Sorting

Jed Rembold

November 24, 2025

Announcements

  • Be working on Infinite Adventure!
  • Graphics Contest entries are due by midnight tonight!
  • Advent of Code begins next Monday!
    • Can join my leaderboard using code 3198345-61d515c2
    • I’ll give you some class participation points for each day’s puzzles that you complete
      • There are 12 days of puzzles
  • Polling: polling.jedrembold.prof

Review Question

You want to look for the term "fish" in the list to the right. What search method would prove fastest in this specific case?

  1. Linear Search
  2. Binary Search
  3. Both would be equal
list_to_search = [
    "onions",
    "puppies",
    "fish",
    "donkey",
    "goats",
    "carrots",
    "lasagna",
    "sheep",
    "bears",
    "beets",
    "battlestar galactica"
]

The Sorting Hat

Sorting

  • Binary search only works on arrays in which the elements are ordered.
    • The process of putting the elements into order is called sorting.
  • Lots of different sorting algorithms, which can vary substantially in their efficiency.
  • From an algorithms view, sorting is probably the most applicable algorithm we’ll discuss in this course
    • Organizing data makes it easier to digest that data, whether the data is being digested by other machines or by humans

Selection Sort

  • The easiest sorting algorithm to explain is that of selection sort

  • Imagine your left hand keeping track of where you were in the array, and your right hand scanning through and finding the next smallest value to move to that location each iteration

    def selection_sort(array):
        for lh in range(len(array)):
            rh = lh
            for i in range(lh+1, len(array)):
                if array[i] < array[rh]:
                    rh = i
            array[lh], array[rh] = array[rh], array[lh]

Following Selection Sort

def selection_sort(array):
    for lh in range(len(array)):
        rh = lh
        for i in range(lh+1, len(array)):
            if array[i] < array[rh]:
                rh = i
        array[lh], array[rh] = array[rh], array[lh]
L = [31, 41, 59, 26, 53, 58, 97, 93, 23, 84]
selection_sort(L)

Selection Sort Efficiency

  • One method to investigate the efficiency would be to time how long different operations took
  • For one particular laptop, those times might look like:
Array Size Running Time (sec) Increase of
10 0.000013 s
100 0.000581 s 44x
1,000 0.0578 s 99x
10,000 5.738 s 99x
100,000 574.2 s 100x
1,000,000 57395 s 100x
  • The time to sort 1 million entries is just under 16 hours!

A Quicker Estimation

  • Alternatively, we can estimate the efficiency by counting up how many times the most frequent operation is executed
    • The idea being that all the basic Python operations take around the same amount of time to do
  • In selection sort that is the inner for loop comparison
    • Checks 10 (\(N\)) values the first time through
    • Checks 9 (\(N-1\)) values the second time through
    • Checks 8 (\(N-2\)) values the third time through, etc
  • Can simplify with some series math \[\displaystyle N + (N-1) + (N-2) + (N-3) + \cdots = \sum_{i=1}^N i = \frac{N\times(N+1)}{2}\]

Quadratic Growth

  • Like with the run times, we can compare how this value scales with increasing \(N\) values in the table to the right
  • Tracks closely with runtimes
  • Multiplying things out, our efficiency scales as \[\frac{1}{2}(N^2 + N) \approx N^2\]
\(N\) \(\tfrac{N\times(N+1)}{2}\) Increase of
10 55
100 5,050 92x
1,000 500,500 99x
10,000 50,005,000 100x
100,000 5,000,050,000 100x

O

Big-O Notation

  • The common way to express notational complexity is to use big-O notation, introduced by German mathematician Paul Bachmann in 1892
  • Big-O notations consists of the letter \(\mathcal{O}\) followed by a formula that offers a qualitative assessment of the program running time as a function of the problem size (\(N\))
  • The complexity of:
    • linear search was \(\mathcal{O}(N)\)
    • selection sort was \(\mathcal{O}(N^2)\)
  • Read aloud, these would be “big-O of \(N\)” or “big-O of \(N^2\)

Simplifying Big-O

  • Big-O just gives a qualitative estimate, so it makes sense to keep the expression on the inside as simple as possible
  • When writing big-O expressions, make the following simplifications:
    • Eliminate any constant factors
    • Eliminate any term whose contribution ceases to become significant when \(N\) becomes large
  • Thus the computational complexity of selection sort is \[\mathcal{O}(N^2)\qquad\text{and}\,\,\text{NOT}\qquad\mathcal{O}\left(\frac{N^2 + N}{2}\right)\]

Deducing complexity from code

  • Can often get a feeling for complexity just by looking at the code structure
  • Find the section of code that seems to be executed the most
  • How many times does that piece of code seem to be executed in comparison to the problem difficulty?
  • Loops are often key!
    • Code in a single loop that iterates \(N\) times gets executed \(N\) times
    • Code in a pair of nested loops that each iterate \(N\) times gets executed \(N^2\) times

Understanding Check

What would be the big-O complexity of the below function?

def func(array):
    n = 0
    while n < len(array):
        for i in range(n):
            array[i] = array[n]
        n += 2
  1. \(\mathcal{O}(N)\)
  2. \(\mathcal{O}(N^2)\)
  3. \(\mathcal{O}(N^{1/2})\)
  4. \(\mathcal{O}(\frac{N^2}{2})\)

A More Efficient Strategy

  • As long as arrays are small, selection sort is perfectly workable!
    • Even 10,000 elements get sorted in just over 1 second
  • Less attractive to commercial applications with huge arrays though
    • Sorting a million values to take over 3 hours?!
  • The \(\mathcal{O}(N^2)\) complexity does offer a little hope though
    • Sorting twice as many elements takes four times as long = BAD
    • But sorting half as many elements takes only a quarter the time = GOOD!
    • Can we break the array into smaller pieces and just work with those?
// reveal.js plugins