Trees

Jed Rembold

November 20, 2020

Announcements

Family Trees

  • Family trees are common examples of a tree structure
  • Useful for defining terminology
    • William I is the root of the tree
    • Adela is a child of William I and the parent of Stephen
    • Robert, William II, Adela, and Henry and siblings
    • Henry II is a descendant of William I, Henry I, and Matilda
    • William I is an ancestor of everyone else in this tree

Binary Search Trees

  • Trees can be used to implement dictionaries using a structure called a binary search trees (or BST)
  • Each node in a BST has exactl two subtrees:
    • A left subtree that contains all nodes before the current node
    • A right subtree that contains all the nodes that come after it
  • The classic example of a binary search tree uses the names of the seven dwarves:

Balance Your Trees

  • Ideally, a binary search tree would look similar to our last picture
  • If you placed a different dwarf at the root, then the tree would end up unbalanced
    • If you placed Bashful at the root, the entire left subtree would be empty!
  • Balancing BST’s is important for effectively and quickly finding the desired values
  • If you go on to take CS 343 (Analysis of Algorithms), you will learn several strategies for maintaining balanced binary search trees

Graphs

  • Trees are more flexible than lists, but we can get more flexible yet
  • Trees still require a single root, and no cycles.
  • If we eliminate those restrictions, we get a more general structure called a graph
  • Graphs consists of a set of nodes connected in various relationships and links by arcs
  • Graphs are frequently used in many types of practical applications

Graph Examples

 

Google It

  • The big innovation of the late 1990s was the development of search engines
    • Beginning with Alta Vista
    • Reaching its modern pinnacle with Google
  • Google founded by Stanford graduate students Larry Page and Sergey Brin in 1998
  • Heart of the Google search engine is the Page Rank algorithm, designed by Page and Brin under the direction of their advisors, Rajeev Motwani and Terry Winograd
Larry Page and Sergey Brin

The Page Rank Algorithm

  • Gives each webpage a rating based on its importance, wherein a page becomes more important if other pages link to it
    • If a page that links to the page in question is itself important, this boosts the weight of that link
  • Imagine a random person surfing the web, clicking on links randomly
    • The Page Rank of a page is roughly the probability that the person will land on a particular webpage
    • More links pointing to a page mean that the individual is more likely to end up on that page, and thus that page has a greater important and Page Rank
  • The behavior of the web surfer is an example of a Markov process
    • A random process that only depends on the current state of the system and not any of its history

Walking the Algorithm

You start with a set of pages

Walking the Algorithm

Crawl the web to determine the link structure

Walking the Algorithm

Assign each page an initial rank of \(1/N\)

Walking the Algorithm

Update the rank of each page by adding up the rank of every page that links to it divided by the number of links emanating from the referring page.

  • Node E has two incoming links, one from C and one from D
    • Node C contributes 1/3 of its current rank
    • Node D contributes 1/2 of its current rank
  • New rank for Node E is: \[PR(E) = \frac{PR(C)}{3} + \frac{PR(D)}{2} = \frac{0.2}{3} + \frac{0.2}{2} = 0.17\]

Walking the Algorithm

If a page (like Node E) has no outward links, redistribute its rank equally among the other pages in the graph

  • Here, 1/4 of E’s page rank is distributes out to A, B, C and D
  • The idea is that users will keep searching if they hit a dead end

Walking the Algorithm

Apply this redistribution to every page in the graph

Walking the Algorithm

Repeat process until ranks stabilize

Enter Player 2

  • A challenge of any search engine is ensuring that some commercial interest can not “game” the system
  • Page Rank makes this difficult, since the ranking depends on the prestige of outside web pages normally outside the control of those looking to manipulate the system
  • To ensure that rankings remain fair, Google keeps the details of the ranking algorithms secret and changes them often to outwit those trying to cheat the system

Actually Searching

  • For each page then, with its page rank, Google indexes the words that appear on that page
  • When you search for a word, Google checks which pages contain that word, and then returns them for you sorted by page rank
  • Fancy combinations like pairs of words can be determined by looking for pages when the words appear at successive indices on the page!
// reveal.js plugins