---
title: "Foreign Subqueries about Pivots"
author: Jed Rembold
date: March 19, 2025
slideNumber: true
theme: nord_light
highlightjs-theme: nord
width: 1920
height: 1080
transition: slide
---


## Announcements
- Homework
  - HW7 is due tonight
  - HW8 is coming out tonight, due Wed after break
  - I'm spending part of my break getting caught up
- Today is going to be mostly SQL focused
- Projects
  - Is your scraper up and running? Ensure that it is before Spring Break starts!
  - Is your automatic cleaner/transformer up and running? Try to have that done before we meet next
  - We'll be talking the serving components after break

# Database Connections
## Connecting Databases
- Postgres has a few ways you can access tables from one database while in another
- Both are extensions that you would **need to activate**
  - dblink: Simple for one-off data access
  - postgres_fdw: Better if you need a more permanent connection
- Both can be useful, so I'll try to touch on each


## The Connection
- Accessing data using `dblink` is a two-step process: connect and then query
- To connect, use `dblink_connect` (shocking!)
  ```pgsql
  SELECT dblink_connect(|||connection string|||)
  ```
- The connection string can either be a full database url, or a space delimited sequence of keywords, e.g.:
  ```pgsql
  'host=localhost user=postgres password=1234'
  ```

## Querying
- Once the connection has been made (within a session) you can send across queries
  ```pgsql
  SELECT * FROM dblink(|||query string|||)
  ```
- `dblink` is essentially blind, in that it has no idea what would be returned by the query
  - As such, similar to creating a pivot table (coming soon!), you **must** supply a table alias **and** columns/types at the end
  ```pgsql
  SELECT * 
  FROM dblink(|||query string|||)
  AS |||table alias||| ( |||col₁||| |||type₁|||, |||col₂||| |||type₂|||, |||etc.|||);
  ```

## Executing
- If you have a command to run remotely, and don't need a table back, you can use `dblink_exec`
- This may be useful for creating, updating or deleting data from a remote table.
  ```pgsql
  SELECT dblink_exec(
    'CREATE TABLE test (col1 text, col2 int)'
  );
  ```

## Foreign Data Wrapper
- Using the `postgres_fdw` extension is a bit more of a setup, but makes working with remote tables a breeze
- Multistep process:
  - Activate the extension
  - Create a foreign server object, representing the remote database server
  - Create a user wrapping which allows that user to access that foreign server
  - Create the foreign tables


## The Foreign Server Objects
- You first need to define a foreign server
  ```pgsql
  CREATE SERVER |||server_name|||
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (|||server config settings|||)
  ```
- Server name can be whatever you like, provided it is unique
- Server settings are a comma delimited, space separated sequence of host, port, dbname, etc


## The User Wrapper
- You essentially need to give your user permission to access the remote server
  ```pgsql
  CREATE USER MAPPING FOR CURRENT_USER
  SERVER |||server_name|||
  OPTIONS (|||user options|||)
  ```
- Server name will be the name you just used for the created server
- User options should include the corresponding _remote_ user info like username and password

## Bring On the Tables!
- Last but not least, you can import in the tables you want
- Can do one at a time with `CREATE FOREIGN TABLE`, but I far prefer `IMPORT FOREIGN SCHEMA`
  ```pgsql
  IMPORT FOREIGN SCHEMA |||schema name|||
  FROM SERVER |||server_name|||
  INTO |||local schema name|||
  ```
- Can also include `LIMIT TO` or `EXCEPT` followed by a list of table names to fine-tune which tables are imported


# Nesting Queries
## Subqueries
- A _subquery_ is simply a query embedded within another piece of SQL
- Commonly used to prepare some data or compute a value to be used by the surrounding SQL
- We have already seen very simple versions
  - Creating a new table, for example
    
  ```pgsql
  CREATE TABLE |||new table||| AS (
    SELECT * FROM |||og table|||
  );
  ```
- Often, you could have alternatively saved the subquery as its own table and then used that directly, but not always


## Common Uses of Subqueries
- Computing a single value to use in comparisons
- Computing a single column or list of values to use either directly in a table or in comparisons
- Performing "join-like" operations (potentially in a more flexible manner)
- Filtering one table based on the contents of another table
- Otherwise combining operations that would otherwise have taken several steps


## Filter on single value
- Think back to HW3 where you needed to filter out all the outlier taxi-ride speeds
  - There you had to compute the quartiles, the whisker edges, and then filter directly by typing that number back in
- Subqueries can combine those steps! You can directly compare a value to the subquery output **so long as the query outputs only a single value**

  ```{.pgsql style='font-size:1em'}
  SELECT *
  FROM |||table 1|||
  WHERE |||column 1||| > (
    SELECT avg(|||column 2|||)
    FROM |||table 2|||
  );
  ```

## Derived Tables
- You can use the output of a subquery anywhere you would normally reference a table name
- Such a table is then called a _derived table_
- You **must** give such subqueries a table alias, else you have no way to refer to them

  ```pgsql
  SELECT *
  FROM (
    SELECT |||column 1, column 2|||
    FROM |||table name|||
  ) AS |||table alias|||; 
  ```

## Repeating Column Values
- Sometimes it is very useful to have a constant value assigned to an entire column
  - We can't just use an aggregate function (or similar) though, else we get mismatched column lengths
  - Window functions could work, but only if within the same table
- You can include a subquery _in your selected columns_ if it outputs only a **single value**
  - That value will be propagated to all the rows
- Give it an alias so the column heading has some meaning!

  ```{.pgsql style='font-size:.9em'}
  SELECT
    |||column 1|||,
    |||column 2|||,
    (SELECT avg(|||column 3|||) FROM |||table 2|||) as |||column alias|||
  FROM |||table 1|||;
  ```


## Subquery Expressions
- Can also use subqueries to filter based on whether certain content is present within the subquery output
- Combinations of a keyword followed by the subquery
  - `|||expr||| IN (|||subquery|||)` and `|||expr||| NOT IN (|||subquery|||)`
  - `EXISTS (|||subquery|||)` and `NOT EXISTS (|||subquery|||)`
  - `|||expr||| |||op||| ANY (|||subquery|||)`
  - `|||expr||| |||op||| ALL (|||subquery|||)`
- `|||expr|||` is a column name or value, and `|||op|||` is a boolean operator (`=,>,>=`,etc)
- All subquery expressions are evaluated in short-circuit mode: they will return an answer as soon as they can


## IN Subquery
- Will check to see if a term appears anywhere in the subquery output
- The subquery must output a **single column** (though it could be empty)
- `NOT IN` just reverses the situation
- **Be careful of NULLs**
  - `1 IN (1,2,3)` gives True
  - `1 IN (2,3)` gives False
  - `1 IN (NULL,2,3)` gives `NULL`
  - Most dangerous with `NOT IN`, since `1 NOT IN (NULL,2,3)` will give `NULL`, and not True as you might expect


## EXISTS Subquery
- Checks just to see if the subquery has **any** rows in the output
  - No expression to compare to, just looking at subquery output
- The _contents_ of the subquery do not matter at all, so a simple `SELECT 1 FROM ...` is usually used
- To be useful, the subquery is frequently what is called a _correlated subquery_, where it references the table in the outer query
  - These types of subqueries can **not** be run in isolation

  ```{.pgsql style='font-size:.9em'}
  SELECT |||column₁, column₂|||
  FROM |||table₁|||
  WHERE EXISTS (
    SELECT 1
    FROM |||table₂|||
    WHERE |||table₁.column₁||| = |||table₂.column₂|||
  );
  ```


## ANY and ALL
- At times you don't want to see if a value is _in_ the subquery output, but rather how it _compares_ to the output
- A boolean expression needs to return a single True or False though
- `ANY` and `ALL` "broadcast" the boolean comparison across all the subquery rows
  - `ANY` will return True if _any_ of the rows evaluate to True with the expression
  - `ALL` will only return True if _all_ of the rows evaluate to True with the expression
- `|||expr||| = ANY (|||subquery|||)` is thus identical to `|||expr||| IN (|||subquery|||)`
  ```{.pgsql style='font-size:.9em'}
  SELECT |||column|||
  FROM |||table 1|||
  WHERE |||column||| < ALL ( SELECT |||column||| FROM |||table 2|||);
  ```

## Subqueries vs Joins
- You may have realized that subqueries can do many similar things to what we used joins for!
- When should each be used?
  - Subqueries are great when you only need information from a single table, but what information depends on another table
  - Joins are necessary when you need information from multiple tables
- Historically, most RDMS were better optimized for joins, but many have improved substantially in recent years for subqueries
  - If in doubt, use the form that makes the most sense for what you _semantically_ want to accomplish. You can always convert it later if better optimization is needed and could be achieved


## Be careful!
- It can be pretty easy to leap off the deep end with subqueries
- Always keep in mind what you want the subquery to achieve, and test it individually to make sure it is doing what you expect
  - Use them where they fit the needs of the situation
  - A nice aspect of subqueries is that you can work from the "bottom-up", starting with smaller subqueries you know and building on them
- **Organize your queries nicely!**
  - This becomes even more important with subqueries


## Activity!
- I've prepped a simple grades database [here](../activity_data/assignment_submission.sql) that you can download and run the sql file to create a `subq1` schema in your database of choice which will contain three common tables: `roster`, `assignments`, `submissions`
- See if you can answer the questions below **without using joins** and only subqueries (as practice).
  - Which student(s) have turned in _nothing_?
  - Which assignment name has the worst scored average overall?
  - Which students scored below average on every quiz?
<!--- Pairings-->
  <!--- Karol and Jullian-->
  <!--- Megan and Rochelle-->
  <!--- Nina and Ellie-->
  <!--- Andrew and Liam-->
  <!--- Maribel and Connor-->

# CTEs
## Common Table Expressions
- Sometimes you may want to use the same derived tables multiple places in a query
- You could copy and paste those subqueries, but there is a better way
- _Common Table Expressions_ or CTEs are a way for you to define up front a derived table that can be referenced anywhere in the rest of the query
- To do so, you use a `WITH` keyword:
```pgsql
WITH
  |||derived table name||| (|||new column names|||) AS (
    |||subquery|||
  )
SELECT ...
```

## CTE Details
- The number of columns returned by your subquery **must** match the number of column names you initially define
  - If you aren't going to rename the columns, you do not even really need to include the new column names, but it can be nice for clarity
- You do not need to include data types, as these will be inherited from the subquery columns
- You can include multiple derived tables in the CTE statement, separated with a comma
- Good use of CTE's can help clarify and simplify queries
  - _Especially_ if you have a subquery that you need to reference a few times, use a CTE!

## Example
Using a CTE and other subqueries, let's extract the middle 50% of the taxi_rides by duration.


# Pivot Tables
## Why Cross Tabulation?
- SQL tables tend to operate with observations (records/rows) and variables (columns)
  - Commonly called a "tidy" data format
- This isn't always the most _human readable_ format
- Cross-tabulations or pivot tables compare variables to other variables

::::::{.cols style='font-size:.9em'}
::::col

| Name      |  Color | Condition | Number |
|-----------|:------:|:---------:|:------:|
| Apple     |   Red  |    Good   |    5   |
| Orange    | Orange |    Bad    |    3   |
| Cherry    |   Red  |    Good   |   10   |
| Banana    | Yellow |    Good   |    1   |
| Pineapple | Yellow |    Bad    |    4   |

::::

::::col

| Color  | Good | Bad |
|--------|:----:|:---:|
| Red    |  15  |  0  |
| Orange |   0  |  3  |
| Yellow |   1  |  4  |

::::
::::::


## Extensions
- Standard SQL has no way to create these cross-tabulations
- Postgres has a function to help, but it is located in an _extension_ or _module_
  - Extensions to Postgres are collections of functions, date types, and more which add some functionality to a particular database
- To use an extension, you need only enable it for your particular database
  ```pgsql
  CREATE EXTENSION |||extension_name|||;
  ```
- To facilitate making pivot tables, we need to add the `tablefunc` extension to our database, which will add the `crosstab` function
- You can always remove an extension using `DROP` if you need later


## Pieces of the Crosstab
- The `crosstab` function has a lot going on, so let's break things down

::::::cols
::::col
```{.pgsql style='max-height:800px'}
SELECT *
FROM crosstab(
  |||subquery 1 string|||,
  |||subquery 2 string|||
  )
AS (
  |||row label column||| TEXT,
  |||column 1||| |||type 1|||,
  |||column 2||| |||type 2|||,
  ⋮
);
```
::::

::::{.col style='font-size:.8em'}

:::incremental
- Subquery 1 must return 3 columns
  - The first is the row label values
  - The second is the column label values
  - The third is the particular data values
- Subquery 2 returns a single column
  - What column categories should rows be placed into?
- You still need to define the column names and data types
  - First should be the row label name and data type
  - Rest should be whatever column categories you are creating
:::

::::
::::::

## Example
Let's construct a pivot table comparing the total number of taxi rides at different days of the week across different hours.

# A Case Study
## On the Case
- It can be useful in certain situations to employ a little conditional logic _outside_ a filtering statement
- Can be particularly useful to tweak or reclassify values
- Standard SQL has the `CASE` statement to accomplish this
  - Essentially works like programming languages if...else if...else blocks
  
```pgsql
CASE 
  WHEN |||some condition||| THEN |||output|||
  WHEN |||some other condition||| THEN |||different output|||
  ⋮
  ELSE |||fallback output|||
END
```

## A Case Study
- Comparisons are made in order, so the first condition that matches, that output is used
- If you do not include an `ELSE` part, then `NULL` will be output if nothing else matches
- `CASE` statements most often show up in `SELECT` statements where column outputs are being selected, but they could potentially also show up in filtering or ordering statements
- A `CASE` statement will not evaluate results where the condition is not met, so they can be used to prevent certain errors as well (such as dividing by 0)
  ```{.pgsql style='font-size:.9em'}
  SELECT
    CASE WHEN a != 0 THEN 5/a END
  FROM tablename;
  ```

## Example
Let's compute how many taxi rides were considered:

- "Time Travelers" with a duration of less than 0
- "Short Trips" with a duration less than 5 minutes
- "Medium Trips" with a duration between 5 and 15 minutes
- "Long Trips" with a duration greater than 15 minutes


## Coalesce
- Similar to `CASE`, SQL also has the `COALESCE` statement
- `COALESCE` returns the first non-null value in the list
- Frequently used to substitute a default value for `NULL` values
```pgsql
SELECT
  student_id,
  COALESCE(grade, 0)
FROM grades;
```

## Activity!
Using the same grade book tables as earlier ([here](../activity_data/assignment_submission.sql) if you need them again), do the following:

- Create a pivot table for student quiz point scores with full student names as the rows and individual quizes as the columns. You can concatenate two strings in SQL using the `||` operator. So `fname || ' ' || lname` would give the first and last name with a space between as a single string. Feel free to use joins here.
 
 | student_name | Quiz 1 | Quiz 2 | Quiz 3 |
 |:------------:|:------:|:------:|:------:|
 |  Hailey Gray |    1   |    1   |   21   |
 |       ⋮      |    ⋮   |    ⋮   |    ⋮   |



- Create a pivot table for student letter grades on each quiz. Parts of this will look really similar to what you did in the first part, but there will be some important differences.

<!--
## Window Functions
- One useful piece of kit that is not included in the text is that of _window functions_
- A window function is like a mix between a normal column value and an aggregate function
  - Unlike aggregate functions, a window function returns a value for each row
  - Unlike normal column values, a window function can utilize other rows included within its "window" in making an aggregation
- Any normal aggregate function can be used as a window function, though there are specific window functions as well
- Window functions can only be used inside `SELECT` or `ORDER BY` statements
  - They are evaluated _after_ any filtering, grouping, or normal aggregations


## Over the Hill
- The defining characteristic of any window function is the `OVER ()` keyword, which comes after the aggregating window function
- Content inside the `()` determines the "window" of the window function
- By default, if nothing is provided, the entire column is the window
- The below would output the average of the column in every row
  - This means it is technically no different from accomplishing this with a subquery
```pgsql
SELECT AVG(col1) OVER()
FROM table_name;
```

## Dedicated Window Functions
- You can use any existing aggregate function as a window function, but there are also more specific window functions


:::{style='font-size:.9em'}

| Function           | Description                                                                           |
|--------------------|---------------------------------------------------------------------------------------|
| `row_number()`     | Assigns an ascending row number to each row in a window                               |
| `rank()`           | Assigns an ascending rank to each row, with possible ties skipping the next value     |
| `dense_rank()`     | Assigns an ascending rank to each row, with possible ties not skipping the next value |
| `first_value(col)` | Returns the first value in the window of column `col`                                 |
| `last_value(col)`  | Returns the last value in the window of column `col`                                  |
| `lag(col)`         | Returns the previous row of column `col`                                              |
| `lead(col)`        | Returns the next row of column `col`                                                  |

:::

## DETERMINING ORDER
- Often, to be useful, you may want to define an ordering inside the `OVER()` statement
- As soon as you specify an ordering, **the default window changes**
  - By default, each window now encompasses everything from the start, up to that current row
    - Easiest to see with classic aggregate functions

```pgsql
SELECT COUNT(*) OVER(ORDER BY col2)
FROM table_name;
```

## Tweaking the Window
- You can tweak this window by specifying the starting and stopping point, using the syntax:
  ```pgsql
  ... ROWS BETWEEN offset1 PRECEDING AND offset2 FOLLOWING
  ```
  which appears in the `OVER` clause, after any provided ordering
- Can also exclude the current row or group from the window:
  ```pgsql
  ... EXCLUDE CURRENT ROW
  ```
  - Group here refers to all other rows tied with the current row
- Can be used to compute "rolling" averages and similar

## Partitioning
- Additionally, you can specify a partition for the window
- The default partition is the entire column
- Partitioning here is determined _before_ the rows comprising the window are computed
  - This has a similar feel to `GROUP BY`, but _every_ row will get a value here
- Window functions evaluated within each window within each partition
```pgsql
SELECT AVG(col1) OVER (
    PARTITION BY col2
  )
FROM table_name;
```


## Activity!
- Contained [here](../activity_data/halloween.sql) is an SQL file to generate a table of Halloween trick-or-treater data, which contains two columns:
  - The minute within an hour span when a trick-or-treater visited
  - What candy was given to the trick-or-treater
- You started out with a supply of 40 of each type of candy
- Using this information, answer the following questions:
  - Between the 20th and 30th minutes, what were the three most popular candies given?
  - At what minute of the hour did you run out of Starburst?
  - Which candy did you run out of first?
  - Construct a 5-minute rolling average of the number of trick-or-treaters you saw each minute of the evening. According to this, about when were things busiest?
-->

<!--## Pairs-->
<!--- Andrew and Liam-->
<!--- Nina and Rochelle-->
<!--- Ellie, Connor, and Megan-->
<!--- Karol, Maribel, and Jullian-->

<!--## Text Power-->
<!--- Time to focus on everything we can do with strings!-->
<!--- Chapter topic fall into several main ideas:-->
  <!--- Manipulating strings-->
  <!--- More complicated pattern matching-->
  <!--- Full text searching using normalization and lexemes-->
<!--- All are geared around making using text and strings much more powerful and flexible-->

<!--
## Stringy Functions (CORE)

:::{style='font-size:.9em'}

| Function                  | Description                                                                                           |
|-----------------------------------|---------------------------------------------|
| `str1 || str2` | Concatenates string 1 and string 2 together |
| `upper(str)`              | Converts a string to all uppercase characters                                                         |
| `lower(str)`              | Converts a string to all lowercase characters                                                         |
| `char_length(str)`        | Returns the number of characters in the string                                                        |
| `position(str IN substr)` | Find the number of the character where the substring begins                                           |
| `trim(opt chr FROM str)`  | Removes the given characters from the string, optionally taking from the _leading_ or _trailing_ edge |
| `substring(str FROM n FOR l)` | Returns the portion of the string starting at position n and continuing for l characters |

:::

## String Functions (Postgres)

:::{style='font-size:.9em'}

| Function                  | Description                                                                              |
|--------------------------------|------------------------------------------------------------------------------------------|
| `initcap(str)`            | Converts the first character of each word to uppercase, and the rest lower               |
| `left(str,n)`             | Returns the first n characters of the string                                             |
| `right(str,n)`            | Returns the last n characters of the string                                              |
| `ltrim(str,chr)`          | Remove the characters (space by default) from the start of the string                    |
| `rtrim(str,chr)`          | Remove the characters (space by default) from the end of the string                      |
| `replace(str,from,to)`    | Replaces all occurance of _from_ in the string to _to_                                   |
| `length(str)`             | Returns the number of characters in the string                                           |
| `substr(str, n, l)` | Returns the portion of the string starting at position n and continuing l characters |

:::


## Enhanced Pattern Matching
- We've already seen basic pattern matching with `LIKE` and `LIKE`
  - Some flexibility with wildcard characters: `%` and `_`
- To get (much) more flexibility, we need to pivot to something made for exactly this purpose: _regular expressions_ (or _regex_)
- Regular expressions are a sequence of mostly single character symbols that denote exactly what patterns one could wish for
  - These sequences of characters can initially look very inscrutable! Stick with it!
- Regex's are useful all over, and supported in almost all programming languages as well. Learning at least the basics is time very well spent.


## Basic Regex Terms

:::{style='font-size:.7em'}

::::::cols
::::col

| Expression | Description                                                |
|------------|------------------------------------------------------------|
| `.`        | Matches _any_ character except a new line                  |
| `[abc]`    | Matches any character in the square brackets (a or b or c) |
| `[a-z]`    | Matches a range of characters (all lowercase letters here) |
| `[^a-z]`   | Caret negates what follows (so no lowercase letters here)  |
| `\w`       | Any word character, digit or underscore                    |
| `\d`       | Any digit                                                  |
| `\s`       | A space                                                    |
| `\t`       | A tab character                                            |
| `\n`       | A newline character                                        |

::::

::::col

| Expression | Description                                   |
|------------|-----------------------------------------------|
| `^`        | Match at the start of the string                           |
| `$`        | Match at the end of the string                             |
| `?`        | Get the preceding match 0 or one time         |
| `*`        | Get the preceding match zero or more times    |
| `+`        | Get the preceding match one or more times     |
| `{m}`      | Get the preceding match exactly m times       |
| `{m,n}`    | Get the preceding match between m and n times |
| `a|b`      | Match on either a or b                        |
| `( )`      | Create a capture group or set precedence      |
| `(?: )`    | Negate reporting a capture group              |
::::
::::::

:::

## Other Regex Concepts
- If you ever want to match off a symbol that has special meaning in regex (a parenthese, for instance) you must _escape it_ with a backslash: `\(`
- Reserved characters include: `{ } [ ] / \ + * . $ ^ | ?`
- Flags can be added at the end to tweak matching
  - `/i` means that matches will be case insensitive
  - `/g` means that all instances of the match will be returned, not just the first
  - `/m` allows the anchor characters (`^` and `$`) to operate on each line, not just across the entire string.



## {data-background-iframe="https://regexr.com/6hfai"}

## Your Turn!
- The link [here](https://regexone.com/problem/matching_phone_numbers) has a nice sequence of short problems to test your skills against
- Most problems consist of:
  - Terms that you want to match correctly
  - Terms that you want to **not** match
  - Capture groups that you'd like to capture
- Working with a partner, see how many you can figure out in our remaining time tonight
-->
<!--## Back to SQL-->
<!--- One of the main ways we previously used pattern matching was for filtering-->
<!--- You can also use regexes for pattern matching!-->
  <!--- `~` is a case sensitive match using the following regex-->
  <!--- `~*` is a case insensitive match using the following regex-->
  <!--- Either can have a `!` in front to negate the search (where things do **not** match the regex)-->

<!--```pgsql-->
<!--SELECT colname-->
<!--FROM tablename-->
<!--WHERE colname ~ '[a-z]*\s\d{2}';-->
<!--```-->

<!--## Extracting Data-->
<!--- Another hugely common use of regex is to extract only the data you want from a much larger string-->
<!--- This can be particularly useful when cleaning data or constructing useful database tables-->
<!--- `regexp_match(str, regex)` returns the first matching instance in the string-->
  <!--- What is returned is whatever is in any _capture groups_ you may have included in your regex, or the entire match if there are no capture groups-->
  <!--- Output is returned as an array, to allow for potentially multiple capture groups-->
  <!--- If you just have one capture group and don't want it in an array, index it out using `[1]` at the end after wrapping entire expression in ()-->

<!--```{.pgsql style='font-size:.9em'}-->
<!--SELECT (regexp_match('today is March 15, 2022', '\d{4}'))[1];-->
<!--```-->


<!--## Back to You!-->
<!--- In the same pairings as previously, take a look at [this](../activity_data/regex_parsing.csv) CSV file, which contains a simple subset of artists and dimensions from the MoMA data set-->
<!--- Create the simple table and import in the data from the CSV-->
<!--- See if you can achieve the following:-->
  <!--- Create and populate new columns for first, middle, and last name-->
  <!--- Create and populate new columns to hold the width and height in inches (in `1 1/4` form)-->
  <!--- If feeling good about the above, create new columns to hold decimal equivalents of the width and height-->
