Being Selective

Jed Rembold

Wednesday, August 27, 2025

Announcements

  • HW1 is posted!
    • I’ve posted a short video walkthrough of using GitHub Classroom as well
  • Let me know if you are having issues following along with the book’s description of the PostgreSQL install
  • No class next Monday!
  • Look over Ch 3 for next Wednesday
  • We’ll have some polling today: polling.jedrembold.prof

Review!

The SQL commands to the right involve creating and then populating a simple table. If they were run one after the other, which of the below possibilities would mostly closely mimic how the rev table would look?

CREATE TABLE rev (
  name VARCHAR(15), 
  age int);

INSERT INTO rev (name)
VALUES ('Bob'), ('Sally');
  1. name age
    Bob 15
    Sally 15
  2. name age
    Bob
    Sally
  3. name age
    Bob Sally
  4. This would give an error.

Subsets by Column

Query Time

  • The prime use of SQL for individuals not in charge of managing a database is using it to query the database for information

  • Queries return only a subset of information from the database, and don’t alter the database in any way

  • The keyword that begins any query is SELECT, followed by:

    • The column name(s) you would like information from, and
    • The table you want to get the information from
  • Basic syntax then could look something like:

    SELECT |||column₁|||, |||column₂||| FROM |||table name|||;

Star Power

  • While you can list out as many columns as you want, for tables with tons of columns that can be annoying

  • An asterisk *, can stand in for the column names to stand for “all the columns”

    SELECT * FROM |||table name|||;
  • There is no way in core SQL or Postgres to indicate that you want “all but a few columns”

    • Some variants do have a method of doing this though

You are Distinct

  • In many instances, a particular column may have duplicate entries
    • Each row should be unique, but if looking at just a single column, there very well could be duplicates
  • The DISTINCT keyword allows you to only display the unique values that appear in a column
    • Appears before the names of the desired columns
  • Can be applied to multiple columns at a time
    • Then interpreted as “For each X in the table, what are all the Y values?”
SELECT DISTINCT |||column₁||| FROM |||table name|||;
SELECT DISTINCT |||column₁|||, |||column₂||| FROM |||table name|||;

Reversing Entropy

  • Sometimes patterns can be more evident if some ordering is applied to the query results
    • The ordering only applies to the query, the original data is untouched
    • If you do no specify an ordering, the order of returned rows is not guaranteed!
  • SQL provides the ORDER BY keyword to indicate a column to use for ordering
  • Can use ASC (the default) or DESC to specify the direction of ordering
  • Can also order by multiple columns (separate by comma)
SELECT |||column₁|||, |||column₂|||
FROM |||table name|||
ORDER BY |||column₁||| DESC;

Ordering Basics

  • When your database was created, Postgres assigned it a collation method based on your systems locale
    • You can see the collation method for a connected database using: SHOW lc_collate;
  • This collation method is what determines what characters “come before” other characters
    • On my system, the collation method looks like: en_US.utf8
    • If your system is using a different collation method, then your sorting might come out slightly different!
    • You can choose a different collation method by providing one after the ORDER BY statement
    SELECT |||column|||
    FROM |||table name|||
    ORDER BY |||column||| DESC COLLATE "C"

Subsets by Row

Filtration

  • Especially in tables with thousands to millions of rows, you likely don’t want all of the information pertaining to a column
  • Instead it makes sense to filter or only return rows that meet certain criteria
  • SQL’s WHERE keyword provides exactly this functionality
SELECT |||column||| FROM |||table name|||
WHERE |||some condition|||;
  • Each condition is some sort of comparison check, which could be summarized as a true / false question

Comparison Operations

Operator Function Example
= Equal to WHERE colname = 5
<> or != Not equal to WHERE colname != 'fish'
> Greater than WHERE colname > 0
< Less than WHERE colname < 100
>= Greater than or equal to WHERE colname >= 0
<= Less than or equal to WHERE colname <= 1000
BETWEEN Within a range WHERE colname BETWEEN 50 and 100
IN Matches one of a set WHERE colname IN ('red', 'blue')
LIKE or ILIKE Match a pattern WHERE colname LIKE '%ed'
NOT Negates a condition WHERE colname NOT LIKE '%ed'

Not Null

  • Probably have come across instances where you don’t want to have a bunch of NULL values in your output query
  • You can not filter them out using colname != NULL
    • A NULL value comparison always gives NULL, no matter the comparison
  • Need to use the keywords IS NOT NULL (or IS NULL if you want to find the null values)
SELECT |||column|||
FROM |||tablename|||
WHERE |||column||| IS NOT NULL;

Understanding Check

Suppose I’d like to select from my cereal table an alphabetized list of all the cereal names that have the same amount of grams of proteins as they do grams of fat. Which query would best deliver on this?

SELECT name FROM cereal
WHERE fat = protein
ORDER BY cereal;
SELECT DISTINCT name 
FROM cereal
WHERE fat = protein;
SELECT name FROM cereal
WHERE fat = protein
ORDER BY name;
SELECT name FROM cereal
WHERE protein = fat
ORDER BY protein;

Pattern Matching

  • LIKE and ILIKE differ only in whether capitalization matters in the match
    • LIKE is capitalization sensitive, ILIKE is not
  • Each can use several special characters in the desired pattern
    • % is a wildcard matching zero or more of any character
    • _ is a wildcard matching just a single character
    • If you want to actually match off one of these characters, you need to “escape” it with a backslash (\_ or \%)

Conditional Combinations

  • Often, you might need to filter or several or more complicated conditions
  • SQL also has the boolean AND and OR operators that you can use to stitch together multiple conditions
  • Order of operations is not necessarily left to right, so surround terms in parentheses if you need to force prioritization
SELECT *
FROM |||table name|||
WHERE |||column₁||| LIKE 'F%' AND
      (|||column₂||| > 50 OR |||column₃||| <= 10);

Reaching Your Limit

  • Often times, especially if sorting, you may not even be interested in all the results
  • Maybe you just want the first few, or even just the first result
  • You can control the maximum number of rows returned using LIMIT
  • Comes last, after all the other keywords
SELECT * FROM |||table name|||
WHERE |||some condition|||
ORDER BY |||column|||
LIMIT 5;

Your Turn!

Introduce yourself to a neighbor and then work together to construct a query to solve the following situation:

  • Suppose you are particularly short and thus can only access cereals on the first two shelves (shelves 1 and 2). You’d like to grab the cereal with “Cheerios” somewhere in its name and which has the highest amount of sugar per serving. Construct a query that could help you figure out what cereal you should be grabbing!
// reveal.js plugins // Added plugins