SQL Utilities

Jed Rembold

April 16, 2025

Announcements

  • HW10 due tonight (or as soon as you can get it in) and then you just have the project!
  • I continue to be terribly behind in my grading for this class, and for that I sincerely apologize. I should be able to make meaningful progress in the coming week.
  • For next week, we just have the presentations planned. I’ll put together and post an order soon.
    • You can set your Grafana instance to “Serverless” as well to save on credits!
    • Make sure you are meeting the requirements on the project guide and scoring rubric!
    • The supplementary GitHub repo will be available soon. (I’ll get it available tonight and post an announcement)

DRY

  • A common programming term is that of DRY: Don’t repeat yourself
    • For convenience sake (why redo work you have already done?)
    • For debugging/maintenance sake (in case of a change, each replicate needs to be found and changed)
  • Want to look at several methods in SQL that we can utilize to better adhere to DRY principles
    • Views
    • Custom functions
    • Triggers

A Splendid View

Views

  • A view in SQL is essentially a saved query

  • Existing views can be treated just like tables, selecting from their contents

    • You can even, to a limited extent, add, remove, or update contents
  • Syntax similar to creating a new table, just with VIEW

    CREATE VIEW |||view name||| AS (
      SELECT |||colnames|||
      FROM |||tablename|||
    )

View vs Table

  • Despite seeming similar, views are different from tables
    • A view stores no new internal information in the database, whereas creating a new table copies that information
      • This means a view will always reflect the latest data, whereas a created table would need to be updated
    • Technically, a view runs the original query each time it is accessed. This makes them less ideal for computationally heavy operations
  • Views can be a way to give users access to only particular columns of a larger table
    • This can be a boon for security and giving only those who need them permissions to change various tables

View Operations

  • You can drop views just like tables: DROP VIEW |||view_name|||;
  • You can create or replace views if the original number of columns and types are still present: CREATE OR REPLACE VIEW |||view_name||| ...
    • Avoids having to drop first to recreate
    • New view could have more columns, but can not have less
  • Insertions, updates and removes can be made on simple views, which will propogate back to the original table!
    • Views must reference single tables (no joins) and have no distinct or group by type clauses
    • Data in the original table not present in the view will be given NULL values

Limiting Views

  • Can also add WITH LOCAL CHECK OPTION to the end of a view creation query
  • This will enforce that only data visible within the view can be edited with inserts, updates or removes
    • This includes any filtering done by the view!
  • Can be an excellent way to limit what data can be changed
  • Can replace LOCAL with CASCADED if you are referencing several nested views and want the restrictions to apply based on all the views’ restrictions

API Views

  • PostgREST will serve up endpoints for views as well as tables!
  • This can make them an excellent option for collecting together information from your normalized tables that you’d like to offer up to the API user
    • Since the view always showcases the lastest data, you don’t need to worry about keeping in sync
  • Just create the view in your API schema, referencing your other tables
    • Don’t forget to give the anonymous user account permissions to SELECT from it after you do!

Activity!

  • I’ve given you a simple starter bit of sql here to create and populate some tables. Use them to practice the below in the shown pairings.
  • Tasks:
    • Create a view called newest that will only show the “new” boxes from best_boxes
    • Create a view called newred that will only show new and red boxes using only newest
    • Adjust the CHECK OPTION setting for both views so that you could add ('Jane', 2, 7, 9, 'red', 'new') to newred but not ('Bart', 6, 10, 12, 'red', 'used') to newred

Break!

Break Time!

Functional SQL

Functions

  • Postgres has many prebuilt functions that we have used over the semester
  • Sometimes you need something a bit more bespoke though, for which Postgres gives you some “easy” methods of writing your own functions
  • What language you use to write your own function can be exceptionally flexible:
    • Pure SQL
    • PL/pgSQL is an extension of SQL that offers more programming logic
    • Python
  • All functions will follow a similar structure, but the body will depend on the details of your chosen language
    • Documentation for all the languages (and functions in general) can be found here

Structure of Function

  • The structure of any defined function will generally look like:

    CREATE FUNCTION |||function name|||(|||function arguments|||)
    RETURNS |||return type||| AS
    |||function body string|||
    LANGUAGE |||function lang|||;
  • |||function name||| - The desired name of the function

  • |||function arguments||| - A comma separated list of paired arguments and types

  • |||return type||| - The variable type of what will be returned (selected)

  • |||function lang||| - The language used for the function body

  • |||function body string||| - a string containing the query or logic to process whenever the function is run

In SQL

  • LANGUAGE SQL
  • Using the basic SQL language gives you access to any existing SQL queries that you already know
  • To return something you should use the SELECT
  • You can also write functions that do not return something by specifying RETURNS void and then using INSERT, UPDATE, DROP, etc.
  • Using $$ to quote the body is recommended so that you do not need to double up any single quotes

SQL Example

  • Suppose we wanted to define a function to return us the difference between two timestamps in hours

    CREATE FUNCTION diff_in_hours 
      (start_ts TIMESTAMP, end_ts TIMESTAMP)
    RETURNS FLOAT AS
    $$
    SELECT date_part('epoch', end_ts - start_ts) / 60
    $$
    LANGUAGE sql;

In plpgSQL

  • LANGUAGE plpgsql
  • Extends SQL to add classic programming control structures
  • Needs to be wrapped in BEGIN, END keywords
  • Operates like a hybrid between SQL and a more typical programming language
  • Each phrase needs a terminating semicolon at the end
  • More flexible than pure SQL, but has additional syntax to learn

plpgSQL Structures

  • You can define variables

    x := 56;
  • Control statements

    IF x > 5 THEN
      |||...do something...|||
    ELSE
      |||...do something else...|||
    END IF;
  • Returns

    RETURN x;

plpgSQL Example

  • Suppose we wanted to define a function to return us the same difference between two timestamps in hours, but now in pgsql

    CREATE FUNCTION diff_in_hours_plpgsql 
      ( start_ts TIMESTAMP, end_ts TIMESTAMP )
    RETURNS FLOAT AS
    $$
    BEGIN
    IF end_ts > start_ts THEN
      RETURN date_part('epoch', end_ts - start_ts) / 60;
    ELSE
      RETURN NULL;
    END IF;
    END;
    $$
    LANGUAGE plpgsql;

In Python

  • Python is another supported language in which functions can be written

  • Requires an extension to be active:

    CREATE EXTENSION plpython3u;
  • Having the extension available might depend on how your database was installed

  • Most common data types with corresponding types are converted, anything else (like timestamps!) are just utilized as strings

  • I’d generally write the Python in a more Python-suited editor and then copy it in.

Activity!

  • Returning to the boxes tables you were working with earlier:
    • Write a function named box_volume to compute the volume of a cubic object, and use that function to populate a new volume column in best_boxes
    • Write a function called scale_boxes which updates the best_boxes table to scale the value in the named width, height, or depth columns by the provided argument. Run it to ensure it is working.

Obligatory Trigger Warning

Triggers

  • Triggers are contingencies that you can set up so that when a certain event occurs, a particular function is run
  • Commonly used to update or change values in other tables when something is newly added or changed in a table
  • Requires two things:
    • A defined trigger function
      • Can’t use the pure SQL language, plpgsql probably best
    • A defined trigger itself

The Trigger Function

  • Initially going to avoid inputs for complexity reasons
    • Technically, this makes the function what Postgres calls a procedure
    • In truth, you can have inputs to better generalize and potentially reuse trigger functions, but we won’t cover that here.
  • You still likely want a way to get information about the inserted/changed/deleted rows
    • The general names NEW and OLD will refer to the newly changed or previous table values (respectively)
  • The function needs to return a special TRIGGER type
  • What you return will depend somewhat on what type of trigger you are making
    • Only really matters for triggers firing before an event
      • UPDATE or INSERT triggers should return NEW
      • DELETE triggers should return OLD

The Trigger Itself

CREATE TRIGGER |||trigger name|||
AFTER UPDATE
  ON |||table_name|||
FOR EACH ROW
EXECUTE FUNCTION |||func_name|||();
  • Need to give the trigger a name
  • Specify when it will trigger
    • Could be before or after a particular event
    • Specifies what table will be “watched”
  • Specify how many times it will trigger
    • Could be for each row affected by the event, or just once for the entire event
  • Specify what function will be run (the earlier defined trigger function)

When to Trigger

  • You can listen for INSERT, UPDATE, or DELETE events that affect a particular table
  • Can trigger before the event occurs
    • Would be before any conditions are checked or changes made
    • What is being returned from your trigger function is important here!
  • Can trigger after the event occurs
    • After all conditions are checked and changes have been made
    • What is returned from the trigger function has no bearing here
  • Technically can also trigger INSTEAD OF the event occurring, but we won’t cover that here

Fine-tuning Triggering

  • In some cases, triggering off of any update or insert is too broad

  • Can localize to just a single (or few) columns by using OF colname1:

    AFTER UPDATE OF |||column_name₁|||, |||column_name₂|||
  • Can localize to only triggering off certain rows by inserting a WHEN condition before the EXECUTE line

    • Can use NEW and OLD within this condition
    WHEN OLD.|||name||| != NEW.|||name|||
    EXECUTE FUNCTION |||myfunc|||();

How much triggering

  • Using FOR EACH ROW will cause the trigger function to run once for every row that was a part of the triggered event
    • Run an UPDATE that affected 10 rows? The trigger function will be run 10 times, once on each row
    • Ensures you will have access to the NEW and OLD variables in your trigger function
  • Using FOR EACH STATEMENT will cause the trigger function to run once when the triggering event occurs
    • Run an UPDATE that affects 10 rows? The trigger function just runs once.
    • A statement altering 0 rows will still have an effect!
    • Statement level triggers do not have a way to examine individual modified rows using OLD or NEW

Extra Considerations

  • You can have multiple triggers acting off a single event
    • In this case, Postgres processes them in alphabetical order
  • It is possible to set up cyclical triggers, where Trigger A makes a change that triggers B, which makes a change that then triggers A again, etc
    • Postgres does not have built-in tools to prevent this, and it is up to you not to trigger your way into bad situations!

Activity!

  • You are very worried about the data within best_boxes from our earlier activity, such that you want to keep a log of any name changes that might occur. The table name_changes was already created for you from the earlier activity.

    • Write a function and corresponding trigger that, upon changing a name from best_boxes, will add a row to the name_changes table, where the operation would be 'update' in this case. The changed_at column should be set to the current timestamp. Test it by changing ‘Bob’ to ‘Bobby’.
    • Write another function and corresponding trigger that will add a row to name_changes whenever a new row is added to best_boxes. In this case, the operation would be 'insert' and the og_name would be NULL.
    • (If time) Add a trigger so that whenever a dimension is altered (like, from your scaling function), the volume is automatically recomputed.
// reveal.js plugins // Added plugins