The Right Type

Jed Rembold

Wednesday, September 3, 2025

Announcements

  • HW1 is due at the end of the day today
    • You just need to have uploaded your solutions back to the GitHub repository. Nothing else needs to be done.
  • HW2 will be posted by the end of the day, due next Wednesday night
  • Look over Ch 5 for next Monday
  • Polling: polling.jedrembold.prof

Review Question

Suppose a table had been created using the SQL statement below. How would you then query for the most recently measured number of positive cases from schools beginning with “University”?

CREATE TABLE cases (
  id INT,
  school_name VARCHAR(50),
  case_positive INT,
  tests_run INT,
  date_taken DATE
);
SELECT case_positive FROM cases
WHERE school_name LIKE 'University_'
ORDER BY date_taken;
SELECT case_positive FROM cases
WHERE school_name LIKE 'University%'
ORDER BY date_taken DESC;
SELECT case_positive FROM cases
WHERE school_name LIKE '%University%'
ORDER BY case_positive DESC, 
         date_taken DESC;
SELECT case_positive FROM cases
WHERE school_name LIKE '%University%'
ORDER BY date_taken DESC;

Data Type Fundamentals

  • Each column in a table can only have data of a single type
    • Or be missing an entry, designated with a special NULL entry
  • Assigning the correct data types of columns can
    • Improve memory allocation
    • Improve performance
    • Prevent errors in data entry
    • Prevent mistakes in calculations
  • Postgres can accept a large number of data types, but today we will focus in depth on the most common: characters, numbers, and dates and times

Text

Characters or Text

  • One of the more common things to be stored are characters or sequences of characters
  • Postgres has several types you can use to store this sort of information:
    • CHAR(n): A fixed length column of n characters. Always uses this amount of characters, padding with spaces if your stored sequence of characters is shorter.
      • Omitting the (n) is the same as CHAR(1)
    • VARCHAR(n): A variable length column with a maximum of n characters. If your stored sequence is smaller, then no padding is done and the data is stored as is.
      • Omitting the (n) is the same as TEXT
    • TEXT: A variable length column of “unlimited length” (about 1 GB).
  • In Postgres (unlike in some other variants) there is not really a difference in performance between all 3 types


– Postgres specific, though similar implementations exist in other variants

Character Takeaways

  • In general, don’t use CHAR unless what you are storing is always a given length of characters
    • State abbreviations for instance, or possibly other single character encodings
  • Use VARCHAR or TEXT in most situations
    • VARCHAR is more portable, but make sure to choose a max that is well above whatever your longest sequence of characters could be
    • Can also choose VARCHAR if you want an error to be output if too many characters are entered
    • If you don’t mind being Postgres specific, TEXT should always work

Numbers

Types of numbers

  • Important to be able to do built-in math calculations on columns
  • If the information you want to store is numeric, always use a numeric data type rather than the characters of a number
  • A few different options for numbers:
    • Integers: whole numbers (positive and negative)
    • Fixed-point and Floating-point: fractions of whole numbers (also positive and negative)
  • For each there are several data types that you can choose from, largely depending on the size of the numbers you are expecting.

Integers

  • Probably the most common form of number you will use in a database
  • Can use 3 different data types in SQL to represent an integer
    • Differ just in the size of the numbers they can hold
    Type Size Range
    SMALLINT 2 bytes ±32,767
    INTEGER 4 bytes ±2,147,483,648
    BIGINT 8 bytes ±9,223,372,036,854,775,808
  • SMALLINT is generally only used if disk space is at a premium
  • Probably only use BIGINT if sure that INTEGER is insufficient

Serial Integers

  • If you want a column to be auto-incrementing, you can use the serial form of the integer types:
    • SMALLSERIAL
    • SERIAL
    • BIGSERIAL
  • Have the same allowed ranges as their integer counterparts
  • No need to specify that column when adding a row: Postgres will automatically increment and add it
  • While each increment will be unique, you may not have perfect even intervals
    • Removed row values are not reused
    • If an INSERT is interrupted because of an error, that unique integer is basically skipped

Your Identity

  • The SERIAL datatypes are Postgres specific

  • Since Postgres 10 the SQL core version with IDENTITY has been supported

  • Automatically fill the column with the incremented number:

    INTEGER GENERATED ALWAYS AS IDENTITY
  • Automatically fill column but allow overwriting:

    INTEGER GENERATED BY DEFAULT AS IDENTITY

Fixed-Point Numbers

  • Represents a fractional value
  • Two methods of writing:
    • NUMERIC(|||precision|||, |||scale|||)
    • DECIMAL(|||precision|||, |||scale|||)
  • precision is a positive integer representing the max total number of digits comprising the number (on both sides of the decimal point)
    • If provided, needs to be between 1 and 1000
  • scale is a positive integer representing the total number of digits to the right of the decimal point
  • Will round or pad the decimal digits with 0’s if needed to get the correct number of digits after the decimal
  • Will return an error if the input number can’t fit within the precision

Fixed-Point Examples

  • Consider the number: 192.837465
  • In various fixed-point formats:
    • NUMERIC(5,2) = 192.84
    • NUMERIC(10,2) = 192.84
    • NUMERIC(15,10) = 192.8374650000
    • NUMERIC(6,3) = 192.837
    • NUMERIC(4,2) = ERROR
  • If you don’t include inputs:
    • NUMERIC(5) = 193
      • Treats the scale as 0
    • NUMERIC = 192.837465
      • Will store up to the maximum precision and scale allowed (131,072 digits before and 16,383 digits after the decimal, lol)

Floating-Point Numbers

  • Still represents a fractional value
  • Differs from Fixed-Point in that they use exponents to store the information, so the decimal point could “float”
  • Two data types that can be used:
    • REAL
    • DOUBLE PRECISION
  • They differ just in their precision (and how much storage space they take)
    • REAL will show at most 9 significant digits and takes up 4 bytes
    • DOUBLE PRECISION will show at most 17 significant digits and takes up 8 bytes
      • Postgres will also accept the type FLOAT to be synonymous with DOUBLE PRECISION

Floating Pitfalls

  • Float-point representations are computed using binary math to make them easy to store in the computer
  • Some fractional values cannot be perfectly captured with a binary decimal though
    • In the same way that we can’t exactly write 1/3 as a decimal
  • Postgres will use the best approximation that it can given the precision, but it is still an approximation!
  • Takeaways:
    • Do not use floating-point values if you will be doing highly sensitive calculations with the numbers!
      • i.e. money, landing a rocket on the Moon
    • Comparing two float-point values for equality might not work as expected, due to these tiny approximations
      • Better to check if greater or less than, or within a small interval around the desired value

Understanding Check!


A table is made as shown to the right. Only one of the rows below could be added to that table using INSERT INTO without causing an error. Which row could be successfully added?

CREATE TABLE revq (
  col1 CHAR(2),
  col2 TEXT,
  col3 SMALLINT,
  col4 NUMERIC(3,2),
  col5 REAL
);
Answer col1 col2 col3 col4 col5
A ‘OR’ ‘Hello!’ 30200 12.2 3.14
B ‘12’ ‘How odd!’ 43210 2.23 2.718281828
C ‘C’ ‘Puzzling.’ 13.6 1.1 1.41
D ‘okay’ ‘Mysterious…’ 2132 8 1.618

It’s a Date (and time)

Dates and Times

  • One very nice aspect of all SQL databases is that they can work with times and dates very easily, assuming the correct data type is used
    • This shouldn’t be trivialized! Working with times and dates is often a pain given all the complexities of the Gregorian calendar, time zones, daylight savings time, etc.
  • Approximately 4 major time and date data types, summarized in the below table
Name Storage Size Lowest Highest Resolution
DATE 4 bytes 4713 BC 5874897 AD 1 day
TIME 8 bytes 00:00:00 24:00:00 1 μs
TIMESTAMP 8 bytes 4713 BC 294276 AD 1 μs
INTERVAL 16 bytes -178000000 yrs -178000000 yrs 1 μs

DATES

  • Holds information for a single day
    • No information about time
  • Data needs to be enclosed in single quotes, like text strings
  • Postgres can actually correctly parse a ridiculous number of ways of writing the date, but you can never go wrong with the ISO standard of YYYY-MM-DD
  • Other acceptable variants include:
January 18, 2022 1/18/2022 01/18/22
2022-Jan-18 18-Jan-2022 Jan-18-22
20220118 Jan 18, 22

Times

  • Holds information about a single time
    • No information about date is stored
    • A date CAN be included, but it is largely ignored
  • Still needs to be enclosed in single quotes
  • Has variants that are both with and without timezone
    • TIME is technically without timezone
    • TIME WITH TIME ZONE is, as expected, including a timezone
      • Postgres has a shorthand notation for this, called TIMETZ
      • Due to the way timezones are stored and some issues that arise, using this data type is generally discouraged.

More times

  • ISO time formats looks like HH:MM:SS.FFFF, where the hours are on the 24 hour clock
  • Other formats are accepted, including
16:05:06 16:05 04:05 PM
4:05:06.123 PM 0 4:05:06 AM

Timezones

  • If specifying a timezone, it comes after the time and there are a few ways to describe it:
    • Abbreviation: PST (for Pacific Standard Time)
    • Full name: America/Los_Angeles
      • This method requires you to enter the date as well, so that it can tell if daylight savings is active or not!
    • UTC Offset: -8 (our clocks are currently 8 hours behind GMT)
  • If no timezone is specified, but the field requires one, the system timezone is used
  • Examples of times with time zones:
    • 04:05:06 PM PST
    • 2022-01-18 16:05:06 America/Los_Angeles
    • 16:05:06-8

Timestamps

  • A TIMESTAMP field holds basically both a date and a time (in that order)
  • The same properties and formats that apply to each individually apply here as well
  • Also have the option of with or without a timezone
    • TIMESTAMP by itself is without a timezone, as per SQL standards
    • TIMESTAMP WITH TIME ZONE is with a timezone. Shorthand in Postgres is TIMESTAMPTZ
  • Internally, timestamps with a timezone are always stored in UTC, but then when displayed are converted back to the local timezone
  • Examples:
    • 2022-01-18 14:30:00
    • Jan-18-2022 2:30 PM PST

Intervals

  • Represents a span of time
  • Generally given by first a number and then by a unit
    • Possible units: microsecond, millisecond, second, minute, hour, day, week, month, year, decade, century, millennium
    • Abbreviations or plurals of the above also work
  • Examples:
    • 1 day
    • 3 century 2 mins
    • 45 ms
    • 1 mon 87 us
  • Intervals can be used in calculations with other date/time data types
    • Can also be used in comparisons if subtracting timestamps

MORE!

  • These are just a small sample of some of the most common data types
  • Postgres supports many more!
    • Booleans
    • Geometric types
    • Network address types
    • JSON types
    • A full list can be found here
  • We’ll address others as they come up sporadically through the rest of the semester, but you have the core basics now

Mix and Match

Conversions

  • There are plenty of instances where you may need to convert between types

    • Sometimes calculations (coming soon!) need a certain data type
    • Sometimes the information was just not stored in the most ideal type
  • Can use the CAST function to convert between data types (where it makes sense)

  • Core syntax is

    CAST(col_name AS new_data_type)
  • Postgres has a shorthand conversion syntax that uses double colons:

    col_name::new_data_type
// reveal.js plugins // Added plugins