Jed Rembold
January 22, 2025
There are three main concerns when building any data system:



“It seems that relational and document databases are becoming more similar over time, and that is a good thing: the data models complement each other. If a database is able to handle document-like data and also perform relational queries on it, applications can use the combination of features that best fits their needs. A hybrid of the relational and document models is a good route for databases to take in the future.”
NULL entryCHAR(|||n|||): A fixed length column of
n characters. Always uses this amount of
characters, padding with spaces if your stored sequence of characters is
shorter.
(|||n|||) is the same as
CHAR(1)VARCHAR(|||n|||): A variable length
column with a maximum of n characters. If your
stored sequence is smaller, then no padding is done and the data is
stored as is.
(|||n|||) is the same as
TEXTTEXT★: A
variable length column of “unlimited length” (about 1 GB).★ – Postgres specific, though similar implementations exist in other variants
CHAR unless what
you are storing is always a given length of characters
VARCHAR or
TEXT in most situations
VARCHAR is more portable, but make sure
to choose a max that is well above whatever your longest sequence of
characters could beVARCHAR if you
want an error to be output if too many characters are
enteredTEXT should always work| Type | Size | Range |
|---|---|---|
SMALLINT |
2 bytes | ±32,767 |
INTEGER |
4 bytes | ±2,147,483,648 |
BIGINT |
8 bytes | ±9,223,372,036,854,775,808 |
SMALLINT is generally only used if disk
space is at a premiumBIGINT if sure that
INTEGER is insufficientSMALLSERIALSERIALBIGSERIALINSERT is interrupted because of an
error, that unique integer is basically skippedThe SERIAL datatypes are Postgres
specific
Since Postgres 10 the SQL core version with
IDENTITY has been supported
Automatically fill the column with the incremented number:
INTEGER GENERATED ALWAYS AS IDENTITYAutomatically fill column but allow overwriting:
INTEGER GENERATED BY DEFAULT AS IDENTITYNUMERIC(|||precision|||, |||scale|||)DECIMAL(|||precision|||, |||scale|||)NUMERIC(5,2) = 192.84NUMERIC(10,2) = 192.84NUMERIC(15,10) = 192.8374650000NUMERIC(6,3) = 192.837NUMERIC(4,2) =
ERRORNUMERIC(5) = 193
NUMERIC = 192.837465
REALDOUBLE PRECISIONREAL will show at most 9 significant
digits and takes up 4 bytesDOUBLE PRECISION will show at most 17
significant digits and takes up 8 bytes| Name | Storage Size | Lowest | Highest | Resolution |
|---|---|---|---|---|
DATE |
4 bytes | 4713 BC | 5874897 AD | 1 day |
TIME |
8 bytes | 00:00:00 | 24:00:00 | 1 μs |
TIMESTAMP |
8 bytes | 4713 BC | 294276 AD | 1 μs |
INTERVAL |
16 bytes | -178000000 yrs | -178000000 yrs | 1 μs |
| January 18, 2022 | 1/18/2022 | 01/18/22 |
| 2022-Jan-18 | 18-Jan-2022 | Jan-18-22 |
| 20220118 | Jan 18, 22 |
TIME is technically without timezoneTIME WITH TIME ZONE is, as expected,
including a timezone
TIMETZ★| 16:05:06 | 16:05 | 04:05 PM |
| 04:05:06.123 PM | 04:05:06 AM |
TIMESTAMP field holds basically both a
time and a dateTIMESTAMP by itself is without a
timezone, as per SQL standardsTIMESTAMP WITH TIME ZONE is with a
timezone. Shorthand in Postgres is
TIMESTAMPTZ★There are plenty of instances where you may need to convert between types
Can use the CAST function to convert
between data types (where it makes sense)
Core syntax is
CAST(|||column name||| AS |||new data type|||)Postgres has a shorthand conversion syntax that uses double colons:
|||column name|||::|||new data type|||Be aware that casting data to a text string with a maximum size smaller than the data will truncate the conversion to get it to fit, not give an error.
INSERT INTO is great for small tables or
adding a few rows, not so great for bulk populating of a tableCOPY
command
C:\Users\Public should be
universally accessible/Users/Shared folder should be
universally accessible, or you could use /tmp
(but that gets purged each time you reboot)id,first_name,last_name,birthday
1,Michael,Scott,"Mar 15, 1964"
2,Dwight,Schrute,"Jan 20, 1970"
3,Pam,Beesly,"Mar 25, 1979"
4,Jim,Halpert,"Oct 1, 1978"
5,Kelly,Kapoor,"Feb 5, 1980"
The COPY command will copy information
into an existing table, it won’t create the table
COPY will be useful
Syntax of the COPY statement:
COPY |||table name|||
FROM |||'full path to filename'|||
WITH ( |||import options||| );C:\/FORMAT CSV sets the delimiter to a comma
and the default text qualifier to double quotesHEADER specifies that there is a header,
and so the first row of the file should be skippedDELIMITER 'x' sets
x to be the delimiter instead of the default
comma (or tab)QUOTE 'x' sets
x to be the new text qualifier, instead of
double quotesCOPY command multiple
times, it will just keep adding the CSV contents to the end of the
current tableSometimes an CSV might not have all the data you want in your SQL table
You can import all the information from the CSV into only a subset of the SQL table columns
Just requires that you specify the target SQL columns after specifying the table
COPY |||table name||| (|||col₁|||, |||col₂|||, |||col₃|||)
FROM |||full path to filename|||
WITH (FORMAT CSV, HEADER);Starting with Postgres 12, the option was added to use a
WHERE keyword to only copy rows that met a
certain condition to the table
The WHERE comes after the
WITH statement:
COPY |||table name||| (|||col₁|||, |||col₂|||)
FROM |||full path to filename|||
WITH (FORMAT CSV)
WHERE |||col₂||| = 'Baboon'Exporting data takes information from your SQL table and allows you to store it in a text file
Syntax-wise, it is almost exactly like copying into a SQL table,
except using TO instead of
FROM
COPY |||table name|||
TO |||full path to filename|||
WITH ( |||import options||| );This exports all columns from the specified table to the filename with the desired format
COPY |||table name||| (|||col₁|||, |||col₂|||, |||col₃|||)
TO |||full path to filename|||
WITH ( |||import options||| );COPY (
SELECT |||col₁|||, |||col₂||| FROM |||table name|||
ORDER BY |||col₂|||
)
TO |||full path to filename|||
WITH ( |||import_options||| );