Jed Rembold
Mar 5, 2025
To focus on the acquisition of data, without worrying about the scripting details, I have created a Docker container you can use
docker pull jrembold/web2db:latestThis container will handle:
You control the behavior of the Web2DB container through environment variables
Usually a good idea to store secrets such as database passwords in environment variables anyway
Locally, the easiest way to handle the variables
is to create a vars.env file and provide it
to the docker container
docker run --rm --env-file vars.env web2dbYou can specify them individually, one after each
-e flag, if you really wanted
docker run --rm -e SITE_URL=www.google.com web2dbweb2db recognizes (or requires) 6 environment variables:
SITE_URL (required)TABLE_NUM (situational/optional)FILE_NAME (optional)DATABASE_URL (optional)TABLE_NAME (situational)DEBUG (optional)HEADER_KEYS (optional)When providing in vars.env, use no
spaces:
SITE_URL=https://willamette.edu/~jjrembold SITE_URLTABLE_NUM
? in the URLPOST functionality
if neededSITE_URL?), include them in your
URLSome APIs may demand that you provide certain information in what is known as a header key
Sometimes used to indicate authentication keys or type of response requested
Provided as a string version of a key-value dictionary:
"{'key_1':13, 'key_2': 'Jed'}"FILE_NAME environment
variable, the data will be written to a newline JSON file with this name
in the container’s /app/output folderCOPY
DATABASE_URL and
TABLE_NAME variablesDATABASE_URL contains
all the info to connect to your database
username:password@hostname:port/databaseTABLE_NAME should be a table that
already exists in the database and has a column named
raw_json
raw_json columnDEFAULT now() to automatically add the time
each entry is added to the database--network=host option when running the
docker containerjsonb data typejsonb column
in Postgres!jrembold/web2db image!DATABASE_URL one!)+ New Variable
button to add the same variables and corresponding values you had in
your var.env file!
DATE: holds a single individual dayTIME: holds a single individual timeTIMESTAMP or variants with
TIMESTAMPTZ: holds a combination of date and
time, along with a potential time zoneINTERVAL: holds a duration of timeextract( part FROM |||datetime_value||| )date_part( |||part|||, |||datetime_value||| )DOUBLE PRECISION value
of whatever part was requested| text | Description |
|---|---|
| century | What century the date is in. 1st century starts 0001-01-01★ |
| day | What day of the month |
| decade | The year divided by 10 |
| dow | The day of the week (0-6, starting with Sunday) |
| doy | The day of the year |
| epoch | Number of seconds since 1970-01-01 |
| hour | The current hour (0-23) |
| microseconds | The number of microseconds |
| text | Description |
|---|---|
| milliseconds | The number of milliseconds |
| minute | The minute |
| month | The month (1-12) |
| quarter | What quarter of the year (1-4) |
| second | The number of seconds |
| timezone | The timezone offset in seconds |
| timezone_hour | The timezone offset in hours |
| week | What week of the year. ISO weeks start on Monday |
| year | The year |
★ – If you disagree with this, please write your complaint to: Pope, Cathedral Saint-Peter of Roma, Vatican.
make_date( |||year|||, |||month|||, |||day|||):
Returns a new DATE type valuemake_time( |||hour|||, |||minute|||, |||seconds||| ):
Returns a new TIME type value (with no
timezone)make_timestamptz(|||year|||,|||month|||,|||day|||,|||hour|||,|||minute|||,|||second|||,|||time zone|||):
Returns a new TIMESTAMPTZ type valuemake_timestamp and
make_interval also existDATE type values will
give just an INT (in days)TIMESTAMP type values
will give an INTERVAL, with the biggest
“unit” in daysage() function can
smooth over both and give units larger than days
age( |||datetime₁|||, |||datetime₂||| ):
Subtracts datetime2 from datetime1justify_interval( |||interval||| ), which
breaks intervals into divisions that don’t exceed a categories max
| function | description |
|---|---|
current_date |
Returns the current date |
current_time |
Returns the current time with timezone |
localtime |
Returns the current time without timezone |
current_timestamp★ |
Returns the current date and time with timezone |
localtimestamp |
Returns the current date and time without timezone |
★ – Postgres also offers the shorter
now() function to do the same
thing
SHOW timezone;SELECT * FROM pg_timezone_abbrevs;SELECT * FROM pg_timezone_names;Change your postgressql.conf file,
which controls your Postgres server. Only recommended if you have
permanently moved elsewhere and the database time zone has not updated
appropriately.
Set future queries in a single session to be from a new timezone:
SET |||timezone||| TO |||time_zone_name_or_abbrv|||;
localtime or
localtimestamp report!Transform a single query to be reported in a different time zone:
SELECT |||dt_col_name||| AT TIME ZONE |||tz_name_or_abbrv|||
FROM |||tablename|||;