Jed Rembold
Datafest 2023

If no condition provided, tries to match across all shared columns
facebook %>% inner_join(linkedin)If needing to match differently named columns across both tables, or only a sub-selection of similarly named columns
facebook %>% inner_join(linkedin,
by=join_by(name == name2)
)Pandas uses the same method for all joins, but you need to
specify the how
Not providing a condition here will match on the index
Unless you’ve set up your indexes, you’ll probably want to
provide an on parameter
Even then, this will attempt to map the index of the second dataframe to the specified column in the first
facebook.join(linkedin, on='name', how='inner')Frequently will want to set the index to the desired column to aid matching
facebook.set_index('name')Can run a lightweight SQL server using SQLite
SQL is where joins originated, so they are very well supported:
SELECT *
FROM facebook
JOIN linkedin
ON facebook.name = linkedin.nameLEFT JOIN
or RIGHT JOIN
LEFT JOIN is decidedly the more common,
and you can make any RIGHT JOIN a
LEFT JOIN just by flipping the table
orderingNULL or NAN
values will be inserted for the secondary table columns if no match is
foundIn R:
facebook %>% left_join(linkedin,
by=join_by(name == name))In Pandas:
facebook.join(linkedin, how='left')In SQL:
SELECT *
FROM facebook LEFT JOIN linkedin
ON facebook.name = linkedin.nameFULL OUTER JOIN will
do what you want
LEFT JOIN followed by
a RIGHT JOIN with the existing tableNULL or NAN
valuesIn R:
facebook %>% full_join(linkedin,
by=join_by(name == name))In Pandas:
facebook.join(linkedin, how='outer')In SQL:
SELECT *
FROM facebook FULL OUTER JOIN linkedin
ON facebook.name = linkedin.nameCROSS JOIN will return a table of all
of these possibilitiesIn R:
facebook %>% cross_join(linkedin)In Pandas:
facebook.join(linkedin, how='cross')In SQL:
SELECT *
FROM facebook CROSS JOIN linkedinGo forth and join!