Using Multiple Tables: JOIN
So far we've looked at queries restricted to a single table. As we saw in the introductory section, the
real power of a database (as compared to a spreadsheet) is in being able to combine tables. We
do this with the JOIN
clause.
Joining tables works by finding shared values (keys) in two tables. If a post was created by a user with ID 9753, we can find the user in the Users table with an ID of 9753. This allows us to display the name of the user (which comes from the Users table) as part of the results of a query on the Posts table:
SELECT p.Title, p.Id, p.Score, p.ViewCount, u.DisplayName FROM Posts p JOIN Users u ON p.OwnerUserId = u.Id WHERE p.PostTypeId = 1 AND p.Score >= 25Run Query
If you're new to SQL, the best way to read this statement is from the inside out. The logic is:
-
Start with the Posts table:
FROM Posts p
. Because we're now talking about more than one table we need to use names;p
refers to any row in the Posts table. Think of it sort of like a loop counter;p
means "whatever row we're currently looking at". A query will end up looking at every row in the table to decide whether that row is relevant. -
Now we're going to cross-reference this with the Users table:
JOIN Users u ON p.OwnerUserId = u.Id
. Just as we usedp
to mean a row in the Posts table, here we useu
to mean a row in the Users table. TheJOIN
says to take the value in the OwnerUserId column inp
(a post), find the same value in the Id column in u (a user), and match them up.JOIN
does a lot of the heavy lifting in SEDE (and many other databases). -
Remember what we said about reading this query from the inside out? Now we go up to the first line:
SELECT p.Title, p.Id, p.Score, p.ViewCount, u.DisplayName
. Notice that we've now added prefixes to the column names to indicate which table they come from. To the four columns from the Posts table we've added the display name from the Users table. -
Finally, we go to the bottom for further restrictions. The
WHERE
clause restricts the values we show. Note that we've also added the prefixes to the values here.
This query produces results in an unspecified order:

We can use any column in either of the joined tables to control the ordering. It doesn't need to be one of the
columns listed in the SELECT
statement.
For the purpose of illustration, we'll use user reputation, starting with the highest, to sort the results. This
also has the effect of grouping all questions from the same author together. We add the
ORDER BY
clause to the previous query:
SELECT p.Title, p.Id, p.Score, p.ViewCount, u.DisplayName FROM Posts p JOIN Users u ON p.OwnerUserId = u.Id WHERE p.PostTypeId = 1 AND p.Score >= 25 ORDER BY u.Reputation DESCRun Query
See how that changes the results.
(If you wanted the lowest-reputation user first, you would replace DESC
(descending) with ASC
(ascending) in the
ORDER BY
clause.)
JOIN
s are not limited to two tables. They
also aren't limited to different tables. Consider the Posts table, which contains both
questions and answers (among other things). Suppose you want to access a question and its accepted answer? You can
do that by joining Posts to Posts:
SELECT p.Title, p.Id, p.Score, a.Score AS "Accepted Score", u.DisplayName AS "Asker", au.DisplayName AS "Answerer" FROM Posts p JOIN Posts a ON p.AcceptedAnswerId = a.Id JOIN Users u ON p.OwnerUserId = u.Id JOIN Users au ON a.OwnerUserId = au.Id WHERE p.PostTypeId = 1 AND p.Score >= 25 and p.AcceptedAnswerId IS NOT NULL ORDER BY p.Score DESCRun Query
Yes, that's a four-way JOIN
. For each post
(that is a question), we find its accepted answer (if that is not null). The answer, being a post, has an ID. We
can therefore join a question to its answer: JOIN
Posts a ON
p.AcceptedAnswerId =
a.Id
.
The JOIN
of the (question) post to its user
is unchanged. We need a separate JOIN
to do the same for the answering user, because answers are in a, not p,
even though they are both references to the Posts table.
Notice that with all the extra posts and users involved in this query, we've used
AS
clauses in the
SELECT
to specify more meaningful column
headers:
SELECT p.Title, p.Id, p.Score, a.Score AS "Accepted Score", u.DisplayName AS "Asker", au.DisplayName AS "Answerer"
We've also changed the sorting:
