Tutorial

Computations

So far we've seen how to find individual interesting rows, like posts with high scores. A lot of the queries you'll find on SEDE are of this type—find posts from low-reputation users, all comments over a certain score threshold, all self-answered questions with positive scores for both question and answer, most-edited tag wikis, and so on. But SEDE (and SQL) can also do computations and aggregation.

Consider the question "who answers my questions"—that is, who are the users on the site who tend to help me? We can start with the following query:

SELECT a.Id AS [Post Link], a.Score, au.Id AS [User Link]
FROM Posts p 
JOIN Posts a ON p.Id = a.ParentId
JOIN Users au ON a.OwnerUserId = au.Id
WHERE p.PostTypeId = 1
AND p.OwnerUserId = ##UserId##
ORDER BY a.Score DESC
Run Query

This query finds all questions (PostTypeId = 1) from a given user(##UserId##) and returns the answer links, answer scores, and answering users, ordered by answer score (highest first). With this query we can start to see who provides the most-popular answers to somebody's questions.

But this doesn't tell us anything about overall patterns; it operates at the single-answer level. We're really more interested in patterns–which users tend to give good answers to my (or anybody else's) questions?

The next query produces some aggregate data, like this:

output from next query: total score, number of answers, answering user

Here's the query:

SELECT SUM(a.Score) as 'Total Score',
COUNT(au.Id) as '# Answers',
au.Id AS [User Link]
FROM Posts p 
JOIN Posts a ON p.Id = a.ParentId
JOIN Users au ON a.OwnerUserId = au.Id
WHERE p.PostTypeId = 1
AND p.OwnerUserId = ##UserId##
-- We want one row per user:
GROUP BY au.Id
-- In order by total score, highest first:
ORDER BY SUM(a.Score) DESC
Run Query

We use the SUM and COUNT functions in the SELECT statement to get the row values. We want the total score (SUM) of all answers from each user, and we want to know how many answers there were. That's the first part of the query:

SELECT SUM(a.Score) as 'Total Score',
COUNT(au.Id) as '# Answers',
au.Id AS [User Link]

(Perhaps you would prefer the average score, not the sum. You can do that with AVG.)

The JOINs are the same as in the previous query; we're linking question, answers, and answering users. The restrictions (WHERE) are also the same.

When reporting results we group the rows by answering user—one row per user, as opposed to one row per answer—and we also want to show the highest score first. We used the SUM earlier, in the SELECT, but it's not actually a value stored in the table, so we have to use it again here:

GROUP BY au.Id
ORDER BY SUM(a.Score) DESC

Aggregate functions you might find helpful, in addition to SUM and COUNT, are AVG (average), MIN, and MAX.

Notice, by the way, that while the "base" table in the last two queries is Posts, we don't actually return any columns from that table. That's fine. We're using it entirely as "glue" to hold the rest of the query together. Just because you include a table doesn't mean you have to show anything from it.