Tutorial

A First Look at Writing Queries

Data in SEDE is organized in several tables–posts (questions and answers are together in one table), users, badges, comments, and so on. There are a lot of tables; you can see the list in the Query Composer:

compose-query page with tables circled

Some of these tables are specialized; you can come back to them later. As a newcomer to SEDE you are probably most interested in the first two tables, Posts and Users. Let's look at Posts.

To view data from a table in SEDE, you write a SQL SELECT statement. (You can type your SQL right into the Query Composer window.) As the name implies, a SELECT statement picks out the specific information that you're looking for. But what if you just want to browse, to get a feel for what kind of data is in a table? Try this:

SELECT TOP 100 * FROM Posts
Run Query

This query returns all the columns for the first 100 posts (questions and answers). This shows you what columns are in the data and what their values look like. Let's take a look at some results from this query:

output from previous query

Notice that one of the columns is PostTypeId, and its value (in these results) is always either 1 or 2. Looking at the data, we might observe that only posts with a type of 1 have values for AcceptedAnswerId and ViewCount, and might (correctly) conclude that a post type of 1 means the post is a question. By extension, then, a value of 2 would probably mean an answer (with ParentId pointing to the question).

That's correct in this case, but reasoning from the data like this is risky. In SEDE there are two kinds of numeric values you might see–"real numbers", like view count or score, and numbers like post types which represent categories or types. If you see a number that looks like the latter, look for an explicit list of the values instead of guessing. In this case the column heading is "PostTypeId" and the table list has a PostTypes table. That table shows two columns, a number and a name:

entry for PostTypes table

Unfortunately, the Posts table itself doesn't tell you that; it just records the data type:

entry for Posts table

In SEDE, a type of tinyint usually means there is another table that has values that explain what this number means. If you see something listed as a tinyint, look for a table listing the values. The table's name matches up with the name of the column using it–"PostTypeId" and "PostTypes", in this case.

So what are the values? Click on the (information) icon next to the table name:

entry for PostTypes table with info link circled

Clicking on that reveals the values defined for this table:

list of post types from schema

(Click the to return to the main list.)

You'll notice that not all tables in the composer have the icon. This is specifically for tables with these special (enumerated) values, which always have the data type of tinyint. There are lots of other number values (of type int, for integer) running around in the data, like scores and view counts and user numbers; there's nothing to show there. But if there is something to show, you'll see the .

Ok, back to writing queries. So far we've just sampled the data–we selected everything in the first few rows so we could browse. That's handy if you need to familiarize yourself with the data, but it's not how real queries are written.