Using queries to make sense of your data

Originally published in Media Magazine, Winter 2001

Alright.  You’ve interviewed  your data till you’re blue in the face. You know what all the fields are and what all the codes mean.  You’ve cleaned up your tables so Smith Street isn’t spelled six different ways.  In fact, you probably know more about that government database and its flaws than most of the bureaucrats who fought so hard to keep it out of your hands.
Now, the payoff.  The chance to ask the big questions that will give you the lead on the big story.
But how? By writing queries, that’s how.
For journalists, queries are the heart and soul of database managers; they’re the powerful tools that allow you to sort out the wheat from the chaff, or at least to find out how many farmers grow wheat and how many grow soybeans.
It used to be before the bright folks at software companies invented the query interface, that writing queries was hard work.
You had to learn a mildly-challenging computer lingo called Structured Query Language before you could make any sense of all that raw data.
But these days, programs such as Microsoft Access take most of the work out of it.  In fact, with its simple query wizards, you barely need to know a thing about how queries work to actually construct one.
Which is not to say you shouldn’t learn.  Anyone who is going to drive this CAR seriously should probably teach themselves SQL.  It is still the standard database query language, and when you use those nice ‘n easy interfaces and query wizards, you are writing SQL queries in the background, even if you don’t know it.
But for now, I’ll talk about the query interface in Access because frankly that’s all I’ve got room for here.  When the editor hands over the whole magazine, well then….
In any case, a query is a way to sort, filter or summarize information contained either in one database table or in more than one relational table. Next time, we’ll talk about how the ability to “join” more than one table can lead to solid gold CAR stories.  Today, I’ll talk about simple, “flat file” queries.
Let’s say you’ve got a database that lists all of the motor vehicle accidents in your city.  With some effort, you can probably get this from your local police, minus personal stuff such as names.
A simple query can narrow down the data to those accidents that happened at the corner of Main and 5th, assuming that the database has a field for intersections.  A slightly more sophisticated query can narrow down to accidents at Main and 5th that involved male drivers, assuming there is a field for gender.  Add some more limiting criteria, and you can query for accidents at the corner of Main and 5th involving drunk male drivers under 35.  You get the picture.
You can also do some simple math.  For example, you can add up the number of accidents in 2000 involving drunk male drivers under 35 at each of the city’s intersections.  If the database has a field containing the value of damage in the accidents, you can calculate which intersection saw the most damage in accidents.
The variations are almost endless, once you get the hang of writing queries.
In Access, the job is made relatively simple with the query interface. Basically, it offers you a small box containing all of the fields in the data table.  You then pick the fields you want to use in the query, and move them down into a grid, one column for each field.  You then set different criteria below each field name, such as whether to sort the field up from the smallest number or down from the largest (or in alphabetical or reverse alphabetical order as the case may be), and whether to limit the query results only to those records having a particular entry in that field.  A separate button on the toolbar adds in a “totals” line which allows you to do math.
With some practice, you will quickly become a query whiz (if not Wizard TM), and make some real sense of your data tables.
Once you have finished your queries–and you will invariably do far more than you will need–you should have a good sense of what this database can tell you, and what kinds of stories you will be doing.
From here, it’s generally a matter of doing interviews to confirm and flesh out your findings, and actually writing your story.
As you get closer to deadline, you will probably find yourself going back to your database manager, to run new queries to get new information you didn’t realized you’d need.  It’s a good idea to have your CAR computer nearby the one you are writing on.  Better still, make them the same machine.
Naturally, anytime you can compare calculated totals to published material, it is a good idea to do so.  This can aid immeasurably in ensuring the ultimate accuracy of your work.  As mentioned in the last Media column, your data can produce impressive query results, even if it is wrong.  Unfortunately, wrong results look just as good as correct ones.
Depending on your medium, you will need to think about graphics, visuals and sound.  You may also want to decide what additional data you can present on your outlet’s web site, to supplement the limited information you can provide in print or in your broadcast story.
With some practice and patience, you can be on your way to page one or newscast leading stories, one query at a time.