Data journalism: It’s all in the name
In this assignment, we will use the Social Security Administration’s baby name dataset to examine the popularity of a name or names over times.
The Social Security Administration tracks the frequency of each baby name used during a given year. They release the datasets each year in April. When it’s released, news organizations all over the country use the data to create stories about trends in baby names. Then throughout the year you’ll see the occasional, topical baby names story. This guide will show how you can make your own baby names story using Excel.
Step 1: Get the data. Usually before Step 1, we would need to figure out our story and find an appropriate dataset to answer our questions. For this story, we already know our story and data set, so we can just go get it.
You can find the Social Security Administration’s baby names dataset can be found through the government’s data repository, Data.gov. You can search for “baby names” or something like it.
A bunch of datasets will pop up, but the first two are the baby names data. The first dataset is the national-level data, but for our assignment, we will use the second dataset, which is state-level data. If you click the “Zip” button, it should automatically start downloading the dataset.
The data will come down as one relatively large zip file. Once you uncompress the file, you will see a text file for each state.
Step 2: Import the data in Excel. Since the data is provided in text files, we can’t just open them in Excel. Instead we need to use Excel’s import function to convert the text into a Excel spreadsheet.
Start by opening up Excel and choosing File > Import. Excel will ask you what type of file it is (i.e., text) and where it is saved. Then you will get to the “Text Import Wizard.”
In this set of windows, you will be telling Excel how the data is organized within the text file, so it knows how to convert it in to a spreadsheet. The first question is whether the data is fixed width (i.e., the beginning of each piece of data is a fixed number of spaces apart) or delimited (i.e., the pieces of data are separated by some kind of indicator, like a tab or comma). As you can see in the preview window, our data is delimited and separated by commas.
The final screen asks you about the format of the data you are importing. In our case, all of the data is just “general” data, but if you were importing dates or numbers as text you could tell it here.
Finally, it will ask you where you want to put the data. You should be able to leave it how it is.
Step 3: Find the data you need. You should come into the assignment with an idea of what you’re looking. In my in-class example, I looked at the popularity of the name Peyton over time.
The easiest way to find all the Peytons is to use Excel’s filtering function. [note: before I started filtering I added a row above all my data and added column labels, like year and gender] You’ll find the filter button on the right side of the screen.
When you turn on the filtering, you will see a drop-down arrow has been appeared on the right side of each column label. If you click the arrow for a column, the filtering box for the column will pop up.
So for our example, you probably want filter by name, so you can get rid of all the data for the names you don’t care about.
In the above picture, I have filtered down to just the Peytons. If I wanted I could also filter by gender to see just the male (or female) Peytons. Now that I have it filtered down, I can check if my story idea (e.g., the rise in popularity in Tennessee of the name Peyton) is actually a story. Then I can decide how to report the data.
For this assignment we want to create a simple line chart. To do this, I would start by copying the relevant data to a new sheet within the same Excel file. Once it is organized on the new sheet, use Excel’s chart tool to start making the chart. For this one I chose to use a stacked bar chart.
Then we work on the formatting to make it the perfect chart.