Harnessing growing data volumes: The St Louis Fed

St Louis Fed officials discuss handling ever bigger, more granular and more global data

stlouis-77624764
St Louis

The Federal Reserve Bank of St Louis has become a hub of global data, with its Federal Reserve Economic Data (Fred) database growing from small beginnings to a leading example of data management. Across nearly 400,000 time series – and growing – the database provides an easily searchable resource for economic researchers.

With such large volumes of data being produced, gathering the data quickly and then organising it in such a way that people can find what they need is a major challenge. As any user knows, many national statistical websites are showing their age, with poor search functionality and frustrating interfaces. Search for ‘headline inflation’ and you are often presented with 100 different series – but not the one you are looking for. The St Louis Fed’s experience developing Fred may offer a useful example of how to cope as data becomes bigger.

 

How did Fred get started?

Katrina Stierholz: More than 50 years ago at the bank, back in the print days, there was an interest in making data available. We put out data publications to provide people with information on the current economic conditions, particularly monetary indicators. These became very popular, and when everything went online so did Fred. This year is the 25th anniversary of Fred being accessible via the internet.

It started out as an electronic bulletin board, and it was a fairly small product for many years. In the 1990s, I think we only had 3,000 time series available. It was not until probably 2004 that we grew to more than 10,000 series. There has been an incredible growth rate in the past 10 years.


What changed in 2004?

Katrina Stierholz: A couple of things. We realised that the paper data publication world was disappearing, so we decided to focus on Fred. We hired some new people who had fresh ideas and we surveyed our users. We asked users what they wanted and the answer was overwhelming: “more data”.

We doubled down on adding data, especially a lot of international data. We had regional data and continued to work on that, but then we added international data as a focus. You can imagine, when you start adding time series for every country in the world, that adds up pretty quickly.

 

How is the data represented? Fred charts are one of the standout features.

Keith Taylor: Originally, it was just downloading data. Around the early 2000s we added charts, and since then we have steadily tried to develop better charting software. Much of that was initially driven by an internal interest in being able to represent the more advanced graphics that you would see in our data publications. Many of them are basic time series, but we also add other kinds of interesting graphs. We have various graph types – line charts, bar charts, pie charts, area charts and scatter plots. 

We also have GeoFred. We realised that, once we have this regional data along with the international data, mapping it is one of the best ways to represent it. And we have dashboards, which allow you to put a bunch of different charts all on one webpage, where it all updates automatically.

 

Have you seen demand for the data change over time?

Keith Taylor: The big change was that users really wanted a lot of international data and more granular data. If you ask any researcher: “Would you like this data?” they say: “Yes, of course.” No one ever says no to data.

Our core data is macroeconomic indicators such as GDP, inflation
and unemployment. But there has been great interest in indicators that impact the economy but maybe aren’t traditionally thought of as macroeconomic indicators. We are exploring crime statistics, poverty data, health data and health insurance – there has been a lot of interest in that kind of data. 

In addition, we have seen our user base move from super-sophisticated people – bankers, academics, economists – towards, especially over the past five years, less sophisticated users who are much more interested in data that is easily accessible.


You’ve touched on some non-conventional data sources – are you interested in big data?

Keith Taylor: We’re not doing anything in the sense of, for instance, the Billion Prices Project. But I think I have a big data problem any time I have more data than I can handle. For certain kinds of users, a spreadsheet with 10 columns is a big data problem, because they just do not have the right knowledge. Then, at the other end of the scale are Google, Amazon or Facebook, which are dealing with petabytes of data.

Fred is really about solving the big data problem for the people at the other end of the spectrum. If you wanted to get GDP from the Bureau of Economic Analysis (BEA), you could do analysis and compare that with, say, the Bureau of Labor Statistics’ (BLS) non-farm payrolls. For many people, that is a big data problem because they have to go to two different sources, navigate two different sets of websites, download relatively large files, parse out what they want and make certain kinds of conversions. That’s really where our focus is: streamlining that process and ensuring that someone can do their analysis much more quickly.

 

Are you seeing that kind of problem appear more often now that data volumes are growing?

Keith Taylor: Yes, definitely. I’ve been here for about five years, and when I started we would typically work with a release that had several hundred series, maybe a few thousand series. Now, let’s say you want all the counties for the US – that’s 3,000 series, and that’s going to be in a data set of 60,000 series. It is not super complex, but many people don’t have the skills to to tease that stuff out.

 

Are you hiring new specialists to manage these challenges?

Keith Taylor: Not yet, but we recognise that day is coming. However, we are already hiring more people with a high degree of fluency with metadata.

 

Can you explain the idea of metadata more?

Keith Taylor: Metadata is the data about the data. From the Fred perspective, you have the series-level information, and that describes what the indicator is. Take GDP. Which country is it? What are the units? Is it seasonally adjusted or  not? What other adjustments have been made? Is it inflation adjusted? 

Having well-defined metadata allows users to have a high degree of confidence that, when they search for something, they find what they are looking for. It also gives them a greater degree of understanding if they have additional questions. We’re focusing a lot on metadata to improve search functions and to help people understand the data.

 

It seems like metadata is a big issue, even if it doesn’t grab headlines in the way standard data does.

Keith Taylor: That’s especially true when you’re talking about big data. For example, on Fred, if you search for GDP, you get perhaps 27,000 results. They all measure GDP in some way, or they are a component of GDP, but how do you sort through that and then know that what you have really is what you were looking for? Metadata is a good way to do that.

 

Does the St Louis Fed have research streams looking to use the data in innovative ways?

Keith Taylor: Michael McCracken, an economist here, is working on something we call Fred-MD. The relatively famous Stock and Watson data set has not been updated since it was originally compiled. It is very difficult for other researchers to obtain that data.

Michael worked to identify related series that existed in Fred; then he wrote a paper demonstrating that the data set he created was basically the same as Stock and Watson on a number of different measures.1 We now update that routinely; we have been adding vintages to it as well. You not only have the latest values for these 90 series, but also the monthly vintages, going back about a year now, which allow a new kind of analysis.

 

How does the St Louis Fed handle data governance?

Katrina Stierholz: Everything is in the research department. The Fred data team handles it well and has many processes and procedures for uploading the data and checking it. We also have a data librarian who looks at the data to ensure it complies with copyright and licensing. If it is not public data, sometimes data series require permission, although around 99.9% of data is publicly available.

 

Is security a challenge? Where do you store the data?

Keith Taylor: Like all large organisations, we take security very seriously. We cannot really discuss where we keep the data. 

 

What other challenges are thrown up by data management?

Keith Taylor: Our biggest challenge is keeping the database going, from a processing standpoint. 

We can have high levels of traffic on our website, which ends up really pulling on the database a lot. We have done a number of things in terms of virtualising servers, and we have put in place caching software to lessen the burden.

We are an aggregator, so when the data becomes available at the source we go out and grab it. We want to get it into Fred as quickly as possible, so if someone is making some market decisions they can use Fred to do that. Another constraint is how quickly we can get the data down, process it, run it through all of our checks and then load it into Fred. 

 

Can you explain more about how the aggregation process works?

Keith Taylor: We do it in a variety of different ways. In a small number of situations, we partner with the other organisation. It will normally tell us it would like us to host the data on Fred, but that tends to be only very small agencies. For the big agencies – such as the BEA, BLS, Organisation for Economic Co-operation and Development and the World Bank – we go out and use the interfaces they provide to the public. Whenever possible, we use application program interfaces (APIs) or bulk downloads, which many sites now have.

In the cases where they do not have that we ‘scrape’ the data, or we download a file that is really meant for human consumption. We have written custom scripts that will deconstruct these files and analyse the data before loading it into Fred. In an ideal world, they just say “here’s the data”, which is perfectly formatted, but that is a very small percentage of the time.

 

Where the interface is provided, is that quick enough for
your needs?

Keith Taylor: It depends. Some sources have interfaces for pulling the data down very quickly and easily, but do not have all the data on the release. The BEA set up an API recently, which is awesome, but it is not updated as quickly as its public consumption files, so we are not able to use it.

There has been a move to put many of these things in APIs or in bigger bulk download programs. We have to keep looking out there to see if there is a more efficient way to get it.

 

And does the data match up as well as you would like? Is that a question of metadata?

Keith Taylor: Yes, we spend our days trying to get the data to match up. One challenge is that the metadata is revised over time, so even if you get the metadata to match up initially, that may change. We track those revisions in another product called Alfred. We have all of the revisions of both the series-level metadata, and the observations themselves. That takes up a huge amount of time.

Right now, observations are handled automatically in terms of tracking revisions to them. The series metadata is kind of half-automated – and so we are looking to improve our metadata schema here and trying to work with others to establish standards around metadata, which would allow us to build an automated process.

 

What are your plans for the future? What improvements would you like to implement?

Keith Taylor: I am really interested in looking at how we can improve our metadata, which would lead, I think, to improvements in searching. According to user feedback, this is an area we can do better in. Because we have about 400,000 series, it’s a case of finding it and having a high degree of confidence that what you found is what you want. The other area of focus is to continue to expand the data into other areas.

Katrina Stierholz: We have a growing base of novice users, so we’re doing things like adding related content to series to help people understand what the series is. We’re adding links to our economic education materials, to historical publications and to articles written on the subject. If you search for GDP, and you don’t know what GDP is composed of, then you can look at some of that information and understand it more. We are trying to help those fairly new users understand what they are getting.

 

Many central banks are testing new communication methods. Do you have educational programmes? 

Katrina Stierholz: We have a very comprehensive economic education programme here at the St Louis Fed. We have materials based on the curriculum that have been written to support teachers. They are designed to be given to teachers, who then deliver them to students. The curriculum provides basic economic concepts, is matched up to curriculum standards and is available in several formats – print, online, podcasts and videos. There is a ton of really powerful economic education, and it is widely used in the US.

 

Do you have ways of measuring the impact?

Katrina Stierholz: The programmes include pre-tests, and post-tests for the online courses. Students are required to take a pre-test, and then a post-test to see if they have learned anything. They consistently, regularly and, to a degree of confidence, show improvement in their scores. 

 


taylorkeithformalfinalKeith Taylor is co-ordinator of the St Louis Fed’s data desk, which is part of the reserve bank’s research division, and manages the collection, organisation and publication of Fred data. Before joining the Fed, Taylor practised law at firms in Missouri and Illinois. He holds a doctorate in law from the Washington University School of Law.


stierholtzkatrinaKatrina Stierholz is a vice-president in the St Louis Fed’s research division, and director of library and information services. She is in charge of the reserve bank’s physical and digital libraries, and oversees the data desk, which posts Fred data. Before joining the Fed, Stierholz worked at the Washington University School of Law Library. She holds an MSc in library and information science from the University of Illinois at Urbana–Champaign.


Notes
1. Michael McCracken and Serena Ng, Fred-MD: a monthly database for macroeconomic research (Research Division, Federal Reserve Bank of St Louis, August 2015) 

This feature is part of the Central Banking focus report, Big data in central banks, published in association with BearingPoint.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@centralbanking.com or view our subscription options here: http://subscriptions.centralbanking.com/subscribe

You are currently unable to copy this content. Please contact info@centralbanking.com to find out more.

You need to sign in to use this feature. If you don’t have a Central Banking account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account

.