Quandl update
It’s been a few months since I wrote about how much I love Quandl so you might be wondering how things are working out for me. The good stuff is still really good but there are a few things that still need work.
The good
- I can import data directly from R/Python code without having to go near the ONS website. It’s at easy as
Quandl.get('UKONS/ABMI')
and I have a properly indexed dataframe ready to go. - Once I’ve written the import I can re-run the code and it automatically pulls the latest data so I never need to worry about when the last release was. People lucky enough to have Bloomberg/Datastream probably don’t think this is a big deal but I’m not one of those people.
- The Quandl maintainers are incredibly good at adding new data sources and you can even upload your own datasets now.
- It charts the series on the website for you and gives you embeddable charts like this in seconds, without leaving your browser. That is incredibly useful when you want to have a quick look at a series without having to download the data and chart it yourself.
The frustrating
- Quandl are great at adding datasets and great at aggregating data but there’s very little organisation of it so far. For instance, the UK’s GDP series is available from the ONS in three different datasets at two different frequencies and in various forms (raw, SA, %age change, etc). Quandl hosts them all, which is great, but doesn’t draw the links between them. For someone who knows the data that’s fine but it could prove very confusing for a novice. The ONS give the series the same code across all releases so perhaps linking the releases is something that could be done in future.
- Searching for data is a nightmare. If I search ‘UK GDP’ there are 17417 pages of results and the official ONS data isn’t even on the first page. Okay, let’s narrow the search by suggested sources to the ONS. Now there are only nine pages of results but the GDP series still aren’t on the first page. Luckily, I know the ONS code for GDP,
YBHA
, so let’s search on that. Uh-oh, no results! How am I going to find this series?! Well, I can see that the other series are coded by Quandl asUKONS/[dataset]_[ONS code]_[frequency]
, so let’s try just pulling the seriesUKONS/QNA_YBHA_Q
from a Python interpreter. Success, I’ve got the series! But I still can’t see the metadata, or the webpage, or embed a chart, or do any of the other great website things. Update: Abraham Thomas of Quandl helpfully points out that I could go to the address to get the metadata and the link will be automatically expanded.
As Quandl has grown it’s added data very quickly but it’s now at the point that it’s becoming a bit unwieldy without effective search and organisational tools. I really hope that’s next on the devs list of things to do so it can become the fantastic repository that it has the potential to be.
Hi James, Abraham here from Quandl. You’re absolutely right — data discovery is our biggest weakness, and the single most frustrating thing about Quandl. There’s no point having millions of datasets “somewhere” on the website if users can’t find the precise datasets they need, quickly and efficiently. Fixing search is our #1 dev priority. We’re working on it!
Thanks Abraham, I’m sure you guys are on top of it and it’s obviously a fairly tough problem to crack. I look forward to seeing what you come up with!