Quandl and Zillow

Quick news that Quandl, through its R interface, is giving access to county-level price data from Zillow. The interface is OK and it’s cheap, so it seems to beat most of the competition — including buying county-level Case Shiller price data from Standard and Poors.

quandl_logo

There are pluses and minuses.

Pluses:

  • The R interface is simple and easy to use. It takes five minutes to get the historical time series of one county’s median price.
  • Zillow provides median rents.
  • Zillow provides list prices as well as sales prices — this used to be available in the expensive CoreLogic data.
  • The number of foreclosures is also in the data. This is also available in the expensive RealtyTrac data (compiled from county records); but RealtyTrac still provides more information such as Real Estate Owned properties, Lis Pendens, Notice of Trustee Sale, i.e. the full timeline of the foreclosure process.

Minuses:

  • There are more than 3,000 counties in the U.S. Getting the full U.S. data thus requires more than 3,000 function calls. Not really the simplest thing on earth, especially when such data stored in CSV would take a few seconds to load on a statistical package.
  • It goes back to 1996, which may be good enough for you, but is not enough for some other applications.
  • Micro data is a must, and county-level data points are not sufficient any more. In that sense Quandl overstates the amount of information they deliver to their customers. They call one time series data set. Not really the standard way of defining a data set. (where is the “set” in data “set”?)

So we’re still going to rely on either Census values at the block group level, but they are self-reported and top coded; or on other data sets such as Data Quick, or FNC. I use FNC data in my latest paper.

 

 

Advertisements