December 13, 2014

DataReader - import data directly from FRED, Yahoo Finance, Google Analytics, etc.

Pandas has a fancy DataReader command that allows you to import data into a data frame directly from several websites. Currently these include Google Finance, Yahoo Finance, FRED, World Bank, Kenneth French's data library, and Google Analytics (see here for current list).

This is extremely cool. Once upon a time you had to download a million csv files and then load/merge them all. DataReader does this all automatically. You just provide the name of the series you need and the agency that provides it. You can also provide starting/stopping dates if you like.

Here's an example of using DataReader to fetch some data from FRED. I then use it to make a stacked line plot. (For tips on making good plots, read this. If you wanted to make a simpler graph, you could do it in maybe 5 lines of code.)

Code here.

import pandas as pd
from import DataReader

import numpy as np

from matplotlib import pyplot as plt
import matplotlib as mpl
from mpltools import style

# -------------------------------------------------------
# read in data
# -------------------------------------------------------
id_list = ['UEMPLT5','UEMP5TO14', 'UEMP15T26', 'UEMP27OV']
unemp_df = DataReader(id_list, 'fred', start='01/01/1948')

# compute percentage unemployed by unemployment duration
unemp_df['unemp_sum'] = unemp_df[id_list].sum(axis=1)
for col in id_list:
 unemp_df[col+'_pct'] = unemp_df[col]/unemp_df['unemp_sum']

# -------------------------------------------------------
# make plots
# -------------------------------------------------------
fig = plt.figure(figsize=(9, 6))
ax = fig.add_axes([0.1, 0.1, 0.6, 0.75])

# x and y coordinates for 2d plot
X = np.arange(0,len(unemp_df),1)
Y = unemp_df.filter(regex="_pct").values.T

# choose a style from mpltools
style.use('grayscale') #ggplot
sp = ax.stackplot(X, *Y, alpha=.55)

# add legend (reverse to match top-down order of graph)
proxy = [mpl.patches.Rectangle((0,0), 0,0, \
  facecolor=pol.get_facecolor()[0]) for pol in sp]
ax.legend(proxy, ('27+ weeks', '15-26 weeks', '5-14 weeks', '0-4 weeks'),\
  ncol=1,bbox_to_anchor=(1.05, 1),loc=2,fontsize='medium',\
  title='Unemployment Duration')

# tidy up ticks and titles
labels = [int(i) for i in ax.get_xticks().tolist()]
ax.set_xticklabels(unemp_df.index.year[labels], rotation=45 )
plt.title('Fraction of Unemployed by Unemployment Duration \n')

# save figure
fig.savefig('unemp_dur_CS.png', bbox_inches='tight', dpi=100)

February 2, 2014

Smolyak Interpolation

Last semester, Spencer and I dedicated some time (probably too much time in fact) to writing up a Python class that could handle Smolyak interpolation.  For those that are unfamiliar with what Smolyak interpolation is, it is a sparse grid collocation method that uses perpendicular polynomials for accurate interpolation between grid points.  It is slowly becoming a "frequently" used tool by computationally focused economists (even saw a student from Penn use it in his job market talk).  Krueger and Kubler had what I believe was the first paper in economics to use this method in 2004 and they have since written some papers jointly with Malin that use it.  Additionally, Judd, the Maliars, and Rafael Valero have written some papers on the method and they have a working paper that was very helpful in our exploration of the material.  It is useful for tackling high dimensional problems where one would quickly run into the curse of dimensionality.

For example, imagine a model where you have 5 state variables.  Even if you wanted to build a grid with just 5 grid points in each dimension you get a grid that is 5^5 (3125) points large.  The same problem can be approximated quite well with a Smolyak grid of 241 points.  It reduces the growth in states from exponential to polynomial.

Our code is hosted at this github repository and some pretty decent documentation (and basic intro to the mathematics of the method, examples, code documentation) can be found here.  We are by no means finished with this project and we have plans to add more functionality to the package (after we finish our first year), but at this point there is good working code that builds grids, interpolates, and even handles first derivatives.  We would love some feedback (bugs, useful suggestions, etc...) if anyone tests or uses the code.