December 13, 2014

DataReader - import data directly from FRED, Yahoo Finance, Google Analytics, etc.

Pandas has a fancy DataReader command that allows you to import data into a data frame directly from several websites. Currently these include Google Finance, Yahoo Finance, FRED, World Bank, Kenneth French's data library, and Google Analytics (see here for current list).

This is extremely cool. Once upon a time you had to download a million csv files and then load/merge them all. DataReader does this all automatically. You just provide the name of the series you need and the agency that provides it. You can also provide starting/stopping dates if you like.

Here's an example of using DataReader to fetch some data from FRED. I then use it to make a stacked line plot. (For tips on making good plots, read this. If you wanted to make a simpler graph, you could do it in maybe 5 lines of code.)

Code here.

import pandas as pd
from pandas.io.data import DataReader

import numpy as np

from matplotlib import pyplot as plt
import matplotlib as mpl
from mpltools import style


# -------------------------------------------------------
# read in data
# -------------------------------------------------------
id_list = ['UEMPLT5','UEMP5TO14', 'UEMP15T26', 'UEMP27OV']
unemp_df = DataReader(id_list, 'fred', start='01/01/1948')

# compute percentage unemployed by unemployment duration
unemp_df['unemp_sum'] = unemp_df[id_list].sum(axis=1)
for col in id_list:
 unemp_df[col+'_pct'] = unemp_df[col]/unemp_df['unemp_sum']


# -------------------------------------------------------
# make plots
# -------------------------------------------------------
fig = plt.figure(figsize=(9, 6))
ax = fig.add_axes([0.1, 0.1, 0.6, 0.75])

# x and y coordinates for 2d plot
X = np.arange(0,len(unemp_df),1)
Y = unemp_df.filter(regex="_pct").values.T

# choose a style from mpltools
style.use('grayscale') #ggplot
sp = ax.stackplot(X, *Y, alpha=.55)

# add legend (reverse to match top-down order of graph)
proxy = [mpl.patches.Rectangle((0,0), 0,0, \
  facecolor=pol.get_facecolor()[0]) for pol in sp]
proxy.reverse()
ax.legend(proxy, ('27+ weeks', '15-26 weeks', '5-14 weeks', '0-4 weeks'),\
  ncol=1,bbox_to_anchor=(1.05, 1),loc=2,fontsize='medium',\
  title='Unemployment Duration')

# tidy up ticks and titles
plt.ylim([0,1])
plt.xlim([0,len(unemp_df)-3])
labels = [int(i) for i in ax.get_xticks().tolist()]
ax.set_xticklabels(unemp_df.index.year[labels], rotation=45 )
plt.title('Fraction of Unemployed by Unemployment Duration \n')

# save figure
fig.savefig('unemp_dur_CS.png', bbox_inches='tight', dpi=100)

plt.show()

February 2, 2014

Smolyak Interpolation

Last semester, Spencer and I dedicated some time (probably too much time in fact) to writing up a Python class that could handle Smolyak interpolation.  For those that are unfamiliar with what Smolyak interpolation is, it is a sparse grid collocation method that uses perpendicular polynomials for accurate interpolation between grid points.  It is slowly becoming a "frequently" used tool by computationally focused economists (even saw a student from Penn use it in his job market talk).  Krueger and Kubler had what I believe was the first paper in economics to use this method in 2004 and they have since written some papers jointly with Malin that use it.  Additionally, Judd, the Maliars, and Rafael Valero have written some papers on the method and they have a working paper that was very helpful in our exploration of the material.  It is useful for tackling high dimensional problems where one would quickly run into the curse of dimensionality.

For example, imagine a model where you have 5 state variables.  Even if you wanted to build a grid with just 5 grid points in each dimension you get a grid that is 5^5 (3125) points large.  The same problem can be approximated quite well with a Smolyak grid of 241 points.  It reduces the growth in states from exponential to polynomial.

Our code is hosted at this github repository and some pretty decent documentation (and basic intro to the mathematics of the method, examples, code documentation) can be found here.  We are by no means finished with this project and we have plans to add more functionality to the package (after we finish our first year), but at this point there is good working code that builds grids, interpolates, and even handles first derivatives.  We would love some feedback (bugs, useful suggestions, etc...) if anyone tests or uses the code.

October 22, 2013

Rouwenhorst (1995) Markov Transition Matrix

If you're going to approximate an AR(1) process using a finite state Markov-chain, many people will use the method given in either Tauchen (1986) or Tauchen and Hussey (1991). For a highly persistent process, however, Rouwenhorst (1995) outperforms both of these methods considerably. A rigorous comparison is given by Kopecky and Suen (2010).

Python code for Tauchen and Hussey is in a previous post. An implementation of Rowenhorst is given here (and shown below).


import numpy as np

def rouwen(rho, mu, step, num):
    '''
    Adapted from Lu Zhang and Karen Kopecky. Python by Ben Tengelsen.
    Construct transition probability matrix for discretizing an AR(1)
    process. This procedure is from Rouwenhorst (1995), which works
    well for very persistent processes.

    INPUTS:
    rho  - persistence (close to one)
    mu   - mean and the middle point of the discrete state space
    step - step size of the even-spaced grid
    num  - number of grid points on the discretized process

    OUTPUT:
    dscSp  - discrete state space (num by 1 vector)
    transP - transition probability matrix over the grid
    '''

    # discrete state space
    dscSp = np.linspace(mu -(num-1)/2*step, mu +(num-1)/2*step, num).T

    # transition probability matrix
    q = p = (rho + 1)/2.
    transP = np.array([[p**2, p*(1-q), (1-q)**2], \
                    [2*p*(1-p), p*q+(1-p)*(1-q), 2*q*(1-q)], \
                    [(1-p)**2, (1-p)*q, q**2]]).T

    while transP.shape[0] <= num - 1:

        # see Rouwenhorst 1995
        len_P = transP.shape[0]
        transP = p*np.vstack((np.hstack((transP,np.zeros((len_P,1)))), np.zeros((1, len_P+1)))) \
        + (1-p)*np.vstack((np.hstack((np.zeros((len_P, 1)), transP)), np.zeros((1, len_P+1)))) \
        + (1-q)*np.vstack((np.zeros((1, len_P+1)), np.hstack((transP, np.zeros((len_P, 1)))))) \
        + q*np.vstack((np.zeros((1, len_P+1)), np.hstack((np.zeros((len_P, 1)), transP))))

        transP[1:-1] /= 2.

    # ensure columns sum to 1
    if np.max(np.abs(np.sum(transP, axis=1) - np.ones(transP.shape))) >= 1e-12:
        print('Problem in rouwen routine!')
        return None
    else:
        return transP.T, dscSp