December 13, 2014

DataReader - import data directly from FRED, Yahoo Finance, Google Analytics, etc.

Pandas has a fancy DataReader command that allows you to import data into a data frame directly from several websites. Currently these include Google Finance, Yahoo Finance, FRED, World Bank, Kenneth French's data library, and Google Analytics (see here for current list).

This is extremely cool. Once upon a time you had to download a million csv files and then load/merge them all. DataReader does this all automatically. You just provide the name of the series you need and the agency that provides it. You can also provide starting/stopping dates if you like.

Here's an example of using DataReader to fetch some data from FRED. I then use it to make a stacked line plot. (For tips on making good plots, read this. If you wanted to make a simpler graph, you could do it in maybe 5 lines of code.)

Code here.

import pandas as pd
from pandas.io.data import DataReader

import numpy as np

from matplotlib import pyplot as plt
import matplotlib as mpl
from mpltools import style


# -------------------------------------------------------
# read in data
# -------------------------------------------------------
id_list = ['UEMPLT5','UEMP5TO14', 'UEMP15T26', 'UEMP27OV']
unemp_df = DataReader(id_list, 'fred', start='01/01/1948')

# compute percentage unemployed by unemployment duration
unemp_df['unemp_sum'] = unemp_df[id_list].sum(axis=1)
for col in id_list:
 unemp_df[col+'_pct'] = unemp_df[col]/unemp_df['unemp_sum']


# -------------------------------------------------------
# make plots
# -------------------------------------------------------
fig = plt.figure(figsize=(9, 6))
ax = fig.add_axes([0.1, 0.1, 0.6, 0.75])

# x and y coordinates for 2d plot
X = np.arange(0,len(unemp_df),1)
Y = unemp_df.filter(regex="_pct").values.T

# choose a style from mpltools
style.use('grayscale') #ggplot
sp = ax.stackplot(X, *Y, alpha=.55)

# add legend (reverse to match top-down order of graph)
proxy = [mpl.patches.Rectangle((0,0), 0,0, \
  facecolor=pol.get_facecolor()[0]) for pol in sp]
proxy.reverse()
ax.legend(proxy, ('27+ weeks', '15-26 weeks', '5-14 weeks', '0-4 weeks'),\
  ncol=1,bbox_to_anchor=(1.05, 1),loc=2,fontsize='medium',\
  title='Unemployment Duration')

# tidy up ticks and titles
plt.ylim([0,1])
plt.xlim([0,len(unemp_df)-3])
labels = [int(i) for i in ax.get_xticks().tolist()]
ax.set_xticklabels(unemp_df.index.year[labels], rotation=45 )
plt.title('Fraction of Unemployed by Unemployment Duration \n')

# save figure
fig.savefig('unemp_dur_CS.png', bbox_inches='tight', dpi=100)

plt.show()

No comments:

Post a Comment