*[Note: If you're using Python for data work, you can do this super easily via Pandas. I wrote this back in the stone age before knowing about Pandas. I leave this post up as a demonstration of using matrix algebra to keep loops out of your code. But really, Pandas is awesome so go learn it!]*Suppose we have data in the following form:

PersonID Bread Meat Amount 0 1 1 10 0 1 1 14 0 2 1 30 0 3 0 6 1 1 0 20 1 2 0 15

We call this panel data (or longitudinal data). Many statistical packages (e.g. STATA) have a "collapse" function that will reduce the panel data to a unique entry for each member of the panel with the corresponding column sum/average/min/max for the entries of the panel member. Using such a command to sum over panel members would reduce the table given above to the following:

PersonID Bread Meat Amount 0 7 3 30 1 3 0 35

Looping over panel members works, but could become very cumbersome when the dataset

is large. We can speed things up considerably by thinking about this operation in terms of matrix multiplication (you will want to assign numeric values to any strings). Consider the following matrix:

This gives us exactly the matrix we want for all columns except our PersonID column, but replacing the PersonID with the correct values is straightforward. Call the left-most matrix in the figure above C and the data matrix A. The column with the panel member identifying info is the "index" column. To generate C in python, we use:

C = A[:,index] == Unique values of A[:,index]

It is important to note that this only makes C if "A[:,index]" is a row vector and "Unique values of A[:,index]" is a column vector.

We can now collapse our matrix with the following (PS, sort your data first!):

def sumcollapse(A,index): A1 = sp.atleast_2d(sp.unique(A[:,index])) C = A[:,index] ==A1.T B = sp.dot(C,A) #replace the index row with the true panel member values B[:,index] = A1 return B

Similarly to average over panel members:

def meancollapse(A,index): A1 = sp.atleast_2d(sp.unique(A[:,index])) C = A[:,index] ==A1.T # make a "dividing vector" based on the number of entries for each panel member C1 = sp.atleast_2d(sp.sum(C,axis=1))*1.0 B = sp.dot(C,A) B = sp.divide(B,C1.T) return B

To take the first row for each panel member:

def firstcollapse(A,index): A1 = sp.atleast_2d(sp.unique(A[:,index])) C = A[:,index] == A1.T B = sp.argmax(C, axis=1) return A[B]

Hopefully you see the pattern. Manipulating this to find minima/maxima is straightforward. Don't forget to sort your data first!