Intermediate Data Science

Important Information

Email: joanna_bieri@redlands.edu
Office Hours take place in Duke 209 – Office Hours Schedule
Class Website
Syllabus

Data Wrangling

Data Wrangling is the art of managing data that might be spread across many files or databases. It also involved organizing data that comes to you in an inconvenient format. We are going to explore some ways that Pandas can help us in organizing our data!

# Some basic package imports
import os
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.defaule = 'colab'

Hierarchical Indexing

Hierarchical indexing is a feature of pandas that allows you to have more than one index on a single axis in a DataFrame. This is like working with data in a table but allowing it to be higher dimensional. Here is the example from our book:

# This series is a list of lists
data = pd.Series(np.random.uniform(size=9),
                 index=[["a", "a", "a", "b", "b", "c", "c", "d", "d"],
                        [1, 2, 3, 1, 3, 1, 2, 2, 3]])
data

a  1    0.236333
   2    0.414026
   3    0.846063
b  1    0.032594
   3    0.883162
c  1    0.938927
   2    0.266740
d  2    0.873309
   3    0.767932
dtype: float64

data.index

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 3),
            ('c', 1),
            ('c', 2),
            ('d', 2),
            ('d', 3)],
           )

Notice here that the index now has two dimensions. You can reach in a grab the stuff inside the ‘a’ grouping and then within that choose 1,2,3. We can use partial indexing to grab different parts of the data.

# To get a specific entry you select both index values
data['a',1]

np.float64(0.23633307419404725)

# You can grab all of an outer index
data['a']

1    0.236333
2    0.414026
3    0.846063
dtype: float64

# You can grab multiple outer index values
data[['a','c']]

a  1    0.236333
   2    0.414026
   3    0.846063
c  1    0.938927
   2    0.266740
dtype: float64

# You can select from an inner level index with a : meaning all of the outer index values
data.loc[:,1]

a    0.236333
b    0.032594
c    0.938927
dtype: float64

Let’s see what happens when we put this data into a DataFrame. Notice how we have only one column and hierarchical indexing on the left side.

pd.DataFrame(data)

		0
a	1	0.236333
	2	0.414026
	3	0.846063
b	1	0.032594
b	3	0.883162
c	1	0.938927
c	2	0.266740
d	2	0.873309
d	3	0.767932

We can use the .unstack() function to send this data into a one dimensional data frame. Unstack takes the inner index and sends it to to separate columns, which keeping the outer index as the DataFrame index.

data.unstack()

	1	2	3
a	0.236333	0.414026	0.846063
b	0.032594	NaN	0.883162
c	0.938927	0.266740	NaN
d	NaN	0.873309	0.767932

You can also .stack() data in a one dimensional data frame.

data.unstack().stack()

a  1    0.236333
   2    0.414026
   3    0.846063
b  1    0.032594
   3    0.883162
c  1    0.938927
   2    0.266740
d  2    0.873309
   3    0.767932
dtype: float64

In a DataFrame either the columns or the rows can have hierarchical levels.

frame = pd.DataFrame(np.arange(12).reshape((4, 3)),
                     index=[["a", "a", "b", "b"], [1, 2, 1, 2]],
                     columns=[["Ohio", "Ohio", "Colorado"],
                              ["Green", "Red", "Green"]])
frame

		Ohio		Colorado
		Green	Red	Green
a	1	0	1	2
a	2	3	4	5
b	1	6	7	8
b	2	9	10	11

# The keys are two dimensional
frame.keys()

MultiIndex([(    'Ohio', 'Green'),
            (    'Ohio',   'Red'),
            ('Colorado', 'Green')],
           )

# The indices are two dimensional
frame.index

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )

# How do I get data out of this thing?
frame['Ohio','Green']

a  1    0
   2    3
b  1    6
   2    9
Name: (Ohio, Green), dtype: int64

frame['Ohio','Green'].loc['a',2]

np.int64(3)

# Right now the columns don't have names
frame.columns.names

FrozenList([None, None])

# And the index labels don't have names
frame.index.names

FrozenList([None, None])

# Lets add some names
frame.index.names = ["key1", "key2"]
frame.columns.names = ["state", "color"]
frame

	state	Ohio		Colorado
	color	Green	Red	Green
key1	key2
a	1	0	1	2
a	2	3	4	5
b	1	6	7	8
b	2	9	10	11

You try

What does frame.unstack() do in this case. Go ahead and run the command and see if you can understand the results.

# Code and/or words here

Reordering and Sorting Levels

In the example above the levels were state then color and a/b then number. So it is easy to select by the outer level, and a bit harder to get the inside levels.

frame['Ohio']

	color	Green	Red
key1	key2
a	1	0	1
a	2	3	4
b	1	6	7
b	2	9	10

# This would give a key error because it expects outer level first
# frame['Green']

So if we wanted to look at just the green data, we would need to reorder the levels, making color outer and state inner.

new_frame = frame.swaplevel('state','color', axis=1)
new_frame

	color	Green	Red	Green
	state	Ohio	Ohio	Colorado
key1	key2
a	1	0	1	2
a	2	3	4	5
b	1	6	7	8
b	2	9	10	11

new_frame['Green']

	state	Ohio	Colorado
key1	key2
a	1	0	2
a	2	3	5
b	1	6	8
b	2	9	11

You Try

How would you swap the index keys? See if you can swap key1 and key2 in the new_frame.

# Your code here

Summary Statistics by Level

If we run statistics on the DataFrame as a whole, it ignores the leves and does the operation to the whole thing. This might be okay in some instances, but in the data frame above maybe we want to sum the “a” and “b” groupings separately.

# Good old fashioned sum
frame.sum()

state     color
Ohio      Green    18
          Red      22
Colorado  Green    26
dtype: int64

# Grouped sum
frame.groupby(level='key1').sum()

state	Ohio		Colorado
color	Green	Red	Green
key1
a	3	5	7
b	15	17	19

The grouped sum lets us look at the sums of the specific index levels. We could also group by the columns! If we just transpose the data frame then the column names become the index names!

frame.T

	key1	a		b
	key2	1	2	1	2
state	color
Ohio	Green	0	3	6	9
Ohio	Red	1	4	7	10
Colorado	Green	2	5	8	11

frame.T.groupby(level="color").sum()

key1	a		b
key2	1	2	1	2
color
Green	2	8	14	20
Red	1	4	7	10

Combining and Merging Datasets

Sometimes in your analysis you will want to grab data from more than one file, or maybe you scrape data from more that one website. In these cases you need to be able to merge the data into a single dataset for analysis. There are a few great Pandas commands for this:

pd.merge() - connects the rows in separate DataFrames based on one or more keys. It implements the database join operations.
pd.concat() - this is short of concatenate. Concatenate stacks objects along an access. For example stacking rows to add more observations or stacking columns to add more variables to the existing observations.
combine_first() - splices together overlapping data to fill in missing values in one object with values from another.

Pandas Merge

Merge connects separate DataFrames based on comparing keys (or column labels). There are different merge types available:

inner is the most restrictive and only includes cases where the keys match across both datasets.
left includes all entries in the left dataset and only those that match from the right dataset.
right includes all entries in the right dataset and only those that match from the left dataset.
outer includes all entries in both datasets.

We will start with two example DataFrames and explore the results for the different merge types.

employees = pd.DataFrame({
    'employee_id': [1, 2, 3, 4, 5],
    'name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'department_id': [10, 20, 10, 30, 99]  # Eve is in unknown dept 99
})

# Departments DataFrame (has an extra dept with no employees)
departments = pd.DataFrame({
    'department_id': [10, 20, 30, 40],
    'department_name': ['Engineering', 'HR', 'Marketing', 'Sales']  # Dept 40 has no employees
})

display(employees)
display(departments)

	employee_id	name	department_id
0	1	Alice	10
1	2	Bob	20
2	3	Charlie	10
3	4	Diana	30
4	5	Eve	99

	department_id	department_name
0	10	Engineering
1	20	HR
2	30	Marketing
3	40	Sales

Looking at these two data sets we see that there are 5 employees and 4 departments. The two DataFrames share the key department_id. You need a shared key or shared data to merge! You will also notice some missmatch between the datasets. For example none of our employees have the department_id 40=Sales, and one of our employees has a department_id 99, which does not appear in our departments data frame.

Lets look at the merges below. Note: we are using employees and the left and departments as the right dataset. This could be switched. Both datasets have the department_id key!

# Inner - THIS IS DEFAULT if you don't choose how
# What data is missing? Do you see any Nan?
pd.merge(employees, departments, on='department_id', how='inner')

	employee_id	name	department_id	department_name
0	1	Alice	10	Engineering
1	2	Bob	20	HR
2	3	Charlie	10	Engineering
3	4	Diana	30	Marketing

# Left
# What data is missing? Do you see any Nan?
pd.merge(employees, departments, on='department_id', how='left')

	employee_id	name	department_id	department_name
0	1	Alice	10	Engineering
1	2	Bob	20	HR
2	3	Charlie	10	Engineering
3	4	Diana	30	Marketing
4	5	Eve	99	NaN

# Right
# What data is missing? Do you see any Nan?
pd.merge(employees, departments, on='department_id', how='right')

	employee_id	name	department_id	department_name
0	1.0	Alice	10	Engineering
1	3.0	Charlie	10	Engineering
2	2.0	Bob	20	HR
3	4.0	Diana	30	Marketing
4	NaN	NaN	40	Sales

# Outer
# What data is missing? Do you see any Nan?
pd.merge(employees, departments, on='department_id', how='outer')

	employee_id	name	department_id	department_name
0	1.0	Alice	10	Engineering
1	3.0	Charlie	10	Engineering
2	2.0	Bob	20	HR
3	4.0	Diana	30	Marketing
4	NaN	NaN	40	Sales
5	5.0	Eve	99	NaN

Merge Type	Includes All Employees	Includes All Departments	Notes
Inner	❌ Only matched	❌ Only matched	Most restrictive
Left	✅ Yes	❌ Only matched	Focus on employees
Right	❌ Only matched	✅ Yes	Focus on departments
Outer	✅ Yes	✅ Yes	Full outer view

How would you merge data sets if one had the correct data, but did not have the correct key? Well, one option would be to change the column labels to match, but you could also tell pandas.merge() two different keys.

departments.rename(columns={'department_id':'dept_code'}, inplace=True)
display(departments)

	dept_code	department_name
0	10	Engineering
1	20	HR
2	30	Marketing
3	40	Sales

pd.merge(employees, departments, left_on='department_id', right_on='dept_code')

	employee_id	name	department_id	dept_code	department_name
0	1	Alice	10	10	Engineering
1	2	Bob	20	20	HR
2	3	Charlie	10	10	Engineering
3	4	Diana	30	30	Marketing

You will notice that when the merge happens it resets the index. If you want to preserve the index values for each you may want to store the index values as a column in the data set. This will allow you to look up the original values in the separate data sets if needed.

Sometimes you need to use index values to do your merge. Maybe the matching data is in the index instead of a separate column.

departments = pd.DataFrame({
    'department_name': ['Engineering', 'HR', 'Marketing']
}, index=[10, 20, 30])

display(departments)

	department_name
10	Engineering
20	HR
30	Marketing

pd.merge(employees,departments, left_on='department_id', right_index=True)

	employee_id	name	department_id	department_name
0	1	Alice	10	Engineering
1	2	Bob	20	HR
2	3	Charlie	10	Engineering
3	4	Diana	30	Marketing

You Try

Merge the following data sets using all for ways: inner, left, right, and outer. See if you can predict before running the code what the output will be!

df_animals = pd.DataFrame({
    'animal_id': [1, 2, 3, 4],
    'name': ['Leo', 'Stripes', 'Spot', 'Fluffy'],
    'type': ['Lion', 'Tiger', 'Cheetah', 'Cat']
})

df_habitats = pd.DataFrame({
    'animal_id': [1, 2, 5, 4],
    'habitat': ['Savannah', 'Jungle', 'Mountains', 'Domestic'],
    'population_estimate': [25000, 3200, 120, 50000000]
})

display(df_animals)
display(df_habitats)

	animal_id	name	type
0	1	Leo	Lion
1	2	Stripes	Tiger
2	3	Spot	Cheetah
3	4	Fluffy	Cat

	animal_id	habitat	population_estimate
0	1	Savannah	25000
1	2	Jungle	3200
2	5	Mountains	120
3	4	Domestic	50000000

# Your prediction here

# Your code here

# Repeat

Hirearchical index values:

When you have hierarchical index values or columns, things get more confusing my merges are still possible!

left_data = pd.DataFrame({"key1": ["Ohio", "Ohio", "Ohio",
                               "Nevada", "Nevada"],
                      "key2": [2000, 2001, 2002, 2001, 2002],
                      "data": pd.Series(range(5), dtype="Int64")})
right_data_index = pd.MultiIndex.from_arrays(
    [
        ["Nevada", "Nevada", "Ohio", "Ohio", "Ohio", "Ohio"],
        [2001, 2000, 2000, 2000, 2001, 2002]
    ]
)

right_data = pd.DataFrame({"event1": pd.Series([0, 2, 4, 6, 8, 10], dtype="Int64",
                                           index=right_data_index),
                       "event2": pd.Series([1, 3, 5, 7, 9, 11], dtype="Int64",
                                           index=right_data_index)})

display(right_data)
display(left_data)

		event1	event2
Nevada	2001	0	1
Nevada	2000	2	3
Ohio	2000	4	5
	2000	6	7
	2001	8	9
	2002	10	11

	key1	key2	data
0	Ohio	2000	0
1	Ohio	2001	1
2	Ohio	2002	2
3	Nevada	2001	3
4	Nevada	2002	4

Now we need to indicate multiple columns to merge on as a list.

pd.merge(left_data, right_data, left_on=["key1", "key2"], right_index=True)

	key1	key2	data	event1	event2
0	Ohio	2000	0	4	5
0	Ohio	2000	0	6	7
1	Ohio	2001	1	8	9
2	Ohio	2002	2	10	11
3	Nevada	2001	3	0	1

Concatenate

When you need to stack new rows or columns onto an existing data set pd.concat() is a great way to do that.

Let’s imagine that we are working with the employee and department information above. Now suddenly HR sends us information about two new employees and data that contains all the salaries. They have confirmed that the salaries are in increasing order of the employee id. How do we get all this data into a single dataframe?

Concat the new_hires onto the employees data
Merge the employees and department data - keeping all information
Concat the new columns onto the full data set.

When using concat the dimensions must match!

axis=0 must have the same number of columns - you are adding rows
axis=1 must have the same number of rows - you are adding columns

new_hires = pd.DataFrame({
    'employee_id': [6, 7],
    'name': ['Joanna', 'Bella'],
    'department_id': [30, 20]
})

salaries = pd.DataFrame({
    'emp_num': ['emp_'+str(i+1) for i in range(7)],
    'salary': [60_000, 55_000, 62_000, 58_000, 500_000, 40_000, 40_000]
})

display(new_hires)
display(salaries)

	employee_id	name	department_id
0	6	Joanna	30
1	7	Bella	20

	emp_num	salary
0	emp_1	60000
1	emp_2	55000
2	emp_3	62000
3	emp_4	58000
4	emp_5	500000
5	emp_6	40000
6	emp_7	40000

# Concat the new hires
# Here we ignore the old index values and reindex so the rows are 0-6
all_employees = pd.concat([employees, new_hires],ignore_index=True)
display(all_employees)

	employee_id	name	department_id
0	1	Alice	10
1	2	Bob	20
2	3	Charlie	10
3	4	Diana	30
4	5	Eve	99
5	6	Joanna	30
6	7	Bella	20

# Merge the department data
all_employees = pd.merge(all_employees,departments,left_on='department_id',right_index=True,how='outer')
display(all_employees)

	employee_id	name	department_id	department_name
0	1	Alice	10	Engineering
2	3	Charlie	10	Engineering
1	2	Bob	20	HR
6	7	Bella	20	HR
3	4	Diana	30	Marketing
5	6	Joanna	30	Marketing
4	5	Eve	99	NaN

# Concat the salaries
full_data = pd.concat([all_employees, salaries], axis=1)
display(full_data)

	employee_id	name	department_id	department_name	emp_num	salary
0	1	Alice	10	Engineering	emp_1	60000
2	3	Charlie	10	Engineering	emp_3	62000
1	2	Bob	20	HR	emp_2	55000
6	7	Bella	20	HR	emp_7	40000
3	4	Diana	30	Marketing	emp_4	58000
5	6	Joanna	30	Marketing	emp_6	40000
4	5	Eve	99	NaN	emp_5	500000

Combining Data with Overlap

Sometimes you have two datasets that have an overlap, but one or both of them are incomplete and you want to use one to fill in NaNs in the other. You can think of the comnbine_first() operation as patching up the data. It basically does and if-else statement that inserts values if there are null values in the original dataset.

Here is a scenario where maybe you have incomplete employee data. However each of the datasets has missing data and you want one complete dataset.

employee_profiles = pd.DataFrame({
    'name': ['Alice', None, 'Charlie'],
    'dept_code': [10, 20, None],
    'email': [None, 'bob@example.com', None]
}, index=[1, 2, 3])  

backup_profiles = pd.DataFrame({
    'name': ['Alice A.', 'Bob B.', 'Charlie C.', 'Diana D.'],
    'dept_code': [10, 20, 10, 30],
    'email': ['alice@example.com', None, 'charlie@example.com', 'diana@example.com'],
    'phone': ['111-1111', '222-2222', '333-3333', '444-4444']  
}, index=[1, 2, 3, 4]) 

display(employee_profiles)
display(backup_profiles)

	name	dept_code	email
1	Alice	10.0	None
2	None	20.0	bob@example.com
3	Charlie	NaN	None

	name	dept_code	email	phone
1	Alice A.	10	alice@example.com	111-1111
2	Bob B.	20	None	222-2222
3	Charlie C.	10	charlie@example.com	333-3333
4	Diana D.	30	diana@example.com	444-4444

combined_profiles = employee_profiles.combine_first(backup_profiles)
display(combined_profiles)

	dept_code	email	name	phone
1	10.0	alice@example.com	Alice	111-1111
2	20.0	bob@example.com	Bob B.	222-2222
3	10.0	charlie@example.com	Charlie	333-3333
4	30.0	diana@example.com	Diana D.	444-4444

combine_first() does not overwrite existing data in the first DataFrame. It aligns on both index and column names — mismatches are handled gracefully. You can think of it as a data patching tool.

combine_first() is perfect when:

You have a primary source of data that may be incomplete.
You have a secondary or backup source you want to use to fill in the blanks.
You want row-wise alignment based on index.

But what happens if the indexes dont align?

Reshaping and Pivoting

The goal with reshaping and pivoting is to rearrange tabular data in a way that is better for your specific analysis. We have already seen some of these commands.

stack() rotates or pivots from the columns in the data to the rows when provided hierarchical indexes.
unstack() rotates of pivots from the rows into the columns creating hierarchical indexes.
.pivot() reshapes the data from long(tall) format to a wide format.
melt() reshapes the data from wide to long(tall) melting the column names into the data

Long (Tall) Format Also called “tidy” data in some contexts:

One row per observation.
Repeated categories or measurements in a single column.
More rows, fewer columns.
One row per person per test

Wide Format

Unique values in a categorical column become column headers.
More columns, fewer rows.
Easier for humans to read, but not always ideal for statistical analysis.
One row per person, one column per test

Below we will look at the ways we might rearrange or reshape our data!

df = pd.DataFrame({
    'person': ['Alice', 'Alice', 'Bob', 'Bob'],
    'month': ['Jan', 'Feb', 'Jan', 'Feb'],
    'sales': [200, 180, 210, 190],
    'expenses': [150, 120, 160, 140]
})

display(df)

	person	month	sales	expenses
0	Alice	Jan	200	150
1	Alice	Feb	180	120
2	Bob	Jan	210	160
3	Bob	Feb	190	140

Pivot

When you pivot a DataFrame you tell it which columns to use for the new data set

index which column should be used as the new row (index) labels.
column which column should be used as the new column labels.
values which column should be used to fill in values in the new data set.

pivoted = df.pivot(index='person',columns='month', values='sales')
display(pivoted)

month	Feb	Jan
person
Alice	180	200
Bob	190	210

pivoted = df.pivot(index='month',columns='person', values='expenses')
display(pivoted)

person	Alice	Bob
month
Feb	120	140
Jan	150	160

# You can specify more that one value to be added as hierarchical columns
pivoted = df.pivot(index='person',columns='month', values=['sales','expenses'])
display(pivoted)

	sales		expenses
month	Feb	Jan	Feb	Jan
person
Alice	180	200	120	150
Bob	190	210	140	160

You Try

Do a pivot on your merged animal data. You can decide how to pivot, but try to say before running the code what you expect to happen.

# Your prediction here

# Your code here

Stack

Stack takes the index values and shifts from wide to tall format. You will see one grouping of data for each index. In many cases you might want to set the index values to be more interesting to get a better breakdown of the data.

We go from wide data to tall data, meaning one observation per index per column.

# Stack without setting the index
stacked = df.stack()
display(stacked)

0  person      Alice
   month         Jan
   sales         200
   expenses      150
1  person      Alice
   month         Feb
   sales         180
   expenses      120
2  person        Bob
   month         Jan
   sales         210
   expenses      160
3  person        Bob
   month         Feb
   sales         190
   expenses      140
dtype: object

# Stack with index set
stacked = df.set_index(['person', 'month']).stack()
display(stacked)

person  month          
Alice   Jan    sales       200
               expenses    150
        Feb    sales       180
               expenses    120
Bob     Jan    sales       210
               expenses    160
        Feb    sales       190
               expenses    140
dtype: int64

# Adding the .reset_index() command to a series will send it to a dataframe again
# notice how the stacking is preserved in the row order.
stacked = df.set_index(['person', 'month']).stack().reset_index()
display(stacked)

	person	month	level_2	0
0	Alice	Jan	sales	200
1	Alice	Jan	expenses	150
2	Alice	Feb	sales	180
3	Alice	Feb	expenses	120
4	Bob	Jan	sales	210
5	Bob	Jan	expenses	160
6	Bob	Feb	sales	190
7	Bob	Feb	expenses	140

Unstack

Lets say you are given data with observations for each person. But what you want is a wide data frame, with fewer rows and more categorical columns. This is what unstack can do!

Let’s start with some stacked data - hierarchical indexes

stacked = df.set_index(['person','month']).stack()
display(stacked)

person  month          
Alice   Jan    sales       200
               expenses    150
        Feb    sales       180
               expenses    120
Bob     Jan    sales       210
               expenses    160
        Feb    sales       190
               expenses    140
dtype: int64

# This will compress the month data (inner index) into a categorical column.
stacked.unstack()

		sales	expenses
person	month
Alice	Feb	180	120
Alice	Jan	200	150
Bob	Feb	190	140
Bob	Jan	210	160

# We can specify which level to use
# Here use the Person level index as the columns
stacked.unstack(level=0)

	person	Alice	Bob
month
Feb	sales	180	190
Feb	expenses	120	140
Jan	sales	200	210
Jan	expenses	150	160

# Here use the month level index as the columns
stacked.unstack(level=1)

	month	Feb	Jan
person
Alice	sales	180	200
Alice	expenses	120	150
Bob	sales	190	210
Bob	expenses	140	160

# Here use the innermost values as the columns
stacked.unstack(level=2)

		sales	expenses
person	month
Alice	Feb	180	120
Alice	Jan	200	150
Bob	Feb	190	140
Bob	Jan	210	160

Melt

The melt command lets you choose a column (or use all columns) to be used as an additional row in the data.

df

	person	month	sales	expenses
0	Alice	Jan	200	150
1	Alice	Feb	180	120
2	Bob	Jan	210	160
3	Bob	Feb	190	140

# Use all the columns - not so useful in this case!
pd.melt(df)

	variable	value
0	person	Alice
1	person	Alice
2	person	Bob
3	person	Bob
4	month	Jan
5	month	Feb
6	month	Jan
7	month	Feb
8	sales	200
9	sales	180
10	sales	210
11	sales	190
12	expenses	150
13	expenses	120
14	expenses	160
15	expenses	140

# Melt to create new rows for each observation for each person each month
# You keep the columns 'person' and 'month' the rest are melted
pd.melt(df,id_vars=['person','month'])

	person	month	variable	value
0	Alice	Jan	sales	200
1	Alice	Feb	sales	180
2	Bob	Jan	sales	210
3	Bob	Feb	sales	190
4	Alice	Jan	expenses	150
5	Alice	Feb	expenses	120
6	Bob	Jan	expenses	160
7	Bob	Feb	expenses	140