Introduction to Data Science

Final Projects

Author

Joanna Bieri
DATA101

Important Information

Final Projects

Due Dates:

  • 11/3 Final Project Proposals
  • 11/12 Final Project Groups Assigned
  • 12/1 Final Project Check in - submit draft of slides and project.
  • 12/3 and 12/8 Groups Present Final Projects
  • 12/10 Final Project Presentations - During Exam 12:00-2:30
  • 12/10 11:59pm Final Projects Due

Project Proposals (individual work)

Your project proposal should reflect your own individual work and should not overlap too much with the work of your peers. Project proposals should take the form of a JupyterLab notebook that includes the following sections:

  • Introduction and Background for the problem you are interested in.
    • What is the question you hope you can answer?
    • Why is the question important to you?
    • What are you hoping to achieve with the project?
  • Description of the data.
    • Where did you get your data?
    • What data is found in your data set?
    • Are there any ethical concerns with using the data or how the data was obtained?
    • Is there a public location where the data is freely available? If so where? If not why?
  • Initial Exploratory Data Analysis
    • Definition of all variables (or explanation for why you are not considering some variables)
    • Descriptive statistics for the data set, counts of the variables, value counts (frequency tables) for each categorical variable of interest.
    • Plots that help a reader understand your data.
    • Initial analysis to answer initial data questions.
  • Proposal for further study.
    • What larger questions would you like to try to answer?
    • What predictive analysis do you think might be possible?
    • Are there any larger impacts that your study might have?

Proposals should be submitted as .ipynb files on Canvas. Make sure your notebook can be run completely. If it requires a data file make sure to submit that file too. It should contain just as many words and as much description of the problem/results as it does code.

If you already have a group of people you plan to work with - please list their names at the end of the proposal.

Final Project Groups

Groups should plan to use Wednesday night lab 6-8pm time to meet and work together.

After groups are assigned, groups will decide on a specific focused set of questions that they hope to answer and a data set (or data sets) that they want to focus on. Try to propose a nice narrative for your study that ties the different questions together.

Groups will form a GitHub repo for their project and have at least one organizational meeting where they divide up the parts of the work. Every person should contribute to the data analysis and writing portions of the work.

Final Project Check In

Each group with submit a draft of their presentation slides and a link to their final project GitHub for review before leaving for Thanksgiving Break.

Final Project Presentations

Final presentations will be 10 minutes long. You should plan for the following:

  • 1 Slide - Introduce your team
  • 1-2 Slides - Introduce you project/background. What question are you trying to answer?
  • 1-2 Slides - Information about the data. Ethical considerations.
  • 5-6 Slides - Your most interesting results
  • 1 Slide - Summary of your findings
  • 1 Slide - Future Directions

Final Projects Due

Final projects should be submitted as a carefully constructed JupyterLab Notebook. There should be the following sections:

  • Title, Authors and Abstract
  • Introduction and Background
  • Data Used
  • Exploratory Data Analysis
  • Proposed Questions
  • Analysis and Results
  • Conclusion

Final Project Outline

Below are some .ipynb cells that outline how you might write up your final project.

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.defaule = 'colab'

from itables import show

Title: FINAL PROJECT OUTLINE

By: Author Names

Abstract

Here you type a 3-4 sentence summary of your final project.

Introduction and Background

Convince your audience that the rest of the project is worth learning about. What interesting problem are you exploring? Why should EVERYONE be interested in your analysis?

Data Used

Give some background about your data. Where did you get it? How was it compiled? How many observations? What are the variables?

Show a data frame with the most important columns.

Do some descriptive statistics and talk about the results.

## This section will have some code. Make sure to use Markdown cells to explain what the code is demonstrating

Exploratory Data Analysis

Show a few plots or some data tables that help your reader understand your data better. What are some initial questions that you were able to quickly answer? How do those questions lead you to a deeper analysis?

## This section will have some code. Make sure to use Markdown cells to explain what the code is demonstrating

Proposed Questions

What specific questions are you going to explore and present in the rest of the paper? Give a very brief overview of what you are going to do to answer those questions. Imagine this as a road map so your reader knows what to expect.

Analysis and Results

Here is where you show your code and results (plots, tables, predictions, etc) that help to explore and answer your questions. There should be code here that can be run to reproduce your results and conclusions. After each plot/table/final number, you should add a Markdown cell where you explain to the reader what the result means and how/why it answers the question.

## This section will have Lots of code. Make sure to use Markdown cells to explain what the code is demonstrating

Conclusion

Give a brief statement about what you achieved in your analysis, what issues or limitations your analysis contains, a conversation of any ethical concerns with the data or analysis, and some possible future directions (if you had more time an money to keep going).