import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.io as pio
= 'colab'
pio.renderers.defaule
from itables import show
Introduction to Data Science
Final Projects
Important Information
- Email: joanna_bieri@redlands.edu
- Office Hours: Duke 209 Click Here for Joanna’s Schedule
Final Projects
Due Dates:
- 11/3 Final Project Proposals
- 11/12 Final Project Groups Assigned
- 12/1 Final Project Check in - submit draft of slides and project.
- 12/3 and 12/8 Groups Present Final Projects
- 12/10 Final Project Presentations - During Exam 12:00-2:30
- 12/10 11:59pm Final Projects Due
Project Proposals (individual work)
Your project proposal should reflect your own individual work and should not overlap too much with the work of your peers. Project proposals should take the form of a JupyterLab notebook that includes the following sections:
- Introduction and Background for the problem you are interested in.
- What is the question you hope you can answer?
- Why is the question important to you?
- What are you hoping to achieve with the project?
- Description of the data.
- Where did you get your data?
- What data is found in your data set?
- Are there any ethical concerns with using the data or how the data was obtained?
- Is there a public location where the data is freely available? If so where? If not why?
- Initial Exploratory Data Analysis
- Definition of all variables (or explanation for why you are not considering some variables)
- Descriptive statistics for the data set, counts of the variables, value counts (frequency tables) for each categorical variable of interest.
- Plots that help a reader understand your data.
- Initial analysis to answer initial data questions.
- Proposal for further study.
- What larger questions would you like to try to answer?
- What predictive analysis do you think might be possible?
- Are there any larger impacts that your study might have?
Proposals should be submitted as .ipynb files on Canvas. Make sure your notebook can be run completely. If it requires a data file make sure to submit that file too. It should contain just as many words and as much description of the problem/results as it does code.
If you already have a group of people you plan to work with - please list their names at the end of the proposal.
Final Project Groups
Groups should plan to use Wednesday night lab 6-8pm time to meet and work together.
After groups are assigned, groups will decide on a specific focused set of questions that they hope to answer and a data set (or data sets) that they want to focus on. Try to propose a nice narrative for your study that ties the different questions together.
Groups will form a GitHub repo for their project and have at least one organizational meeting where they divide up the parts of the work. Every person should contribute to the data analysis and writing portions of the work.
Final Project Check In
Each group with submit a draft of their presentation slides and a link to their final project GitHub for review before leaving for Thanksgiving Break.
Final Project Presentations
Final presentations will be 10 minutes long. You should plan for the following:
- 1 Slide - Introduce your team
- 1-2 Slides - Introduce you project/background. What question are you trying to answer?
- 1-2 Slides - Information about the data. Ethical considerations.
- 5-6 Slides - Your most interesting results
- 1 Slide - Summary of your findings
- 1 Slide - Future Directions
Final Projects Due
Final projects should be submitted as a carefully constructed JupyterLab Notebook. There should be the following sections:
- Title, Authors and Abstract
- Introduction and Background
- Data Used
- Exploratory Data Analysis
- Proposed Questions
- Analysis and Results
- Conclusion
Final Project Outline
Below are some .ipynb cells that outline how you might write up your final project.
Title: FINAL PROJECT OUTLINE
Abstract
Here you type a 3-4 sentence summary of your final project.
Introduction and Background
Convince your audience that the rest of the project is worth learning about. What interesting problem are you exploring? Why should EVERYONE be interested in your analysis?
Data Used
Give some background about your data. Where did you get it? How was it compiled? How many observations? What are the variables?
Show a data frame with the most important columns.
Do some descriptive statistics and talk about the results.
## This section will have some code. Make sure to use Markdown cells to explain what the code is demonstrating
Exploratory Data Analysis
Show a few plots or some data tables that help your reader understand your data better. What are some initial questions that you were able to quickly answer? How do those questions lead you to a deeper analysis?
## This section will have some code. Make sure to use Markdown cells to explain what the code is demonstrating
Proposed Questions
What specific questions are you going to explore and present in the rest of the paper? Give a very brief overview of what you are going to do to answer those questions. Imagine this as a road map so your reader knows what to expect.
Analysis and Results
Here is where you show your code and results (plots, tables, predictions, etc) that help to explore and answer your questions. There should be code here that can be run to reproduce your results and conclusions. After each plot/table/final number, you should add a Markdown cell where you explain to the reader what the result means and how/why it answers the question.
## This section will have Lots of code. Make sure to use Markdown cells to explain what the code is demonstrating
Conclusion
Give a brief statement about what you achieved in your analysis, what issues or limitations your analysis contains, a conversation of any ethical concerns with the data or analysis, and some possible future directions (if you had more time an money to keep going).