Intermediate Data Science
Final Projects
Important Information
- Email: joanna_bieri@redlands.edu
- Office Hours: Duke 209 Click Here for Joanna’s Schedule
Final Projects
Due Dates:
- 12/12/25 at 2:30pm - submitted on Canvas.
Sections
Final projects should be submitted as a carefully constructed JupyterLab Notebook. There should be the following sections:
- Title, Author and Abstract
- Introduction and Background
- Data Used
- Exploratory Data Analysis
- Proposed Questions
- Analysis and Results
- Conclusion
This project should represent the best you can do given the tools that you learned in class. Every project will be different based on the data used. You must have clearly writen words describing your decisions and conclusions. All of your conclusions must be supported by your code and the data.
Final Project Outline
Below are some .ipynb cells that outline how you might write up your final project.
FINAL PROJECT TITLE
By: Your Name(s)
Abstract
Write a concise 3–4 sentence summary of: - The problem you investigated
- The dataset you used
- The methods applied
- Your main findings
Introduction and Background
Provide context for your project:
- What motivated this topic?
- Why should someone care about the problem you’re addressing?
- What prior work, domain knowledge, or assumptions matter?
- What is the goal of the analysis?
Your job is to convince the reader that your investigation is meaningful.
Data Used
Describe your dataset:
- Source and method of collection
- Number of observations and variables
- Key fields and their meanings
- Whether the data is real or synthetic
- Ethical concerns (privacy, bias, representativeness)
Below you will preview the dataset, compute descriptive statistics, and summarize what you learn.
Notes from Data Summary
Explain what the descriptive statistics and value counts reveal about: - Trends - Outliers - Missing values - Data quality issues
Exploratory Data Analysis (EDA)
In this section include a small set of the most informative visualizations or tables, such as:
- Distributions of key variables
- Relationships between variables
- Correlations
- Missing-data patterns
- Scatter or Pairplots
Do not include every plot you generated—choose only those that genuinely help tell the story.
Explain what early insights you gained and how they shaped your later analysis.
Interpretation of EDA
Explain what the plots/tables show and how they informed your next steps.
Proposed Questions
List the specific questions your project investigates.
Good guiding questions: - Are clear and measurable
- Can be answered with your data
- Are linked to your motivation
- Suggest reasonable analysis or modeling approaches
Provide a brief plan for how you will address each question.
Analysis and Results
This is the core of the project.
For each research question:
- Show the code you used (clean, runnable, documented).
- Include outputs such as tables, plots, metrics, or model results.
- Immediately follow each output with a Markdown explanation of:
- What the result means
- Why it matters
- Whether it answers the question
- What the result means
Include at least one advanced machine learning method from class (e.g., decision trees, KNN, regularized regression, Naive Bayes, Dimensionality Reduction).
Be sure to: - Split into train/test - Scale the data - Avoid data leakage
- Justify your choices
- Evaluate your models carefully
Interpretation of Results
Explain what the model results show: - Do they answer the research question? - How strong is the model? - What limitations or caveats exist?
Conclusion
Summarize the project:
- What did you accomplish?
- What did the analysis reveal?
- What limitations should the reader keep in mind?
- What ethical considerations (bias, fairness, privacy) matter?
- What future work would you pursue with more time or resources?
This section should tie the entire narrative together.