Intermediate Data Science

Final Projects

Author

Joanna Bieri
DATA101

Important Information

Final Projects

Due Dates:

  • 12/12/25 at 2:30pm - submitted on Canvas.

Sections

Final projects should be submitted as a carefully constructed JupyterLab Notebook. There should be the following sections:

  • Title, Author and Abstract
  • Introduction and Background
  • Data Used
  • Exploratory Data Analysis
  • Proposed Questions
  • Analysis and Results
  • Conclusion

This project should represent the best you can do given the tools that you learned in class. Every project will be different based on the data used. You must have clearly writen words describing your decisions and conclusions. All of your conclusions must be supported by your code and the data.

Final Project Outline

Below are some .ipynb cells that outline how you might write up your final project.

FINAL PROJECT TITLE

By: Your Name(s)


Abstract

Write a concise 3–4 sentence summary of: - The problem you investigated
- The dataset you used
- The methods applied
- Your main findings

Introduction and Background

Provide context for your project:

  • What motivated this topic?
  • Why should someone care about the problem you’re addressing?
  • What prior work, domain knowledge, or assumptions matter?
  • What is the goal of the analysis?

Your job is to convince the reader that your investigation is meaningful.

Data Used

Describe your dataset:

  • Source and method of collection
  • Number of observations and variables
  • Key fields and their meanings
  • Whether the data is real or synthetic
  • Ethical concerns (privacy, bias, representativeness)

Below you will preview the dataset, compute descriptive statistics, and summarize what you learn.

Notes from Data Summary

Explain what the descriptive statistics and value counts reveal about: - Trends - Outliers - Missing values - Data quality issues

Exploratory Data Analysis (EDA)

In this section include a small set of the most informative visualizations or tables, such as:

  • Distributions of key variables
  • Relationships between variables
  • Correlations
  • Missing-data patterns
  • Scatter or Pairplots

Do not include every plot you generated—choose only those that genuinely help tell the story.

Explain what early insights you gained and how they shaped your later analysis.

Interpretation of EDA

Explain what the plots/tables show and how they informed your next steps.

Proposed Questions

List the specific questions your project investigates.

Good guiding questions: - Are clear and measurable
- Can be answered with your data
- Are linked to your motivation
- Suggest reasonable analysis or modeling approaches

Provide a brief plan for how you will address each question.

Analysis and Results

This is the core of the project.

For each research question:

  1. Show the code you used (clean, runnable, documented).
  2. Include outputs such as tables, plots, metrics, or model results.
  3. Immediately follow each output with a Markdown explanation of:
    • What the result means
    • Why it matters
    • Whether it answers the question

Include at least one advanced machine learning method from class (e.g., decision trees, KNN, regularized regression, Naive Bayes, Dimensionality Reduction).

Be sure to: - Split into train/test - Scale the data - Avoid data leakage
- Justify your choices
- Evaluate your models carefully

Interpretation of Results

Explain what the model results show: - Do they answer the research question? - How strong is the model? - What limitations or caveats exist?

Conclusion

Summarize the project:

  • What did you accomplish?
  • What did the analysis reveal?
  • What limitations should the reader keep in mind?
  • What ethical considerations (bias, fairness, privacy) matter?
  • What future work would you pursue with more time or resources?

This section should tie the entire narrative together.