Coffee

Goals

Answer the following questions:

What was the price of a cup of coffee in the state of New York between 1900 and 1909?
How much does data cleaning on the source dataset change the answer to the first question?

Dataset

Summary

What’s on the Menu? is a project to transcribe The New York Public Library’s restaurant menu collection. The collection contains menus from around the world, stretching from the 1850s to the 2000s.

Tables

The dataset contains four tables with the following relationships.

Credit: @monsieur-le-git

Exploration

The first target question is tuned to the data available in the dataset.

Workflow

Cleaning

Table	Attribute	Change
Menu	date	Trim leading and trailing whitespace
Menu	call_number	Trim leading and trailing whitespace
Menu	place	Trim leading and trailing whitespace
Menu	currency	Trim leading and trailing whitespace
MenuItem	price	Trim leading and trailing whitespace
Dish	name	Trim leading and trailing whitespace
Menu	date	Set empty dates to year from `call_number`
Menu	date	Repair typos observed in manual data exploration
Menu	place	Repair misspellings of "New York"
Menu	currency	Repair misspellings of "Dollars"
Menu	currency	Change instances of "Cents" to "Dollars" (communize US currency)
MenuItem	price	Divide by 100 (`Menu` `currency` changed from "Cents" to "Dollars")
Dish	name	Repair misspellings of "Coffee"

Results

Price of a Cup of Coffee

Impact of Data Cleaning

The goal of data cleaning is to increase the number of records from the menu and dish tables that meet the target value for every applicable attribute.

Creating the Project Environment

conda env create -f environment.yml
conda activate coffee

Running the Project

Exploration

python src/explore.py /path/to/dataset

Cleaning and Query

python src/main.py /path/to/dataset

Testing the Project

mypy src/*.py
python -m unittest discover -s src

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
doc		doc
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coffee

Goals

Dataset

Summary

Tables

Exploration

Workflow

Cleaning

Results

Price of a Cup of Coffee

Impact of Data Cleaning

Creating the Project Environment

Running the Project

Exploration

Cleaning and Query

Testing the Project

About

Uh oh!

Releases

Packages

Languages

EricSchrock/coffee

Folders and files

Latest commit

History

Repository files navigation

Coffee

Goals

Dataset

Summary

Tables

Exploration

Workflow

Cleaning

Results

Price of a Cup of Coffee

Impact of Data Cleaning

Creating the Project Environment

Running the Project

Exploration

Cleaning and Query

Testing the Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages