Skip to content

rc-9/BrewModel

Repository files navigation

Issues

alt_text
Image Source: VinePair

BrewModel: Hops Flavor & Aroma Profiler

Tomer D. & Romith C.

Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
    • Prerequisite Installations
    • Repository Cloning
    • Environment Setup
  3. Usage
    • Scraper
    • Cleaner
    • EDA
    • Classifier
    • Analyzer
  4. Acknowledgments

About The Project

Hops are primarily used as a bittering, flavoring, and stabilizing agent in beer. In recent years, hops have become the center of attention in the beer industry, as hop-forward beers have become one of the most popular styles of beer. Hops varieties are developed and grown in moderate climates around the world. Every branded hop variety has a unique flavor and aroma profile. This makes for an exciting and delicious reason to explore what flavor and aroma a hop can offer on its own, or together with other hops to bring waves and layers of flavor and aromas.

The purpose of this project is to build a processed dataset to explore some definitive hop characteristics, draw initial insights into geographical relationships between hops, and lay the groundwork for further model-building in future studies. The first step was to compile a comprehensive dataset that consists of these characteristics, with both numeric brew values, as well as an aroma profile for each hop. This was achieved through scraping BeerMaverick's database of a diverse set of 300+ hops from around the world. This raw data was thoroughly processed for exploratory studies, and feature-engineered to create insightful visualizations & prepare for initial model-building. Using a supervised tree-based ensemble methods (XG-Boost & Random Forest), these preliminary models were built for a deeper look into classification techniques of beer hops.

(back to top)

Getting Started

This section walks through the steps to download a local copy of the project and reproduce the findings.

Prerequisites

Python 3.X and a means of accessing iPython notebook files is assumed. Links below walk through the necessary steps to fulfill these prerequisites.

Windows:
https://medium.com/@kswalawage/install-python-and-jupyter-notebook-to-windows-10-64-bit-66db782e1d02
MacOS:
https://docs.python-guide.org/starting/install3/osx/
https://medium.com/@blessedmarcel1/how-to-install-jupyter-notebook-on-mac-using-homebrew-528c39fd530f

Repository Cloning

  1. Navigate to a directory to save the repo through a Terminal.
    cd c:\path\to\directory
    
  2. Clone the repo
    git clone https://github.com/rc-9/tools1_project.git
    

Environment Setup

  1. Install virtual environment capability.
    pip install virtualenv
    
  2. Navigate to directory of the cloned repo and create a virtual environment for the project.
    python -m venv c:\path\to\directory\venvname
    
  3. Activate virtual environment for the project.
    .\venv\Scripts\activate
    
  4. Install necessary modules from the provided requirements file.
    pip install -r requirements.txt
    
  5. Launch Jupyter through virtual environment to view and execute codeblocks or run scripts directly in Terminal for outputs files.

(back to top)

Usage

This section outlines the order to execute iPython scripts to retrieve & clean the necessary data, generate visuals, perform analysis, and build basic classification models.

  1. Execute step1_scraper.ipynb to collect the info from BeerMaverick's Hops Database. This script is designed to take 40+ minutes to fully execute and scrape all the necessary data. As the database contains over 300+ hops, each with individual webpages for detailed info, the scraper is set up with a wait-time to ensure the scraping can be fully completed without running into automatic IP blocks. This step is optional and can be skipped to avoid the long run-time as the output raw_data.csv is already provided in the repository.

    The raw data is stored in the raw_data directory consisting of the following csv files:

    • raw_hops_main.csv: primary data file to be used for cleaning & analysis
    • raw_ref_aroma_types.csv: reference document for metadata info on aroma types
    • raw_ref_brew_values.csv: reference document for metadata info on brew values
    • raw_ref_hops_substitutions.csv: reference document for metadata info on pre-determined hop substitutions


  2. Execute step2_cleaner.ipynb to wrangle and feature-engineer the raw data files from raw_data and store cleaned data into clean_data.

    The clean_data directory consists of the following csv files:

    • cln_hops_brewvalues.csv: processed numerical data for various brew values for each hop
    • cln_hops_profile.csv: processed categorical & boolean data of country, purpose, aroma info for each hop
    • cln_ref_aroma_types.csv: processed reference document for metadata info on aroma types
    • cln_ref_brew_values.csv: processed reference document for metadata info on brew values
    • cln_ref_hops_substitutions.csv: processed reference document for metadata info on pre-determined hop substitutions


  3. Execute eda_and_summary_visuals.ipynb which uses the processed data files from clean_data to perform exploratory analysis, develop insights into our dataset, and provide summary visualizations that present the data.

    The images directory generated from execution will consist of all the png output files from our script.


  4. Execute region_classifier.ipynb to construct two tree-based ensemble models using XG-Boost & Random Forest classification algorithms to classify hop region.

    In this script, we attempt to classify geographical regions based on various hop characteristics. The processed data from clean_data undergoes further feature-engineering to prepare to be fed into a model that can carry out this task. This script is self-contained and will consist of the resulting confusion matrix from the model predictions.

    Additionally, a tool was also built to take in input data of various hop characteristics from the user, and execute the classifier model to predict the correct region to which the hop belongs to.


  5. Execute purpose_classifier.ipynb to construct two tree-based ensemble models using XG-Boost & Random Forest classification algorithms to classify hop purpose.

    In this script, we attempt to classify hop purpose (Dual vs Aroma vs Bittering) based on various hop characteristics. The processed data from clean_data undergoes further feature-engineering to prepare to be fed into a model that can carry out this task. This script is self-contained and will consist of the resulting confusion matrix from the model predictions.

    Additionally, a tool was also built to take in input data of various hop characteristics from the user, and execute the classifier model to predict the correct purpose of the hop.


(back to top)

Acknowledgments

(back to top)

About

Hops Flavor & Aroma Profiler

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •