Skip to content

Implementation of various supervised and unsupervised machine learning algorithm in C++

License

Notifications You must be signed in to change notification settings

AmlanJSarmah/ml-from-scratch-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ml-from-scratch-cpp

Implementation of various supervised and unsupervised machine learning algorithm in C++

Setup instructions.

  1. It is assumed that cmake and g++ is installed.
  2. Run the following commands in the project directiory
mkdir build && cd build
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ..
make -j
  1. We have now build the project in the build directory you will see various executables like linear_regression.
  2. To run Linear Regression from build folder
./linear_regression ../data/LinearRegression/california_housing.csv 9 0

Here, ../data/LinearRegression/california_housing.csv is path to dataset and 9 is the column where target variable is located and 0 clarifies that the target columnn in numerical.
See data directory to see various demo datasets.

Features Implemented and Demonstration

CSV Parser

The load_csv() function parses a CSV file and stores the numberical data while elegantly formatting it to be displayed on the console using .print_dataset() function.

image

Splitting data

We can now use test_train_splie() to split testing and training data.
Before splitting we also randomly shuffle our data.
image

Scaling Data

We use Z Score normalization i.e. divide by standard deviation and subtract by mean.

Linear Regression

We can perform linear regression on a dataset, it uses normal equation to calculate the parameters theta.
image

Logistic Regression

We can perform logistic regression on a dataset, it uses gradient ascent and sigmoid function to find the parameters image

Naive Bayes

We have also impleamented Gaussian Naive Bayes used for classification.

Performace Benchmark

Linear Regression

We used linear regression on housing_dataset.
Results in scikit-learn
image
Result in our custom library
image

Logistic Regression

We used logistic regression on breast cancer dataset
Results in scikit-learn
image
Results in custom library
image

Naive Bayes

We use naive bayes on breast cancer dataset
image
Comparing it to performance in sklearn.
image

About

Implementation of various supervised and unsupervised machine learning algorithm in C++

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published