Implementation of various supervised and unsupervised machine learning algorithm in C++
- It is assumed that
cmakeandg++is installed. - Run the following commands in the project directiory
mkdir build && cd build
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ..
make -j
- We have now build the project in the
builddirectory you will see various executables likelinear_regression. - To run Linear Regression from
buildfolder
./linear_regression ../data/LinearRegression/california_housing.csv 9 0
Here, ../data/LinearRegression/california_housing.csv is path to dataset and 9 is the column where target variable is located and 0 clarifies that the target columnn in numerical.
See data directory to see various demo datasets.
The load_csv() function parses a CSV file and stores the numberical data while elegantly formatting it to be displayed on the console using .print_dataset() function.
We can now use test_train_splie() to split testing and training data.
Before splitting we also randomly shuffle our data.

We use Z Score normalization i.e. divide by standard deviation and subtract by mean.
We can perform linear regression on a dataset, it uses normal equation to calculate the parameters theta.

We can perform logistic regression on a dataset, it uses gradient ascent and sigmoid function to find the parameters

We have also impleamented Gaussian Naive Bayes used for classification.
We used linear regression on housing_dataset.
Results in scikit-learn

Result in our custom library

We used logistic regression on breast cancer dataset
Results in scikit-learn

Results in custom library

We use naive bayes on breast cancer dataset

Comparing it to performance in sklearn.
