Skip to content

BaseMax/qt-data-cleaner

Repository files navigation

Qt Data Cleaner

Interactive desktop tool for cleaning and transforming datasets. Built with Python, PyQt5, and pandas.

Qt Data Cleaner

Features

  • File Support: Load and export CSV and Excel files (.csv, .xlsx, .xls)
  • Column Profiling: View detailed statistics for each column including:
    • Data type
    • Count of values
    • Missing values and percentage
    • Unique values
    • Statistics for numeric columns (mean, std, min, max)
  • Missing Value Handling: Multiple strategies for dealing with missing data:
    • Drop rows with missing values
    • Fill with mean, median, or mode
    • Forward fill or backward fill
    • Fill with custom constant value
  • Data Transformations:
    • Normalize columns to 0-1 range
    • Standardize columns (Z-score normalization)
    • Label encode categorical columns
    • Drop duplicate rows
    • Reset dataframe index
  • Undo/Redo Support: Full history tracking with undo/redo functionality
  • Data Preview: Interactive table view with alternating row colors and missing value highlighting
  • Export Pipeline: Save cleaned data to CSV or Excel format

Installation

Requirements

  • Python 3.7 or higher
  • pip package manager

Setup

  1. Clone the repository:
git clone https://github.com/BaseMax/qt-data-cleaner.git
cd qt-data-cleaner
  1. Install dependencies:
pip install -r requirements.txt

Usage

Running the Application

python main.py

Or make it executable and run directly:

chmod +x main.py
./main.py

Quick Start Guide

  1. Open a Dataset:

    • Click "File" → "Open..." or use Ctrl+O
    • Select a CSV or Excel file
    • The data will be displayed in the table view
  2. View Column Profile:

    • The right panel shows detailed statistics for each column
    • Missing values are highlighted in red in the table
  3. Handle Missing Values:

    • Click "Data" → "Handle Missing Values..." or use the toolbar button
    • Select a fill method (mean, median, mode, etc.)
    • Choose which columns to apply the operation to
    • Click OK to apply
  4. Transform Data:

    • Click "Data" → "Transform..." or use the toolbar button
    • Select a transformation (normalize, standardize, encode, etc.)
    • Choose columns if applicable
    • Click OK to apply
  5. Undo/Redo:

    • Use "Edit" → "Undo" (Ctrl+Z) to revert changes
    • Use "Edit" → "Redo" (Ctrl+Shift+Z) to reapply changes
  6. Export Data:

    • Click "File" → "Export..." or use Ctrl+S
    • Choose output format (CSV or Excel)
    • Save the cleaned dataset

Sample Data

A sample dataset (sample_data.csv) is included with the repository for testing purposes. It contains employee data with some missing values.

Keyboard Shortcuts

  • Ctrl+O: Open file
  • Ctrl+S: Export file
  • Ctrl+Z: Undo
  • Ctrl+Shift+Z: Redo
  • F5: Refresh profile
  • Ctrl+Q: Quit application

Architecture

The application is structured into several components:

  • main.py: Application entry point
  • main_window.py: Main GUI window and user interface
  • data_model.py: Data management with undo/redo support
  • transformers.py: Data transformation utilities

Dependencies

  • PyQt5: GUI framework
  • pandas: Data manipulation and analysis
  • numpy: Numerical computing
  • openpyxl: Excel file support
  • scikit-learn: Data preprocessing and transformations

License

MIT License - see LICENSE file for details.

Author

Max Base

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

Interactive desktop tool for cleaning and transforming datasets. Built with Python, PyQt5, and pandas.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages