This project implements a modular ELT (Extract → Load → Transform) pipeline for analytical workloads. Data is ingested from Excel sources, loaded into Supabase (Postgres), staged, cleaned, and transformed with dbt, then served for analytics.
Final data is exposed via:
- Supabase REST API endpoints (per table/view).
- Direct PostgreSQL connections for BI tools.
- Power BI dashboards consuming curated tables.
The pipeline is containerized with Docker and orchestrated via Apache Airflow, running both manually (for demo) and on a scheduler (for production-like automation).
+-----------------+
| Excel Sources |
+-----------------+
|
v
+-----------------+
| Extraction | (Python in Airflow DAG)
+-----------------+
|
v
+-----------------+
| Raw Layer | (Supabase/Postgres)
+-----------------+
|
v
+-----------------+
| Staging Layer | (dbt transformations)
+-----------------+
|
v
+-----------------+
| Business Logic | (dbt models → analytics schema)
+-----------------+
|
v
+-----------------+
| Serving | (Supabase REST / Views)
+-----------------+
|
+---------------------+
| Power BI Dashboards |
+---------------------+
- Python – ingestion scripts (Excel → Postgres)
- PostgreSQL (Supabase) – warehouse
- dbt – transformations (staging + business logic)
- Docker – containerized environment
- Apache Airflow (CLI only) – pipeline orchestration & scheduling
- Supabase REST API – serving layer
- Power BI – visualization
.
├── dags/ # Airflow DAG definitions (elt_dbt_pipeline)
├── script/ # Python scripts for Excel ingestion
├── dbt/ # dbt models (staging + analytics)
├── docker-compose.yml # Docker services (Airflow, dbt, Postgres)
├── requirements.txt # Python dependencies
└── README.mdgit clone https://github.com/JumaOchi/cohort_metrics_everythingdata.git
cd cohort_metrics_everythingdataCreate .env file:
POSTGRES_HOST=...
POSTGRES_DB=...
POSTGRES_USER=...
POSTGRES_PASSWORD=...
SUPABASE_URL=...
SUPABASE_SERVICE_ROLE_KEY=...Build and start services:
docker-compose up --build
docker exec -it airflow bash #(moving inside container to run the dags manually for testing)List DAGs:
airflow dags listTrigger the pipeline:
airflow dags trigger elt_dbt_pipelineThis will:
- Extract data from Excel and load into raw schema in Supabase.
- Run dbt staging models to clean/standardize data.
- Run dbt business logic models to build analytics tables.
- Push final curated tables to Supabase for serving.
Airflow is configured with a scheduler that automatically runs elt_dbt_pipeline at predefined intervals.
This ensures continuous ingestion, transformation, and serving without manual triggers.
| Table | Description |
|---|---|
| unified_intake | Unified intake data across all cohorts. |
| fact_performance_metrics | Cohort-level performance metrics. |
| fact_application_details | Detailed application data per cohort. |
| dim_applicant_demographics | Demographic dimension of applicants. |
| mart_total_intake_summary | Aggregate summary of total intakes and trends. |
| mart_recruitment_effectiveness | Metrics evaluating recruitment channels and effectiveness. |
-
Project URL:
https://hsiovdjiuffpwimlhhok.supabase.co -
Anon Key:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6ImhzaW92ZGppdWZmcHdpbWxoaG9rIiwicm9sZSI6ImFub24iLCJpYXQiOjE3NTc1MTU0MjUsImV4cCI6MjA3MzA5MTQyNX0.oEUyQZSHG4kVJ9AQT-gfxnPKNGzOCqVWsL_tt2bo_2g
-H "apikey: YOUR_ANON_KEY" \
-H "Authorization: Bearer YOUR_ANON_KEY"JSON response (default):
curl "https://hsiovdjiuffpwimlhhok.supabase.co/rest/v1/{table_name}" \
-H "apikey: YOUR_ANON_KEY" \
-H "Authorization: Bearer YOUR_ANON_KEY"CSV download:
curl "https://hsiovdjiuffpwimlhhok.supabase.co/rest/v1/{table_name}" \
-H "apikey: YOUR_ANON_KEY" \
-H "Authorization: Bearer YOUR_ANON_KEY" \
-H "Accept: text/csv" \
-o {table_name}.csvReplace {table_name} with one of the names in the Analytics Tables section above.
I used the rest endpoints but you could connect directly though requires ssl:
- Connect to Supabase REST endpoints (each table separately).
Contributions are welcome!
- Fork the repo
- Create a feature branch
- Commit changes
- Open a PR
