PJ hsjoi0214

PJ · Embedded Software Engineer → Data/AI Engineer

E-mobility software engineer for embedded systems with experience in data analysis and transformation. Now I’m transitioning to pure Data Engineering and ML system roles out of strong interest and curiosity.

Professional Background

Experienced Embedded Software Engineer with a strong foundation in automation, data-driven systems, and scalable software architectures.
Currently transitioning into Data Engineering and Applied Machine Learning, leveraging a deep understanding of system design, data flows, and distributed computation.

Technical Alignment with Data Engineering

Engineered complex automation and control systems using PA-Base/Script, an object-oriented scripting environment conceptually similar to Python/C++ which helped me build strong foundations in modular software design, data manipulation, and process automation.
1. Designed and deployed automated data acquisition and transformation pipelines for large-scale battery testing which are analogous to modern ETL (Extract, Transform, Load) workflows in data engineering.
2. Implemented process control flows via DAG-based orchestration (PA-Graph), mirroring dependency management in tools like Apache Airflow.
1. Developed structured and distributed databases for managing cell, pack, and end-of-line test data which conceptually aligned with PostgreSQL, AWS RDS, and DynamoDB architectures.
2. Implemented cloud-based data synchronization for global test environments, paralleling AWS S3 and Azure Data Lake solutions.
1. Analyzed large-scale battery performance data to detect trends and anomalies using statistical and algorithmic reasoning and laying groundwork for machine learning workflows.
2. Built user-facing dashboards (PA-Design) for visualization and reporting, comparable to frameworks like Streamlit or Plotly.
1. Built real-time monitoring solutions for distributed test systems, providing insight into data quality, system health, and performance which conceptually aligned with Prometheus, Grafana, and AWS CloudWatch.
2. Defined alerting and metric-tracking logic for anomaly detection and proactive maintenance.
Automated deployment and testing pipelines for hardware-software integration which extends continuous integration and delivery (CI/CD) concepts into data and MLOps workflows.
Led global customer training sessions across Europe, the USA, and China, authored internal documentation and user guides to standardize testing and data workflows.

Broader Experience

Developed full-stack applications and Data science / ML-based projects, demonstrating proficiency across both software engineering and data infrastructure layers.
Familiar with AWS Cloud, Python, SQL, Databricks, Terraform, Docker, and CI/CD pipelines.

My experience in embedded systems taught me to build reliable, data-centric automation in distributed environments :— skills that map directly to modern data engineering and cloud computing.

Skills & Transition Path

Transitioning to working with production-grade data engineering, data science, and applied ML projects.
AWS Cloud Solutions: Glue, Lambda, API Gateway, S3, IaC (Terraform, CloudFormation), Simple Data Lake, CloudWatch, Cost Explorer, RDS, DynamoDB, IAM, VPC Security, Databricks, Jenkins (CI/CD), Airflow (DAGs).
AWS (Cloud): Lambda, S3, API Gateway, RDS, DynamoDB, IAM, Service Catalog, Terraform (IaC), CloudWatch, Cost Explorer, EKS, SQS, Glue, Athena, VPC, and others.

Programming & Tools: Python, SQL, Unix Shell Scripting, PySpark, ETL.

DevOps & Automation: CI/CD, Git, Jenkins, Airflow, Terraform (IaC), Kafka (Basic), Containerisation (EKS, Docker).

Design & Architecture: System Design, Client-Server Architecture, Microservices, Serverless Architecture, Event-Driven Architecture, Data Modeling, Database Design.

Observability & Monitoring: OpenTelemetry (Otel), Jaeger, Databricks, Prometheus, Grafana, custom DIY Monitoring & Observability Panel.

Featured Projects (learning + build)

Market Data Platform _{Cloud-native streaming & batch pipelines for financial market data, data quality gates + real-time & analytical serving.}	The Knowledge Drip _{AI-driven knowledge delivery platform using hybrid search (BM25 + embeddings) & personalized insights via SMS.}
RAGbot _{RAG chatbot for Crime and Punishment — information retrieval + LLM via Streamlit.}	Housing Price Prediction _{Feature-engineered XGBoost pipeline; Streamlit app; Kaggle RMSE 0.12033.}
Brazil Market Expansion _{SQL + Tableau dashboards on an artificial Brazil market dataset; structured insights & schema design.}	Eniac Discount Analysis _{Discount strategy & product segmentation on €7.8M revenue; seasonal demand & margin impact.}
Weather App _{Minimalist JS + OpenWeather app with essentials + outfit suggestions.}	Movie Night _{CLI scraper curating top 50 films of 2023; filters + GCS/Heroku.}

Current Work & Learning

1. Knowledge app integrates multiple APIs + Supabase(PostgreSQL) + hosting environment + recommendation system (repo is private, permission-based access).
2. Medium Article that explains the detailed workflow of the Knowledge-app.
3. Personal blogging website built from scratch — roadmap includes adding a text-to-speech model (private repo).
4. Medium Article explaining the workings of Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) in depth.
1. Agentic Knowledge graphs construction
2. Building AI Agents and Agentic Workflows

Journey & Achievements

Moving closer to downstream data roles through projects, certifications, and writing:

- Data Engineering — DeepLearning.AI (4-course specialization using AWS)
- Data Science — WBS Academy, Berlin
- Deep Learning — DeepLearning.AI (5-course specialization)
- RAG — DeepLearning.AI
- Docker & Kubernetes
- Short courses: GCP Essential Training; Statistics (3-part series)

Collaboration & Contact

Open to and excited about collaborating on end-to-end data engineering, data science, and applied ML projects, anything from small builds to production-grade pipelines.
From embedded systems to end-to-end data workflows: engineering pipelines, applied ML, RAG and deep learning — deployed with DataOps/DevOps practices (CI/CD, IaC, automation, monitoring, Docker/Kubernetes).