Classification of Liquid and Ice Particles in Mixed-Phase Clouds via Hybrid Convolutional Neural Network (CNN) and Multimodal Feature Learning
Cloud composition is critical in climate modeling and weather prediction, as the phase of water (liquid, solid, or mixed) governs fundamental physical processes like solar radiation transfer (albedo), energy exchange, and precipitation formation.
The NSF/NCAR Research Aviation Facility utilizes airborne instruments, such as the Two-Dimensional, Stereo, Particle Imaging Probe (2D-S), which captures a binary 2D image (shadowgraph) of particles. This provides high-resolution data on size and complex shape.
Cloud composition is critical in climate modeling and weather prediction because the phase of water governs physical processes like albedo, energy exchange, and precipitation formation. The NSF/NCAR Research Aviation Facility uses airborne instruments, like the Two-Dimensional, Stereo, Particle Imaging Probe (2D-S) to study these cloud microphysics. The probe captures a binary 2D image representation as the particles pass through the probe, to provide high-resolution data on size and complex shape.
- Definitive Phases: In air temperatures above 0-1 °C, particles are assumed liquid (water). Below -40 °C, they are assumed solid (ice).
- Mixed-Phase: Between these values is the mixed-phase range, where supercooled water and ice particles coexist, and we are unable to assume the particle state based on air temperature.
- Current Challenge: Currently, phase classification within the mixed-phase range is tedious, often relying on manual inspection of the imagery. There is no reliable, automated way to accurately classify these particles.
I am seeking to address this gap by using a CNN on the particle-by-particle data to classify mixed-phase cloud particles as either liquid or ice. I will first train just using the particle images, and then I will add in environmental context to help refine the model.
The core objective is a Binary Classification task to identify if a particle is liquid or solid.
- Data Filtering: To make sure our data is clean for training, we are filtering out particles under 100 microns and perfectly rectangular particles (with a 5% margin). This removes small particles that show up as square and could confuse the model, and removes rectangles which are almost always noise.
- Labeling: Using the temperature data taken concurrently alongside the particle measurements, our labeled dataset is created.
-
Image Segmentation & Standardization: The NetCDF file contains the raw imagery data and the boundary indices (
$\mathbf{starty}$ ,$\mathbf{stopy}$ ,$\mathbf{startx}$ ,$\mathbf{stopx}$ ) for each particle. I am extracting cropped 2D binary images for each particle using these indices. Each image is then scaled (preserving its aspect ratio) and zero-padded to a uniform$128 \times 128$ pixel canvas size, creating the required grid input for the CNN. - Class Imbalance: I have ~3.6 times the images for solid as I do for liquid. To handle the class imbalance, I will try oversampling the liquid images with augmentation, ensuring balanced batches.
Initially, I will test a CNN just using the labeled images to see how it does, first with randomly initialized weights, and then exploring options for pretrained weights.
I ultimately expect to employ a Multi-Modal Convolutional Neural Network (CNN) to process both data types as referenced here:
-
Image Branch (CNN): The
$\mathbf{128 \times 128 \times 1}$ grayscale image is processed through 4 convolutional blocks:- Conv2D layers with filters: 32 → 64 → 128 → 128
- Each block includes BatchNormalization and MaxPooling (2×2)
- Output is flattened and passed through Dense layers (256 neurons) with Dropout (0.3)
-
Tabular Branch (Fully Connected or MLP): The features (temperature, air speed, altitude) are processed through:
- Dense layers: 32 → 16 neurons
- Dropout regularization
- Fusion & Prediction: Feature vectors from both branches are concatenated and passed through additional Dense layers (128 → 64 neurons) with Dropout ending with a softmax output layer producing binary class probabilities (0: Liquid, 1: Ice).
| Metric | Detail |
|---|---|
| Project Data Source | CGWaveS RF02 project data (with potential for multi-project validation) |
| Dataset Size (Labeled Samples) |
|
| Target Variable | Particle Phase (Binary Classification): 0 (Water) / 1 (Ice) |
| Input Feature Type | Multimodal (Image Array + Tabular Vector) |
| Image Input Dimensions |
|
| Tabular Features | 3 (Temperature, Air Speed, Altitude) |
| Component | Status | Details |
|---|---|---|
| Data Access | Complete | NetCDF particle-by-particle and state variables loaded in pbp_plotting.ipynb
|
| Pre-processing | Complete | Image extraction, scaling, and standardization to plot_particle_standardized() function |
| Image Export | Complete | Standardized particle images saved. |
| Model Architecture | In progress | Multi-modal CNN implemented in particle_classification_CNN.ipynb with separate branches for images and environmental features |
| Training Pipeline | In progress | Complete training pipeline with data loading, splitting, callbacks, and evaluation. Train, validate, and refine the model on classified particles |
| Variable | Type | Role | Implementation |
|---|---|---|---|
| Array ( |
Primary input for CNN; provides detailed shape morphology | Processed through 4-block CNN architecture | |
| Binary | Target variable: 0 (Liquid) / 1 (Ice) | One-hot encoded for training |
-
$\mathbf{aspectratio}$ : Particle shape metric (could be added to tabular features) -
$\mathbf{diam}$ : Particle size metric (could be added to tabular features) -
$\mathbf{area}$ : Particle area (could be added to tabular features) -
$\mathbf{TASX}$ :True Air Speed for environmental context -
$\mathbf{GGALT}$ : Flight altitude for environmental context -
$\mathbf{ATX}$ : Air temperature. Used for classification but could be used in the mix phase as a feature.