Multiscale Wasserstein Shortest Path Graph Kernel (MWSP)

A Scalable Graph Kernel Method for Graph Classification Using Wasserstein Distances and Multiscale Embeddings

Overview

The Multiscale Wasserstein Shortest Path Graph Kernel (MWSPO) is a state-of-the-art graph classification system that combines:

Hierarchical structural feature extraction

Optimal transport theory (Wasserstein distances)

GPU-accelerated computation

Support Vector Machines (SVM)

Designed for large-scale graph datasets (1,000+ graphs) with applications in:

Bioinformatics (molecular property prediction)

Social network analysis

Fraud detection systems

Recommendation engines

The system implements a novel graph kernel approach:

Key Innovations:

Label-Aware BFS Traversal

    Performs breadth-first search while respecting node labels

    Generates consistent features for isomorphic graphs

    Captures local neighborhood structures

Multiscale Feature Abstraction

    Hierarchical feature extraction at multiple depths

    Progressive abstraction of structural patterns

Shortest Path Embedding

Encodes all shortest paths from each node

Represents paths as label sequences

Uses hashing for fixed-dimension representation

Wasserstein Distance

Measures distributional similarity between graphs

Hybrid computation strategy:

    Exact Sinkhorn for small graphs

    Sliced approximation for large graphs

Engineering Innovations:

Memory-optimized graph processing pipeline

Hybrid CPU-GPU workload balancing

Automatic resource adaptation

Cluster-scale execution design

Memory Control Mechanisms:

Node subsampling (MAX_NODES=180)

Projection limiting (SLICED_PROJECTIONS=120)

Batch processing (GPU_BATCH_SIZE=96)

Depth limiting (MAXH_LIMIT=3, DEPTH_LIMIT=2)

Memory Optimization Strategy:

Hash-Based Compression

    Converts paths to 64-bit hashes (8 bytes vs 100+ bytes for strings)

    Fixed feature dimension (1,048,576) prevents vocabulary explosion

Sparse Matrix Representation

    Uses LIL (List of Lists) format during construction

    Converts to CSR (Compressed Sparse Row) for storage

    Saves 100-1000× memory for sparse features

Incremental Processing

    Processes graphs sequentially

    Clears intermediate results immediately

    Explicit garbage collection triggers

GPU-CPU Hybrid Workload

    GPU: BFS/SSSP for large graphs (>100 nodes)

    CPU: Small graph processing

    Automatic fallback mechanism

Resource Monitor Features:

Tracks execution time per stage

Measures RAM allocation/delta

Records peak GPU memory usage

Reports total memory consumption

This system represents a significant advancement in graph representation learning, balancing expressiveness with computational feasibility for real-world large-scale graph analysis tasks. The memory-optimized design makes graph kernel methods accessible for datasets previously considered infeasible due to memory constraints.

Installation & Usage

Clone repository:

git clone https://github.com/celestine1729/MWSP.git

run_cluster.sh


sbatch run_cluster.sh

Support

For questions or issues, please open a GitHub issue or contact Celestine1729@proton.me

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiscale Wasserstein Shortest Path Graph Kernel (MWSP)

Overview

Installation & Usage

Clone repository:

run_cluster.sh

Support

About

Uh oh!

Contributors

Uh oh!

Languages

License

Celestine1729/MWSP

Folders and files

Latest commit

History

Repository files navigation

Multiscale Wasserstein Shortest Path Graph Kernel (MWSP)

Overview

Installation & Usage

Clone repository:

run_cluster.sh

Support

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages