Skip to content

faranbutt/GraphFMD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸͺ™ GraphFMD: Graph based Financial Misconduct Detection

GraphFMD is a temporal graph learning benchmark for financial misconduct detection in the Bitcoin transaction network.
Participants must classify transactions as illicit (fraudulent) or licit (legitimate).

This repository is designed for Human vs. LLM task.


πŸ† Leaderboard

View the real-time rankings here: https://faranbutt.github.io/GraphFMD/

πŸš€ How to Participate

To ensure the secrecy of the test labels and participant data, we use a Secure Submission Portal.

Step 1: Prepare your Files

You must prepare two files:

  1. predictions.csv: Must contain exactly two columns: id and y_pred.
    • 1: Illicit (Fraudulent)
    • 2: Licit (Legal)
  2. metadata.json: A short description of your approach.
{
  "team": "Your_Team_Name",
  "run_id": "run_01/run_02.... etc",
  "author_type": "human / llm / hybrid",
  "model": "GCN / GraphSAGE / etc.",
  "notes": "Briefly describe your layers/hyperparameters"
}

Step 2: Upload to the Submission Portal

Submit your files via the official Google Form:
πŸ‘‰ Official Submission Form

Step 3: Automated Scoring

Once you submit the form:

  • A GitHub Action is triggered automatically.
  • Your model is scored against the Hidden Ground Truth.
  • The Leaderboard is updated instantly.

1. Task Overview

  • Task: Temporal Inductive Node Classification (Licit vs. Illicit).
  • Domain: Cryptocurrency (Bitcoin) Forensics.
  • Target: Predict the class label of each transaction (Illicit = 1, Licit = 2).
  • Metric: Macro-F1 across both classes (Illicit and Licit).

2. The Data

  • Nodes (Node Feature Matrix (X)): Bitcoin transactions.165 local and aggregate features. (Train = 16658 , Test = 8896)
  • Edges (adjacency matrix (A)) : The flow of BTC between transactions.

3. Difficulty level:

  • Feature Noise Gaussian noise was added to make the features simulate real world noisy data.
  • Temporal Shifting: Time-based split (Train: 1–34, Test: 35+)
  • Class Imbalance & Graph Sparsity: All illicit transactions are preserved while only 50% of licit transactions are retained (unknown nodes removed)

4. Submission Policy:

For maintaining fairness and competition competency

  • One submission policy is enforced so you are only allowed to do one form submission

6. Submission Format

To enter the competition, you must submit a CSV file named exactly prediction.csv inside the submissions/ folder.

submissions/participant1/prediction.csv
id,y_pred
6418,1
7952,2
.....
.....

id: Transaction ID (must match test_nodes.csv).

y_pred: The predicted class label:

  • 1: Illicit (Fraudulent)
  • 2: Licit (Legal)

7. Automated Validation Checks:

When a Pull Request is opened the bot will

  • Check identity (Verify if you have already submitted)
  • Check Formats (Ensure your JSON and CSV files are structured properly)

8. Repository Structure

.
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ public/            
β”‚   β”‚   β”œβ”€β”€ train_nodes.csv
β”‚   β”‚   β”œβ”€β”€ train_labels.csv
β”‚   β”‚   β”œβ”€β”€ test_nodes.csv
β”‚   β”‚   └── edgelist.csv
β”œβ”€β”€ competition/
β”‚   β”œβ”€β”€ baseline.py         # Starter GCN model
β”‚   β”œβ”€β”€ evaluate.py         # Scoring logic
β”‚   β”œβ”€β”€ metrics.py          # F1-Score calculation
β”‚   └── update_leaderboard.py
β”œβ”€β”€ submissions/            # Submission directory
β”‚   └── participant1
β”‚   β”‚  └── predictions.csv
β”œβ”€β”€ leaderboard/            # CSV/Markdown rankings
└── docs/                   # Interactive Leaderboard
└── images/                   

πŸ“ Citation

If you use this challenge, dataset, or repository in your research, please cite:

@dataset{graphfmd_2026,
  title={GraphFMD: Graph-based Financial Misconduct Detection Benchmark},
  author={Faran Taimoor Butt},
  year={2026},
  url = {https://github.com/faranbutt/GraphFMD}
}

Organizer

Faran Taimoor Butt Software Engineer and Researcher in Computer Vision, NLP & Graph ML.

For questions regarding the competition setup, data preprocessing or automated scoring issues, please open an Issue in this repository or contact me directly.

πŸ“š References

Learning Resources


Datasets

  • [1] Elliptic, www.elliptic.co.
  • [2] M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, T. Robinson, C. E. Leiserson, "Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics", KDD ’19 Workshop on Anomaly Detection in Finance, August 2019, Anchorage, AK, USA.

About

πŸͺ™ GraphFMD (Graph-based Financial Misconduct Detection)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages