Skip to content
@nemarDatasets

nemarDatasets

NEMAR Datasets

Research datasets in Brain Imaging Data Structure (BIDS) format, hosted on GitHub + AWS S3 infrastructure.

Purpose

This organization hosts BIDS-formatted datasets that cannot be hosted on public repositories (OpenNeuro, Zenodo, etc.) due to restrictive licenses, while remaining freely available for research use.

Hosting criteria:

  • ✅ BIDS-compliant format
  • ✅ Freely available for academic/research use
  • ✅ Restrictive license preventing hosting on public repositories (e.g., non-commercial, research-only)

Available Datasets

Dataset ID DOI Size Modality Description
HBN-EEG NC nm000103 10.5281/zenodo.17306880 270 GB EEG Healthy Brain Network EEG, Non-commercial
emg2qwerty nm000104 10.5281/zenodo.17287903 149 GB EMG Typing task sEMG dataset
discrete_gestures nm000105 10.5281/zenodo.17283593 14 GB EMG Hand gesture recognition
handwriting nm000106 10.5281/zenodo.17283865 30 GB EMG Handwriting sEMG dataset
wrist nm000107 10.5281/zenodo.17282507 1.9 GB EMG Wrist control sEMG dataset

Downloading Datasets

Using DataLad (recommended)

DataLad enables efficient access to large datasets stored across GitHub (metadata) and S3 (data files).

# Install DataLad (macOS)
brew install datalad

# Clone dataset (lightweight - only downloads metadata)
datalad clone https://github.com/nemarDatasets/nm000107.git
cd nm000107

# Download specific files
datalad get sub-01/emg/sub-01_task-wrist_emg.edf

# Download all data
datalad get .

# Remove data files (keep metadata)
datalad drop .

Using Git (metadata only)

# Clone repository (metadata only, no large files)
git clone https://github.com/nemarDatasets/nm000107.git
cd nm000107

# View S3 URLs for data files
cat .git/annex/objects/.../...

Direct S3 Access

Large binary files (.edf, .bdf) are stored on S3 with public read access:

# List dataset files
aws s3 ls s3://nemar/nm000107/ --recursive --no-sign-request

# Download specific file
aws s3 cp s3://nemar/nm000107/path/to/file.edf . --no-sign-request

Contributing

Reporting Issues

Found incorrect metadata, missing files, or BIDS compliance issues?

  1. Go to the dataset repository (e.g., nm000107)
  2. Click IssuesNew Issue
  3. Describe the problem with:
    • File path or subject ID
    • Expected vs actual behavior
    • BIDS validator output (if applicable)

Proposing Changes

For metadata corrections (JSON, TSV, README):

  1. Fork the dataset repository
  2. Clone your fork locally
  3. Make changes to metadata files
  4. Commit with clear message: fix: correct participant age in participants.tsv
  5. Push to your fork
  6. Open Pull Request with description of changes

For data file issues:

  • File Issues only (data files are immutable annexes)
  • Corrections will be released as new dataset versions

Dataset Versioning

Datasets use semantic versioning (v1.0.0, v1.1.0, etc.):

  • Patch (v1.0.1): Metadata fixes, documentation updates
  • Minor (v1.1.0): New participants, additional sessions
  • Major (v2.0.0): Breaking changes, restructuring

Each version gets:

  • Git tag
  • GitHub release
  • Zenodo DOI (versioned)

Revising a Dataset

If you're a dataset maintainer and need to create a new version:

Step 1: Update Metadata

# Clone dataset
datalad clone https://github.com/nemarDatasets/nm000XXX.git
cd nm000XXX

# Update dataset_description.json, README.md, or other metadata
# Make sure Authors field lists only data creators (not curators)

# Save changes
datalad save -m "fix: update metadata with complete author information"

# Push and create PR
datalad push --to origin
gh pr create --title "Update dataset metadata"

Step 2: Version Bump (after PR merged)

Minor version updates (v1.0.0 → v1.1.0):

  • Metadata improvements
  • Adding participants/sessions
  • Non-breaking changes

This creates:

  • New git tag
  • New GitHub release
  • New version DOI under same concept DOI

Important Notes:

  • ✅ Only modify metadata files (*.json, *.tsv, *.md)
  • ❌ Never modify data files (*.edf, *.bdf) - create new dataset instead
  • ✅ Remove curator names from Authors field
  • ✅ List only original data creators in Authors
  • ⚠️ DOIs are permanent - review carefully before publication

For detailed workflow, see repository documentation.

License

Each dataset has its own license specified in dataset_description.json and root LICENSE file. Common restrictions:

  • ✅ Academic/research use
  • ❌ Commercial use
  • ❌ Redistribution without attribution
  • ❌ Public repository hosting (e.g., OpenNeuro)

Always check the dataset's LICENSE file before use.

Technical Details

Infrastructure:

  • GitHub: Metadata (JSON, TSV, README) + DataLad/git-annex pointers
  • AWS S3: Binary data files (EMG recordings)
  • Zenodo: DOI registration + archived releases

BIDS Validation:

  • Datasets pass basic BIDS checks (required files, structure)
  • Full validator compliance is work in progress

Data Access:

  • S3 public read access (no AWS account needed)
  • No rate limiting on downloads
  • Free egress for research use

Contact

  • Issues: Use repository-specific issue trackers
  • General questions: Open discussion in .github repository
  • New dataset submissions: Contact dataset maintainers

Hosted by NEMAR (NeuroElectroMagnetic Archive) infrastructure

Popular repositories Loading

  1. .github .github Public

  2. nm000151 nm000151 Public

    R3_mini_L100_bdf - NEMAR Dataset

  3. nm000152 nm000152 Public

    HBN R3 Mini L100 BDF - NEMAR Dataset

Repositories

Showing 3 of 3 repositories

Top languages

Loading…

Most used topics

Loading…