A Bash library for parsing and processing BIDS datasets into CSV-like structures, enabling flexible data filtering, extraction, and iteration within shell scripts.
Pattern matching is permissive with respect to BIDS spec, it may match some files which do not meet validation requirements.
- Converts BIDS datasets into a flat CSV format
- Extracts key BIDS entities from filenames
- Provides filtering, column selection, and row operations
- Allows iteration over rows with associative arrays
- Handles JSON sidecar files and metadata
- Designed for shell scripting in pipelines and automation
- Bash version: ≥ 4.3
(Due to associative arrays,
readarray, anddeclare -nusage)
macOS users: Apple's default Bash (3.2) is too old. You must upgrade to ≥ 4.3.
See: https://apple.stackexchange.com/questions/193411/update-bash-to-version-4-0-on-osx
Include the library in your script:
source libBIDS.shRun directly to dump dataset as CSV:
./libBIDS.sh bids-examples/ds001Parses a directory tree, identifies BIDS files, extracts BIDS entities, and outputs CSV.
csv_data=$(libBIDSsh_parse_bids_to_csv "bids-examples/ds001")Output columns:
The CSV columns use the full BIDS entity names (display names), not the short keys found in filenames.
derivatives: Pipeline name if in derivatives folderdata_type: BIDS data type (anat, func, dwi, etc.)- BIDS entities:
subject(not sub),session(not ses),task,acquisition(not acq),run, etc. suffix: File suffix (bold, T1w, dwi, etc.)extension: File extensionpath: Full file path
Note: When filtering or accessing columns, always use these full names (e.g., subject, session, acquisition).
Filters CSV data by columns, values, regex, and missing data.
libBIDSsh_csv_filter "${csv_data}" [OPTIONS]Options:
-c, --columns <col1,col2,...>: Select columns by name or index-r, --row-filter <col:pattern>: Keep rows where column matches value/regex (AND logic for multiple filters)-d, --drop-na <col1,col2,...>: Drop rows where listed columns are "NA"
Examples:
# Keep only subject and task columns
libBIDSsh_csv_filter "$csv_data" -c "subject,task"
# Filter for balloon analog risk task (ds001)
libBIDSsh_csv_filter "$csv_data" -r "task:balloonanalogrisktask"
# Multiple filters: task AND subject 01
libBIDSsh_csv_filter "$csv_data" -r "task:balloonanalogrisktask" -r "subject:sub-01"
# Complex filtering with regex
libBIDSsh_csv_filter "$csv_data" -r "task:(rest|motor)" -r "run:[1-3]"Removes columns that contain only NA values across all rows.
cleaned_csv=$(libBIDSsh_drop_na_columns "$csv_data")Example:
# Remove empty columns from dataset
csv_data=$(libBIDSsh_parse_bids_to_csv "bids-examples/ds001")
cleaned_csv=$(libBIDSsh_drop_na_columns "$csv_data")Processes CSV data to add a json_path column that links data files to their direct JSON sidecars.
Note: This only matches files where a JSON file exists with the exact same name (except extension). It does not resolve BIDS inheritance.
updated_csv=$(libBIDSsh_extension_json_rows_to_column_json_path "$csv_data")Behavior:
- Matches JSON files to corresponding data files based on BIDS entities
- Drops JSON rows that have matching data files
- Keeps unmatched JSON files with their path in
json_path - Adds
NAfor data files without direct JSON sidecars
Example:
csv_data=$(libBIDSsh_parse_bids_to_csv "bids-examples/ds001")
csv_with_json=$(libBIDSsh_extension_json_rows_to_column_json_path "$csv_data")Parses a JSON file into a bash associative array with type information.
declare -A json_data
libBIDSsh_json_to_associative_array "file.json" json_dataValue format:
type:valuefor primitives (e.g.,string:hello,number:42)array:item1,item2,item3for arraysobject:{json_string}for nested objects
Example:
declare -A sidecar
libBIDSsh_json_to_associative_array "bids-examples/volume_timing/sub-01/func/sub-01_task-rest_acq-dense_bold.json" sidecar
echo "TR: ${sidecar[RepetitionTime]}" # Output: number:2Extracts a column as a Bash array with deduplication and NA filtering.
libBIDSsh_csv_column_to_array "$csv_data" "column" array_var [unique] [exclude_NA]Arguments:
csv_data: CSV-formatted stringcolumn: Column name (e.g.,subject) or indexarray_var: Name of array variable to populateunique: "true" (default) to return only unique valuesexclude_NA: "true" (default) to exclude NA values
Example:
declare -a subjects
# Note: Use "subject", not "sub"
libBIDSsh_csv_column_to_array "$csv_data" "subject" subjects true true
echo "Unique subjects: ${subjects[@]}"
declare -a all_runs
libBIDSsh_csv_column_to_array "$csv_data" "run" all_runs false false
echo "All runs (including duplicates and NA): ${all_runs[@]}"Iterates CSV rows, exposes fields in an associative array with optional sorting.
while libBIDS_csv_iterator "$csv_data" row_var [sort_col1] [sort_col2] [-r]; do
# Process row
doneArguments:
csv_data: CSV data stringrow_var: Name of associative array to populate with each row. Keys correspond to column headers (e.g.,row[subject]).sort_columns: Optional column names to sort by-r: Optional reverse sort flag
Example:
declare -A row
while libBIDS_csv_iterator "$csv_data" row "subject" "session" "run"; do
echo "Processing: ${row[subject]} ${row[session]} ${row[run]}: ${row[path]}"
doneInternal function that parses BIDS filenames into component entities.
declare -A file_info
_libBIDSsh_parse_filename "sub-01_task-rest_bold.nii.gz" file_infoPopulated fields:
- Individual BIDS entities using short keys (
sub,ses,task,acq, etc.) suffix: File suffixextension: File extensiondata_type: Inferred data typederivatives: Pipeline name if applicablepath: Full path_key_order: Order of keys for iteration
#!/usr/bin/env bash
source libBIDS.sh
bids_path="bids-examples/ds001"
csv_data=$(libBIDSsh_parse_bids_to_csv "$bids_path")
# Extract unique subjects
declare -a subjects
libBIDSsh_csv_column_to_array "$csv_data" "subject" subjects true true
echo "Found subjects: ${subjects[*]}"
# Clean up empty columns
csv_data=$(libBIDSsh_drop_na_columns "$csv_data")
# Add JSON sidecar information (if sidecars match exactly)
csv_data=$(libBIDSsh_extension_json_rows_to_column_json_path "$csv_data")#!/usr/bin/env bash
source libBIDS.sh
# Using volume_timing dataset which has sidecars
bids_path="bids-examples/volume_timing"
csv_data=$(libBIDSsh_parse_bids_to_csv "$bids_path")
# Filter for functional BOLD data
func_csv=$(libBIDSsh_csv_filter "$csv_data" \
-r "data_type:func" \
-r "suffix:bold")
# Add JSON paths
func_csv=$(libBIDSsh_extension_json_rows_to_column_json_path "$func_csv")
# Process each file with its JSON metadata
declare -A row
while libBIDS_csv_iterator "$func_csv" row "subject" "task" "run"; do
echo "Processing: ${row[path]}"
if [[ "${row[json_path]}" != "NA" ]]; then
declare -A json_data
libBIDSsh_json_to_associative_array "${row[json_path]}" json_data
echo " TR: ${json_data[RepetitionTime]:-NA}"
echo " Task: ${json_data[TaskName]:-NA}"
fi
doneIf your dataset uses an entity that is not part of the official BIDS specification, you can include them in the parsing logic via JSON file(s) in the custom directory:
{
"entities": [
{
"name": "foo",
"display_name": "fooval",
"pattern": "*(_foo-+([a-zA-Z0-9]))"
},
{
"name": "bar",
"display_name": "baridx",
"pattern": "*(_bar-+([0-9]))"
}
...
]
}To see an example, rename the template file from custom/custom_entities.json.tpl to custom/custom_entities.json.
- All functions handle CSV data as strings, not files
- NA values are used for missing BIDS entities
- Pattern matching is permissive and may match non-BIDS-compliant files
- JSON processing requires
jqto be installed - Sort operations use version sort for natural ordering of numbers