Video-based Fall Detection using Multimodal Large Language Models

Project Overview

This project provides code for the master thesis on Video-based Fall Detection using Multimodal Large Language Models (MLLMs), specifically the detection of Human Falls and the subsequent state of being fallen. We also evaluate MLLMs jointly with general Human Activity classes like walking or standing to assess models on Human Activity Recognition (HAR).

The main experiments we conduct are:

Zero-shot: No exemplars are given, just the task instruction
Few-shot: Few (usually 1-10) video exemplars with associated ground truth label are supplied for In-Context Learning (ICL)
Chain-of-Thought (CoT): Specifically, Zero-Shot CoT, i.e. no exemplars with reasoning trace are given. The model can come up with its own reasoning trace.

Quick Start

Requirements:

Setup Environment with conda/uv
Set recommended environment variables

The main entrypoint is scripts/vllm_inference.py and experiments can be configured using e.g.,experiment=zeroshot (the default is debug)

To run zero-shot experiments with InternVL3.5-8B, execute

python scripts/vllm_inference.py experiment=zeroshot model=internvl model.params=8B

To run few-shot experiments with Qwen3-VL-4B, execute

python scripts/vllm_inference.py experiment=fewshot  model=qwenvl model.params=4B

To run CoT experiments with the default model, execute

python scripts/vllm_inference.py experiment=zeroshot_cot

Configuration options

Besides settings experiments, the main configuration options are

vLLM configs in config/vllm (default: default, for faster warmup times, use debug)
Sampling configs in config/sampling (i.e. greedy, qwen3_instruct)
Model configs in config/model (default: qwenvl)
Prompt configs in config/prompt (default: default) with text-based output and Role Prompt

Other settings include:

Data Processing options, i.e. data.size=224 or data.split=cv
Hardware settings, notably
- batch_size: specifies how many videos are loaded into memory at once. Reduce of RAM-constrained
- num_workers: Number of worker processed for data loading
Wandb logging config, notably
- wandb.mode (online, offline or disabled)
- wandb.project (also configured by experiment)

Debugging options

num_samples (int): constrain the number of samples used for inference
vllm.use_mock (bool): if True, skip vLLM engine and produce random predictions for debugging purposes that do not depend on vLLM
vllm=debug for faster warm-up times

Tech Stack

We use the vLLM inference engine, optimized for high-throughput and memory-efficient LLM inference with multimodal support. Hydra is used for configuration management (see above)

Create the environment

Install Conda
Run

conda env create -f environment.yml
conda activate cu129_vllm15

Install additional dependencies using uv (installed inside colab environment)

uv pip install vllm==0.15.1 --torch-backend=cu129
MAX_JOBS=4 uv pip install flash-attn==2.8.3 --no-build-isolation
uv pip install -r requirements.txt
uv pip install -r requirements-dev.txt
uv pip install -e .

At the time of writing, vLLM is compiled for cu129 by default. If you need a different version of CUDA, you have to install vLLM from source.

Environment variables

Required

OMNIFALL_ROOT=path/to/omnifall
VLLM_WORKER_MULTIPROC_METHOD=spawn

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.claude		.claude
config		config
notebooks		notebooks
scripts		scripts
src/falldet		src/falldet
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video-based Fall Detection using Multimodal Large Language Models

Project Overview

Quick Start

Configuration options

Debugging options

Tech Stack

Create the environment

Environment variables

Required

Recommended

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

MoritzM00/fall-detection-mllm

Folders and files

Latest commit

History

Repository files navigation

Video-based Fall Detection using Multimodal Large Language Models

Project Overview

Quick Start

Configuration options

Debugging options

Tech Stack

Create the environment

Environment variables

Required

Recommended

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages