This project provides code for the master thesis on Video-based Fall Detection using Multimodal Large Language Models (MLLMs), specifically the detection of Human Falls and the subsequent state of being fallen. We also evaluate MLLMs jointly with general Human Activity classes like walking or standing to assess models on Human Activity Recognition (HAR).
The main experiments we conduct are:
- Zero-shot: No exemplars are given, just the task instruction
- Few-shot: Few (usually 1-10) video exemplars with associated ground truth label are supplied for In-Context Learning (ICL)
- Chain-of-Thought (CoT): Specifically, Zero-Shot CoT, i.e. no exemplars with reasoning trace are given. The model can come up with its own reasoning trace.
Requirements:
- Setup Environment with conda/uv
- Set recommended environment variables
The main entrypoint is scripts/vllm_inference.py and experiments can be configured using e.g.,experiment=zeroshot (the default is debug)
To run zero-shot experiments with InternVL3.5-8B, execute
python scripts/vllm_inference.py experiment=zeroshot model=internvl model.params=8BTo run few-shot experiments with Qwen3-VL-4B, execute
python scripts/vllm_inference.py experiment=fewshot model=qwenvl model.params=4BTo run CoT experiments with the default model, execute
python scripts/vllm_inference.py experiment=zeroshot_cotBesides settings experiments, the main configuration options are
- vLLM configs in
config/vllm(default:default, for faster warmup times, usedebug) - Sampling configs in
config/sampling(i.e.greedy,qwen3_instruct) - Model configs in
config/model(default:qwenvl) - Prompt configs in
config/prompt(default:default) with text-based output and Role Prompt
Other settings include:
- Data Processing options, i.e.
data.size=224ordata.split=cv - Hardware settings, notably
batch_size: specifies how many videos are loaded into memory at once. Reduce of RAM-constrainednum_workers: Number of worker processed for data loading
- Wandb logging config, notably
wandb.mode(online, offline or disabled)wandb.project(also configured byexperiment)
num_samples(int): constrain the number of samples used for inferencevllm.use_mock(bool): if True, skip vLLM engine and produce random predictions for debugging purposes that do not depend on vLLMvllm=debugfor faster warm-up times
We use the vLLM inference engine, optimized for high-throughput and memory-efficient LLM inference with multimodal support. Hydra is used for configuration management (see above)
- Install Conda
- Run
conda env create -f environment.yml
conda activate cu129_vllm15- Install additional dependencies using uv (installed inside colab environment)
uv pip install vllm==0.15.1 --torch-backend=cu129
MAX_JOBS=4 uv pip install flash-attn==2.8.3 --no-build-isolation
uv pip install -r requirements.txt
uv pip install -r requirements-dev.txt
uv pip install -e .At the time of writing, vLLM is compiled for cu129 by default. If you need a different version of CUDA, you have to install vLLM from source.
OMNIFALL_ROOT=path/to/omnifall
VLLM_WORKER_MULTIPROC_METHOD=spawnThese variables should be set before launching the vllm inference script.
CUDA_VISIBLE_DEVICES=0 # or e.g., 0,1
VLLM_CONFIGURE_LOGGING=0