Skip to content

fix(model): resolve NFS flock error when loading pretrained timm weights#7

Open
gomezzz wants to merge 3 commits intomainfrom
fix/pretrained-nfs-flock
Open

fix(model): resolve NFS flock error when loading pretrained timm weights#7
gomezzz wants to merge 3 commits intomainfrom
fix/pretrained-nfs-flock

Conversation

@gomezzz
Copy link
Collaborator

@gomezzz gomezzz commented Feb 17, 2026

Summary

  • Use cache_dir for pretrained weight downloads: Pass timm's native cache_dir parameter pointing to anomaly_match/pretrained_cache/, so HuggingFace Hub downloads and file locks happen on a local filesystem instead of NFS (which doesn't support fcntl.flock).
  • Bundle default efficientnet-lite0 weights via Git LFS: Ship the default tf_efficientnet_lite0.in1k pretrained weights (~18 MB) with the repo, eliminating the need for network access when using the default model.
  • Skip redundant pretrained download for eval_model: The eval_model's pretrained weights were immediately overwritten by copying from train_model, making the download wasteful. Now eval_model is created with pretrained=False.

Fixes OSError: [Errno 37] No locks available when running on NFS filesystems.

Context

Release 1.3.0 switched from efficientnet_lite_pytorch to timm, which uses huggingface_hub for pretrained weight downloads. huggingface_hub uses fcntl.flock() for cache file locking, which fails on NFS filesystems that don't support POSIX file locks.

Test plan

  • All 267 unit tests pass
  • Verified timm loads from local pretrained_cache/ without network access
  • Verified LFS tracking for the model blob
  • Test on NFS filesystem

Use timm's cache_dir parameter to store pretrained weights locally in
anomaly_match/pretrained_cache/, avoiding fcntl.flock failures on NFS
filesystems. Bundle the default tf_efficientnet_lite0.in1k weights so
the model loads without network access.

Also skip redundant pretrained weight download for eval_model in
FixMatch, since its weights are immediately overwritten by copying
from train_model.
@github-actions
Copy link

github-actions bot commented Feb 17, 2026

Overall Coverage

Coverage Report
FileStmtsMissCoverMissing
__init__.py80100% 
data_io
   SessionIOHandler.py3544786%112, 129–130, 174–175, 185–186, 253–254, 319–320, 323, 363–365, 407, 425–427, 449, 451–452, 456–458, 523, 598, 619, 633–634, 636–637, 639–641, 785–790, 804, 808–809, 813–815
   find_images_in_folder.py210100% 
   load_images.py912869%51–53, 58–60, 81, 93, 110, 140–147, 150, 156–157, 189, 248, 256, 277, 285, 292–294
   metadata_handler.py80890%70–71, 117, 122–123, 170–172
   save_config.py33390%92–94
datasets
   AnomalyDetectionDataset.py2483984%126–127, 356, 362–363, 365, 369, 371–372, 374, 395–396, 398, 404–405, 409, 422–423, 484–485, 491–496, 498–499, 503–506, 508–510, 512–514, 525
   BasicDataset.py52492%59, 61, 97, 103
   Label.py50100% 
   SSL_Dataset.py68395%136, 139, 209
   __init__.py00100% 
   data_utils.py56296%80, 199
datasets/augmentation
   randaugment.py921188%222, 225–226, 245, 328–330, 332–335
   randaugment_multispectral.py772370%62–63, 77–79, 120–121, 148–149, 166, 170, 174, 248, 253, 270–272, 274–277, 280–281
image_processing
   transforms.py57394%46–47, 76
models
   FixMatch.py2142986%108, 201–202, 217, 222, 254, 281–282, 285, 288–290, 293, 301, 306–307, 366, 404, 430–434, 495, 497–498, 502, 519–520
pipeline
   SessionTracker.py122397%127, 219–220
   session.py5358584%131–134, 384–387, 432, 445, 597, 654, 656–657, 665, 668, 679, 687, 717, 721, 725–726, 730, 735–736, 743, 747–748, 760, 770–771, 773, 777, 791–792, 794–796, 798, 801–802, 860–861, 863–866, 868–872, 878, 884, 889–893, 895, 903, 906–907, 923, 964–965, 971–972, 976–978, 980, 1005, 1031, 1065–1066, 1074, 1085, 1093, 1097, 1101, 1106, 1109, 1153, 1156
utils
   accuracy.py130100% 
   consistency_loss.py180100% 
   create_model_string.py40100% 
   cross_entropy_loss.py90100% 
   cutana_stream_utils.py70987%64–65, 88, 108, 118–122
   get_cosine_schedule_with_warmup.py11190%39
   get_default_cfg.py610100% 
   get_net_builder.py49785%81–82, 89, 92, 146–147, 151
   get_optimizer.py21290%54, 58
   print_cfg.py45491%89, 95, 117, 119
   set_log_level.py150100% 
   set_seeds.py13284%25–26
   validate_config.py1311191%180, 199, 214, 228, 244, 248, 264, 290, 379–381
TOTAL257332487% 

Tests Skipped Failures Errors Time
365 0 💤 0 ❌ 0 🔥 58.877s ⏱️

Add lfs: true to actions/checkout in CI so bundled pretrained weights
are fetched. Replace **kwargs with explicit pretrained parameter in
build_test_cnn to satisfy vulture dead code detection.
…s unavailable

If the repo is cloned without git-lfs, the bundled pretrained cache
contains LFS pointer files instead of actual weights. This adds a
try/except fallback that downloads from HuggingFace in that case,
with a warning suggesting git-lfs for offline use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments