Change the repository type filter
All
Repositories list
10 repositories
- General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.
- FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation
- BizFinBench.v2: A Unified Offline–Online Bilingual Benchmark for Expert-Level Financial Capability Evaluation of LLMs
- Compress2Focus: Efficient Coordinate Compression for Policy Optimization in Multi-Turn GUI Agents
- A Business-Driven Real-World Financial Benchmark for Evaluating LLMs
- PuzzleClone: An SMT-Powered Framework for Synthesizing Verified Mathematical Reasoning Data
- [MM 2025] A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning
- [MM 2025] NEXUS-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision