Add LM-Evaluation-Harness integration with unified adapter pattern#7
Add LM-Evaluation-Harness integration with unified adapter pattern#7shreyashkar-ml wants to merge 1 commit intomainfrom
Conversation
|
The changes in naming (for example |
damian1996
left a comment
There was a problem hiding this comment.
Comments for now, probably I will add more for adapter file for extracting scores and samples.
eval_lmeval/adapter.py
Outdated
|
|
||
| # Helpers | ||
|
|
||
| def _infer_quantization(model_name_or_path: str) -> tuple[BitPrecision, Method]: |
There was a problem hiding this comment.
What do you think to move this question to common/utils.py? There usually are issues with extracting quantization info directly from evaluation logs, so this can be useful as a way to extract this info.
There was a problem hiding this comment.
Yes, this is used in more than a few occasion, I will probably move this and get_context_size part to utils.py
pyproject.toml
Outdated
|
|
||
| [tool.setuptools.packages.find] | ||
| include = ["helm*", "schema*", "common*", "config*"] | ||
| include = ["eval_helm*", "eval_lmeval*", "schema*", "common*", "config*"] |
eval_lmeval/adapter.py
Outdated
| } | ||
|
|
||
| method = method_map.get(method_key, Method.None_) | ||
| return precision, method |
There was a problem hiding this comment.
What about adding quant type (gptq, awq, ...) to output from this function and handle it in our schema as well?
There was a problem hiding this comment.
can do this, would you recommend editing Method class in eval_types.py or adding a new class for mapping from quant type to Method (existing class)?
|
|
||
| class LMEvalAdapter(BaseEvaluationAdapter): | ||
|
|
||
| CONFIG_FILE = "config.yaml" |
There was a problem hiding this comment.
Would you always have an access to these four files in lm-eval logs?
There was a problem hiding this comment.
file name may change, I will add some checks to ensure that naming consistency is maintained, but lm_eval works with yaml file (so unless it's terminal arguments entry), yes
eval_lmeval/adapter.py
Outdated
| if not dir_path.is_dir(): | ||
| raise FileNotFoundError(f"Directory {dir_path} does not exist") | ||
|
|
||
| cfg_path = dir_path / self.CONFIG_FILE |
There was a problem hiding this comment.
Have you tested it? I received TypeError: unsupported operand type(s) for /: 'str' and 'str'
Maybe use:
cfg_path = os.path.join(dir_path, self.CONFIG_FILE)
or
cfg_path = f'{dir_path}/{self.CONFIG_FILE}'
There was a problem hiding this comment.
Agreed, will change to ensure robustness, was working fine in my system so didn't think of it much that time
eval_lmeval/adapter.py
Outdated
|
|
||
| # Load task-level metrics | ||
| task_scores: Dict[str, Dict[str, float]] = {} | ||
| results_path = dir_path / self.RESULTS_FILE |
| @@ -0,0 +1,66 @@ | |||
| from __future__ import annotations | |||
There was a problem hiding this comment.
What is a point of this file? We probably don't need it because we won't run any experiments. We only converting eval logs from users to our unified schema.
There was a problem hiding this comment.
will remove
| # generative tasks often expose `exact_match` / `bleu` - handled ad-hoc | ||
| } | ||
|
|
||
| def detect_prompt_class(task_name: str) -> PromptClass: |
There was a problem hiding this comment.
Did you consider to fill this up later in the adapter file? Even if we don't have a field like in HELM (about question type), we still can confirm if task is multiple_choice when we reading responses for questions from the eval log.
| @@ -0,0 +1,340 @@ | |||
| import pytest | |||
There was a problem hiding this comment.
You can remove this file, because we don't run anything here.
| @@ -0,0 +1,10 @@ | |||
| model: hf | |||
|
Thanks @damian1996 , I will wait till andrew's PR for file cleanup and name change is merged, then will commit as per your recommendations to avoid merge conflicts. |
…ntization and model context window extraction to the common utils module. 1. Added adapter for transforming lm-eval outputs to the unified schema format. 2. Added converter for running lm-eval and dumping outputs to the unified schema format. 3. Added test for the adapter and converter, with test config for the lm-eval library in config/lm_eval_test_config.yaml. 4. Added _infer_quantization and _extract_context_window_from_config functions to the common utils module.
ca040da to
b3dd337
Compare
|
@damian1996 , I have updated the lm-eval integration accordingly, kindly check |
This commit adds support for the lm-eval library, and moves infer quantization and model context window extraction to the common utils module.
Complete pipeline: YAML → LMEvalRunner → lm-eval → LMEvalAdapter → Unified Schema