feat: #evaluation in ... command for easier evaluation#1706
feat: #evaluation in ... command for easier evaluation#1706alexkeizer wants to merge 8 commits intomainfrom
#evaluation in ... command for easier evaluation#1706Conversation
…single JSON object
some options suggested by @bollu, such as omitting the details of messages and including walltime, and some other options to make it easier to run tests
|
@bollu I added the options you mentioned, could you have a look to see if this is usable in its current form, or whether we need more? |
|
bv_decide solved 0 theorems. |
|
bv_decide solved 0 theorems. |
|
@alexkeizer No, the LLVM evaluation has not been changed by me. What's in the repo is the "latest". I attempted to improve it using |
So when I open, say, theorem test_sext_zext_thm (e : IntW 16) : sext 64 (zext 32 e) ⊑ zext 64 e := by
simp_alive_undef
simp_alive_ops
simp_alive_case_bash
simp_alive_split
extract_goals
all_goals sorryI assumed the |
This PR adds an
#evaluation in $cmdcommand similar to#guard_msgs in $cmd, which collects all messages generated by an arbitrary command, and serializes them into a JSON object. The idea is that this will be a building block for any kind of evaluation we'd like to do, to easily have existing tactics output JSONL without having to modify those tactics.I was originally planning to use this to convert the existing LLVM evaluation to JSONL, but it seems that evaluation has fundamentally changed. @bollu is the LLVM evaluation superseded by your evaluation?
TODO:
EvaluationHarness