Skip to content

mgbench bench2/q2 ~2x perf drop on Parquet in vector runtime #6631

@philrz

Description

@philrz

Over the 5 month gap since we last focused on benchmark performance, we've seen a ~2x perf drop in running the mgbench bench2/q2 query in vector runtime on Parquet input.

Details

Repro is with super commit 1bced44. The test data is too large to attach to the issue but can be downloaded from https://brim-benchmarks.s3.us-east-2.amazonaws.com/mgbench/bench2.parquet.

This effect was first seen on an "official" run of the benchmarks on an AWS EC2 m6idn.xlarge instance, but the repro below on my Intel-based Macbook shows the same effect.

$ super -version
Version: v0.1.0-22-g1bced4402

$ time super -c "
SELECT *
FROM 'bench2.parquet'
WHERE status_code >= 200
  AND status_code < 300
  AND request LIKE '%/etc/passwd%'
  AND log_time >= TIMESTAMP '2012-05-06 00:00:00'
  AND log_time < TIMESTAMP '2012-05-20 00:00:00'
ORDER BY log_time"

{log_time:2012-05-09T14:46:58Z,client_ip:"201.183.185.11",request:"/?-nd+auto_prepend_file%3D/etc/passwd",status_code:200::int16,object_size:21173}
{log_time:2012-05-13T20:17:05Z,client_ip:"201.183.185.11",request:"/?-nd+auto_prepend_file%3D/etc/passwd",status_code:200::int16,object_size:21809}

real	0m0.998s
user	0m7.133s
sys	0m0.459s

The last time I can show the better performance on this query was at commit 6559986.

$ super -version
Version: 65599869b

$ SUPER_VAM=1 time super -c "
SELECT *
FROM 'bench2.parquet'
WHERE status_code >= 200
  AND status_code < 300
  AND request LIKE '%/etc/passwd%'
  AND log_time >= TIMESTAMP '2012-05-06 00:00:00'
  AND log_time < TIMESTAMP '2012-05-20 00:00:00'
ORDER BY log_time"

{log_time:2012-05-09T14:46:58Z,client_ip:"201.183.185.11",request:"/?-nd+auto_prepend_file%3D/etc/passwd",status_code:200::int16,object_size:21173}
{log_time:2012-05-13T20:17:05Z,client_ip:"201.183.185.11",request:"/?-nd+auto_prepend_file%3D/etc/passwd",status_code:200::int16,object_size:21809}

        0.56 real         3.72 user         0.40 sys

Normally I'd binary search to a precise commit jump to isolate the perf delta. Unfortunately, the merge of #6334 broke the ability to run this query at all, and when the breakage was fixed later with the merge of #6374, it was at the the ~2x worse perf and has stayed there since.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions