Skip to content

mgbench bench2/q4 more than 2x worse perf on CSUP than Parquet #6632

@philrz

Description

@philrz

We have the high-level expectation that on tabular data (such as in the mgbench benchmarks) SuperDB's performance querying a CSUP file should meet-or-beat the performance of querying the same data in a Parquet file. However, as of super commit 1bced44, there's a few mgbench queries for which CSUP is performing noticeably worse than Parquet, with the mgbench bench2/q4 standing out as more than 2x worse.

Details

Repro is with super commit 1bced44.

Here's a table summarizing CSUP vs. Parquet performance (vector runtime, and in -dynamic mode) across all 18 mgbench queries in an "official" run on an AWS EC2 m6idn.xlarge instance. The CSUP ones highlighted in yellow are of interest since they represent where it's doing worse than Parquet.

Image

As you can see, there's three where CSUP ls currently lagging, but let's start by studying bench2/q4 since it has the biggest gap. The input files are too big to attach to a GitHub Issue, but they can be downloaded from:

I can replicate a similar perf delta on my Intel-based Macbook.

$ super -version
Version: v0.1.0-22-g1bced4402

$ time super -dynamic -c "
SELECT client_ip,
       COUNT(*) AS num_requests
FROM 'bench2.csup'
WHERE log_time >= TIMESTAMP '2012-10-01 00:00:00'
GROUP BY client_ip
HAVING COUNT(*) >= 100000
ORDER BY num_requests DESC;"

{client_ip:219.63.173.93,num_requests:1540391}
{client_ip:229.50.247.232,num_requests:743801}
{client_ip:97.211.80.244,num_requests:733261}
{client_ip:152.149.228.251,num_requests:492221}
{client_ip:198.156.249.133,num_requests:370834}
{client_ip:70.86.124.37,num_requests:273057}
{client_ip:67.153.111.239,num_requests:167287}
{client_ip:249.92.17.134,num_requests:112909}

real	0m2.302s
user	0m11.978s
sys	0m0.701s

$ time super -dynamic -c "
SELECT client_ip,
       COUNT(*) AS num_requests
FROM 'bench2.parquet'
WHERE log_time >= TIMESTAMP '2012-10-01 00:00:00'
GROUP BY client_ip
HAVING COUNT(*) >= 100000
ORDER BY num_requests DESC;"

{client_ip:"219.63.173.93",num_requests:1540391}
{client_ip:"229.50.247.232",num_requests:743801}
{client_ip:"97.211.80.244",num_requests:733261}
{client_ip:"152.149.228.251",num_requests:492221}
{client_ip:"198.156.249.133",num_requests:370834}
{client_ip:"70.86.124.37",num_requests:273057}
{client_ip:"67.153.111.239",num_requests:167287}
{client_ip:"249.92.17.134",num_requests:112909}

real	0m0.978s
user	0m6.199s
sys	0m0.404s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions