-
Notifications
You must be signed in to change notification settings - Fork 70
Description
We have the high-level expectation that on tabular data (such as in the mgbench benchmarks) SuperDB's performance querying a CSUP file should meet-or-beat the performance of querying the same data in a Parquet file. However, as of super commit 1bced44, there's a few mgbench queries for which CSUP is performing noticeably worse than Parquet, with the mgbench bench2/q4 standing out as more than 2x worse.
Details
Repro is with super commit 1bced44.
Here's a table summarizing CSUP vs. Parquet performance (vector runtime, and in -dynamic mode) across all 18 mgbench queries in an "official" run on an AWS EC2 m6idn.xlarge instance. The CSUP ones highlighted in yellow are of interest since they represent where it's doing worse than Parquet.
As you can see, there's three where CSUP ls currently lagging, but let's start by studying bench2/q4 since it has the biggest gap. The input files are too big to attach to a GitHub Issue, but they can be downloaded from:
- https://brim-benchmarks.s3.us-east-2.amazonaws.com/mgbench/bench2.csup
- https://brim-benchmarks.s3.us-east-2.amazonaws.com/mgbench/bench2.parquet
I can replicate a similar perf delta on my Intel-based Macbook.
$ super -version
Version: v0.1.0-22-g1bced4402
$ time super -dynamic -c "
SELECT client_ip,
COUNT(*) AS num_requests
FROM 'bench2.csup'
WHERE log_time >= TIMESTAMP '2012-10-01 00:00:00'
GROUP BY client_ip
HAVING COUNT(*) >= 100000
ORDER BY num_requests DESC;"
{client_ip:219.63.173.93,num_requests:1540391}
{client_ip:229.50.247.232,num_requests:743801}
{client_ip:97.211.80.244,num_requests:733261}
{client_ip:152.149.228.251,num_requests:492221}
{client_ip:198.156.249.133,num_requests:370834}
{client_ip:70.86.124.37,num_requests:273057}
{client_ip:67.153.111.239,num_requests:167287}
{client_ip:249.92.17.134,num_requests:112909}
real 0m2.302s
user 0m11.978s
sys 0m0.701s
$ time super -dynamic -c "
SELECT client_ip,
COUNT(*) AS num_requests
FROM 'bench2.parquet'
WHERE log_time >= TIMESTAMP '2012-10-01 00:00:00'
GROUP BY client_ip
HAVING COUNT(*) >= 100000
ORDER BY num_requests DESC;"
{client_ip:"219.63.173.93",num_requests:1540391}
{client_ip:"229.50.247.232",num_requests:743801}
{client_ip:"97.211.80.244",num_requests:733261}
{client_ip:"152.149.228.251",num_requests:492221}
{client_ip:"198.156.249.133",num_requests:370834}
{client_ip:"70.86.124.37",num_requests:273057}
{client_ip:"67.153.111.239",num_requests:167287}
{client_ip:"249.92.17.134",num_requests:112909}
real 0m0.978s
user 0m6.199s
sys 0m0.404s