Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Jan 26, 2026

Improve row_columnar.rs benchmark to cover the full range of data types processed by process_sorted_row_partition() in JVM shuffle:

  • Primitive columns (100 Int64 columns)
  • Struct (flat with 5/10/20 fields)
  • Nested struct (2 levels deep)
  • Deeply nested struct (3 levels deep)
  • List
  • Map<Int64, Int64>

These benchmarks help measure the performance of the row-to-columnar conversion used by CometColumnarShuffle when writing shuffle data.

andygrove and others added 2 commits January 26, 2026 10:42
Add jvm_shuffle.rs benchmark that covers the full range of data types
processed by `process_sorted_row_partition()` in JVM shuffle:

- Primitive columns (100 Int64 columns)
- Struct (flat with 5/10/20 fields)
- Nested struct (2 levels deep)
- Deeply nested struct (3 levels deep)
- List<Int64>
- Map<Int64, Int64>

This replaces the old row_columnar.rs which only tested primitive columns.

These benchmarks help measure the performance of the row-to-columnar
conversion used by CometColumnarShuffle when writing shuffle data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove marked this pull request as draft January 26, 2026 17:55
@codecov-commenter
Copy link

codecov-commenter commented Jan 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.13%. Comparing base (f09f8af) to head (69fb543).
⚠️ Report is 896 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3290      +/-   ##
============================================
+ Coverage     56.12%   60.13%   +4.00%     
- Complexity      976     1468     +492     
============================================
  Files           119      175      +56     
  Lines         11743    16085    +4342     
  Branches       2251     2665     +414     
============================================
+ Hits           6591     9672    +3081     
- Misses         4012     5066    +1054     
- Partials       1140     1347     +207     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andygrove and others added 4 commits January 26, 2026 12:18
Use div_ceil() instead of manual ceiling division and replace
needless range loop with iterator pattern.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove marked this pull request as ready for review January 26, 2026 19:33
@andygrove andygrove changed the title perf: Improve benchmarks for row-to-columnar conversion in JVM shuffle perf: Improve benchmarks for native row-to-columnar used by JVM shuffle Jan 26, 2026
fn get_row_size(num_struct_fields: usize) -> usize {
// Top-level row has 1 column (the struct)
let top_level_bitset_width = SparkUnsafeRow::get_row_bitset_width(1);
// Struct pointer (offset + size) is 8 bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Struct pointer (offset + size) is 8 bytes
// Struct pointer (offset + size) is 8 bytes on 64bit architectures

let top_level_bitset_width = SparkUnsafeRow::get_row_bitset_width(1);

// Nested struct starts after top-level row header + pointer
let nested_offset = top_level_bitset_width + 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a thought, its too many eights, prob it would be easy to name them? where is the pointer size or int64 size, etc?

// Fill nested struct with some data
for i in 0..num_struct_fields {
let value_offset = nested_offset + nested_bitset_width + i * 8;
let value = (i as i64) * 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is 100 here? is aligning?

false,
0,
None,
&CompressionCodec::Zstd(1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also check other codecs? 🤔
I might be wrong but Spark uses LZ4 for IO_COMPRESSION_CODEC which is used for shuffle?

private[spark] val IO_COMPRESSION_CODEC =
  ConfigBuilder("spark.io.compression.codec")
    .doc("The codec used to compress internal data such as RDD partitions, event log, " +
      "broadcast variables and shuffle outputs. By default, Spark provides four codecs: " +
      "lz4, lzf, snappy, and zstd. You can also use fully qualified class names to specify " +
      "the codec")
    .version("0.8.0")
    .stringConf
    .createWithDefaultString("lz4")

}

/// Create a schema with nested structs: Struct<Struct<int64 fields>>
fn make_nested_struct_schema(num_fields: usize) -> DataType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some feeling make_nested_struct_schema and make_deeply_nested_struct_schema can be generalized?

fn get_nested_row_size(num_inner_fields: usize) -> usize {
// Top-level row has 1 column (the outer struct)
let top_level_bitset_width = SparkUnsafeRow::get_row_bitset_width(1);
let struct_pointer_size = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this value should prob be const on crate level?

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove WDYT it would be nice to get a followup PR to see nested type of list/map/struct combinations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants