Skip to content

Add vectorization comparison benchmark for OpenMP/Kokkos/SIMD#21

Closed
Amazingkivas wants to merge 1 commit intomainfrom
codex/analyze-vectorization-efficiency-in-cokkos
Closed

Add vectorization comparison benchmark for OpenMP/Kokkos/SIMD#21
Amazingkivas wants to merge 1 commit intomainfrom
codex/analyze-vectorization-efficiency-in-cokkos

Conversation

@Amazingkivas
Copy link
Owner

Motivation

  • Provide a small, reproducible micro-benchmark to compare OpenMP, Kokkos (different Layout / iteration orders) and an explicit SIMD implementation so we can measure whether Kokkos backends inhibit vectorization.
  • Give a simple playground to validate layout/iteration choices and to investigate why Kokkos-generated loops may be poorly vectorized.

Description

  • Added a new perf target vectorization_compare in perf-tests/CMakeLists.txt that builds vectorization_compare.cpp and places the binary in bin/ via RUNTIME_OUTPUT_DIRECTORY.
  • Implemented perf-tests/vectorization_compare.cpp, a minimal memory-bound kernel that measures and prints timings and checksums for: OpenMP baseline (#pragma omp parallel for), Kokkos MDRangePolicy with LayoutRight, Kokkos MDRangePolicy with LayoutLeft using two iterate orders, and a manual SIMD version using Kokkos::Experimental::native_simd.
  • Added perf-tests/README_vectorization.md with build/run instructions, guidance on interpreting layout/iteration effects, and recommended commands to search Kokkos backend sources and enable compiler vectorization reports for root-cause analysis.
  • Kept checksums in each mode to ensure numerical work is equivalent across variants and to make benchmark outputs comparable.

Testing

  • Ran git diff --check which passed without issues.
  • Attempted to configure the build with cmake .. but configuration failed because KokkosConfig.cmake was not available in this environment due to submodules not being fetched, so the build could not be completed.
  • Attempted to fetch submodules with git submodule update --init --recursive but it failed with network/GitHub access errors (CONNECT tunnel failed, response 403), preventing backend inspection and binary execution.
  • No runtime performance runs were executed because Kokkos and submodules were not available in this environment; the README includes exact commands to build and run the benchmark once submodules are present locally.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant