perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

Quick Start | How to Build | Documentation | System Requirements

perf-cpp lets you profile for specific parts of your code, not the entire program.

Tools like Linux Perf, Intel® VTune™, and AMD uProf profile everything: application startup, configuration parsing, data loading, and all your helper functions. perf-cpp is different: place start() and stop() around exactly the code you want to measure. Profile one sorting algorithm. Measure cache misses in your hash table lookup. Compare two memory allocators. Skip all the noise.

What can perf-cpp do?

Built around Linux's perf subsystem, perf-cpp lets you count and sample hardware events for specific code blocks:

Count hardware events like perf stat, but only around the code you care about, not the entire binary (documentation)
Calculate metrics like cycles per instruction or cache miss ratios from the counters (documentation)
Read counter values without stopping for low-overhead measurements in tight loops (documentation)
Sample instructions and memory accesses like perf [mem] record, but targeted at specific functions (documentation)
Export and analyze results in your code: write samples to CSV, generate flame graphs, or correlate memory accesses with specific data structures
Mix built-in and processor-specific events like cycles, cache misses, or vendor PMU features (documentation)

See various practical examples and the documentation for more details.

Quick Start

Record Hardware Event Statistics

Count hardware events like perf stat—instructions, cycles, cache misses—while your code runs.

#include <perfcpp/event_counter.h>

/// Initialize the counter
auto event_counter = perf::EventCounter{};

/// Specify hardware events to count
event_counter.add({"seconds", "instructions", "cycles", "cache-misses"});

/// Run the workload
event_counter.start();
code_to_profile(); /// <-- Statistics recorded while execution
event_counter.stop();

/// Print the result to the console
const auto result = event_counter.result();
for (const auto [event_name, value] : result)
{
    std::cout << event_name << ": " << value << std::endl;
}

Possible output:

seconds:      0.0955897 
instructions: 5.92087e+07
cycles:       4.70254e+08
cache-misses: 1.35633e+07

Note

See the guides on recording event statistics and event statistics on multiple CPUs/threads. Check out the hardware events documentation for built-in and processor-specific events.

Record Samples

Record snapshots like perf [mem] record—instruction pointer, CPU, timestamp—every 50,000 cycles.

#include <perfcpp/sampler.h>

/// Create the sampler
auto sampler = perf::Sampler{};

/// Specify when a sample is recorded: every 50,000th cycle
sampler.trigger("cycles", perf::Period{50000U});

/// Specify what data is included into a sample: time, CPU ID, instruction
sampler.values()
    .timestamp(true)
    .cpu_id(true)
    .instruction_pointer(true);

/// Run the workload
sampler.start();
code_to_profile(); /// <-- Samples recorded while execution
sampler.stop();

const auto samples = sampler.result();

/// Materialize samples as CSV (-> analyze with python etc) ...
samples.to_csv("samples.csv");

/// ... or print the samples to the console
for (const auto& record : samples)
{
    const auto timestamp = record.metadata().timestamp().value();
    const auto cpu_id = record.metadata().cpu_id().value();
    const auto instruction = record.instruction_execution().logical_instruction_pointer().value();
    
    std::cout 
        << "Time = " << timestamp << " | CPU = " << cpu_id
        << " | Instruction = 0x" << std::hex << instruction << std::dec
        << std::endl;
}

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c
Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c

Note

See the sampling guide for what data you can record. Also check out the sampling on multiple CPUs/threads guide for parallel sampling.

More Examples

We have examples showing how perf-cpp works in the examples/ directory:

counting hardware events (examples/statistics)
sampling (examples/sampling)

Building

perf-cpp is designed as a library (static or shared) that can be linked to your application.

# Clone the repository
git clone https://github.com/jmuehlig/perf-cpp.git

# Switch to the repository folder
cd perf-cpp

# Optional: Switch to this development version
git checkout v0.12.6

# Build the library (in build/)
# -DBUILD_EXAMPLES=1        compiles all examples (optional)
# -DBUILD_LIB_SHARED=1      creates the library as a shared one (optional)
# -DGEN_PROCESSOR_EVENTS=1  generates and compiles a .cpp file that adds events specific to the underlying CPU (optional)
cmake . -B build -DBUILD_EXAMPLES=1
cmake --build build

# Optional: Build examples (in build/examples/bin) if -DBUILD_EXAMPLES=1
cmake --build build --target examples

Note

See the building guide for how to integrate perf-cpp into CMake projects.

Full Documentation

Building: Integrate perf-cpp into your C++ projects.
Counting Performance Events
- Basics: Record hardware event statistics directly in your application (like perf stat but fine-grained).
- Parallel and Multithreaded: Monitor events across threads and CPU cores.
- Metrics: Combine hardware events into metrics for better analysis.
- Live Access: Read counters without stopping, great for tight loops.
Recording Samples
- Basics: Record samples for specific code paths (like perf record and perf mem record but fine-grained).
- Parallel and Multithreaded: Record samples across multiple threads and CPU cores.
Analyzing Samples
- CSV Export: Export samples for analysis with statistical tools, spreadsheets, or custom scripts.
- Linux Perf Tools: Analyze samples with perf report and perf mem report.
- Flame Graphs: Translate instruction pointers to symbols and generate flame graphs.
- Memory Access Patterns: Link samples to data objects for per-instance memory profiling.
Built-in and Hardware-specific Events: Built-in events and how to add new ones for your CPU.
Perf Paranoid: Configure perf permissions.

System Requirements

Clang / GCC with support for C++17 features.
CMake version 3.10 or higher.
Linux Kernel 4.0 or newer (note that some features need a newer Kernel).
perf_event_paranoid setting: Adjust as needed to allow access to performance counters (see the Paranoid Value documentation).
Python3, if you make use of processor-specific hardware event generation.

Contribute and Contact

We welcome contributions and feedback. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me: jan.muehlig@tu-dortmund.de.

Further PMU-related Projects

Other profiling tools:

PAPI monitors CPU counters, GPUs, I/O, and more.
Likwid is a set of command-line tools for benchmarking with an extensive wiki.
PerfEvent is a lightweight wrapper for performance counters.
Intel's Instrumentation and Tracing Technology lets you control Intel VTune Profiler from your code.
Want to go lower-level? Use perf_event_open directly.

Resources about (Perf-) Profiling

Papers and articles about profiling (feel free to add your own via pull request):

Academic Papers

Blog Posts

C2C - False Sharing Detection in Linux Perf (2016)
PMU counters and profiling basics. (2018)
Detect false sharing with Data Address Profiling. (2019)
Advanced profiling topics. PEBS and LBR. (2018)

Name		Name	Last commit message	Last commit date
Latest commit History 566 Commits
docs		docs
events/x86		events/x86
examples		examples
include/perfcpp		include/perfcpp
script		script
src		src
test		test
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

What can perf-cpp do?

Quick Start

Record Hardware Event Statistics

Record Samples

More Examples

Building

Full Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academic Papers

Blog Posts

About

Uh oh!

Releases 26

Packages

Contributors 5

Uh oh!

Languages

License

jmuehlig/perf-cpp

Folders and files

Latest commit

History

Repository files navigation

perf-cpp: Effortless Hardware Performance Monitoring for C++ Applications

What can perf-cpp do?

Quick Start

Record Hardware Event Statistics

Record Samples

More Examples

Building

Full Documentation

Further Reading

System Requirements

Contribute and Contact

Further PMU-related Projects

Resources about (Perf-) Profiling

Academic Papers

Blog Posts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 26

Packages 0

Contributors 5

Uh oh!

Languages

Packages