Skip to content

exd02/hints_analyser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UTXO Bitfield Compression Statistical Analysis

This project provides a statistical analysis of the bitfield used to represent the unspent transaction output (UTXO) set, as described in the Transaction Output Bitfield Compression research problem. The goal is to visualize and understand the distribution of runs of zeros and ones in the so-called "hints file" to determine which compression method may be most suitable for this data structure.

Problem Context

The UTXO set can be represented as a bitfield over all transaction outputs (TXOs), where each bit indicates whether a TXO is spent (0) or unspent (1). Efficiently compressing this bitfield is important for reducing the size of the hints file, which is used in SwiftSync to speed up initial block download (IBD) and may be integrated into Bitcoin Core in the future.

Note that this repository does not implement compression algorithms; it is a statistical study to inform future development. For more details on the problem, see the original research post.

Output and Results

When you run the analysis, the program displays a graph of the run-length distribution on the screen and also saves it locally as run_distribution.pkl for further inspection.

Additionally, the program outputs statistics similar to the following example, which was generated using the main.hints file based on Bitcoin mainnet at block 916,874 (file size: 447 MB):

runs >= 9 (RLE candidates): 53,510,451x
bit packing with less than 64 bits happens: 52,540,220x
bit packing with more than 64 bits happens: 587,583x
total spent (0): 3,293,272,927
total unspent (1): 167,110,763

Graph output:

image

This plot shows how often consecutive runs of identical bits appear. The X-axis is the run length (number of consecutive equal bits), and the Y-axis is how many times runs of that length occur; both axes use a logarithmic scale (you can turn it off by pressing 'j' on graph view). Red points represent runs of bit 0 (spent) and green points represent runs of bit 1 (unspent). Each point answers: “how many times does a run of N identical bits appear?”

For example, at run length = 1, a green point near 10^8 means single-bit runs like 1 occur about 10^8 times, while a red point near 10^7 means single-bit runs like 0 occur about 10^7 times. At run length = 10, points correspond to sequences such as 0000000000 or 1111111111, and their vertical position gives the frequency of those exact-length runs.

Runs with more than 9 consecutive zeros are considered candidates for run-length encoding (RLE), which could provide significant compression gains. The statistics on bit packing with less or more than 64 bits give a sense of the size and sparsity of the data, helping to evaluate the effectiveness of different packing and compression strategies.

This analysis is intended as a first step toward selecting the most appropriate compression method for the hints file, based on real-world data characteristics.

Building the Project

To build and run this project in Python, follow these steps:

1. Install Python

Make sure you have Python 3.13.11 or newer installed. You can check your version with:

python --version

2. (Optional) Create a Virtual Environment

It is recommended to use a virtual environment to manage dependencies:

python -m venv .venv
source .venv/bin/activate

3. Install Dependencies

If your project has dependencies, list them in a requirements.txt file. Then run:

pip install -r requirements.txt

4. Run the Project

You can now run the analysis script. It accepts an optional parameter for the hint file to analyze. If not specified, it defaults to main.hints:

python hints_analyser.py [hintfile]

After running you will have a run_distribution.pkl file. You can plot the graphic at any time without running the hints_analyser again by executing:

python view_graph.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages