Skip to content

Comments

feat: prune InnerForest#1635

Open
drahnr wants to merge 20 commits intomainfrom
bernhard-cleanup-inner-forest-for-main
Open

feat: prune InnerForest#1635
drahnr wants to merge 20 commits intomainfrom
bernhard-cleanup-inner-forest-for-main

Conversation

@drahnr
Copy link
Contributor

@drahnr drahnr commented Feb 3, 2026

Important

Targeting main

What

Does cleanup the InnerForest, both the lookup tables/maps and the actual SmtForest.

Required for bounding the in-memory size growth.

Context

Related #1175

How

There are a few requirements:

  1. loading from DB is rather expensive
    • limit maximum loaded storage map slot entries from DB or return limit exceeded
    • use an LRU cache for the entries to avoid DB lookups for the recent history
  2. forest cleanup
  • in order to cleanup effectively we need store additional meta info for each of vault and storage map slots
    • per block changes of accounts -> avoid full iteration over all accounts
    • forest does deduplication of leaves, but does not refcount roots -> need to do this manually, we use *_refcount tables
  • retain the latest version of a storage map root, just like we do with the DB -> required for partial queries to create smt proofs / witnesses

Caveats

  • For proofs we only need the roots, not all entries of individual storage slots
  • We need storage map slot map entries only if we get a request for ::AllEntries , the SmtProof contains the relevant SmtLeaf itself
    • we don't have a way to access all entries from the current SmtForest for a specific key

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements periodic pruning of the InnerForest in-memory data structures to bound memory growth, addressing issue #1175 about pruning account storage maps and vault tables.

Changes:

  • Added pruning mechanism that retains only the last 50 blocks (configurable via HISTORICAL_BLOCK_RETENTION constant) of account vault and storage map data
  • Introduced block-indexed tracking structures (vault_roots_by_block and storage_slots_by_block) to efficiently identify entries eligible for pruning
  • Refactored state/mod.rs to cache block.body() calls and improve code readability

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
crates/store/src/inner_forest/mod.rs Added pruning logic with new tracking data structures, HISTORICAL_BLOCK_RETENTION constant, and helper methods to prune old vault and storage map roots
crates/store/src/inner_forest/tests.rs Added comprehensive test coverage for pruning functionality, including edge cases like empty forest, young chains, boundary conditions, and multiple accounts/slots
crates/store/src/state/mod.rs Refactored to cache block.body() result and renamed block_data to block_bytes for clarity; split forest write lock acquisition into separate line
crates/store/src/db/tests.rs Renamed test function to follow consistent naming convention using db_roundtrip_ prefix
CHANGELOG.md Added enhancement entry for periodic cleanup feature

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 60 out of 81 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@drahnr drahnr force-pushed the bernhard-cleanup-inner-forest-for-main branch 2 times, most recently from 72dad48 to a3be45b Compare February 6, 2026 16:48
@drahnr
Copy link
Contributor Author

drahnr commented Feb 6, 2026

Found one more bug, when using RPC with response ::AllEntries when the last updates was older than history depth kept.

The large changeset is mostly due to tests added.

Copy link
Collaborator

@igamigo igamigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but left a couple of comments. Not sure if they are fully correct (or if they land within the scope of this PR exactly), but worth reviewing for the long term at least.

@Mirko-von-Leipzig Mirko-von-Leipzig self-requested a review February 9, 2026 18:30
@Mirko-von-Leipzig
Copy link
Collaborator

Mirko-von-Leipzig commented Feb 10, 2026

@bobbinth it looks like the main branch still has the CI/test related protections on it (unless this is just some sort of github artifact)

@drahnr drahnr force-pushed the bernhard-cleanup-inner-forest-for-main branch from 35b80c3 to 098c805 Compare February 10, 2026 10:54
@drahnr drahnr marked this pull request as draft February 10, 2026 15:36
@drahnr drahnr marked this pull request as ready for review February 11, 2026 14:12
@drahnr drahnr force-pushed the bernhard-cleanup-inner-forest-for-main branch from 3e0f462 to e2c5289 Compare February 16, 2026 18:04
@drahnr drahnr requested a review from Copilot February 16, 2026 18:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

crates/store/src/state/mod.rs:1149

  • In the loop starting at line 1109, if a SlotData::All request is followed by a SlotData::MapKeys request and the MapKeys request fails (e.g., returns None at line 1112 or an error at line 1123), the forest_guard that was re-acquired at line 1143 will still be held when the error is returned. This keeps the write lock on the forest unnecessarily.

While Rust's RAII will drop the guard eventually, it's better to explicitly drop it before returning errors to avoid holding the lock during error handling. Consider wrapping the MapKeys logic in a scope that ensures the guard is dropped, or use an early break pattern.

        for StorageMapRequest { slot_name, slot_data } in storage_requests {
            let details = match &slot_data {
                SlotData::MapKeys(keys) => forest_guard
                    .get_storage_map_details_for_keys(
                        account_id,
                        slot_name.clone(),
                        block_num,
                        keys,
                    )
                    .ok_or_else(|| DatabaseError::StorageRootNotFound {
                        account_id,
                        slot_name: slot_name.to_string(),
                        block_num,
                    })?
                    .map_err(DatabaseError::MerkleError)?,
                SlotData::All => {
                    // we don't want to hold the forest guard for a prolonged time
                    drop(forest_guard);
                    // we collect all storage items, if the account is small enough or
                    // return `AccountStorageMapDetails::LimitExceeded`
                    let details = self
                        .db
                        .reconstruct_storage_map_from_db(
                            account_id,
                            slot_name.clone(),
                            block_num,
                            Some(
                                // TODO unify this with
                                // `AccountStorageMapDetails::MAX_RETURN_ENTRIES`
                                // and accumulated the limits
                                <QueryParamStorageMapKeyTotalLimit as QueryParamLimiter>::LIMIT,
                            ),
                        )
                        .await?;
                    forest_guard = self.forest.write().await;
                    details
                },
            };

            storage_map_details.push(details);
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

use miden_protocol::EMPTY_WORD;

// TODO this remains expensive with a large history until we implement pruning for DB
// columns
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already done in next

TreeId::new(lineage, block_num.as_u64())
}

fn storage_lineage_id(account_id: AccountId, slot_name: &StorageSlotName) -> LineageId {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Determines the LineageId for storage slots

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what a LineageId is?


### Enhancements

- Added cleanup of old account data from the in-memory forest ([#1175](https://github.com/0xMiden/miden-node/issues/1175)).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Added cleanup of old account data from the in-memory forest ([#1175](https://github.com/0xMiden/miden-node/issues/1175)).

match result {
Ok(details) => details,
Err(err) => {
drop(forest_guard);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to manually drop the guard?


// Use forest for storage map queries
let forest_guard = self.forest.read().await;
let mut forest_guard = self.forest.write().await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need write access? The only method used is get_ -- is that mutable?

Comment on lines +1131 to +1133
SlotData::All => {
// we don't want to hold the forest guard for a prolonged time
drop(forest_guard);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to split the request into a forest list, and a database list?

As in:

let with_keys = storage_requests.iter().filter(SlotData::MapKeys);
let with_all  = storage_requests.iter().filter(SlotData::All);

let mut forest_guard = ..;
for (slot, keys) in with_keys {
    forest_guard.get_storage...
}
drop(forest_guard);

for slot in with_all {
    self.db.reconstruct(...)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can.

Comment on lines +661 to +662
.checked_sub(HISTORICAL_BLOCK_RETENTION)
.unwrap_or(BlockNumber::GENESIS);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't have BlockNumber::saturating_sub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't unfortunately

///
/// The `LargeSmtForest` itself is truncated to drop historical versions beyond the cutoff.
///
/// Returns the number of _roots_ that got pruned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, but I'd rather have an indication how many we prune than flying blind.

TreeId::new(lineage, block_num.as_u64())
}

fn storage_lineage_id(account_id: AccountId, slot_name: &StorageSlotName) -> LineageId {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what a LineageId is?

use crate::errors::{DatabaseError, DatabaseSetupError, NoteSyncError, StateSyncError};
use crate::genesis::GenesisBlock;

const ROW_OVERHEAD_BYTES: usize = 2 * size_of::<Word>() + size_of::<u32>() + size_of::<u8>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overhead of what?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants