Skip to content

Add efficient ClusterTable-based split_frequencies C++ function for SplitFrequency(NULL)#252

Merged
ms609 merged 11 commits intomainfrom
copilot/update-splitfrequency-implementation
Feb 17, 2026
Merged

Add efficient ClusterTable-based split_frequencies C++ function for SplitFrequency(NULL)#252
ms609 merged 11 commits intomainfrom
copilot/update-splitfrequency-implementation

Conversation

Copy link
Contributor

Copilot AI commented Feb 16, 2026

SplitFrequency(NULL, forest) used an O(N·U·M) R-level loop comparing each unique split against each tree's splits. This adds a C++ implementation using the ClusterTable pairwise matching algorithm from consensus.cpp for efficient frequency counting.

C++ changes

  • src/consensus.cpp: Added calc_split_frequencies template — mirrors calc_consensus_tree but removes threshold filtering and early exit, iterates all trees, and uses std::map<vector<Rbyte>> for split deduplication across reference trees. Returns both the packed split matrix and per-split counts.
  • src/consensus.cpp: Added split_frequencies() entry point with the same stack/heap allocation strategy as consensus_tree().
  • src/RcppExports.cpp: Registered _TreeTools_split_frequencies wrapper and CallEntries entry.
  • R/RcppExports.R: Added R-side .Call wrapper.

R changes

  • R/Support.R: SplitFrequency() NULL path now calls split_frequencies() after RenumberTips + Preorder, then normalizes the ClusterTable output (filters trivial splits, complements bit patterns to match as.Splits convention where tip 1 is in-group).

Known remaining work

The R post-processing layer still has formatting gaps vs the as.Splits output:

  • Missing dimnames (node-number row names)
  • Some complement splits not normalizing correctly (e.g. 67 vs 18)
  • Minor ordering differences

The core C++ counting is correct; remaining issues are purely in the R format conversion.

Original prompt

This section details on the original issue you should resolve

<issue_title>Efficient implementation of SplitFrequency(NULL)</issue_title>
<issue_description>Computing the frequency of all splits would be MUCH faster if we used a ClusterTable approach.</issue_description>

<agent_instructions>Please update the SplitFrequency(NULL, forest) case so that it uses the efficient ClusterTable approach implemented in src/consensus.cpp. Essentially we want to calc_consensus_tree but without the early exit clause, so that we get counts of all splits in forest.

Recall that you will need to add any new API functions to RcppExports-manual.cpp as we can't use Rcpp::compileAttributes in this package.

Test cases that must pass tip tests/testthat/test-Support.R:
test_that("Node supports calculated correctly", {
treeSample <- list(
correct = ape::read.tree(text = "((((((A,B),C),D),E),F),out);"),
swapFE = ape::read.tree(text = "((((((A,B),C),D),F),E),out);"),
DEClade = ape::read.tree(text = "(((((A,B),C),(D,E)),F),out);"),
swapBC = ape::read.tree(text = "((((((A,C),B),D),E),F),out);"),
DbyA = ape::read.tree(text = "((((((A,D),C),B),E),F,G),out);")
)
expect_equal(
SplitFrequency(treeSample$correct, treeSample),
c("10" = 4, "11" = 4, "12" = 4, "13" = 3)
)

expect_error(SplitFrequency(NULL, treeSample), "must bear identical")
sameTips <- KeepTip(treeSample, TipLabels(treeSample$correct))
sameSplits <- do.call(c, as.Splits(sameTips))
expect_equal(SplitFrequency(sameTips),
structure(sameSplits[[!duplicated(sameSplits)]],
count = c(4, 4, 4, 3, 1, 1, 1, 1, 1))
)

monoSplit <- ape::read.tree(text = "((a, b, c, d), (e, f, g));")
expect_equal(SplitFrequency(list(monoSplit)),
structure(as.Splits(monoSplit), count = 1))

Internal nodes on each side of root

balanced <- ape::read.tree(text="((D, (E, (F, out))), (C, (A, B)));")
freq <- SplitFrequency(balanced, treeSample)
expect_equal(freq,
c("9" = 4, "10" = 4, "11" = 4, "12" = 4, "13" = 3)[names(freq)])

})
</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: ms609 <1695515+ms609@users.noreply.github.com>
Copilot AI changed the title [WIP] Update SplitFrequency(NULL) to use ClusterTable approach Add efficient ClusterTable-based split_frequencies C++ function for SplitFrequency(NULL) Feb 16, 2026
Copilot AI requested a review from ms609 February 16, 2026 13:44
@ms609 ms609 marked this pull request as ready for review February 16, 2026 15:42
@codecov
Copy link

codecov bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 94.93671% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.19%. Comparing base (2d350fe) to head (2174acf).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/consensus.cpp 90.90% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #252      +/-   ##
==========================================
- Coverage   96.20%   96.19%   -0.02%     
==========================================
  Files          75       75              
  Lines        5481     5618     +137     
==========================================
+ Hits         5273     5404     +131     
- Misses        208      214       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Feb 16, 2026

⚠️ This benchmark result is outdated. See the latest comment below.

Performance benchmark results

Call Status Change Time (ms)
as.Splits(bigTrees) ⚪ NSD 5.95% 24.8 →
25.2, 22
as.Splits(someTrees) ⚪ NSD 1.87% 11.7 →
11.5, 11.4
Consensus(forest1k.888, check = FALSE) ⚪ NSD 0.94% 96.6 →
95.9, 95.6
Consensus(forest201.80, check = FALSE) ⚪ NSD -0.76% 4.01 →
4.04, 4.07
Consensus(forest21.260, 0.5, FALSE) ⚪ NSD 0.23% 1.22 →
1.22, 1.22
Consensus(forest21.260) ⚪ NSD 0.36% 1.24 →
1.24, 1.24
Consensus(forestMaj, 0.5, FALSE) ⚪ NSD 1.44% 3.03 →
2.99, 2.98
DropTip(tr2000, 5) ⚪ NSD 0.71% 17 →
16.9, 16.9
DropTip(tr80, 5) ⚪ NSD -0.79% 0.105 →
0.104, 0.107
DropTip(unlen2k, 5) 🟠 Slower 🙁 -39.17% 0.207 →
0.217, 0.295
DropTip(unlen80, 5) ⚪ NSD -2.94% 0.0398 →
0.0405, 0.0414
lapply(bigSplits, as.phylo) 🟣 ~same 0.87% 31.7 →
31.4, 31.4
lapply(someSplits, as.phylo) ⚪ NSD -0.64% 14.1 →
14.2, 14.1
PathLengths(tr2000, full = TRUE) ⚪ NSD 0.62% 16.4 →
16.2, 16.4
PathLengths(tr80, full = TRUE) 🟠 Slower 🙁 -230.14% 0.0349 →
0.113, 0.116
PathLengths(tr80Unif, full = TRUE) 🟠 Slower 🙁 -279.11% 0.0307 →
0.116, 0.117
RootTree(tr2000, 5) ⚪ NSD 2.38% 0.394 →
0.375, 0.394
RootTree(tr80, c("t3", "t36")) ⚪ NSD -1.58% 0.0691 →
0.0688, 0.0709
RootTree(tr80, "t3") ⚪ NSD -0.59% 0.0494 →
0.0489, 0.0502
RootTree(tr80, "t30") ⚪ NSD -0.57% 0.0497 →
0.0491, 0.0506
RootTree(unlen2k, 5) ⚪ NSD -1.03% 0.325 →
0.329, 0.329
RootTree(unlen80, c("t3", "t36")) ⚪ NSD -1.25% 0.064 →
0.0636, 0.0657
RootTree(unlen80, "t3") ⚪ NSD -0.49% 0.043 →
0.0428, 0.0435
RootTree(unlen80, "t30") ⚪ NSD -1.99% 0.0428 →
0.0432, 0.0441
TreeDist::RobinsonFoulds(forest201.80) ⚪ NSD 0.39% 16.7 →
16.6, 16.6
TreeDist::RobinsonFoulds(forest21.888) ⚪ NSD 0.04% 3.31 →
3.29, 3.33
TreeTools:::path_lengths(tr80$edge, tr80$edge.length, FALSE) ⚪ NSD -1.37% 0.103 →
0.104, 0.105
TreeTools:::postorder_order(bal40) ⚪ NSD 1.7% 0.00176 →
0.0017, 0.00175
TreeTools:::postorder_order(bal40k) ⚪ NSD 0.26% 0.434 →
0.432, 0.433
TreeTools:::postorder_order(dbal40) ⚪ NSD 1.12% 0.00187 →
0.00182, 0.00187
TreeTools:::postorder_order(dbal40k) ⚪ NSD 0.42% 2.4 →
2.38, 2.41
TreeTools:::postorder_order(dpec40) ⚪ NSD 1.61% 0.0031 →
0.00305, 0.00305
TreeTools:::postorder_order(dpec40k) 🟣 ~same -0.17% 3610 →
3620, 3620
TreeTools:::postorder_order(drnd80) ⚪ NSD 0.97% 0.00517 →
0.00513, 0.00511
TreeTools:::postorder_order(nbal40) ⚪ NSD 0.5% 0.0022 →
0.00217, 0.00219
TreeTools:::postorder_order(nbal40k) ⚪ NSD -0.34% 2.42 →
2.41, 2.45
TreeTools:::postorder_order(npec40) ⚪ NSD 1.16% 0.00337 →
0.00334, 0.00333
TreeTools:::postorder_order(npec40k) 🟣 ~same 0.34% 3650 →
3640, 3640
TreeTools:::postorder_order(nrnd80) ⚪ NSD 0.2% 0.00566 →
0.00564, 0.00565
TreeTools:::postorder_order(pec40) ⚪ NSD 3.42% 0.00175 →
0.00169, 0.00169
TreeTools:::postorder_order(pec40k) ⚪ NSD -0.44% 0.431 →
0.43, 0.437
TreeTools:::postorder_order(rnd80) ⚪ NSD 4.49% 0.00222 →
0.00214, 0.0021

@github-actions
Copy link

github-actions bot commented Feb 16, 2026

⚠️ This benchmark result is outdated. See the latest comment below.

Performance benchmark results

Call Status Change Time (ms)
as.Splits(bigTrees) ⚪ NSD 4.83% 23.9 →
20.8, 23.6
as.Splits(someTrees) ⚪ NSD -1.89% 11 →
11.3, 11.1
Consensus(forest1k.888, check = FALSE) ⚪ NSD -3.08% 95.3 →
101, 96.7
Consensus(forest201.80, check = FALSE) ⚪ NSD -0.47% 4.01 →
4.07, 4.01
Consensus(forest21.260, 0.5, FALSE) ⚪ NSD -0.55% 1.23 →
1.23, 1.23
Consensus(forest21.260) ⚪ NSD -0.11% 1.24 →
1.24, 1.25
Consensus(forestMaj, 0.5, FALSE) ⚪ NSD -1.03% 2.93 →
2.96, 2.95
DropTip(tr2000, 5) ⚪ NSD -0.51% 16.9 →
17, 17
DropTip(tr80, 5) ⚪ NSD -1.01% 0.104 →
0.106, 0.103
DropTip(unlen2k, 5) ⚪ NSD -37.07% 0.208 →
0.289, 0.216
DropTip(unlen80, 5) ⚪ NSD 0.84% 0.0407 →
0.0403, 0.0405
lapply(bigSplits, as.phylo) 🟣 ~same -0.84% 30.8 →
31.1, 31
lapply(someSplits, as.phylo) 🟣 ~same -2.82% 13.5 →
13.9, 13.7
PathLengths(tr2000, full = TRUE) ⚪ NSD -0.96% 16.3 →
16.4, 16.4
PathLengths(tr80, full = TRUE) ⚪ NSD 2.56% 0.0294 →
0.113, 0.0277
PathLengths(tr80Unif, full = TRUE) ⚪ NSD 2.73% 0.0309 →
0.115, 0.0294
RootTree(tr2000, 5) ⚪ NSD -0.33% 0.375 →
0.377, 0.377
RootTree(tr80, c("t3", "t36")) ⚪ NSD -1.83% 0.0685 →
0.0702, 0.0692
RootTree(tr80, "t3") ⚪ NSD -3.16% 0.0478 →
0.0495, 0.0491
RootTree(tr80, "t30") ⚪ NSD -2.03% 0.0485 →
0.0495, 0.0494
RootTree(unlen2k, 5) ⚪ NSD 0.13% 0.331 →
0.33, 0.33
RootTree(unlen80, c("t3", "t36")) ⚪ NSD -1.8% 0.0626 →
0.0638, 0.0635
RootTree(unlen80, "t3") ⚪ NSD -1.58% 0.0424 →
0.0433, 0.0429
RootTree(unlen80, "t30") ⚪ NSD -1.91% 0.0424 →
0.0433, 0.0432
TreeDist::RobinsonFoulds(forest201.80) ⚪ NSD 0.17% 16.1 →
16.2, 16.1
TreeDist::RobinsonFoulds(forest21.888) ⚪ NSD -1.56% 3.28 →
3.32, 3.34
TreeTools:::path_lengths(tr80$edge, tr80$edge.length, FALSE) ⚪ NSD -0.1% 0.103 →
0.105, 0.102
TreeTools:::postorder_order(bal40) ⚪ NSD 1.73% 0.00173 →
0.0017, 0.00171
TreeTools:::postorder_order(bal40k) ⚪ NSD -1.97% 0.424 →
0.435, 0.429
TreeTools:::postorder_order(dbal40) ⚪ NSD 0.6% 0.00184 →
0.00182, 0.00183
TreeTools:::postorder_order(dbal40k) ⚪ NSD -3.03% 2.32 →
2.41, 2.37
TreeTools:::postorder_order(dpec40) ⚪ NSD 0.62% 0.00308 →
0.00305, 0.00306
TreeTools:::postorder_order(dpec40k) ⚪ NSD 0.17% 3620 →
3620, 3610
TreeTools:::postorder_order(drnd80) ⚪ NSD -0.2% 0.00513 →
0.00512, 0.00516
TreeTools:::postorder_order(nbal40) ⚪ NSD 0.46% 0.00217 →
0.00214, 0.00218
TreeTools:::postorder_order(nbal40k) 🟣 ~same -2.77% 2.34 →
2.41, 2.4
TreeTools:::postorder_order(npec40) ⚪ NSD 0.63% 0.00337 →
0.00334, 0.00335
TreeTools:::postorder_order(npec40k) ⚪ NSD -0.02% 3620 →
3620, 3620
TreeTools:::postorder_order(nrnd80) ⚪ NSD 0.35% 0.00568 →
0.00566, 0.00566
TreeTools:::postorder_order(pec40) ⚪ NSD 1.67% 0.00173 →
0.0017, 0.00171
TreeTools:::postorder_order(pec40k) ⚪ NSD -1.41% 0.424 →
0.433, 0.425
TreeTools:::postorder_order(rnd80) ⚪ NSD 0.46% 0.00216 →
0.00216, 0.00215

Repository owner deleted a comment from github-actions bot Feb 16, 2026
Repository owner deleted a comment from github-actions bot Feb 17, 2026
@ms609 ms609 merged commit 7e5f175 into main Feb 17, 2026
17 checks passed
@ms609 ms609 deleted the copilot/update-splitfrequency-implementation branch February 17, 2026 08:16
@github-actions
Copy link

Performance benchmark results

Call Status Change Time (ms)
as.Splits(bigTrees) ⚪ NSD -1.56% 24 →
22.2, 25.1
as.Splits(someTrees) ⚪ NSD -0.2% 11.4 →
11.4, 11.3
Consensus(forest1k.888, check = FALSE) ⚪ NSD -4.48% 95.4 →
102, 97.8
Consensus(forest201.80, check = FALSE) ⚪ NSD -0.89% 4.05 →
4.11, 4.06
Consensus(forest21.260, 0.5, FALSE) ⚪ NSD -0.79% 1.22 →
1.24, 1.23
Consensus(forest21.260) ⚪ NSD -1.15% 1.24 →
1.27, 1.24
Consensus(forestMaj, 0.5, FALSE) ⚪ NSD -1.13% 2.97 →
3.01, 3
DropTip(tr2000, 5) ⚪ NSD -0.5% 16.6 →
16.5, 16.9
DropTip(tr80, 5) ⚪ NSD 0.77% 0.106 →
0.104, 0.106
DropTip(unlen2k, 5) ⚪ NSD 1.03% 0.213 →
0.207, 0.216
DropTip(unlen80, 5) ⚪ NSD 0.29% 0.041 →
0.0404, 0.0412
lapply(bigSplits, as.phylo) ⚪ NSD -1.07% 31.1 →
31, 31.5
lapply(someSplits, as.phylo) ⚪ NSD -0.79% 14 →
14.2, 13.9
PathLengths(tr2000, full = TRUE) ⚪ NSD -0.26% 15.9 →
15.7, 16
PathLengths(tr80, full = TRUE) ⚪ NSD -273.64% 0.0294 →
0.111, 0.0322
PathLengths(tr80Unif, full = TRUE) ⚪ NSD 2.78% 0.0317 →
0.028, 0.0319
RootTree(tr2000, 5) 🟣 ~same -5.26% 0.376 →
0.394, 0.396
RootTree(tr80, c("t3", "t36")) ⚪ NSD -1.58% 0.0703 →
0.0712, 0.0716
RootTree(tr80, "t3") ⚪ NSD -2.38% 0.049 →
0.0502, 0.0502
RootTree(tr80, "t30") ⚪ NSD -2.23% 0.0495 →
0.0509, 0.0503
RootTree(unlen2k, 5) ⚪ NSD 1.55% 0.33 →
0.326, 0.323
RootTree(unlen80, c("t3", "t36")) ⚪ NSD -0.11% 0.065 →
0.0654, 0.0647
RootTree(unlen80, "t3") ⚪ NSD 0.05% 0.0433 →
0.0433, 0.0433
RootTree(unlen80, "t30") ⚪ NSD -0.23% 0.0436 →
0.044, 0.0433
TreeDist::RobinsonFoulds(forest201.80) ⚪ NSD -0.39% 16.3 →
16.6, 16
TreeDist::RobinsonFoulds(forest21.888) ⚪ NSD -0.58% 3.29 →
3.33, 3.28
TreeTools:::path_lengths(tr80$edge, tr80$edge.length, FALSE) ⚪ NSD 1.42% 0.103 →
0.1, 0.103
TreeTools:::postorder_order(bal40) ⚪ NSD 1.71% 0.00175 →
0.0017, 0.00174
TreeTools:::postorder_order(bal40k) ⚪ NSD 1.28% 0.433 →
0.424, 0.432
TreeTools:::postorder_order(dbal40) ⚪ NSD 1.02% 0.00186 →
0.00183, 0.00186
TreeTools:::postorder_order(dbal40k) ⚪ NSD 1.56% 2.38 →
2.34, 2.38
TreeTools:::postorder_order(dpec40) ⚪ NSD 0% 0.00307 →
0.00306, 0.00307
TreeTools:::postorder_order(dpec40k) 🟣 ~same 0.08% 3620 →
3610, 3610
TreeTools:::postorder_order(drnd80) ⚪ NSD 0.39% 0.00519 →
0.00516, 0.00517
TreeTools:::postorder_order(nbal40) ⚪ NSD 0.46% 0.0022 →
0.00217, 0.00221
TreeTools:::postorder_order(nbal40k) ⚪ NSD 1.17% 2.42 →
2.37, 2.4
TreeTools:::postorder_order(npec40) ⚪ NSD -0.9% 0.00334 →
0.00338, 0.00337
TreeTools:::postorder_order(npec40k) ⚪ NSD 0.01% 3630 →
3620, 3630
TreeTools:::postorder_order(nrnd80) ⚪ NSD 0.35% 0.00572 →
0.00571, 0.00569
TreeTools:::postorder_order(pec40) ⚪ NSD 1.15% 0.00174 →
0.00173, 0.00172
TreeTools:::postorder_order(pec40k) ⚪ NSD 0.76% 0.43 →
0.425, 0.429
TreeTools:::postorder_order(rnd80) ⚪ NSD 0.91% 0.00219 →
0.00215, 0.00218

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Efficient implementation of SplitFrequency(NULL)

2 participants