ClusterSearcher and RecoCluster by flemmons · Pull Request #365 · ANNIEsoft/ToolAnalysis

flemmons · 2025-10-27T16:02:25Z

-Updated DigitBuilder to be more consistent with and independent of ClusterFinder's corresponding functions
--Added some extra features, such as hit LAPPD count and strip-by-strip LAPPD hits. (some of these are considered experimental/preliminary)
-Added ClusterSearcher to replace ClusterFinder using RecoDigit and RecoCluster classes
-Updated RecoDigit and RecoCluster classes for corresponding use and new features, such as various cluster parameters
-Added NeutronCheck tool as output for RecoCluster information
-Added sample toolchain configfolder for using the new tools

…ClusterFinder's corresponding functions --Added some extra features, such as hit LAPPD count and strip-by-strip LAPPD hits. -Added ClusterSearcher to replace ClusterFinder using RecoDigit and RecoCluster classes -Updated RecoDigit and RecoCluster classes for corresponding use and new features, such as various cluster parameters -Added NeutronCheck tool as output for RecoCluster information -Added sample toolchain configfolder for using the new tools

jminock · 2025-11-06T15:02:26Z

UserTools/DigitBuilder/DigitBuilder.cpp

+    }
+
+
+    /*while (!file_singlepe.eof()) {


Please delete the old commented out code. It unnecessarily clutters the file.

jminock · 2025-11-06T15:10:07Z

DataModel/RecoCluster.cpp

+            //}
+        }
+    }
+    //FIXME: Need a method to have the 123 be equal to the number of operating detectors


Should this be fixed now?

jminock · 2025-11-06T15:17:47Z

DataModel/RecoCluster.cpp

+        double max_angle = 0;
+        double angle;
+        Position i_position, j_position;
+        for (int i = 0; i < fDigitList.size(); i++) {


There are many separate loops of the same size with the same index going on. Could we combine them? If this is a DataModel function, it could be used across multiple Tools so it would be best to improve efficiency within reason.

S81D · 2025-11-12T17:32:16Z

configfiles/ClusterSearcher/LoadWCSimConfig

+LappdNumStrips 60            ## num channels to construct from each LAPPD
+LappdStripLength 100         ## relative x position of each LAPPD strip, for dual-sided readout [mm]
+LappdStripSeparation 10      ## stripline separation, for calculating relative y position of each LAPPD strip [mm]
+PMTMask configfiles/BeamClusterAnalysisMC/DeadPMTIDs_p2v7.txt ## Which PMTs should be masked out? / are dead?


You can point to the most up to date path configfiles/LoadWCSim/DeadPMTIDs_p2v7.txt

S81D · 2025-11-12T17:52:54Z

UserTools/ClusterSearcher/ClusterSearcher.cpp

+    if (!fisMC){
+      int pmtid = recoDigit->GetDetectorID();
+      unsigned long chankey = pmt_tubeid_to_channelkey[pmtid];
+      if (pmt_gains[chankey]>0) qep/=pmt_gains[chankey];


In line 77 (m_variables.Get("SinglePEGains",singlePEgains);) you grab the SPE gains from the gains file then use it to convert from nQ --> pe for data. It doesn't look like you ever specify the gains file in the Config file. Could you add it to ClusterSearcherConfig? Alternatively you can grab that directly from the Store since the LoadGeometry tool populates it for downstream tools (like this one) to use. See PhaseIITreeMaker for an example.

The current configuration in the provided toolchain is for use on MC simulations, which do not require the conversion. So this variable is not necessary within the configuration. I will add it to the readme file.
Though as I start looking at Data in the next couple of weeks, I'll likely move into taking the gains from the store in the next update to this tool. Thank you for the suggestion.

S81D · 2025-11-12T17:55:30Z

UserTools/NeutronCheck/NeutronCheck.cpp

+            if(fMCParticles->at(i).GetPdgCode()==2112 && fMCParticles->at(i).GetParentPdg()==0) {
+                fTrueNeutronMult++;
+
+                if(fMCParticles->at(i).GetStopTime()>10000) fTrueNeutronDelayed++;


It might be wise to make this a configurable variable in case someone needs to look for neutrons in a different region of interest. Especially considering 10us is used extensively throughout the code.

S81D · 2025-11-12T17:56:58Z

UserTools/NeutronCheck/NeutronCheck.cpp

+        //true_Emu*=1000;  //GeV->MeV to match other energies(unneeded, possibly)
+
+        double theta = truevtx->GetDirection().GetTheta();
+        double p = sqrt(pow(true_Emu,2)-pow(105.7,2));


@jminock is there a way to grab the momentum and Q^2 from the Store? I thought the LoadGenieEvent tool will store those values for use.

There absolutely is via: m_data->Stores["GenieInfo"]->Get("EventQ2",TrueQ2). Magnitude of muon momentum is not saved to the Store so this is the intended method to get the true muon momentum

jminock

Please make the changes listed. I worry with the large number of pointers if memory is being handled properly. I don't know how necessary all of the pointers are, and I am not enough of an expert on it to make a certain statement. I recommend double checking all are fitting general best practice before this gets sent off to Level 0 review

jminock · 2025-11-13T02:35:35Z

DataModel/RecoCluster.cpp

+    double maxCharge=0;
+    int tempPDG = -5;
+    for (RecoDigit* i_digit : fDigitList) {
+        cout<<"Digit parent list size: "<<i_digit->GetParents().size()<<endl;


Please add verbosity conditions to the cout's throughout this file and others. Preferably Log functions if you would like Marcus to give approval down the road

jminock · 2025-11-13T02:44:46Z

UserTools/ClusterSearcher/ClusterSearcher.cpp

+
+  // Default Clustering parameters
+  fConfig = ClusterSearcher::kPulseHeightAndClusters;
+  fPmtMinPulseHeight = 5.0;     // minimum pulse height (PEs) //Ioana... initial 1.0


The configuration file should take care of the initialization. Leave the default parameters in the configuration file. If any of these NEED to be initialized outside of the configuration file, please do so in the header file upon declaration. It cuts down on clutter

jminock · 2025-11-13T02:46:26Z

UserTools/ClusterSearcher/ClusterSearcher.cpp

+    double temp_gain;
+    while (!file_singlepe.eof()){
+      file_singlepe >> temp_chankey >> temp_gain;
+      if (file_singlepe.eof()) break;


This line seems unnecessary. While loop already includes this condition

i think it would also prevent the emplace call for the last line of the file.

jminock · 2025-11-13T02:51:49Z

UserTools/ClusterSearcher/ClusterSearcher.cpp

+
+      carryon = 1;
+      while( carryon ){
+        carryon = 0;


Why is this for loop inside of a while loop?

Seemingly because only one of the two loops had a well-defined range over which to run. This code was inherited from the HitCleaner tool, so I can't answer the why of that decision with complete certainty, but the code flows to my reading, and works as needed.
This seems like a pet peeve, more than a problem, so for the time being, I'll stand by it as is.

jminock · 2025-11-13T02:54:19Z

UserTools/DigitBuilder/DigitBuilder.cpp


  /////////////////// Usefull header ///////////////////////
-  if(verbosity) cout<<"Initializing Tool DigitBuilder"<<endl;
+  cout<<"Initializing Tool DigitBuilder"<<endl;


I think this has been mentioned before. Please include verbosity conditions, preferably as Log functions

DataModel/RecoCluster.cpp

marc1uk · 2025-11-12T23:30:42Z

DataModel/RecoCluster.cpp

+    double sumY=0;
+    Position pos;
+    //std::vector<RecoDigits> hullDigits;
+    std::sort(fDigitList.begin(), fDigitList.end());


is sorting a vector of pointers with the default comparator going to do what you expect? I see you have a operator< for RecoDigit, but think you still need to provide a comparitor lambda to perform the pointer de-reference:

std::sort(fDigitList.begin(), fDigitList.end(), [](const RecoDigit* a, const RecoDigit* b){ return a < b; });

I'm... just going to remove the RecoCluster::ConvexHull function, at least for now. It's not actually useful right now.

marc1uk · 2025-11-13T12:23:51Z

UserTools/ClusterSearcher/ClusterSearcher.cpp

+  }
+
+  if( !fgClusterSearcher ){
+    assert(fgClusterSearcher);


this seems redundant. If the previous allocation didn't succeed it would have thrown bad_alloc.

marc1uk · 2025-11-13T12:24:31Z

UserTools/ClusterSearcher/ClusterSearcher.cpp

+  }
+
+  if( fgClusterSearcher ){
+


marc1uk · 2025-11-13T12:32:47Z

UserTools/ClusterSearcher/ClusterSearcher.cpp

+  }
+
+  if (!fisMC){
+    ifstream file_singlepe(singlePEgains.c_str());


recommend adding a check file_singlepe.is_open() to catch typos in filename.

marc1uk · 2025-11-13T17:29:33Z

UserTools/ClusterSearcher/ClusterSearcher.cpp

+
+  // run clustering algorithm
+  // ========================
+  std::vector<RecoCluster*>* ClusterList = (std::vector<RecoCluster*>*)(this->RecoClusters(DigitList));


again shouldn't need the c-style cast - in fact the local variable ClusterList is just an alias for the member variable fRecoClusters, so seems redundant. The aliasing in this Tool makes it hard to track objects.

marc1uk · 2025-11-13T17:31:04Z

UserTools/ClusterSearcher/ClusterSearcher.cpp

+  // ===================
+  for(int idigit=0; idigit<int(fSelectByNeighbours->size()); idigit++ ){
+    RecoDigit* recoDigit = (RecoDigit*)(fSelectByNeighbours->at(idigit));
+    RecoClusterDigit* clusterDigit = new RecoClusterDigit(recoDigit);


why do the digits need to be on the heap?

marc1uk · 2025-11-13T18:45:56Z

UserTools/DigitBuilder/DigitBuilder.cpp

+        //Loop over lines, collect all detector data (should only be one line here)
+        while (getline(file_singlepe, line)) {
+            if (verbosity > 3) std::cout << line << std::endl; //has our stuff;
+            if (line.find("#") != std::string::npos) continue;


be wary that this will skip lines with trailing comments, as well as commented-out lines.
perhaps if(line.empty() || line[0]=='#') continue is a safer/clearer check

marc1uk · 2025-11-13T18:48:36Z

UserTools/DigitBuilder/DigitBuilder.cpp

+            if (verbosity > 3) std::cout << line << std::endl; //has our stuff;
+            if (line.find("#") != std::string::npos) continue;
+            std::vector<std::string> DataEntries;
+            boost::split(DataEntries, line, boost::is_any_of(","), boost::token_compress_on);


i think token_compress_on merges repeated tokens? Is this desirable? If there are repeated tokens, doesn't this suggest an empty column, and by compressing that, your later columns will be shifted?

marc1uk · 2025-11-13T18:50:18Z

UserTools/DigitBuilder/DigitBuilder.cpp

+              Log("This HIT'S TIME AND CHARGE: " + to_string(ahit.GetTime()) + ", " + to_string(ahit.GetCharge()),v_debug,verbosity);
            double hitTime = ahit.GetTime()*1.0;
-          	if(hitTime>-10 && hitTime<40) {
+          	if(hitTime>-10 && hitTime<70) {


better for magic numbers to be configuration variables, especially if they are subject to change

-Changed RecoCluster and RecoDigit lists in several tools to be vectors of objects, rather than vectors of pointers, to avoid memory complications -removed unused convex hull function from RecoCluster class -simplified CalcAS function in RecoCluster class by consolidating for loops -removed several instances of commented-out code from older versions -removed several debug outputs -altered hard-coded time window values in several instances to rely on configuration input -added use of the true Q2 value from the GenieInfo store to NeutronCheck's output. -removed several uncontrolled cout lines, and replaced useful ones with Log. -Tidied the Instance() function of ClusterSearcher -Added vertex information to NeutronCheck

jminock · 2026-01-13T15:49:30Z

Thank you @flemmons for the updates! Unfortunately, this branch has conflicts with the main branch that must be resolved in order to be merged. I will wait for the conflicts to be resolved and for the workflow check to be performed such that ToolAnalysis is confirmed to compile before I review the actual changes to the files

flemmons · 2026-01-14T02:41:15Z

Conflicts were just Factory and Unity listing different added tools. I've cleared them.

jminock · 2026-01-14T14:21:40Z

Okay, but could you still resolve those conflicts? Also, it's great that ToolAnalysis can compile in your workspace, but it would be better to get confirmation that ToolAnalysis can compile in a general workspace (on GitHub via workflow). Given that this PR is self contained, there is no reason to not resolve these conflicts. I will look at the other files in the meantime.

jminock · 2026-02-08T23:01:43Z

configfiles/ClusterSearcher/EventSelectorConfig

+MRDRecoCut 0
+RecoPMTVolCut 0
+RecoFVCut 0 
+ArgonFV 0


Not a hold up for merger, but what is ArgonFV? That isn't expected for current the current EventSelector

ArgonFV is a holdover from a previous experimental study. It...shouldn't be there. Good catch.

jminock · 2026-02-08T23:06:50Z

configfiles/ClusterSearcher/LoadGenieEventConfig

@@ -0,0 +1,12 @@
+verbosity 0
+FluxVersion 0  # use 0 to load genie files based on bnb_annie_0000.root etc files


The flux files corresponding to version 0 are very outdated. None of the GENIE files I produced use those form of flux files. I don't even know where those flux files are. Pointing this out more as a courtesy for your consideration

That's good to know... And likely explains why I've seen strange behavior in the extracted true q2 values I've been working with since adding them from LoadGenieEvent. I appreciate it.

jminock · 2026-02-08T23:12:24Z

configfiles/ClusterSearcher/ToolsConfig

+MCParticleProperties MCParticleProperties ./configfiles/ClusterSearcher/MCParticlePropertiesConfig
+DigitBuilder DigitBuilder ./configfiles/ClusterSearcher/DigitBuilderConfig
+ClusterSearcher ClusterSearcher ./configfiles/ClusterSearcher/ClusterSearcherConfig
+ClusterSearcher2 ClusterSearcher ./configfiles/ClusterSearcher/ClusterSearcher2Config


You have to use ClusterSearcher twice? Would a second run of ClusterSearcher overwrite the first use? Sorry, just confused why this is here. Also, how does the information get written out to a file? Does NeutronCheck double as an output file maker? Tools are meant to be singular in purpose, and there already exists output file makers (PhaseIITreeMaker)

ClusterSearcher is designed to be used multiple times, yes. It will add the new clusters to the prior list. This enables different sorts of clusters to be made for muli-level analysis. The current setup is running ClusterSearcher to make concentrated Cherenkov ring-like clusters to analyze the initial muon emission for vertexing and CC verification, and then ClusterSearcher2 collects other clusters without any spatial density requirements to be tested for neutron identification. The different types of clusters are separated by the ClusterMode tag.

As to NeutronCheck, it was meant to be more of a debug tool, akin to VertexGeometryCheck but for the neutron analysis. Its scope grew as I needed more relevant values, but it still doesn't represent a final-state of analysis. In the current ToolAnalysis, it's the only 'end of toolchain' tool that uses RecoClusters, and so adding it was the best option for this toolchain. However, I'm also not sure that PhaseIITreeMaker should be the be-all, end-all output tool for all of ANNIE. Having all output centered in a single tool will make it very cumbersome for focused analyses that only need certain variables. It may be worth considering having separate output tools for separate analyses. As I finalize the neutron multiplicity analysis, there will likely be a more concrete output, whether that's adding the important variables to PhaseIITreeMaker, advancing NeutronCheck to a more immediately useful form, or something else.

Gotcha, please document this and how to use ClusterSearcher in a central, accessible place.

As far as the opinion on separate output Tools: having multiple Tools with similar functionality that could be altered via a configuration file is against the intended direction of ANNIE analysis and software. Different sets of Tools for different analyses will have duplicated work, divergent branches, and different standards for different analyses. Yes, PhaseIITreeMaker is cumbersome to work with. But I can guarentee standardizing different Tools with the sane functionality is much more cumbersome. There already is ANNIETreeMaker as another output file maker. I'm not opposed to that Tool being the new standard, but there should be a singular standard across analyses. Making a new separate output Tool will result in: it continuing to advance and develop after you, diverging from other analysis ToolChains, silo'ing off neutron analysis from everything else; or it getting scrapped in favor of shared output in a Tool, losing important work done now, making unnecessary work in the future. Do what you need for your present analysis, but please keep in mind that existing resources are to here to help each of us and the work we do serves ourselves, each other, and those who come after.

That is detailed in the ClusterSearcher tool's readme file.

You're right. I'll plan to fold the final analysis outputs into PhaseIITreeMaker or the like as I complete them, and hold NeutronCheck to the level of a debugging check of mid-level values. But, in that role, and for the time being, it is useful, and should not hold back this pull request. Take it as a demonstration of what ClusterSearcher is doing, more than any analysis output.

jminock · 2026-02-08T23:17:22Z

UserTools/HitCleaner/HitCleaner.cpp

+        //}
+    }
+    //FIXME: Need a method to have the 123 be equal to the number of operating detectors
+    double ucharge_balance = sqrt((total_QSquared) / (total_Q * total_Q) - (1. / 123.));


123 is a magic number. While it doesn't impact functionality for now, it can be troublesome for documentation and updating it. What does 123 represent? What is an "operating detector"?

jminock

Please address all the comments. I left some that are more general questions that would be nice to have accessible documentation addressing, and shouldn't be an issue to create. There are some comments that ask about memory. Please address them as they do impact functionality

jminock · 2026-02-08T23:22:45Z

UserTools/HitCleaner/HitCleaner.cpp

-    for(int idigit=0; idigit<myCluster->GetNDigits(); idigit++ ){
-      RecoDigit* myDigit = (RecoDigit*)(myCluster->GetDigit(idigit));
+    for(int idigit=0; idigit<myCluster.GetNDigits(); idigit++ ){
+      RecoDigit* myDigit = new RecoDigit;


You are creating pointers in a nested loop. How are you freeing the memory once they're done? This looks like it can and will cause memory issues.

jminock · 2026-02-08T23:24:58Z

UserTools/NeutronCheck/NeutronCheck.cpp

Why does this Tool exist when there already exists output file maker Tools?

As to NeutronCheck, it was meant to be more of a debug tool, akin to VertexGeometryCheck but for the neutron analysis. Its scope grew as I needed more relevant values, but it still doesn't represent a final-state of analysis. In the current ToolAnalysis, it's the only 'end of toolchain' tool that uses RecoClusters, and so adding it was the best option for this toolchain. However, I'm also not sure that PhaseIITreeMaker should be the be-all, end-all output tool for all of ANNIE. Having all output centered in a single tool will make it very cumbersome for focused analyses that only need certain variables. It may be worth considering having separate output tools for separate analyses. As I finalize the neutron multiplicity analysis, there will likely be a more concrete output, whether that's adding the important variables to PhaseIITreeMaker, advancing NeutronCheck to a more immediately useful form, or something else.

jminock self-requested a review October 30, 2025 13:03

jminock self-assigned this Oct 30, 2025

S81D self-assigned this Oct 30, 2025

jminock reviewed Nov 6, 2025

View reviewed changes

S81D reviewed Nov 12, 2025

View reviewed changes

jminock suggested changes Nov 13, 2025

View reviewed changes

marc1uk reviewed Nov 13, 2025

View reviewed changes

jminock added the waiting for submitter label Dec 4, 2025

jminock added the Conflicts label Jan 13, 2026

Merge branch 'Application' into Application

1bae850

jminock added bug and removed Conflicts labels Jan 16, 2026

flemmons added 2 commits January 23, 2026 19:08

Removed ClusterCA from NeutronCheck to fix bug.

8a6499f

Modified HitCleaner for non-pointer Digits list.

0236d07

jminock removed the bug label Jan 26, 2026

Update/fix to ClusterSearcher sample toolchain.

efe448a

jminock added Ready for Software meeting discussion and removed waiting for submitter labels Jan 27, 2026

jminock reviewed Feb 8, 2026

View reviewed changes

		@@ -0,0 +1,12 @@
		verbosity 0
		FluxVersion 0 # use 0 to load genie files based on bnb_annie_0000.root etc files

Conversation

flemmons commented Oct 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

S81D Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jminock left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marc1uk Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jminock commented Jan 13, 2026

Uh oh!

flemmons commented Jan 14, 2026

Uh oh!

jminock commented Jan 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

S81D Nov 12, 2025 •

edited

Loading

marc1uk Nov 13, 2025 •

edited

Loading