Skip to content

Improved operators, locking mechanism, and seeding initial population with decision trees#65

Open
gAldeia wants to merge 27 commits intomasterfrom
operators
Open

Improved operators, locking mechanism, and seeding initial population with decision trees#65
gAldeia wants to merge 27 commits intomasterfrom
operators

Conversation

@gAldeia
Copy link
Collaborator

@gAldeia gAldeia commented Jan 15, 2026

This PR introduces several improvements and fixes:

  • Added GEQ and EQ operators;
    • GEQ will compare two floats and return a boolean result. This is useful for 'casting' floats to boolean and can occur on splitOn nodes;
    • EQ will compare integers to an specific value (which Brush uses to represent categories). This will cast int to booleans and can be used in SplitOn nodes;
    • Improved boolean operators to work properly;
  • addition of a new initialization option for starting from decision trees;
  • enhancements to the node locking mechanism and better control of locking coefficients;
    • Added a new parameter start_from_decision_trees to EstimatorInterface and underlying C++ Parameters, allowing the initial population to consist only of decision trees. This is now exposed in Python and passed to C++ bindings.
    • In BrushEstimator.partial_fit, added keep_current_weights parameter to control whether current weights are locked during optimization, and ensured population is replicated from the best estimator before fitting. The underlying C++ lock_nodes method and its Python bindings now support this option.
  • Improved the implementation of average_precision_score in C++ to correctly handle cases where all predicted probabilities are constant, matching sklearn's behavior, and fixed an off-by-one error in the loop.

…iations

I will not create something fully random if variation fails, because
the population can lose the locked nodes or weights. Instead, I try
subtree mutations and finally clone the parent if it fails.

This is how mutations are handling locked weights

Delete – should not work on nodes with locked weight
Toggle – will not work if there is a fixed weight (cannot turn on or off)
Subtree – will keep the weight
Point – will keep the weight
Insert – should “steal” the weight from the fixed node
Crossover – will keep the weight of the receiving parent
That was making complexity values explore by considering intermediate
nodes when doing the recursive calculation, often leading to
overflow of the integer value.

I also updated the cpp test cases to print the min and max values
for each data type so we can manually check if the value is suitable
for the calculations we are doing.
@gAldeia gAldeia requested a review from lacava January 15, 2026 12:21
…ataframe.

Better error message when accessing the dataset directly does not find
the feature by its name.
New assertions in API interface tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant