[tmva][sofie] Restructure emitted code to be differentiable with Clad by guitargeek · Pull Request #18332 · root-project/root

guitargeek · 2025-04-09T09:40:02Z

Restructure emitted code to be differentiable with Clad.

vgvassilev · 2025-04-09T10:03:40Z

tmva/sofie/inc/TMVA/SOFIE_common.hxx

   return out;
 }

+inline void Copy(float const *b, float const *e, float *o)


Does providing a pullback for std::copy not work?

No, I tried a bit but then gave up. This was my approach:

#include <Math/CladDerivator.h> namespace std { void copy_pullback(double const *first, double const *last, double *out_first, double *_d_out, double *_d_first, double *_d_last, double *_d_out_first) { // Implementation doesn't matter yet, it doesn't compile anyway } } // namespace std void fooImpl(double const *x, double *y) { std::copy(x, x + 1, y); } void foo(double const *x, double *y) { fooImpl(x, y); } double g(double *variables) { double out; foo(variables, &out); return out * variables[1]; } void clademo() { // Call clad to generate the gradient of g. auto g_grad = clad::gradient(g, "variables"); // Execute the generated gradient function. double variables[]{3., 4.}; double grad_output[]{0., 0.}; g_grad.execute(variables, grad_output); std::cout << "grad_output[0]: " << grad_output[0] << std::endl; std::cout << "grad_output[1]: " << grad_output[1] << std::endl; // Dump the generated gradient code to standard output. g_grad.dump(); }

It segfaults. I think Clad just doesn't play well with the STL algos that take iterators, it's better to avoid this, no?

In any case, supporting this is not crucial for this PR. I was refactoring things to avoid this copy call in the generated code anyway.

Once this PR is functional for our usecase (actually now, but I also want to make the ROOT CI pass again), I'll write up what was not perfect in Clad for this and open issues

Ah, I see. Probably worth opening an issue in clad…

vgvassilev · 2025-04-09T10:04:36Z

tmva/sofie/test/TestCustomModelsFromONNX.cxx

   });

   TMVA_SOFIE_Equal::Session s("Equal_FromONNX.dat");
-   std::vector<bool> output = s.infer(input1.data(),input2.data());


Did that fail to differentiate?

No, I didn't even try to differentiate the models in the test. I'm solely focusing on the SBI usecase that we implement with LHCb. The reason why I changed this is because std::vector<bool> is not a good output type parameter. See:

[TMVA][SOFIE] Use uint8_t instead of bool in return types #18302

github-actions · 2025-04-09T12:04:48Z

Test Results

22 files 22 suites 3d 13h 28m 21s ⏱️
3 777 tests 3 771 ✅ 0 💤 6 ❌
75 124 runs 75 069 ✅ 0 💤 55 ❌

For more details on these failures, see this check.

Results for commit 9ea0f01.

♻️ This comment has been updated with latest results.

guitargeek · 2025-04-22T17:06:23Z

Proof of concept test for this PR

Take this ONNX file (remove the .txt suffix after downloading):

VRlL_real_500k_evts_model.onnx.txt

Here are the scripts to convert the model to C++ and then to differentiate it with Clad:

// onnx_to_cpp.C

void onnx_to_cpp()
{
   using namespace TMVA::Experimental;
   SOFIE::RModelParser_ONNX parser;
   SOFIE::RModel model = parser.Parse("./VRlL_real_500k_evts_model.onnx");
   model.SetOptimizationLevel(SOFIE::OptimizationLevel::kBasic);
   model.Generate();
   model.PrintRequiredInputTensors();

   model.OutputGenerated("./VRlL_real_500k_evts_model.hxx");
}

// sofie_ad.C

#include "VRlL_real_500k_evts_model.hxx"

#include <Math/CladDerivator.h>

float my_func(TMVA_SOFIE_VRlL_real_500k_evts_model::Session const *session, float const *tensor_x,
              float *tensor_theory_params)
{
   float out = 0.;
   TMVA_SOFIE_VRlL_real_500k_evts_model::doInfer(session, tensor_x, tensor_theory_params, &out);
   return out;
}

void sofie_ad()
{
   std::vector<float> input1{5.0, 2.0, 1.0, -1.0, 1.0};
   std::vector<float> input2{0.0};

   // Generated header file shall contain a Session class which requires
   // initialization to load the corresponding weights.
   TMVA_SOFIE_VRlL_real_500k_evts_model::Session s("VRlL_real_500k_evts_model.dat");

   // Once instantiated the session object's infer method can be used
   // std::vector<float> out = s.infer(input1.data(), input2.data());

   auto func = [&](std::span<float> params) { return s.infer(input1.data(), params.data())[0]; };

   auto numDiff = [&](int i) {
      const float eps = 1e-4;
      std::vector<float> p{input2};
      p[i] = input2[i] - eps;
      float funcValDown = func(p);
      p[i] = input2[i] + eps;
      float funcValUp = func(p);
      return (funcValUp - funcValDown) / (2 * eps);
   };

   for (std::size_t i = 0; i < input2.size(); ++i) {
      std::cout << i << ":" << std::endl;
      std::cout << "  numr : " << numDiff(i) << std::endl;
   }

   float grad_output[]{0., 0., 0., 0., 0.};
   auto g_grad = clad::gradient<clad::opts::disable_tbr>(my_func, "tensor_theory_params");
   g_grad.execute(&s, input1.data(), input2.data(), grad_output);
   std::fill(std::begin(grad_output), std::end(grad_output), 0);
   g_grad.execute(&s, input1.data(), input2.data(), grad_output);

   std::cout << "  clad : " << grad_output[0] << std::endl;

   g_grad.dump();
}

Note that clad::opts::disable_tbr can probably be removed when this Clad issue is fixed:

Another regression in Clad v1.10 with new crash in code that worked before vgvassilev/clad#1369

Usage with expected output (replace libblas.so location with relevant path for your system):

   ------------------------------------------------------------------
  | Welcome to ROOT 6.35.01                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jan 01 1980, 00:00:00                 |
  | From heads/sofie_ad@v6-35-01-2277-g8ddebb98bb                    |
  | With g++ (GCC) 14.2.1 20250322                                   |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

root [0] .L /nix/store/6kknwpcf8fl7ihkkxmdb6p764kdn443n-blas-3/lib/libblas.so
root [1] .x onnx_to_cpp.C
Model requires following inputs:
Fully Specified Tensor name: theory_params	type: float	shape: [1]
Fully Specified Tensor name: x	type: float	shape: [5]

root [2] .x sofie_ad.C
0:
  numr : -0.531077
  clad : -0.532437
root [3] .q

vgvassilev · 2025-08-05T17:43:56Z

Why did we decide to not pursue this?

guitargeek · 2025-08-11T08:20:38Z

@vgvassilev, sorry that was totally an accident. Maybe I confused it with another PR, or I wanted to close and re-open the PR to run the tests, but apparently I missed the "reopen" button.

TMVA SOFIE development is challenging sometimes, because of how the tests are structured. The tests that covers many possible models imported from ONNX or ROOT have the issue that they includes **all** emitted code in the compiled executables. This means that one gets a build failure on the first model that generated invalid code, and that was it. Therefore, it's difficult to debug what is going wrong. This commit suggests include the generated code with the interpreter instead. Then, one can check for each individual model if the code was valid, and if not, skip over to the next test a print the emitted code that failed to compile. It has some performance overhead, but the tests still only take about 6 seconds. The drastically improved debugging experience justifies these few extra seconds spent on testing. This was motivated by the effort to refactor the SOFIE-emitted code to make it differentiable with Clad.

Like this, we don't have to add these forward declarations conditionally to the emitted code.

guitargeek added the in:TMVA label Apr 9, 2025

guitargeek self-assigned this Apr 9, 2025

vgvassilev reviewed Apr 9, 2025

View reviewed changes

guitargeek force-pushed the sofie_ad branch 8 times, most recently from 6b90cb6 to 87597cd Compare April 15, 2025 06:33

guitargeek force-pushed the sofie_ad branch from 87597cd to fa2cd12 Compare April 22, 2025 18:17

guitargeek force-pushed the sofie_ad branch 3 times, most recently from 89b638c to a3d545f Compare May 7, 2025 14:42

guitargeek changed the title ~~[TMVA][SOFIE] Restructure emitted code to be differentiable with Clad~~ [tmva][sofie] Restructure emitted code to be differentiable with Clad May 7, 2025

guitargeek force-pushed the sofie_ad branch 2 times, most recently from 3f40542 to 78fcc20 Compare May 8, 2025 09:12

guitargeek force-pushed the sofie_ad branch from 78fcc20 to 66e39bb Compare May 27, 2025 06:34

guitargeek force-pushed the sofie_ad branch 2 times, most recently from 4c9920f to 97903fa Compare July 15, 2025 14:52

guitargeek closed this Aug 5, 2025

guitargeek deleted the sofie_ad branch August 5, 2025 17:14

guitargeek restored the sofie_ad branch August 11, 2025 08:20

guitargeek reopened this Aug 11, 2025

guitargeek force-pushed the sofie_ad branch from 97903fa to 2742478 Compare August 11, 2025 09:22

guitargeek force-pushed the sofie_ad branch from 2742478 to c93c28e Compare February 5, 2026 16:34

guitargeek added 5 commits February 7, 2026 10:28

[tmva][sofie] Forward declare relevant BLAS routines in SOFIE_common

d9ece97

Like this, we don't have to add these forward declarations conditionally to the emitted code.

[tmva][sofie] Restructure emitted code to be differentiable with Clad

48a6a5b

[tmva][sofie] Disable tests that are not supported yet

65b184a

Continue

9ea0f01

guitargeek mentioned this pull request Feb 7, 2026

Regression since Clad 2.1 that broke differentiating code emitted by SOFIE vgvassilev/clad#1721

Open

guitargeek force-pushed the sofie_ad branch from c93c28e to 9ea0f01 Compare February 7, 2026 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tmva][sofie] Restructure emitted code to be differentiable with Clad#18332

[tmva][sofie] Restructure emitted code to be differentiable with Clad#18332
guitargeek wants to merge 5 commits intoroot-project:masterfrom
guitargeek:sofie_ad

guitargeek commented Apr 9, 2025

Uh oh!

vgvassilev Apr 9, 2025

Uh oh!

guitargeek Apr 9, 2025

Uh oh!

vgvassilev Apr 9, 2025

Uh oh!

vgvassilev Apr 9, 2025

Uh oh!

guitargeek Apr 9, 2025

Uh oh!

github-actions bot commented Apr 9, 2025 •

edited

Loading

Uh oh!

guitargeek commented Apr 22, 2025 •

edited

Loading

Uh oh!

vgvassilev commented Aug 5, 2025

Uh oh!

guitargeek commented Aug 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guitargeek commented Apr 9, 2025

Uh oh!

vgvassilev Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

guitargeek Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

vgvassilev Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

vgvassilev Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

guitargeek Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

guitargeek commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proof of concept test for this PR

Uh oh!

vgvassilev commented Aug 5, 2025

Uh oh!

guitargeek commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 9, 2025 •

edited

Loading

guitargeek commented Apr 22, 2025 •

edited

Loading

guitargeek commented Aug 11, 2025 •

edited

Loading