Skip to content

[tmva][sofie] Restructure emitted code to be differentiable with Clad#18332

Draft
guitargeek wants to merge 5 commits intoroot-project:masterfrom
guitargeek:sofie_ad
Draft

[tmva][sofie] Restructure emitted code to be differentiable with Clad#18332
guitargeek wants to merge 5 commits intoroot-project:masterfrom
guitargeek:sofie_ad

Conversation

@guitargeek
Copy link
Contributor

Restructure emitted code to be differentiable with Clad.

@guitargeek guitargeek self-assigned this Apr 9, 2025
return out;
}

inline void Copy(float const *b, float const *e, float *o)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does providing a pullback for std::copy not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I tried a bit but then gave up. This was my approach:

#include <Math/CladDerivator.h>

namespace std {

void copy_pullback(double const *first, double const *last, double *out_first, double *_d_out, double *_d_first,
                   double *_d_last, double *_d_out_first)
{
   // Implementation doesn't matter yet, it doesn't compile anyway
}

} // namespace std

void fooImpl(double const *x, double *y)
{
   std::copy(x, x + 1, y);
}

void foo(double const *x, double *y)
{
   fooImpl(x, y);
}

double g(double *variables)
{
   double out;
   foo(variables, &out);
   return out * variables[1];
}

void clademo()
{

   // Call clad to generate the gradient of g.
   auto g_grad = clad::gradient(g, "variables");

   // Execute the generated gradient function.
   double variables[]{3., 4.};
   double grad_output[]{0., 0.};
   g_grad.execute(variables, grad_output);
   std::cout << "grad_output[0]: " << grad_output[0] << std::endl;
   std::cout << "grad_output[1]: " << grad_output[1] << std::endl;

   // Dump the generated gradient code to standard output.
   g_grad.dump();
}

It segfaults. I think Clad just doesn't play well with the STL algos that take iterators, it's better to avoid this, no?

In any case, supporting this is not crucial for this PR. I was refactoring things to avoid this copy call in the generated code anyway.

Once this PR is functional for our usecase (actually now, but I also want to make the ROOT CI pass again), I'll write up what was not perfect in Clad for this and open issues

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Probably worth opening an issue in clad…

});

TMVA_SOFIE_Equal::Session s("Equal_FromONNX.dat");
std::vector<bool> output = s.infer(input1.data(),input2.data());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that fail to differentiate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I didn't even try to differentiate the models in the test. I'm solely focusing on the SBI usecase that we implement with LHCb. The reason why I changed this is because std::vector<bool> is not a good output type parameter. See:

@github-actions
Copy link

github-actions bot commented Apr 9, 2025

Test Results

    22 files      22 suites   3d 13h 28m 21s ⏱️
 3 777 tests  3 771 ✅ 0 💤  6 ❌
75 124 runs  75 069 ✅ 0 💤 55 ❌

For more details on these failures, see this check.

Results for commit 9ea0f01.

♻️ This comment has been updated with latest results.

@guitargeek guitargeek force-pushed the sofie_ad branch 8 times, most recently from 6b90cb6 to 87597cd Compare April 15, 2025 06:33
@guitargeek
Copy link
Contributor Author

guitargeek commented Apr 22, 2025

Proof of concept test for this PR

Take this ONNX file (remove the .txt suffix after downloading):

VRlL_real_500k_evts_model.onnx.txt

Here are the scripts to convert the model to C++ and then to differentiate it with Clad:

// onnx_to_cpp.C

void onnx_to_cpp()
{
   using namespace TMVA::Experimental;
   SOFIE::RModelParser_ONNX parser;
   SOFIE::RModel model = parser.Parse("./VRlL_real_500k_evts_model.onnx");
   model.SetOptimizationLevel(SOFIE::OptimizationLevel::kBasic);
   model.Generate();
   model.PrintRequiredInputTensors();

   model.OutputGenerated("./VRlL_real_500k_evts_model.hxx");
}
// sofie_ad.C

#include "VRlL_real_500k_evts_model.hxx"

#include <Math/CladDerivator.h>

float my_func(TMVA_SOFIE_VRlL_real_500k_evts_model::Session const *session, float const *tensor_x,
              float *tensor_theory_params)
{
   float out = 0.;
   TMVA_SOFIE_VRlL_real_500k_evts_model::doInfer(session, tensor_x, tensor_theory_params, &out);
   return out;
}

void sofie_ad()
{
   std::vector<float> input1{5.0, 2.0, 1.0, -1.0, 1.0};
   std::vector<float> input2{0.0};

   // Generated header file shall contain a Session class which requires
   // initialization to load the corresponding weights.
   TMVA_SOFIE_VRlL_real_500k_evts_model::Session s("VRlL_real_500k_evts_model.dat");

   // Once instantiated the session object's infer method can be used
   // std::vector<float> out = s.infer(input1.data(), input2.data());

   auto func = [&](std::span<float> params) { return s.infer(input1.data(), params.data())[0]; };

   auto numDiff = [&](int i) {
      const float eps = 1e-4;
      std::vector<float> p{input2};
      p[i] = input2[i] - eps;
      float funcValDown = func(p);
      p[i] = input2[i] + eps;
      float funcValUp = func(p);
      return (funcValUp - funcValDown) / (2 * eps);
   };

   for (std::size_t i = 0; i < input2.size(); ++i) {
      std::cout << i << ":" << std::endl;
      std::cout << "  numr : " << numDiff(i) << std::endl;
   }

   float grad_output[]{0., 0., 0., 0., 0.};
   auto g_grad = clad::gradient<clad::opts::disable_tbr>(my_func, "tensor_theory_params");
   g_grad.execute(&s, input1.data(), input2.data(), grad_output);
   std::fill(std::begin(grad_output), std::end(grad_output), 0);
   g_grad.execute(&s, input1.data(), input2.data(), grad_output);

   std::cout << "  clad : " << grad_output[0] << std::endl;

   g_grad.dump();
}

Note that clad::opts::disable_tbr can probably be removed when this Clad issue is fixed:

Usage with expected output (replace libblas.so location with relevant path for your system):

   ------------------------------------------------------------------
  | Welcome to ROOT 6.35.01                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jan 01 1980, 00:00:00                 |
  | From heads/sofie_ad@v6-35-01-2277-g8ddebb98bb                    |
  | With g++ (GCC) 14.2.1 20250322                                   |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

root [0] .L /nix/store/6kknwpcf8fl7ihkkxmdb6p764kdn443n-blas-3/lib/libblas.so
root [1] .x onnx_to_cpp.C
Model requires following inputs:
Fully Specified Tensor name: theory_params	type: float	shape: [1]
Fully Specified Tensor name: x	type: float	shape: [5]

root [2] .x sofie_ad.C
0:
  numr : -0.531077
  clad : -0.532437
root [3] .q

@guitargeek guitargeek force-pushed the sofie_ad branch 3 times, most recently from 89b638c to a3d545f Compare May 7, 2025 14:42
@guitargeek guitargeek changed the title [TMVA][SOFIE] Restructure emitted code to be differentiable with Clad [tmva][sofie] Restructure emitted code to be differentiable with Clad May 7, 2025
@guitargeek guitargeek force-pushed the sofie_ad branch 2 times, most recently from 3f40542 to 78fcc20 Compare May 8, 2025 09:12
@guitargeek guitargeek force-pushed the sofie_ad branch 2 times, most recently from 4c9920f to 97903fa Compare July 15, 2025 14:52
@guitargeek guitargeek closed this Aug 5, 2025
@guitargeek guitargeek deleted the sofie_ad branch August 5, 2025 17:14
@vgvassilev
Copy link
Member

Why did we decide to not pursue this?

@guitargeek
Copy link
Contributor Author

guitargeek commented Aug 11, 2025

@vgvassilev, sorry that was totally an accident. Maybe I confused it with another PR, or I wanted to close and re-open the PR to run the tests, but apparently I missed the "reopen" button.

@guitargeek guitargeek restored the sofie_ad branch August 11, 2025 08:20
@guitargeek guitargeek reopened this Aug 11, 2025
TMVA SOFIE development is challenging sometimes, because of how the
tests are structured.

The tests that covers many possible models imported from ONNX or ROOT
have the issue that they includes **all** emitted code in the
compiled executables. This means that one gets a build failure on the
first model that generated invalid code, and that was it. Therefore,
it's difficult to debug what is going wrong.

This commit suggests include the generated code with the interpreter
instead. Then, one can check for each individual model if the code was
valid, and if not, skip over to the next test a print the emitted code
that failed to compile.

It has some performance overhead, but the tests still only take about 6
seconds. The drastically improved debugging experience justifies these
few extra seconds spent on testing.

This was motivated by the effort to refactor the SOFIE-emitted code to
make it differentiable with Clad.
Like this, we don't have to add these forward declarations conditionally
to the emitted code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants