Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML keras #4172

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

ML keras #4172

wants to merge 5 commits into from

Conversation

bikagit
Copy link

@bikagit bikagit commented Aug 22, 2024

Integrating Keras capabilities to OPM.
Draft pull request to test/discuss implications of the changes in OPM.

This enables the straightforward and adaptable integration of neural networks into OPM scripts. These models are initially trained using the Keras library in Python, stored in a format readable for the OPM framework and subsequently deployed within OPM.
When the user initializes and loads a stored Keras model inside an OPM script, an automated deployment process handles all the translation. This process works by operating a series of steps handling model interpretation, layer conversion, optimization, and code generation steps to adapt the Keras model to a native OPM function.

@totto82
Copy link
Member

totto82 commented Aug 22, 2024

jenkins build this please

@daavid00
Copy link
Contributor

jenkins build this please

@atgeirr
Copy link
Member

atgeirr commented Aug 22, 2024

This does not add a dependency on a third party library, so then I assume it instead embeds Keras in some way? Or have I misunderstood the purpose of this PR?

@bska
Copy link
Member

bska commented Aug 22, 2024

Is there a reason this is added to opm-common instead of being a separate repository? Do we, somehow, need to make (selected) objects in this repository, or any of its downstream repositories for that matter, "aware" of Keras?

@bikagit
Copy link
Author

bikagit commented Aug 22, 2024

This does not add a dependency on a third party library, so then I assume it instead embeds Keras in some way? Or have I misunderstood the purpose of this PR?

We use Keras for the training process (it doesnt need to be done in OPM). The generated models are subsequently embedded and run in OPM. I have updated the description to provide some context.

@totto82
Copy link
Member

totto82 commented Aug 26, 2024

jenkins build this please

@daavid00
Copy link
Contributor

jenkins build this please

@bska
Copy link
Member

bska commented Aug 28, 2024

Maybe I'm missing something, but as far as I can tell no-one have answered my question from last week

Is there a reason this is added to opm-common instead of being a separate repository? Do we, somehow, need to make (selected) objects in this repository, or any of its downstream repositories for that matter, "aware" of Keras?

I would really like an answer to this before I consider the details of the PR.

@totto82
Copy link
Member

totto82 commented Aug 28, 2024

I would really like an answer to this before I consider the details of the PR.

Sorry for not answering earlier. The idea is to apply the ML-Keras inside OPM for different tasks. This is only the first PR to add the Keras ML model. The applications will follow. For an example of a ML near well model using ML-Keras check out https://github.com/cssr-tools/ML_near_well. Since the ML-Keras model framework is general. We hope it would be useful for the OPM community and therefore suggest to add it to opm-common

@bska
Copy link
Member

bska commented Aug 28, 2024

The idea is to apply the ML-Keras inside OPM for different tasks [...] Since the ML-Keras model framework is general, we hope it would be useful for the OPM community and therefore suggest to add it to opm-common

Okay, utility/convenience is clearly one reason for adding it here. Would it be impossible to make [your/certain use cases] work if it were located elsewhere? Do you, for instance, need access to the internals/private data members or member functions of Well or Connection objects or similar in your use cases?

@totto82
Copy link
Member

totto82 commented Aug 28, 2024

jenkins build this please

@bikagit bikagit marked this pull request as ready for review August 28, 2024 16:43
@bikagit bikagit marked this pull request as draft August 29, 2024 10:38
@bikagit
Copy link
Author

bikagit commented Sep 1, 2024

The idea is to apply the ML-Keras inside OPM for different tasks [...] Since the ML-Keras model framework is general, we hope it would be useful for the OPM community and therefore suggest to add it to opm-common

Okay, utility/convenience is clearly one reason for adding it here. Would it be impossible to make [your/certain use cases] work if it were located elsewhere? Do you, for instance, need access to the internals/private data members or member functions of Well or Connection objects or similar in your use cases?

Exact! For instance, we need access to the automatic differentiation tools within OPM.

@totto82
Copy link
Member

totto82 commented Sep 6, 2024

jenkins build this please

@bikagit bikagit marked this pull request as ready for review September 6, 2024 13:41
Copy link
Member

@atgeirr atgeirr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of changes needed here. I have only looked at the C++ code, and I probably missed some. I have not really checked that the activation functions or the layers do what a user of Keras would expect. I have not looked at any of the Python code, someone else must do that.

I have requested many changes, but I hope it provides a useful learning experience. Feel free to ask about anything that is unclear!

opm/ml/ml_tools/__init__.py Outdated Show resolved Hide resolved
opm/ml/keras_model.hpp Outdated Show resolved Hide resolved
opm/ml/keras_model.hpp Outdated Show resolved Hide resolved
opm/ml/keras_model.hpp Outdated Show resolved Hide resolved
opm/ml/keras_model.hpp Outdated Show resolved Hide resolved
opm/ml/keras_model.cpp Outdated Show resolved Hide resolved
opm/ml/keras_model.cpp Outdated Show resolved Hide resolved
opm/ml/keras_model.cpp Outdated Show resolved Hide resolved
opm/ml/keras_model.cpp Outdated Show resolved Hide resolved
opm/ml/keras_model.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@kjetilly kjetilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the python bits. I think a major point is that the folder opm/ml_tools, which contains only python code, should probably be moved to say the python folder or similar. And since these scripts use external libraries (tf, numpy, keras) I would really like to see a requirements.txt file specifying the versions used. Especially tensorflow is known to be problematic her.

opm/ml/ml_tools/kerasify.py Outdated Show resolved Hide resolved
opm/ml/ml_tools/kerasify.py Outdated Show resolved Hide resolved
opm/ml/ml_tools/kerasify.py Outdated Show resolved Hide resolved
opm/ml/ml_tools/kerasify.py Outdated Show resolved Hide resolved
opm/ml/ml_tools/kerasify.py Outdated Show resolved Hide resolved
opm/ml/ml_tools/kerasify.py Outdated Show resolved Hide resolved
opm/ml/ml_tools/__init__.py Outdated Show resolved Hide resolved
opm/ml/ml_tools/scaler_layers.py Outdated Show resolved Hide resolved
@bikagit
Copy link
Author

bikagit commented Sep 27, 2024

Thanks all for the valuables comments and suggestions.
We have started adding most of the modifications.

@bikagit bikagit requested a review from atgeirr October 11, 2024 14:55
@daavid00
Copy link
Contributor

jenkins build this please

1 similar comment
@daavid00
Copy link
Contributor

jenkins build this please

Copy link
Member

@atgeirr atgeirr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at the c++ parts, which have been improved a lot. Still quite a few things to address, but I think this will converge eventually!

opm/ml/ml_model.hpp Outdated Show resolved Hide resolved
data_.resize(i * j * k * l);
}

inline void Flatten() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a problem.



template <typename Type>
void resizeI(std::vector<Type> c) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason to take a copy of c, use a const reference. Also, you do not need to force this to be a vector:

template <typename Sizes>
void resizeI(const Sizes& sizes)
{
...
}

opm/ml/ml_model.hpp Outdated Show resolved Hide resolved
opm/ml/ml_model.hpp Outdated Show resolved Hide resolved
opm/ml/ml_model.hpp Outdated Show resolved Hide resolved
opm/ml/ml_model.cpp Outdated Show resolved Hide resolved
template<class Evaluation>
bool NNLayerScaling<Evaluation>::loadLayer(std::ifstream& file) {
OPM_ERROR_IF(!readFile<float>(file, data_min), "Failed to read min");
OPM_ERROR_IF(!readFile<float>(file, data_max), "Failed to read min");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error messages are wrong, except for the first.

break;
case kHardSigmoid:
for (size_t i = 0; i < out.data_.size(); i++) {
Evaluation x = (out.data_[i] * sigmoid_scale) + 0.5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be const.

break;
case kSigmoid:
for (size_t i = 0; i < out.data_.size(); i++) {
Evaluation& x = out.data_[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be const. Always make variables const unless they have to be mutable.

Copy link
Member

@atgeirr atgeirr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review block just contains a single comment, from a long time ago, but that is still relevant. I will make a new review block now.

opm/ml/keras_model.cpp Outdated Show resolved Hide resolved
Copy link
Member

@atgeirr atgeirr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from reformatting, I see none of the earlier requested code changes here.

opm/ml/ml_model.hpp Outdated Show resolved Hide resolved
*/
template <class T> class Tensor {
public:
Tensor() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation and formatting is not in line with common OPM practice, although it seems internally consistent now! I assume you used clang-format to process this? Please do so again, using the .clang-format file at the top level of opm-common. (That would give 4-space indents for example.)

You could consider setting a slightly wider allowed width to make the fmt::format() calls look nicer (i.e. on a single line) such as on line 78.

std::accumulate(begin(c), end(c), 1.0, std::multiplies<Type>()));
}

inline void flatten() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove inline in front of functions that are defined inline in the class. It is unnecessary.

This was commented on already!


Tensor(int i, int j, int k, int l) { resizeI<int>({i, j, k, l}); }

template <typename Type> void resizeI(std::vector<Type> c) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier comment on this.

Copy link
Author

@bikagit bikagit Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly, you asked to use the dims_, not c?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the comment:

No reason to take a copy of c, use a const reference. Also, you do not need to force this to be a vector:

template <typename Sizes>
void resizeI(const Sizes& sizes)
{
...
}

The use of dims_ was further down in an accumulate call.

Copy link
Author

@bikagit bikagit Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use now a const ref.

opm/ml/ml_model.hpp Outdated Show resolved Hide resolved
"Cannot add tensors with different dimensions");
Tensor result;
result.dims_ = dims_;
result.data_.reserve(data_.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a resize(), and then not use a back_inserter below, as discussed earlier! Same with the multiply() further down.

opm/ml/ml_model.hpp Outdated Show resolved Hide resolved
opm/ml/ml_model.hpp Outdated Show resolved Hide resolved
template <class Evaluation>
class NNLayerActivation : public NNLayer<Evaluation> {
public:
enum ActivationType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was requested to make this an enum class, and you commented that this was done, but it is not done.

template <class Evaluation>
bool NNLayerActivation<Evaluation>::apply(const Tensor<Evaluation>& in,
Tensor<Evaluation>& out) {
constexpr double sigmoid_scale = 0.2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not define this up here, but just before it is used (line 96). As already commented...

@atgeirr
Copy link
Member

atgeirr commented Jan 2, 2025

Note that in addition to re-commenting on several of the unchanged problems, I un-resolved several more since the existing comment was still valid. Please make code changes first, in a separate commit so I can easily see what changes you made, then after that add a separate commit to do the auto-reformatting.

@bikagit
Copy link
Author

bikagit commented Jan 3, 2025

Note that in addition to re-commenting on several of the unchanged problems, I un-resolved several more since the existing comment was still valid. Please make code changes first, in a separate commit so I can easily see what changes you made, then after that add a separate commit to do the auto-reformatting.

Thanks for spotting the missing points. The changes were done in November. However, unfortunately, during re-basing we removed some commits inadvertently. The requested changes will all be added back.

@bikagit bikagit requested a review from atgeirr January 6, 2025 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants