Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralise Ocean parameters #64

Closed
wants to merge 32 commits into from
Closed

Conversation

merijn
Copy link
Collaborator

@merijn merijn commented Oct 29, 2019

Doesn't currently include validation because THCM parameters are passed via ocean and don't pass validation.

I think the best approach is to refactor THCM in a similar fashion so that Ocean can simply query the default parameters from THCM and extend its default parameters with those. The main obstacle is that some of THCM's parameters are only used in the constructor, so it's unclear what "setting" those would mean/look like. Specifically THCM parameters only used in the constructor to initialise other values are:

double qz      = paramList.get("Grid Stretching qz", 1.0);
int    itopo   = paramList.get("Topography", 1);
bool   flat    = paramList.get("Flat Bottom", false);
bool   rd_mask = paramList.get("Read Land Mask", false); //== false in experiment0
bool rd_spertm     = paramList.get("Read Salinity Perturbation Mask",false);
int coriolis_on    = paramList.get("Coriolis Force", 1);
int forcing_type   = paramList.get("Forcing Type", 0);
std::string spertm_file = paramList.get("Salinity Perturbation Mask", "no_mask_specified");
std::string windf_file = paramList.get("Wind Forcing Data", "wind/trtau.dat");

// Let THCM know the sst forcing file
std::string sst_file = paramList.get("Temperature Forcing Data", "levitus/new/t00an1");

// Let THCM know the sss forcing file
std::string sss_file = paramList.get("Salinity Forcing Data", "levitus/new/s00an1");

bool time_dep_forcing = paramList.get("Time Dependent Forcing",false);
int Nic = paramList.get("Integral row coordinate i", N-1);
int Mic = paramList.get("Integral row coordinate j", M-1);

@erik808 @Sbte what do you think should be done with those parameters?

Merijn Verstraaten added 5 commits October 29, 2019 13:38
This frees up the name to use as consistent interface for setting/updating
ParameterList.
It's only used once to initialise belos and not touched later, if needed in the
future it should be integrated with the default parameter list.
@merijn merijn self-assigned this Oct 29, 2019
@Sbte
Copy link
Member

Sbte commented Oct 29, 2019

  • You forgot to remove the commit of the other pull request.
  • The nice thing about get with default values is that it sets the parameter as used. See
    https://trilinos.org/docs/dev/packages/teuchos/doc/html/classTeuchos_1_1ParameterList.html#acf06d867a960000f651b1d8657e083ff
    Using the templated get does not do this. Also using the parameter list as const implies that you do not do this (but knowing Trilinos, it's possible that they made it mutable, you could check this). The use of this used is that you can print the parameter list at the end of the run or initialization, and it tells you for every parameter whether you set it and it was used, the default value was used, or it was unused. It would be nice if we could retain this. Or get it to work properly if it doesn't work yet.
  • I don't see a problem with setting parameters in THCM from Ocean only. To make sure you could put an assert in THCM.C to check that they were not changed at the end of the run. All initialization can be moved to Ocean. Unless @erik808 disagrees.
  • Please use a _ suffix for all class members. It's really confusing if you don't.

@merijn
Copy link
Collaborator Author

merijn commented Oct 30, 2019

  • The nice thing about get with default values is that it sets the parameter as used. See
    https://trilinos.org/docs/dev/packages/teuchos/doc/html/classTeuchos_1_1ParameterList.html#acf06d867a960000f651b1d8657e083ff
    Using the templated get does not do this. Also using the parameter list as const implies that you do not do this (but knowing Trilinos, it's possible that they made it mutable, you could check this). The use of this used is that you can print the parameter list at the end of the run or initialization, and it tells you for every parameter whether you set it and it was used, the default value was used, or it was unused. It would be nice if we could retain this. Or get it to work properly if it doesn't work yet.

If we want to make the original ParameterList the "single source of truth" we really don't want other parts of the code accidentally "adding" parameters that were not in the original list, because then we're back to square 1 of having no centralised places were all parameters can be found. I'll have to think how we can make that work together with still tracking whether a parameter was used or not.

Having a "default" parameter list we lose the ability to track whether a default was used anyway, since all the used parameters were already explicitly set in this default list I initialise from. If this is really important I think the simplest way to do it would be to subclass Teuchos::ParameterList and track "unused"/"used"/"default" in that?

The current THCM doesn't actually ever use the default value anyway, since the get function is only ever called for parameters that are actually in the list, see: https://github.com/nlesc-smcm/i-emic/pull/64/files#diff-5c096fb489dae5c82b49f6a8261e9a45L1902

Which basically means that it can only ever report used/unused. I actually think we don't lose anything here since there's no conditional selection of which variables to set, so each variable that THCM knows about is always set if present (so they can't be left unused). This means only variables that THCM doesn't know about will ever be "unused", but once we add the validation so that we only allow "known" parameters, that can't happen either.

Anyway, if you still think this functionality is important we can brainstorm some way to make it work again.

@Sbte
Copy link
Member

Sbte commented Oct 30, 2019

Adding parameters is not a problem. You should just always add them with the default value by using get. That way it will tell you it used the default value. For instance

#include <iostream>

#include "Teuchos_ParameterList.hpp"

int main()
{
    Teuchos::ParameterList pars;
    pars.set("Test 1", 1);
    pars.set("Test 2", 2);
    std::cout << pars.get("Test 1", 2) << std::endl;
    std::cout << pars.get("Test 3", 3) << std::endl;
    std::cout << pars.get("Test 3", 4) << std::endl;
    std::cout << pars;
}

gives as output

1
3
3
Test 1 = 1
Test 2 = 2   [unused]
Test 3 = 3   [default]

Note that the second get of "Test 3" is also 3, not 4! So in your createDefaultParameterList you should just pass the parameters that were passed to the constructor, and then use get instead of set.

The reason that the THCM parameters are not read in this way yet, is that the default values are in the Fortran code. If you move them to the C++ code this can be done with the proper get method.

@merijn
Copy link
Collaborator Author

merijn commented Oct 30, 2019

Adding parameters is not a problem. You should just always add them with the default value by using get. That way it will tell you it used the default value. For instance

It's a problem in the sense that, if I query Ocean "hey, which parameters do you support?" and 'Test3' is not in the default list, then omuse doesn't know 'Test3' is a parameter that we can set, since omuse only knows about the parameters it can query. That's why I made the ParameterList const, that way we know it can only use parameters that were already in the list (and thus must be in the default list, since the validation will reject any parameter that's not in the default list).

@Sbte
Copy link
Member

Sbte commented Oct 30, 2019

Yes, but that's my point. You just get every parameter into the default parameter list, as well as any other parameter list. So getDefaultParameters becomes

Teuchos::ParameterList getDefaultParameters()
{
Teuchos::ParameterList params;
setParameters(params);
return params;
}

and setParameters only contains get methods. You can also call setParameters from the constructor to actually set the parameters.

@merijn
Copy link
Collaborator Author

merijn commented Oct 30, 2019

Just to be sure, are we talking about the same thing? I thought you were referring to the change in THCM.C's readParameters, but do you mean you're referring to the parameter setting inside Ocean?

@Sbte
Copy link
Member

Sbte commented Oct 30, 2019

Both. They should both do the same thing. ReadParameters should become setParameters for THCM.

@merijn
Copy link
Collaborator Author

merijn commented Oct 30, 2019

Ok, I see what you're getting at, but there's a bit of trickiness. I can see two approaches that make sense, but I'm unsure what has the best trade-off.

The approach I just pushed uses get + default value before initialising Ocean's params_, so that should correctly report whether we're using defaults or not. There's a bit of ugly duplication in that setParameters still uses explicit get without default to assign to Ocean's class variables. Your example (seems to) implies just having a single setParameters both for setting defaults and setting the class variables, however this approach has a problem because then simply calling getDefaultParameters() will immediately reset Ocean's class variables to the defaults, which seems...suboptimal. Hence, the duplication.

There's an alternate approach which is to indeed use a single setParameters but make assigning to Ocean's class variables conditional. This would look something like:

template<typename T>
void set(bool doAssign, T& var, T val)
{ if (doAssign) var = val; }

void
Ocean::setParametersHelper(bool doAssign, Teuchos::ParameterList& params)
{
    set(doAssign, loadSalinityFlux_, params.get("Load salinity flux", false));
    ....
}

Teuchos::ParameterList
Ocean::getDefaultParameters()
{
    Teuchos::ParameterList result;
    setParametersHelper(false, result);
    return result;
}

void
Ocean::setParameters(Teuchos::ParameterList newParams)
{
    setParametersHelper(true, newParams);
    ...
}

This second option gets rid of the duplication, but 1) I think the set() approach looks rather unreadable, and 2) we lose the ability to make getDefaultParameters() a static class function, since it needs an instance to access the class variables.

@merijn
Copy link
Collaborator Author

merijn commented Oct 30, 2019

Relatedly, I've been using const for the ParameterList references being passed in, but if you want to use this used/unused/default setup that'd mean having setParameters/constructors mutating the argument ParameterList. I'm not a big fan of that, but if you think it's important I can switch to doing that.

Although I think the same information would be available in params_, so that might work without needlessly mutating the input lists...

@Sbte
Copy link
Member

Sbte commented Oct 30, 2019

Ah, you're right. We actually talked about this the first time we discussed these parameters (here in Groningen). What I suggested was to use parameter list validators. Something like this:

static Teuchos::ParameterList getValidParameters() const
{
    Teuchos::ParameterList params;

    Teuchos::RCP<Teuchos::StringToIntegralParameterEntryValidator<int> >
        varValidator = Teuchos::rcp(
                new Teuchos::StringToIntegralParameterEntryValidator<int>(
                  Teuchos::tuple<std::string>("Velocity", "Pressure", "Temperature"),"Velocity"));
    params.set("Variable Type", "Velocity", "The types of variables that can be found at the node.", varValidator);
    params.set("Dimension", 3, "Dimension of the problem");
    return params;
}

static Teuchos::ParameterList getDefaultParameters() const
{
    return getValidParameters();
}

void setParameters(Teuchos::ParameterList &params)
{
    params.validateParametersAndSetDefaults(getValidParameters());
    varType_ = params.get<std::string>("Variable Type");
    dim_ = params.get<int>("Dimension");
}

In this example, using "Salinity" as a "Variable Type" now results in an error, since it is not in the list in the StringToIntegralParameterEntryValidator. Also the "Dimension" can not be "3.0" since it expects an int.

Note that the 3rd parameter of set is a docstring, so this can also be used to document the parameters. The documentation can now only be found spread all over the xml files. Does that work for you? It is still some duplication, but now with a reason (validation and documentation).

@merijn
Copy link
Collaborator Author

merijn commented Oct 30, 2019

Your suggestions was my plan all along. But I first wanted to centralise the parameterlists, then ensure setParameters verifies against the default, and then figure out how to add the validators (I actually already have some stuff for validation, but it breaks everything until I sort out THCM and solver_params).

@Sbte
Copy link
Member

Sbte commented Oct 30, 2019

Ok, perfect. The commits that you pushed now also seem fine to me. The only thing I noticed is the belosParamList_ instead of belosParamList.

src/ocean/THCM.C Outdated Show resolved Hide resolved
src/ocean/THCM.C Outdated Show resolved Hide resolved
src/ocean/THCM.C Outdated
@@ -2058,7 +2010,7 @@ bool THCM::setParameter(std::string label, double value)
else if (param==0) // 0 is non-dimensional time
{
// set monthly forcing data
bool time_dep_forcing = paramList.get("Time Dependent Forcing",false);
bool time_dep_forcing = paramList.get<bool>("Time Dependent Forcing");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably become an instance variable.

src/ocean/THCM.C Outdated Show resolved Hide resolved
@merijn
Copy link
Collaborator Author

merijn commented Nov 1, 2019

Some of the constructor only parameters used in THCM are used to initialise this here, would it be sensible to allow updating these later or should these only really be set at initialisation?

The fortran code gets initialised here but ideally we want to be able to set/overwrite these parameters later via setParameters is that feasible?

@Sbte
Copy link
Member

Sbte commented Nov 4, 2019

I think everything but the grid can be changed later. Maybe even the grid could be changed later, but then you would have to re-initialize most of the code anyway, so it would be better to just call the constructor again.

@merijn
Copy link
Collaborator Author

merijn commented Nov 4, 2019

Relatedly, right now I've been adding the same functions to different classes, but would it perhaps make sense to define a baseclass that provides virtual implementations of all those getParameters/setParameters function and inherit from those? (This would especially be useful to avoid duplicating a lot of work if/when we also add a separate setInitParameters to account for stuff that can only be set at initialisation time.

@Sbte
Copy link
Member

Sbte commented Nov 4, 2019

Yeah, go ahead.

@Sbte
Copy link
Member

Sbte commented Jan 20, 2020

Well, this pull request actually changes the THCM sublist from Ocean here (if the lists are shared):

https://github.com/nlesc-smcm/i-emic/pull/64/files#diff-24e73cc71ccdf36ee75c0b8350cdf86cR2231

@ipelupessy
Copy link
Contributor

ok, I am not sure 100% what setparams of teuchos::ParameterList actually does, but my understanding is that this snippet:

params_->setParameters(newParams);
params_->validateParametersAndSetDefaults(getDefaultInitParameters());

first updates params_ from the newParams and then validates them, checks whether any parameters are in there that should not be and fills in any missing parameters from defaults?
so question is what happens/ should happen if:

  1. newParams contains THCM sublist
  2. when it doesn't
    I think at option 1: it should just propagate the sublist (and probably it does?)
    not sure about option 2?

can you two (@meriJ, @Sbte) comment on this?

@Sbte
Copy link
Member

Sbte commented Jan 20, 2020

Well, as I commented, the first thing that is wrong is already that it updates the parameters BEFORE validating that they are valid. But as for your comments, what happens:

  1. Since params_ is the "single source of truth" the THCM parameter list gets set as well (but the actual parameters do not!!!).
  2. Nothing

What I think should happen is that setParameters also calls thcm->setParameters and sets the internal parameter list from there. It is safe to call setParameters without a THCM sublist (or without any parameters at all for that matter).

@ipelupessy
Copy link
Contributor

ipelupessy commented Jan 20, 2020

Well, as I commented, the first thing that is wrong is already that it updates the parameters >BEFORE validating that they are valid. But as for your comments, what happens:

ok, from what I get the newparams can also be just a subset or one particular parameter that you want to change - in this case validating before will fill it with default values for the other parameters??

1. Since `params_` is the "single source of truth" the THCM parameter list gets set as well (but the actual parameters do not!!!).

2. Nothing

What I think should happen is that setParameters also calls thcm->setParameters and sets the >internal parameter list from there. It is safe to call setParameters without a THCM sublist (or without >any parameters at all for that matter).

ok, i see... this is necessary...basically setParameters and constructor are the only places that should be used to set/change parameters..its not too far from the current setup I think though..let me think about this (and also @merijn to comment)...

@ipelupessy
Copy link
Contributor

this commit fixes the validation and propagates the sublist to THCM setParameters (and fixes constructor). the paramlist passed to the constructor will get updated, which is ok. The only thing which you should not do is update the paramlist afterwards with the expectation that these changes will be committed to the sim code..

@Sbte
Copy link
Member

Sbte commented Jan 22, 2020

Apparently that made the tests fail...

Also with that change the parameter list passed to setParameters doesn't get updated with what is done with the parameters (used, unused, default, etc). The entire point of passing a nonconst reference to that method is that the parameter list gets updated.

I still don't see what's wrong with my approach, since at the very least it's much easier to read.

@ipelupessy
Copy link
Contributor

yea we are working on it (its not entirely obvious why it fails..

@ipelupessy
Copy link
Contributor

this should fix it.

note that at the end the input newparam gets updated so that you have same information as your proposal. in your proposal the original input parameterlist to the constructor is only updated after the constructor - not with later changes. I guess what is preferable depends a bit on the usage pattern!

The main items for conversation are according to @merijn:
i) should the classes have their own private copy (currently: no, you: yes)
ii) should the argument to setparameters and the constructor argument be updated (yes, yes)
iii) should the original argument to the constructor be kept up to date after later setparameters (yes, no)

@ipelupessy
Copy link
Contributor

btw we are not fundamentally opposed to your proposal (merijn is ok with implementing that if you wish so), but I think we should first reach agreement on how it looks on the "user side" cause I got the feeling Merijn was chasing a moving target and led to a bit of unhappiness ;-)

@Sbte
Copy link
Member

Sbte commented Jan 22, 2020

Yes, your points perfectly summarize the current state of the discussion.

Also, as for the extra thing you added to setParameters: Erik noted to me earlier today that we have to make sure (test) that this happens correctly for all parameters. I suggested during one of our earlier meetings that all parameter setting (and handling of that) is moved to the setParameters function (and removed from the constructor). Merijn noted that some parameters are only allowed to be set during the construction, so these may be an exception (but less exceptions is better).

@merijn
Copy link
Collaborator Author

merijn commented Jan 22, 2020

Also, as for the extra thing you added to setParameters: Erik noted to me earlier today that we have to make sure (test) that this happens correctly for all parameters. I suggested during one of our earlier meetings that all parameter setting (and handling of that) is moved to the setParameters function (and removed from the constructor). Merijn noted that some parameters are only allowed to be set during the construction, so these may be an exception (but less exceptions is better).

Ideally I think the check we added in setParameters should be deleted (there and in the constructor) and replaced with an error during validation.

As for testing, I didn't want to invest any time into writing any tests until we at least agreed on the actual API provided by Ocean & friends.

@Sbte
Copy link
Member

Sbte commented Jan 22, 2020

Sure, maybe that check should be deleted, but there are many more things that are done in the constructor that depend on parameters that can be set. If the getting these parameters has been moved to setParameters, the stuff that depends on it also has to be moved. But I guess this can wait until after this has been merged, so it doesn't get even bigger.

Let's say the ultimate goal (not from this pull request, but after many more pull requests) is to be able to call the constructor first (without parameters), then setParameters, and then some Initialize function (to call the fortran init function and handle all stuff that should be immutable through calling setParameters only).

@Sbte
Copy link
Member

Sbte commented Jan 23, 2020

i) should the classes have their own private copy (currently: no, you: yes)
iii) should the original argument to the constructor be kept up to date after later setparameters (yes, no)

On my bike I came up with an example of why this is preferable. If the parameter list is shared, we get this:

Say we have a class C with internal parameters a_ and b_ that can be set through the parameter list with "A" and "B".

Teuchos::RCP<Teuchos::ParameterList> params = Teuchos::rcp(new Teuchos::ParameterList);
params->set("A", 1);
params->set("B", 2);

C c(params);
// internal parameter a_ = 1
// internal parameter b_ = 2

// lots of code

params->set("A", 3);

// lots of code

// internal parameter a_ = 1
// internal parameter b_ = 2
Teuchos::ParameterList newParams;
newParams.set("B", 4);
c.setParameters(newParams);
// internal parameter a_ = 3!!!!!
// internal parameter b_ = 4

Now a_ is something completely different while we didn't pass it to setParameters. Note that this doesn't even have to happen in the same file. It can be changed from somewhere completely different because of the shared RCP.

@ipelupessy
Copy link
Contributor

ok, that can indeed be a problem; in effect the input parameterlist is only safe for reading in our case..@merijn shall we change it and go for @Sbte approach? note that this will make the "check the parameters" part (after constructor or setparameters) brittle in the sense that the input parameter list only provides a valid view immediately after the call..but maybe it is better to explicitly to a getParameters call on the class if you want an up to date status...

@ipelupessy
Copy link
Contributor

ok, I have changed parameter list now..(let me know what you think, @Sbte ..it may need some cleanup/simplfication)

Copy link
Member

@Sbte Sbte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found 2 things that are actually broken after the recent changes. There are many other broken things, such as THCM not setting the proper internal variables after a parameter change, but this can be changed in another pull request, since it doesn't affect how we currently use the code. In this new pull request, please fix one variable at a time and add a test for every fix.

Also, since we now have something that supposedly works (for the most part), please add some tests to make sure it doesn't break in the future. Some tests can be: checking that the defaults are set, used and unused are set, we can't validate with an incorrect parameter, changing the parameter list after the constructor doesn't change the internal parameter list. This should be done in this pull request (to spot possible errors).

src/ocean/Ocean.C Show resolved Hide resolved
src/ocean/Ocean.H Outdated Show resolved Hide resolved
@ipelupessy
Copy link
Contributor

ipelupessy commented Jan 23, 2020

to do, tests to:

  • checking that the defaults are set
  • used and unused are set
  • we can't validate with an incorrect parameter
  • changing the parameter list after the constructor doesn't change the internal parameter list.
  • add tests for the omuse interface. At the moment getDefaultParameters is only called from getDefaultInitParameters, meaning that they could be merged. I guess this is required for omuse. Please add a test for this so we can't break it.

@ipelupessy
Copy link
Contributor

please add any other to do items!

@Sbte
Copy link
Member

Sbte commented Jan 23, 2020

I made an issue for the remaining stuff that is required after this pull request. I also updated your comment with something else I came up with.

@merijn
Copy link
Collaborator Author

merijn commented Feb 3, 2020

Rebasing these changes into a cleaner/more minimal PR.

@merijn merijn closed this Feb 3, 2020
@merijn merijn deleted the params branch March 4, 2020 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants