-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix axes expand mecanism #933
base: master
Are you sure you want to change the base?
Conversation
* suffixes are axe indexes instead of just element index in expanded array
assert simulation_builder.get_ids('households') == ['housea0', 'houseb1', 'housea2', 'houseb3'] | ||
assert simulation_builder.get_input('rent', '2018-11') == approx([0, 0, 3000, 0]) | ||
assert simulation_builder.get_ids('households') == ['housea0', 'houseb0', 'housea1', 'houseb1'] | ||
assert simulation_builder.get_input('rent', '2018-11') == approx([0, 0, 3000, 3000]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benjello your insight would be appreciated. What would you expect in given this input data? (You will have to expand the visible code snippet to get the full context)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never used it in that case (or i do not remember). I think I would stick to your convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to say because this was quite some time ago, but it's safest to assume that this was to reproduce the behaviour of the existing implementation, as I understood it (perhaps poorly).
The problem here is that the YAML specification for axes has lots of ambiguity. When using axes, we replicate a seed simulation N times along one or more axes. It's easy to say what this means when we have one person in the seed simulation and we say "vary this along an axis of salary between 0 and 3000", or one household and we say "vary the rent between 0 and 3000".
It becomes more complicated when there are two households, because then a "rent" value for instance is no longer a scalar but a vector. So what does it mean to say "replicate this two-household seed simulation varying the rent from 0 to 3000" ? This "0" is in fact a shorthand for the vector [0, 0]
. But the target value is ambiguous. It could be the vector [3000,0]
or the vector [3000,3000]
.
As usual with OpenFisca the problem is not choosing a behaviour. There is not always a "right" or "wrong" answer and sometimes many ways could make sense. The problem is when you change something that was not formally specified but that other people have come to rely on, mentally supplying their own informal specification which is not compatible with your way. So now it works for you but it breaks for others.
So my best advice is to be careful, and to give as much explicit information as you can about why you would pick one way rather than the other. Document why you think it should work that way, with real-world examples. Compare with other users' examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @adrienpacifico uses this as well and could give us some insight as how he understands the behaviour and some real-world examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this one be considered a real-world example @Morendil ? https://github.com/openfisca/tutorial/blob/master/notebooks/how_to_handle_axes.ipynb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maukoquiroga Yes, insofar as it's an example of something you might actually want to calculate where axes play a role; but also no, as it's not an example of what's being discussed here i.e. how specifically we expect a seed situation with more than one household to be expanded along an axis. So you couldn't really conclude anything from that specific example that motivates a decision in this present case.
@@ -146,4 +146,4 @@ def test_simulation_with_axes(simulation_builder): | |||
data = yaml.safe_load(input_yaml) | |||
simulation = simulation_builder.build_from_dict(tax_benefit_system, data) | |||
assert simulation.get_array('salary', '2018-11') == approx([0, 0, 0, 0, 0, 0]) | |||
assert simulation.get_array('rent', '2018-11') == approx([0, 0, 3000, 0]) | |||
assert simulation.get_array('rent', '2018-11') == approx([0, 0, 3000, 3000]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here @benjello. (You will have to expand the visible code snippet to get the full context.)
Many thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
@@ -45,7 +45,7 @@ def test_add_axis_with_group(simulation_builder, persons): | |||
simulation_builder.add_parallel_axis({'count': 2, 'name': 'salary', 'min': 0, 'max': 3000, 'period': '2018-11'}) | |||
simulation_builder.expand_axes() | |||
assert simulation_builder.get_count('persons') == 4 | |||
assert simulation_builder.get_ids('persons') == ['Alicia0', 'Javier1', 'Alicia2', 'Javier3'] | |||
assert simulation_builder.get_ids('persons') == ['Alicia0', 'Javier0', 'Alicia1', 'Javier1'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is prudent; if two people have the same name then ids will be non-unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That non-uniqueness would be consistent with the input data.
IMHO, the axe expand mecanism should not try to mitigate non-uniqueness issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding non-uniqueness, SimulationBuilder
already ignores duplicated people before, as you can see with this test:
def test_ignores_duplicated_people(simulation_builder, persons):
simulation_builder.add_person_entity(persons, {'Alicia': {}, 'Alicia': {}, 'Alicia': {}})
assert simulation_builder.get_count('persons') == 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a look at the code, I guess this change won't have an impact on simulations as those ids are used solely in the context of axes expansion.
@guillett What leads you to conclude that the currently implemented behaviour is wrong ? I cannot say I am 100% certain that the existing code is correct, it took me quite a bit of effort at the time but it was long ago and I don't remember the context well enough; but I'd advise you to be careful as it's a somewhat complex feature. |
Thanks @Morendil, that PR is an refactor extracted from #934. The expected array Plus, I don't understand why/how |
@guillett @maukoquiroga : ce serais bien de voir si ce point est toujours à régler ? |
Ce problème est toujours à régler mais on a de temps disponible et c'est un problème qui n'apparait que dans des cas assez particuliers et en particulier sur https://betagouv.github.io/mes-aides-changent/ avec plusieurs membres dans une famille. |
Hi @benjello and @guillett ! I've been working on axes this weekend in order to better understand what is expected and how this proposal could be just a bug fix or a new feature. As it has been already mentioned, it is a bit an archeological work of trial, error, and documentation. What do I know up to this point:
[0, 0, 3000, 3000]
[0, 3000, 0, 3000] The current behaviour of
I've tried an got passing in master the following: def test_add_perpendicular_axes_on_group(simulation_builder, persons, group_entity):
simulation_builder.add_person_entity(persons, {'Alicia': {}, 'Javier': {}})
simulation_builder.add_group_entity('persons', ['Alicia', 'Javier', 'Tom'], group_entity, {
'housea': {'parents': ['Alicia']},
'houseb': {'parents': ['Tom'], 'children': ['Javier']},
})
simulation_builder.register_variable('rent', group_entity)
simulation_builder.register_variable('taxes', group_entity)
simulation_builder.add_parallel_axis({'count': 3, 'name': 'rent', 'min': 0, 'max': 3000, 'period': '2018-11'})
simulation_builder.add_perpendicular_axis({'count': 2, 'name': 'taxes', 'min': 0, 'max': 2000, 'period': '2018-11'})
simulation_builder.expand_axes()
assert simulation_builder.get_input('rent', '2018-11') == approx([0, 0, 1500, 0, 3000, 0, 0, 0, 1500, 0, 3000, 0])
assert simulation_builder.get_input('taxes', '2018-11') == approx([0, 0, 0, 0, 0, 0, 2000, 0, 2000, 0, 2000, 0])
assert simulation_builder.get_ids('households') == ['housea0', 'houseb1', 'housea2', 'houseb3', 'housea4', 'houseb5', 'housea6', 'houseb7', 'housea8', 'houseb9', 'housea10', 'houseb11'] It would probably make sense to rather have it this way: [0, 0, 0, 0, 1500, 1500, 1500, 1500, 3000, 3000, 3000, 3000]
[0, 0, 0, 0, 0, 0, 2000, 2000, 2000, 2000, 2000, 2000] I've tried that test with the changes proposed in this pull request and it fails:
So your proposal makes sense, but opens the question I guess posed already by @Morendil: if by meaningful we mean expanded entities should be named per axis count and not per cell count, How should we name them when we're expanding them over several axes? (in this case a parallel and a perpendicular one). What's next?I think the current documentation about how axes are supposed to work is incomplete, and I would be in favour of making those contracts explicit with tests and documentation —I've started that effort here #996. This will be a breaking change, but as a counterpart, give users a clear way of understanding how axes are being expanded. To be continued. |
Maybe that one is actually easy, something like this for two parallel with 2 steps and one perpendicular with 3 steps axes?
And so on. |
Breaking changes ?
Change expected axes values and adapt code base
https://github.com/openfisca/openfisca-core/compare/axes?expand=1#diff-419538942f2d50eb0605ed1c278f959eR51