-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generator for existing atoms #310
base: master
Are you sure you want to change the base?
Add generator for existing atoms #310
Conversation
Co-authored-by: Jan Uhlig <[email protected]>
Co-authored-by: Jan Uhlig <[email protected]>
7ad4d17
to
61041f0
Compare
Last commit provides first tests. |
@kostis this is ready for a first review I think. |
@@ -1398,6 +1398,20 @@ sampleshrink_test_() -> | |||
?_shrinksTo([a], Gen)}, | |||
?_test(proper_gen:sampleshrink(Gen))]}]. | |||
|
|||
existing_atom_test() -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very unclear what these tests really test and what their underlying assumptions are.
For example, their correctness crucially depends on not loading any other Erlang modules between the two calls to erlang:system_info(atom_count)
. Is this robust to e.g. executing all the PropEr tests in parallel?
Also, it's unclear why the default_atom_test
is using the any()
instead of the atom()
generator. For example, the test trivially succeeds if any()
happens to return terms other than atoms...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, the tests are brittle, meant as a first attempt =(
Executing PropEr tests in parallel is likely to break them. In fact, anything else that may be going on in the system may break them.
Also, it's unclear why the
default_atom_test
is using theany()
instead of theatom()
generator. For example, the test trivially succeeds ifany()
happens to return terms other than atoms...
The default_atom_generator
option, by and large, decides what atom generator is used in the any
generator, either atom
or existing_atom
(there is probably a better name for this). Anyway, this is the reason why any
is used in the test. And yes, if any
happens to generate only non-atoms, this test will pass. I tried with ?SUCHTHAT
, but then the test fails if the generator fails to produce any atom.
I'm at a loss about what to do about this, or rather how to test it PropErly. Thankful for advice 🤷♀️
@@ -418,6 +420,53 @@ atom_gen(Size) -> | |||
atom_rev(Atom) -> | |||
{'$used', atom_to_list(Atom), Atom}. | |||
|
|||
%% @private |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two functions, and the PR in general, should describe the main idea of the existing atom generation algorithm and the assumptions it relies on. Also its limitations.
Having written that, if I understand what the existing_atom_gen()
does, it is in fact not an atom generator (in the sense that proper_gen:atom_gen/1
is), but it simply returns a random atom from those (currently) loaded in the atom table, which it stores in its own map, right?
Since there is a magic way (the <<131, 75, Index:24>>
trick) to access each of these atoms given an Index
, and the upper limit of this index is known ( it is given by erlang:system_info(atom_count)
), what is the point of first iterating through all these atoms putting them in an internal map, rather than randomly picking one Index
in the range and returning the corresponding atom? (Perhaps after a num_tries
if this process is unlucky to only end up in atoms starting with $
.) What I am missing?
Incidentally, I do not like the magic constants 131, 75 and 24. They should be made appropriate defines and ideally obtained by the Erlang/OTP system somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having written that, if I understand what the
existing_atom_gen()
does, it is in fact not an atom generator (in the sense thatproper_gen:atom_gen/1
is), but it simply returns a random atom from those (currently) loaded in the atom table, which it stores in its own map, right?
That is correct, yes. "Find all existing atoms, (store them in the map,) pick one at random". The term "generator" is probably not the exact term to use here (I guess this is you objection?), but what is the correct term to use?
What I am missing?
Nothing I think ;) That is a great suggestion, I'll do that. How about we use a fallback atom like ''
to return if all tries turn up atoms starting with $
? We need to return something, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For starters, address the three comments I've left in the code.
With the last commit, I have reverted the cosmetic changes and changed the generation algorithm as suggested here (the number of retries-if-starts-with- We are still undecided how to provide good tests here, so that is not part of the commit. |
Updated the PR description to clarify what this is all about. |
@Maria-12648430 Excuse my intrusion, but I have had to deal with a production system that exhausted atoms and it took me weeks to find where it was happening - so this is of interest to me. (it turns out we had a custom JSON parser that was doing UUID to atom...) How about an atom generator that will try not to exhaust atoms? Given a standard runtime without the
We can solve for Lets create an alphabet of chars which we will use to build atoms. Dropped some vowels to avoid some rude words but include
Solve the above equation to see the longest atoms we can make that will not exhaust the space:
Write a generator (excuse it being in Elixir, I couldn't find the equivalent LET in Erlang:
Sample the generator:
|
Hi @devstopfix, no worries, every opinion is appreciated 🙂
Hm, I don't see how a different atom generator, or the current one for that matter, could have helped in detecting this? 😅
It's not quite as simple as that, is it? The above holds only for atoms (or strings or whatever) of length of exactly
I see no reason to exclude anything here, this is just for tests, I don't think the runtime will take offense no matter how much profanity you throw at it 😁 (and, just saying, by not excluding
Given the polite (:grin:) alphabet of 23 letters (without If we were using the full alphabet in lower- and uppercase + the numbers (so, 62 characters), it would be only IMO, those are very severe limitations which make the proposed generator pretty useless (sorry, no offense 🥺)
(Put an IMO on most of what's in the following paragraphs 😉) Thing is, none of those is any more likely to cause (and thereby help detect) problems than one of the others, except The common use case for atoms is to be matched on, not to be compared. So it doesn't matter if you throw However, another use case where the atom name is indeed relevant is in sorting (like, sorting a list of stuff which may be atoms), but even for such use cases, the set of existing atoms should contain ample specimens. That all said, contrary to what the PRs title indicates, the problem I was having and which did lead up to it is actually not the |
@Maria-12648430 no worries I guess it just comes down to the fact that the OTP team will be testing "atoms" so they appreciate a broad spectrum of weird and wonderful atoms, whereas I normally just need an atom that is fairly short and readable and won't blow up the BEAM when running a long test :-) |
Heh, I don't think they do, picking from the set of existing atoms is just fine 😉 Normally it's the users who come up with weird stuff. I'm waiting for the day when instead of
I guess what I was trying to say is that, if you already make up atoms for testing at random, where do you draw the line between what is generally reasonable to randomly generate with and what isn't? |
This PR adds a new generator
existing_atom/0
which does not create random new atoms (likeatom/0
does) but only uses existing atoms. Also, a new option has been added,default_atom_generator
with optionsatom
orexisting_atom
, which will be used by theany/0
generator.The main idea behind those additions is rooted in the fact that the number of atoms in an Erlang node is limited. The default atom generator, which produces new atoms at random, may exhaust this limit. This happened when we wrote a proptest suite for the
lists
module of OTP, which is quite large; and while we were able to work around this, it is obvious that a noble goal like property-testing all of OTP is nigh impossible this way.The (existing) atom generator proposed in this PR does not generate new atoms but instead picks one of the existing ones at random.
As the
any
generator includes the (default) atom generator and is not easy to replace there, and is in turn used implicitly in many other generators likelist/0
, another aspect of this PR is thedefault_atom_generator
option. This can be used to control which atom generator will be used by theany
generator: the default one or the existing atom one.As PropEr is quite big and I don't understand all of it, I'm not sure if I made all the necessary changes/additions in all places, and would be thankful for some advise.