Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XXXX humanization #88

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Conversation

thomas-topway-it
Copy link
Collaborator

fix #80

@thomas-topway-it
Copy link
Collaborator Author

thomas-topway-it commented Jan 16, 2023

if ( $unspecifiedDigit->getYear() === 4

(in InternationalizedHumanizer class)
could be moved to UnspecifiedDigit class, for instance method yearIsFullyUnspecified


// *** if the expression $date->getDay() === null && $date->getMonth() === null
// is superfluous as it seems the function can be restored to its
// original declaration ( int $year, UnspecifiedDigit $unspecifiedDigit )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll, find out by removing the code. We have the tests, and you can add more tests if there are cases you are not sure about.

// is superfluous as it seems the function can be restored to its
// original declaration ( int $year, UnspecifiedDigit $unspecifiedDigit )

if ( $unspecifiedDigit->getYear() === 4 && $date->getDay() === null && $date->getMonth() === null ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this 4 about?

If all 3 of these conditions are needed, then extracting into isUnspecifiedYear(): bool would be nice. Remember: one level of abstraction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think that -5XXXX is possible, which might break your === 4 check.

@@ -33,6 +33,10 @@ public function humanizationProvider(): Generator {

yield 'Month only' => [ 'XXXX-12-XX', 'December' ];
yield 'Day only' => [ 'XXXX-XX-12', '12th' ];

// https://github.com/ProfessionalWiki/EDTF/issues/80
yield 'Some year' => [ 'XXXX', 'some year' ];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add
yield 'Some year' => [ 'X', 'some year' ];
yield 'Some year' => [ 'XXX', 'some year' ];

Curious as to what "-X" would yield.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And check what -5XXXX results into, unless already tested

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And check what -5XXXX results into, unless already tested

50000 BC

this is nice, but still doesn't account for the unspecified digits

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should perhaps be:

50000 BC (millennium, century and year in decem millennium unspecified)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not scale. Because what is -5XXXXXXX ? How about 50000 BC (year unspecified)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-5XXXXXXX would be "50 millions years BC" and -5XXXX " 5 decem millennium BC",
so (given that your solution is the only viable at the moment) in general for unspecified digits we could use always the literal expression, that is "5 millenniums BC" instead than "5000 BC (year unspecified)" (with the first notation it is implicit)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means implementing scales (https://en.wikipedia.org/wiki/Long_and_short_scales#Short_scale) e.g. the short scale to billion? As (according to Google) the universe is 13 billion years old and will likely exists for 22 billion years and this is about dates. :)

Copy link
Collaborator Author

@thomas-topway-it thomas-topway-it Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking further the specification here https://www.loc.gov/standards/datetime/
(Unspecified Digit) and the parser,
but I'm not sure that the parser is checking whether
the unspecified digit is placed incrementally from the right of the date component.
e.g.
the following should throw an error:

-XXXXXXX4 (4th year of any millennium BC)

yield 'Scales (8 digits)' => [ '5XXXXXXXX', '5 hundreds of millions' ];
yield 'Scales (9 digits)' => [ '5XXXXXXXXX', '5 billions' ];
yield 'Scales (10 digits)' => [ '5XXXXXXXXXX', '5 tens of billions' ];
yield 'Scales (11 digits)' => [ '5XXXXXXXXXXX', '5 hundreds of billions' ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"5 decem millennia"? What does decem mean here?

"5 hundreds of thousands" is just an amount; should it not also have "years" in there somewhere? Is this even a valid EDTF date to begin with? @mzeinstra

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sometimes the word "year" is missing, I have still to add it where necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"5 decem millennia"? What does decem mean here?

"Decem millennium" theoretically means "Ten thousand years" it could be also used the expression "tens of thousands"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tens of thousands is indeed better, but we can also leave that up to the translators.

Cosmology seems to me a valid border of amounts of years that we humanise. Maybe a fallback to just presenting the years in numbers?

The most years I can find in the sciences is the half-life of Xenon, which is 18 billion trillion years :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we should not forget the objective here, that is to make these edtf-string human readable. If 99% of the use cases can be captured than that is ok with me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously I also posted this link

https://asistdl.onlinelibrary.wiley.com/doi/epdf/10.1002/pra2.552

in the element.io SemanticMediawiki group because somebody asked about the support for "geological and historical eras".

Regarding the following

Cosmology seems to me a valid border of amounts of years that we humanise. Maybe a fallback to just presenting the years in numbers?

in my opinion if the user registers a date with unspecified digits and the library outputs a human readable format, and additionally computes correctly comparisons and intervals, that's really useful.

Still in my opinion the approach to present a year with mixed unspecified digits using numbers for the specified digits and literals for the unspecified (scaled) part, seems also formally rigorous. So using the example above -5XXXXXXX is not to be precise "50 millions years BC" but "5 tens of millions years BC" (in the output the number represents the specified digits, and the literal part the unspecified digits)

So what about, when this is more refined, to publish a demo (which could query the library using Ajax) and to propose this approach to the authors of the format, the Washington' Library of Congress, as far as I know ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik there is no standard for the humanisation of EDTF, humanised examples are used on the library of congres website but they are not the owner of the standard.

So your best efforts here will de facto help set the humanisation standard.

@mnhn-paul you might be able to provide us with a standard of humanising eras in EDTF. Eg. What would the date: -5XXXXXXX be humanised to?

Paul is from our museum of natural history and might have more insight into this.


return $ret + 1;
}
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -247,7 +244,55 @@ private function humanizeYearMonthDay( ?string $year, ?string $month, ?string $d
);
}

private function humanizeYear( int $year, UnspecifiedDigit $unspecifiedDigit ): string {
private function scaleToMessageKey( int $scale ): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for putting this in its own method!


// FIXME: reuse recursively the scale with trillions
// e.g. tens-of-trillions etc.,
return 'edtf-tens-of-trillions';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's first make sure we actually need this. I am dubious that these are valid EDTF dates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, the only sensible use-case is in cosmology

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's first make sure we actually need this. I am dubious that these are valid EDTF dates.

actually they are required only for testing purposes, e.g. by an user who wonder what is the output entering something like that 56XXXXXXXXXXXXXXXXXXX


if ( $date->isBC() ) {
$ret .= " " . $this->message( "edtf-date-BC" );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concatenating messages like this bakes in an assumption about the order of these messages, which might not hold in all languages. That is why for instance we have messages such as "edtf-qualified-date": "$1 ($2)".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I guessed something like that, so that should be taken into account and enhanced

$ret .= " " . $this->message( "edtf-date-BC" );
}

return $ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$humanizedYear

@JeroenDeDauw JeroenDeDauw changed the title Kma XXXX humanization Add XXXX humanization Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XXXX should not be humanised to 'Year 0'
3 participants