Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XXXX humanization #88

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions i18n/en.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,30 @@
"edtf-month": "month",
"edtf-year": "year",

"edtf-year-unspecified": "some year",

"edtf-spring": "Spring",
"edtf-summer": "Summer",
"edtf-autumn": "Autumn",
"edtf-winter": "Winter",

"edtf-date-unspecified": "some",
"edtf-date-BC": "BC",

"edtf-decade": "{{PLURAL:$1|decade|decades}}",
"edtf-century": "{{PLURAL:$1|century|centuries}}",
"edtf-millennium": "{{PLURAL:$1|millennium|millennia}}",
"edtf-decem-millennium": "decem {{PLURAL:$1|millennium|millennia}}",
"edtf-hundreds-of-thousands": "{{PLURAL:$1|hundred|hundreds}} of thousands",
"edtf-million": "{{PLURAL:$1|million|millions}}",
"edtf-tens-of-millions": "{{PLURAL:$1|ten|tens}} of millions",
"edtf-hundreds-of-millions": "{{PLURAL:$1|hundred|hundreds}} of millions",
"edtf-billion": "{{PLURAL:$1|billion|billions}}",
"edtf-tens-of-billions": "{{PLURAL:$1|ten|tens}} of billions",
"edtf-hundreds-of-billions": "{{PLURAL:$1|hundred|hundreds}} of billions",
"edtf-thousands-of-billions": "{{PLURAL:$1|thousand|thousands}} of billions",
"edtf-trillion": "{{PLURAL:$1|trillion|trillions}}",

"edtf-spring-north": "Spring (Northern Hemisphere)",
"edtf-summer-north": "Summer (Northern Hemisphere)",
"edtf-autumn-north": "Autumn (Northern Hemisphere)",
Expand Down
21 changes: 21 additions & 0 deletions src/Model/ExtDate.php
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,27 @@ private function resolveMaxDay( $year, $month ): int {
return null === $this->day ? $lastDayOfMonth : $this->day;
}

public function getUnspecifiedYearScale() : int {
if ( $this->unspecifiedDigit->unspecified( 'year' ) ) {
$ret = $this->unspecifiedDigit->getYear();

if ( $this->getYear() === 0 ) {
return $ret;
}

return $ret + 1;
}
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

public function getSpecifiedYears() : int {
return $this->getYear() / ( pow( 10, $this->unspecifiedDigit->getYear() ) );
}

public function isBC(): bool {
return $this->unspecifiedDigit->isBC();
}

/**
* This function is applicable for 2-digits placeholders (month, day).
* Means that decimal: 0 < n < 10
Expand Down
6 changes: 5 additions & 1 deletion src/Model/UnspecifiedDigit.php
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,10 @@ public function getDay(): int {
return $this->day;
}

public function isBC(): bool {
return ( (int)str_replace( "X", "1", $this->rawYear ) < 0 );
}

public function century(): bool {
if ( $this->year == 2 && substr( $this->rawYear, -2 ) == "XX" ) {
return true;
Expand All @@ -91,4 +95,4 @@ public function decade(): bool {

return false;
}
}
}
55 changes: 50 additions & 5 deletions src/PackagePrivate/Humanizer/InternationalizedHumanizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -215,10 +215,7 @@ private function humanizeDateWithoutUncertainty( ExtDate $date ): string {
$day = $date->getDay();

if ( $year !== null ) {
$year = $this->humanizeYear(
$year,
$date->getUnspecifiedDigit()
);
$year = $this->humanizeYear( $year, $date );
}

if ( $month !== null ) {
Expand Down Expand Up @@ -247,7 +244,55 @@ private function humanizeYearMonthDay( ?string $year, ?string $month, ?string $d
);
}

private function humanizeYear( int $year, UnspecifiedDigit $unspecifiedDigit ): string {
private function scaleToMessageKey( int $scale ): string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for putting this in its own method!

switch( $scale ) {
case 1 : return 'edtf-year'; // X
case 2 : return 'edtf-decade'; // XX
case 3 : return 'edtf-century'; // XXX
case 4 : return 'edtf-millennium'; // XXXX
case 5 : return 'edtf-decem-millennium'; // XXXXX
case 6 : return 'edtf-hundreds-of-thousands'; // XXXXXX
case 7 : return 'edtf-million'; // XXXXXXX
case 8 : return 'edtf-tens-of-millions'; // XXXXXXXX
case 9 : return 'edtf-hundreds-of-millions'; // XXXXXXXXX
case 10 : return 'edtf-billion'; // XXXXXXXXXX
case 11 : return 'edtf-tens-of-billions'; // XXXXXXXXXXX
case 12 : return 'edtf-hundreds-of-billions'; // XXXXXXXXXXXX
case 13 : return 'edtf-trillion'; // XXXXXXXXXXXXX
}

// FIXME: reuse recursively the scale with trillions
// e.g. tens-of-trillions etc.,
return 'edtf-tens-of-trillions';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's first make sure we actually need this. I am dubious that these are valid EDTF dates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, the only sensible use-case is in cosmology

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's first make sure we actually need this. I am dubious that these are valid EDTF dates.

actually they are required only for testing purposes, e.g. by an user who wonder what is the output entering something like that 56XXXXXXXXXXXXXXXXXXX

}

private function humanizeYear( int $year, ExtDate $date ): string {
$unspecifiedYearScale = $date->getUnspecifiedYearScale();
$unspecifiedDigit = $date->getUnspecifiedDigit();
$specifiedYears = $date->getSpecifiedYears();

if ( $unspecifiedYearScale === 0 ||
( $this->needsYearEndingChar( $unspecifiedDigit ) && $specifiedYears !== 0 ) ) {
return $this->humanizeYearSpecified( $year, $unspecifiedDigit );
}

$specifiedYearsStr = (string)abs( $specifiedYears );

$ret = ( $specifiedYears === 0 && $unspecifiedYearScale != 0 ? $this->message( "edtf-date-unspecified" )
: $specifiedYearsStr );

if ( $unspecifiedYearScale > 0 ) {
$ret .= " " . $this->message( $this->scaleToMessageKey( $unspecifiedYearScale ), $specifiedYearsStr );
}

if ( $date->isBC() ) {
$ret .= " " . $this->message( "edtf-date-BC" );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concatenating messages like this bakes in an assumption about the order of these messages, which might not hold in all languages. That is why for instance we have messages such as "edtf-qualified-date": "$1 ($2)".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I guessed something like that, so that should be taken into account and enhanced


return $ret;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$humanizedYear

}

private function humanizeYearSpecified( int $year, UnspecifiedDigit $unspecifiedDigit ): string {
$yearStr = (string)abs( $year );

if ( $year <= -1000 ) {
Expand Down
4 changes: 3 additions & 1 deletion src/PackagePrivate/Parser/RegexMatchesMapper.php
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ private function mapDate( array $rawDateMatches ): Date {
);
}

// FIXME: ensure that -XXXXXXX4 or -XXXXX4XX throws an error
// see https://github.com/ProfessionalWiki/EDTF/pull/88
private function prepareNumValue( string $str ): ?int {
$value = (int)str_replace( 'X', '0', $str );
return $value !== 0 ? $value : null;
Expand Down Expand Up @@ -89,4 +91,4 @@ private function regroupMatches( array $matches ): array {

return $regrouped;
}
}
}
20 changes: 20 additions & 0 deletions tests/Functional/EnglishHumanizationTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,26 @@ public function humanizationProvider(): Generator {

yield 'Month only' => [ 'XXXX-12-XX', 'December' ];
yield 'Day only' => [ 'XXXX-XX-12', '12th' ];

// https://github.com/ProfessionalWiki/EDTF/issues/80
yield 'Some year (4 digits)' => [ 'XXXX', 'some millennium' ];

yield 'Some year (1 digit)' => [ 'X', 'some year' ];
yield 'Some year (2 digit)' => [ 'XX', 'some decade' ];
yield 'Some year (3 digits)' => [ 'XXX', 'some century' ];
yield 'Scales (4 digits minus)' => [ '-5XXXX', '5 decem millennia BC' ];
yield 'Scales (5 digits)' => [ '5XXXXX', '5 hundreds of thousands' ];
yield 'Scales (6 digits)' => [ '5XXXXXX', '5 millions' ];
yield 'Scales (7 digits)' => [ '5XXXXXXX', '5 tens of millions' ];
yield 'Scales (8 digits)' => [ '5XXXXXXXX', '5 hundreds of millions' ];
yield 'Scales (9 digits)' => [ '5XXXXXXXXX', '5 billions' ];
yield 'Scales (10 digits)' => [ '5XXXXXXXXXX', '5 tens of billions' ];
yield 'Scales (11 digits)' => [ '5XXXXXXXXXXX', '5 hundreds of billions' ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"5 decem millennia"? What does decem mean here?

"5 hundreds of thousands" is just an amount; should it not also have "years" in there somewhere? Is this even a valid EDTF date to begin with? @mzeinstra

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sometimes the word "year" is missing, I have still to add it where necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"5 decem millennia"? What does decem mean here?

"Decem millennium" theoretically means "Ten thousand years" it could be also used the expression "tens of thousands"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tens of thousands is indeed better, but we can also leave that up to the translators.

Cosmology seems to me a valid border of amounts of years that we humanise. Maybe a fallback to just presenting the years in numbers?

The most years I can find in the sciences is the half-life of Xenon, which is 18 billion trillion years :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we should not forget the objective here, that is to make these edtf-string human readable. If 99% of the use cases can be captured than that is ok with me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously I also posted this link

https://asistdl.onlinelibrary.wiley.com/doi/epdf/10.1002/pra2.552

in the element.io SemanticMediawiki group because somebody asked about the support for "geological and historical eras".

Regarding the following

Cosmology seems to me a valid border of amounts of years that we humanise. Maybe a fallback to just presenting the years in numbers?

in my opinion if the user registers a date with unspecified digits and the library outputs a human readable format, and additionally computes correctly comparisons and intervals, that's really useful.

Still in my opinion the approach to present a year with mixed unspecified digits using numbers for the specified digits and literals for the unspecified (scaled) part, seems also formally rigorous. So using the example above -5XXXXXXX is not to be precise "50 millions years BC" but "5 tens of millions years BC" (in the output the number represents the specified digits, and the literal part the unspecified digits)

So what about, when this is more refined, to publish a demo (which could query the library using Ajax) and to propose this approach to the authors of the format, the Washington' Library of Congress, as far as I know ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik there is no standard for the humanisation of EDTF, humanised examples are used on the library of congres website but they are not the owner of the standard.

So your best efforts here will de facto help set the humanisation standard.

@mnhn-paul you might be able to provide us with a standard of humanising eras in EDTF. Eg. What would the date: -5XXXXXXX be humanised to?

Paul is from our museum of natural history and might have more insight into this.


// TODO throw error
// yield 'Scales (throw error 1)' => [ 'XXXXXXXXXX4', '' ];
// yield 'Scales (throw error 1)' => [ 'XXXXXX4XXXX', '' ];

yield 'Month and day' => [ 'XXXX-12-11', 'December 11th' ];
yield 'Year and day' => [ '2020-XX-11', '11th of unknown month, 2020' ];
yield 'Unspecified year decade' => [ '197X', '1970s' ];
Expand Down