Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dateformat of timestamp field for the flights.json.gz test dataset #440

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Fju
Copy link
Contributor

@Fju Fju commented Feb 24, 2022

Problem:

I recently used the flights.json.gz dataset for testing Elasticsearch queries in another project. When indexing the data into Elasticsearch I noticed that the timestamp causes some parsing errors. The date format for this field, which is specified in tests/__init__.py, is strict_date_hour_minute_second. In the dataset the timestamp is sometimes set to something like this "2018-02-10", which leads to parsing errors.

I wrote this short bash "script" to search all timestamp fields in the dataset that don't contain the "T" separator. It's not that efficient but it proves the point.

gunzip --stdout flights.json.gz | while read -r line; do
    echo $line | jq '.timestamp' | grep -v "T"
done

The output is:

"2018-01-02"
"2018-01-03"
"2018-01-04"
"2018-01-05"
"2018-01-06"
"2018-01-07"
"2018-01-08"
"2018-01-09"
"2018-01-10"
"2018-01-11"
"2018-01-12"
"2018-01-12"
"2018-01-12"
"2018-01-13"
"2018-01-14"
"2018-01-15"
"2018-01-16"
"2018-01-17"
"2018-01-18"
"2018-01-19"
"2018-01-20"
"2018-01-21"
"2018-01-22"
"2018-01-23"
"2018-01-24"
"2018-01-25"
"2018-01-26"
"2018-01-27"
"2018-01-28"
"2018-01-29"
"2018-01-30"
"2018-01-31"
"2018-02-01"
"2018-02-02"
"2018-02-03"
"2018-02-04"
"2018-02-05"
"2018-02-06"
"2018-02-07"
"2018-02-08"
"2018-02-09"
"2018-02-09"
"2018-02-09"
"2018-02-10"
"2018-02-11"

BTW, these "invalid" timestamps only occur in the flights.json.gz dataset but not in flights_small.json.gz.

Solution:

In order to support these timestamps I changed the date format to strict_date_optional_time.

@elasticmachine
Copy link

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@sethmlarson
Copy link
Contributor

jenkins test this please

@pquentin
Copy link
Member

pquentin commented Nov 6, 2023

buildkite test this please

@pquentin
Copy link
Member

pquentin commented Nov 6, 2023

buildkite test this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants