Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str(stats["min"].to_pydatetime()) error #7

Open
mike-vogel opened this issue May 18, 2017 · 2 comments
Open

str(stats["min"].to_pydatetime()) error #7

mike-vogel opened this issue May 18, 2017 · 2 comments

Comments

@mike-vogel
Copy link

I'm profiling parquet files and they are failing as shown below.

File "/usr/lib/python3.4/site-packages/spark_df_profiling/base.py", line 326, in describe_date_1d
stats["min"] = str(stats["min"].to_pydatetime())
AttributeError: 'datetime.datetime' object has no attribute 'to_pydatetime'

Line 323, copied below, only checks the 'max' value and its failing on the 'min' value.
if isinstance(stats["max"], pd.tslib.Timestamp):

Should there be a test of 'min' the same way 'max' is tested?

@mike-vogel
Copy link
Author

mike-vogel commented May 18, 2017

It is an issue with a date with a year which the pandas code considers too big, e.g., 2388. This is the code that is ultimately called by line 326. Is it appropriate to add some kind of logic to change dates that are too big to some special value or be skipped rather than exiting with an exception? Pandas code that checks the year copied below.

cdef inline _check_dts_bounds(pandas_datetimestruct *dts):
cdef:
bint error = False

if dts.year <= 1677 and cmp_pandas_datetimestruct(dts, &_NS_MIN_DTS) == -1:
    error = True
elif (
        dts.year >= 2262 and
        cmp_pandas_datetimestruct(dts, &_NS_MAX_DTS) == 1):
    error = True

if error:
    fmt = '%d-%.2d-%.2d %.2d:%.2d:%.2d' % (dts.year, dts.month,
                                           dts.day, dts.hour,
                                           dts.min, dts.sec)

    raise OutOfBoundsDatetime(
        'Out of bounds nanosecond timestamp: %s' % fmt)

@julioasotodv
Copy link
Owner

I see, great point!

Well, my guess is that the best way to deal with it is that your dates "make sense for the time being", literally.

But feel free to send a PR should you desire :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants