You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I look at the report output, my first thought (particularly with numeric types) is, what if the data changes a bit?
It might be useful to give an idea of how much margin there would be for data change if the suggestion was followed? Perhaps a flag to add extra info to the output? For example:
wm_yr_wk (int64) currently taking 54,729,096 bytes, to save 41,046,726 bytes try wm_yr_wk.astype(int16)
int16 range: -32768 to +32767
data range: -1000 to + 29067
Obviously you could get that info yourself, but it might be nice to just be given it. You could give more info than just data range (percentiles or SDs) but this seems like an easy addition.
The text was updated successfully, but these errors were encountered:
Hey Antony, thanks for the interest. Agree that more info could be useful. I think also noting that some conversions might change the floating point results (but if under a threshold then maybe that's cool). Good food for thought :-)
I deal a lot with data from Excel at the moment and a lot of it has noise in the 15th decimal place (or so). In this case a reduction of accuracy would be fine. Arguably a user could round the data before passing it to dtype_diet if they don't need the precision, but perhaps the library can help to find the optimum rounding level. It might be a nice feature anyway. Presumably you would need some input from the user on whether they are prepared to lose accuracy.
In fact I imagine you could plot a curve of rounding error vs storage size, although only a few points on the storage size axis would be valid data types. That is probably overkill - I'm just doing some blue skies thinking :-)
When I look at the report output, my first thought (particularly with numeric types) is, what if the data changes a bit?
It might be useful to give an idea of how much margin there would be for data change if the suggestion was followed? Perhaps a flag to add extra info to the output? For example:
wm_yr_wk (int64) currently taking 54,729,096 bytes, to save 41,046,726 bytes try
wm_yr_wk.astype(int16)
int16 range: -32768 to +32767
data range: -1000 to + 29067
Obviously you could get that info yourself, but it might be nice to just be given it. You could give more info than just data range (percentiles or SDs) but this seems like an easy addition.
The text was updated successfully, but these errors were encountered: