Precision of awkward array #3218

roy-brener-cern · 2024-08-14T10:10:42Z

roy-brener-cern
Aug 14, 2024

Version of Awkward Array

awkward1

Description and code to reproduce

Dear experts,
I am using awkward array for HEP analysis. Recently, I've found out that the precision used in the array is float32 although I need float64 for all intents and purposes:

>>> aka
<Array [85.5, 69.1] type='2 * float32'>

However, upon extracting the values from the array, the type is float64:

>>> aka[1]
69.09111785888672
>>> np.array(aka[1]).itemsize
8

i.e. 8*8=64bits, which is what I want.
How can I keep using awkward arrays but store float64 in them?

Many thanks for your help.

Cheers,

Roy

Answered by jpivarski

Aug 14, 2024

You can call ak.values_astype on your input data as the first step. If the input to the vector calculation is np.float64, then all of the intermediate steps will be, too. If you want to make some fields np.float64while keeping other fields as np.float32 or integers type, then you can construct a full type expression and use ak.enforce_type. The string (DataShape) representation of types is generally easier to work with, and str and ak.types.from_datashape will help you convert between strings and type objects.

(This isn't a version issue—Awkward Arrays have always supported all numerical types. Some of these functions are new or have changed names since Awkward 1—I've written all of the A…

View full answer

ianna · 2024-08-14T10:39:32Z

ianna
Aug 14, 2024
Maintainer

Hi @roy-brener-cern ,

Awkward arrays are immutable. I think if you want to convert a float32 array to a float64 array you could multiply it by np.float64(1.0)

>>> narr = np.array([85.5, 69.1], dtype = np.float32)
>>> array = ak.from_numpy(narr)
>>> array
<Array [85.5, 69.1] type='2 * float32'>
>>> array_64 = array*np.float64(1.0)
>>> array_64
<Array [85.5, 69.1] type='2 * float64'>

0 replies

roy-brener-cern · 2024-08-14T10:44:14Z

roy-brener-cern
Aug 14, 2024
Author

Hi @ianna,
Many thanks for your reply.
Weirdly the last step doesn't change it:

>>> array
<Array [85.5, 69.1] type='2 * float32'>
>>> array_64 = array*np.float64(1.0)
>>> array_64
<Array [85.5, 69.1] type='2 * float32'>

What do you think?

0 replies

ianna · 2024-08-14T10:46:51Z

ianna
Aug 14, 2024
Maintainer

is it specific to awkward version 1? Have you tried using awkward v2?

0 replies

roy-brener-cern · 2024-08-14T11:02:15Z

roy-brener-cern
Aug 14, 2024
Author

Hi, yes indeed, it was a version issue. Seems to be giving the full float64 w/ version 2.
I'll try to implement in the analysis.
Thanks!
Roy

0 replies

ianna · 2024-08-14T11:11:24Z

ianna
Aug 14, 2024
Maintainer

Please, let me know if I can close the issue. Thanks!

0 replies

roy-brener-cern · 2024-08-14T11:12:48Z

roy-brener-cern
Aug 14, 2024
Author

Hi @ianna,
Sure, please give a bit of time to test it and I'll LYK.
Cheers,
Roy

0 replies

roy-brener-cern · 2024-08-14T11:44:06Z

roy-brener-cern
Aug 14, 2024
Author

Hi @ianna,
Perhaps you could recommend how to implement this on something nested in the following form:

>>> leadlep.type
ArrayType(RecordType([NumpyType('float32'), NumpyType('float32'), NumpyType('float32'), NumpyType('float32'), NumpyType('int32'), NumpyType('int32'), NumpyType('int32')], ['pt', 'eta', 'phi', 'charge', 'passTTVA', 'passIso', 'isBad']), 2, None)

naturally only on the floating variables ('pt', 'eta', 'phi')?
Cheers,
Roy

0 replies

ianna · 2024-08-14T12:05:48Z

ianna
Aug 14, 2024
Maintainer

Hi @roy-brener-cern ,
You can access the values as leadlep.pt, leadlep.eta, and leadlep.phi.

However, please, check that the required calculations in v2 do not give you a float64 precision. For example, an ak.mean reducer would return a float64:

>>> array
<Array [85.5, 69.1] type='2 * float32'>
>>> type(ak.mean(array, axis=-1))
<class 'numpy.float64'>

0 replies

roy-brener-cern · 2024-08-14T12:11:44Z

roy-brener-cern
Aug 14, 2024
Author

Hi @ianna,
Thanks.
Yes, that's true, I can access the variables this way and extract the float64-precision values.
But, I need to perform operations on the vectors, e.g. add them and get the invariant mass of the resulting vector. All this, whilst keeping the float64 precision internally.
How can this be achieved?
Cheers,
Roy

0 replies

ianna · 2024-08-14T12:21:28Z

ianna
Aug 14, 2024
Maintainer

Hi @roy-brener-cern ,

Perhaps, this answer in the discussions is what you are looking for #1342?

0 replies

jpivarski · 2024-08-14T12:24:59Z

jpivarski
Aug 14, 2024
Maintainer

You can call ak.values_astype on your input data as the first step. If the input to the vector calculation is np.float64, then all of the intermediate steps will be, too. If you want to make some fields np.float64while keeping other fields as np.float32 or integers type, then you can construct a full type expression and use ak.enforce_type. The string (DataShape) representation of types is generally easier to work with, and str and ak.types.from_datashape will help you convert between strings and type objects.

(This isn't a version issue—Awkward Arrays have always supported all numerical types. Some of these functions are new or have changed names since Awkward 1—I've written all of the Awkward 2 names above.)

0 replies

roy-brener-cern · 2024-08-14T13:45:26Z

roy-brener-cern
Aug 14, 2024
Author

Dear @ianna and @jpivarski,
Many thanks for your replies.
Directly to @jpivarski's last comment: yes indeed, I've found ak.enforce_type to be probably what I want. But I'm still struggling to understand something. Looking at the ntuple with simple root terminal, I get:

root [15] nominal->Scan("mu_eta","event==69584518","colsize=40")
*************************************************************
*    Row   * Instance *                              mu_eta *
*************************************************************
*  9417662 *        0 *                -1.38058340549468994 *

i.e. the desired, original value is -1.38058340549468994.

After applying enforce_type I'm getting something similar but not quite the same:

>>> ak.enforce_type(the_event_extra.muons_signal.eta[1],'float64')[0]
-1.38058340549469

Why aren't they equal? Shouldn't they be..?

Cheers,

Roy

0 replies

ianna · 2024-08-14T14:00:52Z

ianna
Aug 14, 2024
Maintainer

No, floating-point numbers cannot be compared the same way as integers.

0 replies

roy-brener-cern · 2024-08-14T14:27:55Z

roy-brener-cern
Aug 14, 2024
Author

Hi @ianna,
Sorry, I don't understand. Why wouldn't the number be exactly equal to the original one (in the ntuple), thereby containing all available digits..?
Cheers,
Roy

0 replies

ianna · 2024-08-14T14:53:34Z

ianna
Aug 14, 2024
Maintainer

Hi @roy-brener-cern ,

It's the way the IEEE 754 standard defines them. Please, check https://indico.cern.ch/event/1287965/contributions/5411743/attachments/2687210/4662439/Floating-point%20Arithmetic%20is%20not%20Real.pdf for a longish explanation.

0 replies

roy-brener-cern · 2024-08-14T18:40:50Z

roy-brener-cern
Aug 14, 2024
Author

Hi @ianna,
Many thanks! Very nice slides if I might add..
I see.
So is it fair to say that what I would get from encfore_type w/ float64 is a precise representation of the data?
Cheers,
Roy

0 replies

jpivarski · 2024-08-14T19:15:18Z

jpivarski
Aug 14, 2024
Maintainer

The key thing is that it's the file itself that has 32-bit floats. When you look at TTree::Scan in ROOT, it will show you the closest decimal expansion to that 32-bit number. If you look at the number in Python before conversion to 64-bit, it will also show you the closest decimal expansion to the number, though it might do so with a different number of decimals.

Side-note: Python's float.__repr__ shows enough decimals to see all of the binary precision; print-outs in C++ vary, and C's default %g shows very few:

root [0] TMath::Pi()
(double) 3.1415927
root [1] std::cout << TMath::Pi() << std::endl;
3.14159
root [2] printf("%g\n", TMath::Pi());
3.14159

versus

>>> np.pi
3.141592653589793
>>> print(np.pi)
3.141592653589793
>>> print("%g" % np.pi)   # explicitly choose to use C's %g
3.14159

If you convert the 32-bit number from your file into a 64-bit number and then print that, it will be the closest decimal expansion to the closest 64-bit representation of the 32-bit representation. If you needed the original data to have 64-bit precision, that was lost when the file was made, but up-converting it from 32-bit to 64-bit and then doing all computations in 64-bit is the best you can make of the situation. (I doubt that you really needed the original data to be 64-bit, given experimental uncertainties in $p_T$, $\eta$, and $\phi$, but I can believe that you do need subsequent calculations to be performed in 64-bit, to prevent errors from growing too large during the calculation.)

Despite the number of digits in "-1.38058340549468994", a 32-bit floating point number of this scale has only 6 digits of precision after the decimal point:

>>> original = np.float32(-1.38058340549468994)
>>> original
-1.3805834
>>> np.nextafter(original, np.float32(np.inf))   # the next float32 in the direction of ∞
-1.3805833
>>> np.nextafter(original, np.float32(-np.inf))  # the next float32 in the direction of -∞
-1.3805835

The original value is only known with 0.0000001 precision in either direction. So when we cast that float32 as a float64,

>>> as64bit = np.float64(original)
>>> as64bit
-1.38058340549469

the base-2 zeros that were padded to the significand are, in base-10, "0549469". The "0549468994" that ROOT is showing presumably come from up-casting the same 32-bit number to 64-bit and then showing more digits than even a 64-bit number can contain.

>>> np.nextafter(as64bit, np.inf)
-1.3805834054946897
>>> np.nextafter(as64bit, -np.inf)
-1.3805834054946902

(The 64-bit precision is in the last digit shown.)

The TTree::Scan that ROOT shows is showing more digits than are in the precision of the number, even as a 64-bit number. We can make Python do that:

>>> print("%.17f" % as64bit)
-1.38058340549468994

just as we can have it make up arbitrarily many digits past the actual precision of the number:

>>> print("%.25f" % as64bit)
-1.3805834054946899414062500

But Python's default float.__repr__ (what it shows you on the terminal when you just enter the number) is the least misleading.

0 replies

roy-brener-cern · 2024-08-15T13:36:41Z

roy-brener-cern
Aug 15, 2024
Author

Dear @ianna and @jpivarski,
Many many thanks again for all your explanations. This is highly illuminating.
I deeply appreciate your help.
Kind regards,
Roy

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precision of awkward array #3218

{{title}}

Replies: 18 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Precision of awkward array #3218

roy-brener-cern Aug 14, 2024

Version of Awkward Array

Description and code to reproduce

Replies: 18 comments

ianna Aug 14, 2024 Maintainer

roy-brener-cern Aug 14, 2024 Author

ianna Aug 14, 2024 Maintainer

roy-brener-cern Aug 14, 2024 Author

ianna Aug 14, 2024 Maintainer

roy-brener-cern Aug 14, 2024 Author

roy-brener-cern Aug 14, 2024 Author

ianna Aug 14, 2024 Maintainer

roy-brener-cern Aug 14, 2024 Author

ianna Aug 14, 2024 Maintainer

jpivarski Aug 14, 2024 Maintainer

roy-brener-cern Aug 14, 2024 Author

ianna Aug 14, 2024 Maintainer

roy-brener-cern Aug 14, 2024 Author

ianna Aug 14, 2024 Maintainer

roy-brener-cern Aug 14, 2024 Author

jpivarski Aug 14, 2024 Maintainer

roy-brener-cern Aug 15, 2024 Author

roy-brener-cern
Aug 14, 2024

ianna
Aug 14, 2024
Maintainer

roy-brener-cern
Aug 14, 2024
Author

ianna
Aug 14, 2024
Maintainer

roy-brener-cern
Aug 14, 2024
Author

ianna
Aug 14, 2024
Maintainer

roy-brener-cern
Aug 14, 2024
Author

roy-brener-cern
Aug 14, 2024
Author

ianna
Aug 14, 2024
Maintainer

roy-brener-cern
Aug 14, 2024
Author

ianna
Aug 14, 2024
Maintainer

jpivarski
Aug 14, 2024
Maintainer

roy-brener-cern
Aug 14, 2024
Author

ianna
Aug 14, 2024
Maintainer

roy-brener-cern
Aug 14, 2024
Author

ianna
Aug 14, 2024
Maintainer

roy-brener-cern
Aug 14, 2024
Author

jpivarski
Aug 14, 2024
Maintainer

roy-brener-cern
Aug 15, 2024
Author