Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Arrow IPC 30x slower than Numpy's, and MemoryMappedFile is slower than OSFile #44121

Open
u3Izx9ql7vW4 opened this issue Sep 15, 2024 · 0 comments

Comments

@u3Izx9ql7vW4
Copy link

u3Izx9ql7vW4 commented Sep 15, 2024

Describe the bug, including details regarding any error messages, version, and platform.

I was looking around the internet for why Arrow's IPC is essentially the speed of going to disk, and I came upon a post on stackoverflow from 2022, and ran it on my machine. I was a bit amazed at the disparity -- below are the scripts extracted from the post.

The following script using Numpy below took ~0.02s on my machine.

import numpy as np
import time
import ctypes

from multiprocessing import sharedctypes

data = np.ones([1, 1, 544, 192], dtype=np.float32)

capacity = 1000 * 1 * 544 * 192 * 10

buffer = sharedctypes.RawArray(ctypes.c_uint8, capacity + 1)
ndarray = np.ndarray((capacity,), dtype=np.uint8, buffer=buffer)

cur_offset = 0

t = time.time()
for i in range(1000):
    data = np.frombuffer(data, dtype=np.uint8)
    data_size = data.shape[0]
    ndarray[cur_offset:data_size + cur_offset] = data
    cur_offset += data_size
e = time.time()

print(e - t)

The script below using PyArrow ran for 0.8s on my machine.

import numpy as np
import pyarrow as pa
import time
import os

data = np.ones((1, 1, 544, 992), dtype=np.float32)

tensor = pa.Tensor.from_numpy(data)

path = os.path.join(str("./"), 'pyarrow-tensor-ipc-roundtrip')
mmap = pa.create_memory_map(path, 5000000 * 1000)

s = time.time()
for i in range(1000):
    result = pa.ipc.write_tensor(tensor, mmap)
e = time.time()

print(e - s)

output_stream = pa.BufferOutputStream()

s = time.time()
for i in range(1000):
    result = pa.ipc.write_tensor(tensor, output_stream)
e = time.time()

print(e - s)

Surprisingly second one using BufferOutputStream is 2x slower than the first using mmap. I also tried replacing the path with /dev/shm/, which is actual memory map directory, which sped things up to 0.6s. It's as if create_memory_map isn't using memory mapping at all. In fact if you swap out

mmap = pa.create_memory_map(path, 5000000 * 1000)

with

mmap = pa.OSFile(path, 'wb')

you'll decrease the write time by half! What's causing this?

os                     Ubuntu 24
arrow                     1.3.0
pyarrow                   16.1.0

Component(s)

Python

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant