Significant slowdown on large files #10

xelatihy · 2019-07-30T23:59:40Z

The current version of happly is quite slow for large files compared to an home-brewed solution I cooked up. The profiler suggest that the problem is allocating many small vectors in list properties. On the Lucy model from the Stanford repo, happly takes about 16 seconds of which 7 are just vector allocs. My home-brewed solution takes half that time.

I propose the following changes:

change the storage of list props from vector<vector<T>> data to three vectors for start, count and data std::vector<size_t> start; std::vector<uint8_t> count; vector<T> data;, where data has the concatenated list of elements, start has the starting index for each list and count contains the lists sizes
in a backward compatible manner, add getListProperty(vector<array<T, N>>& data, vector<uint8_t>& count) to read the data in preallocated lists; maintain previous versions for backward compatibility

If this sounds good, I may even take a crack at it, but only if this feels right.

The text was updated successfully, but these errors were encountered:

nmwsharp · 2019-08-12T01:38:21Z

This sounds like a good idea to me! There's a little extra complexity, but it's hidden from the user. And it makes total sense that it would speed things up.

nmwsharp · 2019-09-11T17:47:16Z

FYI, started working on this in https://github.com/nmwsharp/happly/tree/compressed_list

The core data structure for storing list properties is changed to a flat list in 82a5dd2. Seems to make just parsing in binary data ~40% faster.

Still need to expose direct access to the flat list via the API.

xelatihy · 2019-10-03T07:00:28Z

I took a look at that and it seems fine. Giving access to list properties directly will also help too (otherwise, we still get the constructor called many times). I took a crack at a similar thing in our library including adding helpers for loading points, lines, triangles, quads. For us, it works really well and scales well. If you care, I can e up a summary here of what one might want to do.

nmwsharp · 2019-10-06T11:59:06Z

Great! Agreed about direct access to the underlying buffers, can implement an API for that soon. Any tips you can share about helpers are certainly appreciated.

I spent a while looking at performance in happly a few weeks ago. Although some of the worst offenders (like nested std::vector allocs) can be cleaned up, ultimately the "simple" template-based approach happly uses holds it back from being ultra-fast. We just spend too much time in virtual function calls and streaming operators. Perhaps one day I'll find the time to rip out the insides and replace it with something faster, but for now happly will have to stay "medium speed" at best.

By the way, @mhalber has an awesome benchmark of ply reader/writers here: https://github.com/mhalber/ply_io_benchmark, if anyone is looking for something faster. At least happly does very well in terms of lines of code :)

nmwsharp added the enhancement New feature or request label Aug 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant slowdown on large files #10

Significant slowdown on large files #10

xelatihy commented Jul 30, 2019

nmwsharp commented Aug 12, 2019

nmwsharp commented Sep 11, 2019

xelatihy commented Oct 3, 2019

nmwsharp commented Oct 6, 2019

Significant slowdown on large files #10

Significant slowdown on large files #10

Comments

xelatihy commented Jul 30, 2019

nmwsharp commented Aug 12, 2019

nmwsharp commented Sep 11, 2019

xelatihy commented Oct 3, 2019

nmwsharp commented Oct 6, 2019