Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance with cf-units #2

Open
pelson opened this issue Oct 23, 2024 · 2 comments
Open

Performance with cf-units #2

pelson opened this issue Oct 23, 2024 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@pelson
Copy link
Owner

pelson commented Oct 23, 2024

A quick glance suggests that the generated converter from pyudunits2 (which uses sympy lambdify) is significantly quicker than the one that is generated by udunits2 (used within cf-units). However, this performance doesn't shine until you have a lot of data to convert (e.g. 5000*2500*12 data points). Before that, the cost of reading the XML, parsing, etc. is much higher in pyudunits2. It would be good to micro-benchmark this so that we can focus on speeding things up at pinch points.

I had the following scripted hacked together to roughly compare:

import numpy as np
import timeit


def prepare_cf_units():
    import cf_units

    u_from = cf_units.Unit('degC')
    u_to = cf_units.Unit('K')

    def convert(data):
        return u_from.convert(data, u_to)

    return convert


def prepare_pyudunits():
    from pyudunits2._udunits2_xml_parser import read_all

    from pyudunits2._unit import Converter

    unit_system = read_all()

    def convert_w_pyudunits(data):
        u_from = unit_system.unit('degC')
        u_to = unit_system.unit('K')
        converter = Converter(u_from, u_to)
        return converter.convert(data)

    return convert_w_pyudunits


def prepare_data():
    data = np.arange(50 * 25 * 20)
    return data


if __name__ == "__main__":

    print('cf-units:', timeit.repeat(
        "convert_w_cf_units(data)",
        "from __main__ import prepare_cf_units, prepare_data; convert_w_cf_units = prepare_cf_units(); data = prepare_data()",
        repeat=3, number=2,
    ))

    print('pyudunits2:', timeit.repeat(
        "convert_w_pyudunits(data)",
        "from __main__ import prepare_pyudunits, prepare_data; convert_w_pyudunits = prepare_pyudunits(); data = prepare_data()",
        repeat=3, number=2,
    ))

With results along the lines of:

cf-units: [0.0002636730205267668, 0.0002245799987576902, 0.00021556997671723366]
pyudunits2: [0.34791191801195964, 0.04072356101823971, 0.04109554295428097]
@pelson pelson added the help wanted Extra attention is needed label Oct 23, 2024
@pelson
Copy link
Owner Author

pelson commented Nov 13, 2024

@ocefpaf highlighted some performance issues in ioos/compliance-checker#1118 (comment). I don't really think the magnitude there is representative of the performance of pyudunuts2 (given the entire test suite of pyudunits2 runs in <2s), but it would be good to both get the aforementioned metrics, and to track down where the performance penalties are coming from in the checker (and whether these are a result of some of the workarounds due to missing pyudunits2 features)

@ocefpaf
Copy link

ocefpaf commented Nov 13, 2024

I'm pretty sure we do things in the least optimized way possible in compliance-checker, so please take those number with a grain of salt. Yet, when using cf-units, the tests run super fast. I'll try to debug this further to figure out where the hiccups are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants