Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepare_iter_for_array PR #43

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

chaburkland
Copy link
Collaborator

Closes #9

@chaburkland chaburkland self-assigned this Apr 26, 2021
@chaburkland chaburkland marked this pull request as ready for review April 28, 2021 22:29
@chaburkland
Copy link
Collaborator Author

Current performance

cls                  func         ak           ref          ref/ak
IsGenCopyValues      main         0.2276659    0.39814382   1.74880746
PrepareIterForArray  iter_small   1.07673633   1.28456536   1.19301757
PrepareIterForArray  iter_large   1.46419539   2.07376239   1.41631534 

@flexatone
Copy link
Contributor

I have optimized the Python implementation of this function, which reduces and clarifies the requirements of this function. The new implementation is as follows:

def prepare_iter_for_array(
        values: tp.Iterable[tp.Any],
        restrict_copy: bool = False
        ) -> tp.Tuple[DtypeSpecifier, bool, tp.Sequence[tp.Any]]:
    is_gen, copy_values = is_gen_copy_values(values)

    if not is_gen and len(values) == 0: #type: ignore
        return None, False, values #type: ignore

    if restrict_copy:
        copy_values = False

    v_iter = values if is_gen else iter(values)

    if copy_values:
        values_post = []

    resolved = None # None is valid specifier if the type is not ambiguous

    has_tuple = False
    has_str = False
    has_non_str = False
    has_inexact = False
    has_big_int = False

    for v in v_iter:
        if copy_values:
            # if a generator, have to make a copy while iterating
            values_post.append(v)

        value_type = type(v)

        if (value_type is str
                or value_type is np.str_
                or value_type is bytes
                or value_type is np.bytes_):
            # must compare to both string types
            has_str = True
        elif hasattr(v, '__len__'):
            # identify SF types by if they have STATIC attr they also must be assigned after array creation, so we treat them like tuples
            has_tuple = True
            resolved = object
            break
        elif isinstance(v, Enum):
            # must check isinstance, as Enum types are always derived from Enum
            resolved = object
            break
        else:
            has_non_str = True
            if value_type in INEXACT_TYPES:
                has_inexact = True
            elif value_type is int and abs(v) > INT_MAX_COERCIBLE_TO_FLOAT:
                has_big_int = True

        if (has_str and has_non_str) or (has_big_int and has_inexact):
            resolved = object
            break

    if copy_values:
        # v_iter is an iter, we need to finish it
        values_post.extend(v_iter)
        return resolved, has_tuple, values_post
    return resolved, has_tuple, values #type: ignore

@chaburkland chaburkland requested a review from brandtbucher May 28, 2021 22:25
@brandtbucher brandtbucher removed their request for review November 17, 2022 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement prepare_iter_for_array
2 participants