PERF-#6696: Use cached dtypes in fillna when possible. #6697

AndreyPavlenko · 2023-11-02T19:25:46Z

What do these changes do?

first commit message and PR title follow format outlined here

NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.
passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
signed commit with git commit -s
Resolves PERF: Use cached dtypes in fillna when possible. #6696
tests added and passing
module layout described at docs/development/architecture.rst is up-to-date

modin/core/dataframe/pandas/dataframe/dataframe.py

dchigarev

overall looks good, besides the changes related to laziness, let's separate them and discuss in a separate PR

modin/core/storage_formats/pandas/query_compiler.py

Signed-off-by: Andrey Pavlenko <[email protected]>

dchigarev · 2023-11-13T14:57:39Z

modin/core/storage_formats/pandas/query_compiler.py

+                dtype = pandas.Series(value).dtype
+                if all(
+                    find_common_type([t, dtype]) == t for t in self._modin_frame.dtypes
+                ):
+                    new_dtypes = self._modin_frame.dtypes


why can't we simply fill new dtypes with the find_common_type results?

Suggested change

dtype = pandas.Series(value).dtype

if all(

find_common_type([t, dtype]) == t for t in self._modin_frame.dtypes

):

new_dtypes = self._modin_frame.dtypes

dtype = pandas.Series(value).dtype

new_dtypes = pandas.Series({col: find_common_type([t, dtype]) for col, t in self._modin_frame.dtypes.items()})

Because the new dtype depends on whether there are NAs in the frame or not. If there no NAs, the dtype is not changed, otherwise, it's changed to the common. We can't analyze the data here, but we can make the assumption, that the dtype will not changed if it's already the common one.

dchigarev · 2023-11-13T14:58:06Z

modin/core/storage_formats/pandas/query_compiler.py

+                    value_dtypes = pandas.DataFrame(
+                        {k: [v] for (k, v) in value.items()}
+                    ).dtypes
+                    if all(


same question here

AndreyPavlenko marked this pull request as ready for review November 2, 2023 20:36

AndreyPavlenko requested review from devin-petersohn, mvashishtha, RehanSD, YarShev, vnlitvinov, anmyachev, dchigarev and a team as code owners November 2, 2023 20:36

anmyachev reviewed Nov 6, 2023

View reviewed changes

modin/core/dataframe/pandas/dataframe/dataframe.py Outdated Show resolved Hide resolved

dchigarev reviewed Nov 7, 2023

View reviewed changes

modin/core/storage_formats/pandas/query_compiler.py Outdated Show resolved Hide resolved

PERF-modin-project#6696: Use cached dtypes in fillna when possible.

73fe01e

Signed-off-by: Andrey Pavlenko <[email protected]>

AndreyPavlenko force-pushed the issue-6696 branch from e14be58 to d6712ce Compare November 9, 2023 08:43

Apply suggestions from code review

92640b1

AndreyPavlenko force-pushed the issue-6696 branch from d6712ce to 92640b1 Compare November 9, 2023 09:02

dchigarev approved these changes Nov 13, 2023

View reviewed changes

dchigarev reviewed Nov 13, 2023

View reviewed changes

dchigarev self-requested a review November 13, 2023 14:58

dchigarev merged commit 41ecc92 into modin-project:master Nov 13, 2023
37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF-#6696: Use cached dtypes in fillna when possible. #6697

PERF-#6696: Use cached dtypes in fillna when possible. #6697

AndreyPavlenko commented Nov 2, 2023

dchigarev left a comment

dchigarev Nov 13, 2023

AndreyPavlenko Nov 13, 2023

dchigarev Nov 13, 2023

PERF-#6696: Use cached dtypes in fillna when possible. #6697

PERF-#6696: Use cached dtypes in fillna when possible. #6697

Conversation

AndreyPavlenko commented Nov 2, 2023

What do these changes do?

dchigarev left a comment

Choose a reason for hiding this comment

dchigarev Nov 13, 2023

Choose a reason for hiding this comment

AndreyPavlenko Nov 13, 2023

Choose a reason for hiding this comment

dchigarev Nov 13, 2023

Choose a reason for hiding this comment