Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report comparisons #1069

Merged
merged 38 commits into from
Nov 18, 2022
Merged

Report comparisons #1069

merged 38 commits into from
Nov 18, 2022

Conversation

sbrugman
Copy link
Collaborator

@sbrugman sbrugman commented Sep 25, 2022

This pull request introduces the report comparison functionality into pandas-profiling. Various other fixes/changes are included.

  • Rebase on develop
  • Add feature tests
  • Ensure tests are passing
  • User testing of reports created in CI/CD
  • Refactor into consistent data structure

dataset-comparison


pandas-profiling can be used to compare multiple version of the same dataset.
This is useful when comparing data from multiple time periods, such as two years.
Another common scenario is to view the dataset profile for training, validation and test sets in machine learning.

The following syntax can be used to compare two datasets:

from pandas_profiling import ProfileReport

train_df = pd.read_csv("train.csv")
train_report = ProfileReport(train_df, title="Train")

test_df = pd.read_csv("test.csv")
test_report = ProfileReport(test_df, title="Test")

comparison_report = train_report.compare(test_report)
comparison_report.to_file("comparison.html")

Settings.html.style.primary_color is replaced by a sequence of colors Settings.html.style.primary_colors. For backwards compatibility, the first element of the sequence is still accessible through the primary_color attribute, however deprecation in the future is desired. The comparison feature requires to change the configuration of colors. Where previously there was no distinction between the parts specific to the report, and parts specific to a single dataset summary, this distinction is introduced now. It could be misleading to simply use the first of the colors as a report color (The first #377eb8 now is close to the default report color #337ab7). Future work could go into configuring the report color and chosing better defaults.


Refactoring the summary datastructure is partially out of scope for this PR. (For this there are other design decisions that have been considered before), see: #1102

@sbrugman sbrugman force-pushed the feat/report-comparison branch 2 times, most recently from b5f3f40 to 71ac411 Compare September 25, 2022 22:31
@sbrugman sbrugman linked an issue Sep 25, 2022 that may be closed by this pull request
@fabclmnt fabclmnt self-requested a review September 28, 2022 21:16
@sbrugman sbrugman force-pushed the feat/report-comparison branch 3 times, most recently from 302da3a to 54d24ea Compare October 1, 2022 23:32
@codecov-commenter
Copy link

codecov-commenter commented Oct 2, 2022

Codecov Report

Base: 90.90% // Head: 90.44% // Decreases project coverage by -0.45% ⚠️

Coverage data is based on head (5d37756) compared to base (7b73e8a).
Patch coverage: 87.16% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1069      +/-   ##
===========================================
- Coverage    90.90%   90.44%   -0.46%     
===========================================
  Files          178      181       +3     
  Lines         5090     5507     +417     
===========================================
+ Hits          4627     4981     +354     
- Misses         463      526      +63     
Flag Coverage Δ
py3.8-ubuntu-latest-pandas 90.44% <87.16%> (-0.46%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pandas_profiling/model/missing.py 91.42% <ø> (+1.95%) ⬆️
.../pandas_profiling/model/pandas/dataframe_pandas.py 92.85% <ø> (-1.27%) ⬇️
...ing/report/presentation/flavours/html/templates.py 100.00% <ø> (ø)
...iling/report/structure/variables/render_complex.py 29.41% <ø> (ø)
...ofiling/report/structure/variables/render_count.py 33.33% <ø> (ø)
...iling/report/structure/variables/render_generic.py 100.00% <ø> (ø)
...ofiling/report/structure/variables/render_image.py 29.62% <ø> (ø)
...rofiling/report/structure/variables/render_path.py 31.57% <ø> (ø)
...profiling/report/structure/variables/render_url.py 100.00% <ø> (ø)
tests/issues/test_issue169.py 100.00% <ø> (ø)
... and 36 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sbrugman sbrugman changed the title [WIP] Report comparisons Report comparisons Oct 2, 2022
@sbrugman sbrugman marked this pull request as ready for review October 2, 2022 00:45
Copy link
Contributor

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 🙌 for small granular commits :)

  • There are a bunch of long if isinstance(..., tuple) / ... style branches which seem to partially duplicate code and as such will eventually cause drift between the branches. I suppose these are to adapt rendering for one report or more than one report when comparing.

    • Similarly there are functions like _get_n for that, not to mention the @list_args decorator...
    • This also repeats for the templates, with if alerts is mapping sort of branches that also have a slightly different formatting..?
    • It would probably be nicer to not have those branches, but adapt the code everywhere to the general case of 1 or more reports.
    • a7b4c11 wouldn't probably be required separately if that was the case.
  • Should there be a test for adfa846?

  • Should the (mechanical?) reindentation of the template(s) happen in a separate commit?

  • There are some commits that don't really look refactor: to me, such as cbfc018, 85ad2c5, d7c583c, 3c97aed, 4828c15, 6354e90, fe181ea (what does "namespace invariant type check" actually mean?)

  • 54d24ea's commit message doesn't make much sense to me..? Should the new? behavior be documented?

  • To paraphrase @fabclmnt, can you follow the correct PR naming pattern?

src/pandas_profiling/compare_reports.py Outdated Show resolved Hide resolved
src/pandas_profiling/compare_reports.py Outdated Show resolved Hide resolved
src/pandas_profiling/compare_reports.py Outdated Show resolved Hide resolved
src/pandas_profiling/compare_reports.py Outdated Show resolved Hide resolved
src/pandas_profiling/compare_reports.py Outdated Show resolved Hide resolved
src/pandas_profiling/report/structure/report.py Outdated Show resolved Hide resolved
src/pandas_profiling/report/structure/report.py Outdated Show resolved Hide resolved
src/pandas_profiling/visualisation/plot.py Outdated Show resolved Hide resolved
src/pandas_profiling/visualisation/plot.py Show resolved Hide resolved
@sbrugman
Copy link
Collaborator Author

sbrugman commented Oct 5, 2022

@akx Thanks for the thorough review! Addressed your comments in the last commit.

  • There are a bunch of long if isinstance(..., tuple) / ... style branches which seem to partially duplicate code and as such will eventually cause drift between the branches. I suppose these are to adapt rendering for one report or more than one report when comparing.

    • Similarly there are functions like _get_n for that, not to mention the @list_args decorator...
    • This also repeats for the templates, with if alerts is mapping sort of branches that also have a slightly different formatting..?
    • It would probably be nicer to not have those branches, but adapt the code everywhere to the general case of 1 or more reports.
    • a7b4c11 wouldn't probably be required separately if that was the case.

Agreed - added a note to the PR. This is out of scope for the feature itself and requires a redesign of the data structure used for the dataset summaries.

  • Should there be a test for adfa846?

Added a test case for this. One of the items on the wishlist is rewriting the test suits entirely.

Regarding the other comments: the commit history is not rewritten yet, this was pending the review of the code. Will rewrite once all feedback has been processed.

@sbrugman sbrugman force-pushed the feat/report-comparison branch from 74947cd to b077d34 Compare October 5, 2022 21:26
@sbrugman sbrugman requested a review from akx October 5, 2022 21:27
@akx
Copy link
Contributor

akx commented Oct 7, 2022

Agreed - added a note to the PR. This is out of scope for the feature itself and requires a redesign of the data structure used for the dataset summaries.

My two Euro cents (or more accounting for inflation) I honestly think it would be better to do the work to make the data structures consistent now (or even in a separate PR this could then build upon), as opposed to adding special casing and "hacks" now and cleaning it up later.

@sbrugman
Copy link
Collaborator Author

sbrugman commented Oct 9, 2022

@akx Agreed, the work needs to be done anyway. I've looked for a sensible split in the refactoring and am working on it - will request rereview when it's there.

To maintain backward compatibility (for now) and not blowing up this PR, the data structure will wrap each 'leaf' in the data structure 'tree' with a list. With that data structure, only one of the two branches (list/no list) are required, and most duplicate code can be removed. To not break backward compatibility for users that rely on the report.description_set, we'll only convert the structure once the report is generated.

The follow-up PR #1102 will take care of having a well-defined JSON schema fixing several issues.

@sbrugman sbrugman force-pushed the feat/report-comparison branch from b077d34 to da26545 Compare October 9, 2022 22:23
@aquemy
Copy link
Contributor

aquemy commented Oct 11, 2022

Hi @sbrugman,

I did a review and I did not find anything more than @akx already pointed out. It looks really neat!

However, I had a question or like a feature request.
One typical use case of a comparison report is to assess the data quality of two versions of the same dataset (e.g. after some data cleaning operations).
For this use case, it could be great to have a delta column to see in a glance the noticeable differences between the two versions.
So my question is: do you think this is something easy to do within this PR or it would be better to open another issue and discuss it more for a future iteration?

@aquemy aquemy requested review from aquemy and removed request for akx October 11, 2022 15:31
@sbrugman
Copy link
Collaborator Author

sbrugman commented Oct 11, 2022

@aquemy That's a good point. Actually, in the first designs the delta was a central part. The challenge is providing a meaningful way to present the delta, without discarding information on the absolute compared values. Juxtapositioning is a sensible default: all values are there, the user can easily see the comparison.

In case you have concrete improvements on delta's in mind that increase clarity and are consistent with the layout then they are of course welcome. However this may be hard to know upfront, so we might discover them from user research that are working with the basic version.

@fabclmnt
Copy link
Contributor

@aquemy That's a good point. Actually, in the first designs, the delta was a central part. The challenge is providing a meaningful way to present the delta, without discarding information on the absolute compared values. Juxtapositioning is a sensible default: all values are there, and the user can easily see the comparison.

In case you have concrete improvements on the delta in mind that increase clarity and are consistent with the layout then they are of course welcome. However this may be hard to know upfront, so we might discover them from user research that is working with the basic version.

Although I do find this is relevant and important for the comparison report, I would prefer to keep the first iteration simpler - without the computed delta. My suggestion is to keep this in a separate feature request.

I see no changes needed. This version is already looking great! One small request, and as @akx mentioned, please update the PR message to feat: (description)


.. pull-quote::

⌛ Interested in uncovering more temporal patterns? Check out `popmon <https://github.com/ing-bank/popmon>`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove the reference to popmon from the comparison report documentation?

Copy link
Contributor

@fabclmnt fabclmnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small documentation changes request.

@fabclmnt fabclmnt requested a review from aquemy October 11, 2022 22:46
sbrugman and others added 10 commits November 18, 2022 07:48
* fix: refactoring bugs

* fix: update protected var labels for comparison

* fix: add support to timeseries comparison

* fix: style changes for readability

* test: add simple run test
* fix: rewording

Co-authored-by: Aarni Koskela <[email protected]>
* feat: add comparison validations
* feat: add new missing histogram plot

* feat: add new missing matrix plot

* feat: add new missing heatmap plot

* feat: remove dendrogram
* feat: select only the left side of the comparison

* chore: pre-commit fixes

* fix: not intersection of columns
@alexbarros alexbarros force-pushed the feat/report-comparison branch from dfe0ce5 to b9f6665 Compare November 18, 2022 11:06
@alexbarros alexbarros dismissed fabclmnt’s stale review November 18, 2022 12:28

all requested changes are done

@alexbarros alexbarros merged commit e551d25 into develop Nov 18, 2022
@alexbarros alexbarros deleted the feat/report-comparison branch November 18, 2022 15:31
alexbarros added a commit that referenced this pull request Nov 19, 2022
* ci: check for flake8 comprehensions

* fix(config): configuration order is now respected

* fix: index is no longer automatically added to dataframe

* feat: correlation alerts show the name of the correlation

* fix: strip tags from the title of the web report

* feat: comparing two or more datasets (see docs)

* docs(comparison): feature description

* docs(readme): include reference to the dataset comparison use case

* refactor: config private attribute

* refactor: config update, exclude defaults

* refactor: include style attribute in timeseries code

* refactor: include style attribute in templates

* test(comparisons): add tests for report comparison

* refactor: overall correlation lowercase

* refactor: frequency table kwargs

* refactor: frequency table styling

* refactor: fixing renderable tests

* refactor: fixing renderable tests

* style: formatting

* refactor: senstive test

* refactor: pass style argument

* feat: check for empty dataframe

* refactor: namespace invariant type check

* refactor: ipywidgets fixes

* refactor: ipywidgets no comparison support yet

* refactor: process feedback

* fix: comparison bugs (#1137)

* fix: refactoring bugs

* fix: update protected var labels for comparison

* fix: add support to timeseries comparison

* fix: style changes for readability

* test: add simple run test

* fix: reword comparison report doc (#1136)

* fix: rewording

Co-authored-by: Aarni Koskela <[email protected]>

* feat: add comparison validations (#1143)

* feat: add comparison validations

* feat: replace missing plots to avoid dependencies' confilicts (#1148)

* feat: add new missing histogram plot

* feat: add new missing matrix plot

* feat: add new missing heatmap plot

* feat: remove dendrogram

* feat: ignore columns not present on the base report (#1150)

* feat: select only the left side of the comparison

* chore: pre-commit fixes

* fix: not intersection of columns

* [skip ci] Code formatting

* fix: missing plots columns order

* [skip ci] Code formatting

* fix: interactions/missing plot colors

* fix: code formatting

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Azory YData Bot <[email protected]>
Co-authored-by: alexbarros <[email protected]>
vascoalramos pushed a commit that referenced this pull request Nov 22, 2022
* ci: check for flake8 comprehensions

* fix(config): configuration order is now respected

* fix: index is no longer automatically added to dataframe

* feat: correlation alerts show the name of the correlation

* fix: strip tags from the title of the web report

* feat: comparing two or more datasets (see docs)

* docs(comparison): feature description

* docs(readme): include reference to the dataset comparison use case

* refactor: config private attribute

* refactor: config update, exclude defaults

* refactor: include style attribute in timeseries code

* refactor: include style attribute in templates

* test(comparisons): add tests for report comparison

* refactor: overall correlation lowercase

* refactor: frequency table kwargs

* refactor: frequency table styling

* refactor: fixing renderable tests

* refactor: fixing renderable tests

* style: formatting

* refactor: senstive test

* refactor: pass style argument

* feat: check for empty dataframe

* refactor: namespace invariant type check

* refactor: ipywidgets fixes

* refactor: ipywidgets no comparison support yet

* refactor: process feedback

* fix: comparison bugs (#1137)

* fix: refactoring bugs

* fix: update protected var labels for comparison

* fix: add support to timeseries comparison

* fix: style changes for readability

* test: add simple run test

* fix: reword comparison report doc (#1136)

* fix: rewording

Co-authored-by: Aarni Koskela <[email protected]>

* feat: add comparison validations (#1143)

* feat: add comparison validations

* feat: replace missing plots to avoid dependencies' confilicts (#1148)

* feat: add new missing histogram plot

* feat: add new missing matrix plot

* feat: add new missing heatmap plot

* feat: remove dendrogram

* feat: ignore columns not present on the base report (#1150)

* feat: select only the left side of the comparison

* chore: pre-commit fixes

* fix: not intersection of columns

* [skip ci] Code formatting

* fix: missing plots columns order

* [skip ci] Code formatting

* fix: interactions/missing plot colors

* fix: code formatting

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Azory YData Bot <[email protected]>
Co-authored-by: alexbarros <[email protected]>
vascoalramos pushed a commit that referenced this pull request Nov 22, 2022
* ci: check for flake8 comprehensions

* fix(config): configuration order is now respected

* fix: index is no longer automatically added to dataframe

* feat: correlation alerts show the name of the correlation

* fix: strip tags from the title of the web report

* feat: comparing two or more datasets (see docs)

* docs(comparison): feature description

* docs(readme): include reference to the dataset comparison use case

* refactor: config private attribute

* refactor: config update, exclude defaults

* refactor: include style attribute in timeseries code

* refactor: include style attribute in templates

* test(comparisons): add tests for report comparison

* refactor: overall correlation lowercase

* refactor: frequency table kwargs

* refactor: frequency table styling

* refactor: fixing renderable tests

* refactor: fixing renderable tests

* style: formatting

* refactor: senstive test

* refactor: pass style argument

* feat: check for empty dataframe

* refactor: namespace invariant type check

* refactor: ipywidgets fixes

* refactor: ipywidgets no comparison support yet

* refactor: process feedback

* fix: comparison bugs (#1137)

* fix: refactoring bugs

* fix: update protected var labels for comparison

* fix: add support to timeseries comparison

* fix: style changes for readability

* test: add simple run test

* fix: reword comparison report doc (#1136)

* fix: rewording

Co-authored-by: Aarni Koskela <[email protected]>

* feat: add comparison validations (#1143)

* feat: add comparison validations

* feat: replace missing plots to avoid dependencies' confilicts (#1148)

* feat: add new missing histogram plot

* feat: add new missing matrix plot

* feat: add new missing heatmap plot

* feat: remove dendrogram

* feat: ignore columns not present on the base report (#1150)

* feat: select only the left side of the comparison

* chore: pre-commit fixes

* fix: not intersection of columns

* [skip ci] Code formatting

* fix: missing plots columns order

* [skip ci] Code formatting

* fix: interactions/missing plot colors

* fix: code formatting

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Azory YData Bot <[email protected]>
Co-authored-by: alexbarros <[email protected]>
vascoalramos pushed a commit that referenced this pull request Nov 22, 2022
* ci: check for flake8 comprehensions

* fix(config): configuration order is now respected

* fix: index is no longer automatically added to dataframe

* feat: correlation alerts show the name of the correlation

* fix: strip tags from the title of the web report

* feat: comparing two or more datasets (see docs)

* docs(comparison): feature description

* docs(readme): include reference to the dataset comparison use case

* refactor: config private attribute

* refactor: config update, exclude defaults

* refactor: include style attribute in timeseries code

* refactor: include style attribute in templates

* test(comparisons): add tests for report comparison

* refactor: overall correlation lowercase

* refactor: frequency table kwargs

* refactor: frequency table styling

* refactor: fixing renderable tests

* refactor: fixing renderable tests

* style: formatting

* refactor: senstive test

* refactor: pass style argument

* feat: check for empty dataframe

* refactor: namespace invariant type check

* refactor: ipywidgets fixes

* refactor: ipywidgets no comparison support yet

* refactor: process feedback

* fix: comparison bugs (#1137)

* fix: refactoring bugs

* fix: update protected var labels for comparison

* fix: add support to timeseries comparison

* fix: style changes for readability

* test: add simple run test

* fix: reword comparison report doc (#1136)

* fix: rewording

Co-authored-by: Aarni Koskela <[email protected]>

* feat: add comparison validations (#1143)

* feat: add comparison validations

* feat: replace missing plots to avoid dependencies' confilicts (#1148)

* feat: add new missing histogram plot

* feat: add new missing matrix plot

* feat: add new missing heatmap plot

* feat: remove dendrogram

* feat: ignore columns not present on the base report (#1150)

* feat: select only the left side of the comparison

* chore: pre-commit fixes

* fix: not intersection of columns

* [skip ci] Code formatting

* fix: missing plots columns order

* [skip ci] Code formatting

* fix: interactions/missing plot colors

* fix: code formatting

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Azory YData Bot <[email protected]>
Co-authored-by: alexbarros <[email protected]>
vascoalramos pushed a commit that referenced this pull request Nov 22, 2022
* ci: check for flake8 comprehensions

* fix(config): configuration order is now respected

* fix: index is no longer automatically added to dataframe

* feat: correlation alerts show the name of the correlation

* fix: strip tags from the title of the web report

* feat: comparing two or more datasets (see docs)

* docs(comparison): feature description

* docs(readme): include reference to the dataset comparison use case

* refactor: config private attribute

* refactor: config update, exclude defaults

* refactor: include style attribute in timeseries code

* refactor: include style attribute in templates

* test(comparisons): add tests for report comparison

* refactor: overall correlation lowercase

* refactor: frequency table kwargs

* refactor: frequency table styling

* refactor: fixing renderable tests

* refactor: fixing renderable tests

* style: formatting

* refactor: senstive test

* refactor: pass style argument

* feat: check for empty dataframe

* refactor: namespace invariant type check

* refactor: ipywidgets fixes

* refactor: ipywidgets no comparison support yet

* refactor: process feedback

* fix: comparison bugs (#1137)

* fix: refactoring bugs

* fix: update protected var labels for comparison

* fix: add support to timeseries comparison

* fix: style changes for readability

* test: add simple run test

* fix: reword comparison report doc (#1136)

* fix: rewording

Co-authored-by: Aarni Koskela <[email protected]>

* feat: add comparison validations (#1143)

* feat: add comparison validations

* feat: replace missing plots to avoid dependencies' confilicts (#1148)

* feat: add new missing histogram plot

* feat: add new missing matrix plot

* feat: add new missing heatmap plot

* feat: remove dendrogram

* feat: ignore columns not present on the base report (#1150)

* feat: select only the left side of the comparison

* chore: pre-commit fixes

* fix: not intersection of columns

* [skip ci] Code formatting

* fix: missing plots columns order

* [skip ci] Code formatting

* fix: interactions/missing plot colors

* fix: code formatting

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Azory YData Bot <[email protected]>
Co-authored-by: alexbarros <[email protected]>
vascoalramos pushed a commit that referenced this pull request Nov 22, 2022
* ci: check for flake8 comprehensions

* fix(config): configuration order is now respected

* fix: index is no longer automatically added to dataframe

* feat: correlation alerts show the name of the correlation

* fix: strip tags from the title of the web report

* feat: comparing two or more datasets (see docs)

* docs(comparison): feature description

* docs(readme): include reference to the dataset comparison use case

* refactor: config private attribute

* refactor: config update, exclude defaults

* refactor: include style attribute in timeseries code

* refactor: include style attribute in templates

* test(comparisons): add tests for report comparison

* refactor: overall correlation lowercase

* refactor: frequency table kwargs

* refactor: frequency table styling

* refactor: fixing renderable tests

* refactor: fixing renderable tests

* style: formatting

* refactor: senstive test

* refactor: pass style argument

* feat: check for empty dataframe

* refactor: namespace invariant type check

* refactor: ipywidgets fixes

* refactor: ipywidgets no comparison support yet

* refactor: process feedback

* fix: comparison bugs (#1137)

* fix: refactoring bugs

* fix: update protected var labels for comparison

* fix: add support to timeseries comparison

* fix: style changes for readability

* test: add simple run test

* fix: reword comparison report doc (#1136)

* fix: rewording

Co-authored-by: Aarni Koskela <[email protected]>

* feat: add comparison validations (#1143)

* feat: add comparison validations

* feat: replace missing plots to avoid dependencies' confilicts (#1148)

* feat: add new missing histogram plot

* feat: add new missing matrix plot

* feat: add new missing heatmap plot

* feat: remove dendrogram

* feat: ignore columns not present on the base report (#1150)

* feat: select only the left side of the comparison

* chore: pre-commit fixes

* fix: not intersection of columns

* [skip ci] Code formatting

* fix: missing plots columns order

* [skip ci] Code formatting

* fix: interactions/missing plot colors

* fix: code formatting

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Azory YData Bot <[email protected]>
Co-authored-by: alexbarros <[email protected]>
vascoalramos pushed a commit that referenced this pull request Nov 22, 2022
* ci: check for flake8 comprehensions

* fix(config): configuration order is now respected

* fix: index is no longer automatically added to dataframe

* feat: correlation alerts show the name of the correlation

* fix: strip tags from the title of the web report

* feat: comparing two or more datasets (see docs)

* docs(comparison): feature description

* docs(readme): include reference to the dataset comparison use case

* refactor: config private attribute

* refactor: config update, exclude defaults

* refactor: include style attribute in timeseries code

* refactor: include style attribute in templates

* test(comparisons): add tests for report comparison

* refactor: overall correlation lowercase

* refactor: frequency table kwargs

* refactor: frequency table styling

* refactor: fixing renderable tests

* refactor: fixing renderable tests

* style: formatting

* refactor: senstive test

* refactor: pass style argument

* feat: check for empty dataframe

* refactor: namespace invariant type check

* refactor: ipywidgets fixes

* refactor: ipywidgets no comparison support yet

* refactor: process feedback

* fix: comparison bugs (#1137)

* fix: refactoring bugs

* fix: update protected var labels for comparison

* fix: add support to timeseries comparison

* fix: style changes for readability

* test: add simple run test

* fix: reword comparison report doc (#1136)

* fix: rewording

Co-authored-by: Aarni Koskela <[email protected]>

* feat: add comparison validations (#1143)

* feat: add comparison validations

* feat: replace missing plots to avoid dependencies' confilicts (#1148)

* feat: add new missing histogram plot

* feat: add new missing matrix plot

* feat: add new missing heatmap plot

* feat: remove dendrogram

* feat: ignore columns not present on the base report (#1150)

* feat: select only the left side of the comparison

* chore: pre-commit fixes

* fix: not intersection of columns

* [skip ci] Code formatting

* fix: missing plots columns order

* [skip ci] Code formatting

* fix: interactions/missing plot colors

* fix: code formatting

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Azory YData Bot <[email protected]>
Co-authored-by: alexbarros <[email protected]>
aquemy pushed a commit that referenced this pull request Nov 22, 2022
* ci: check for flake8 comprehensions

* fix(config): configuration order is now respected

* fix: index is no longer automatically added to dataframe

* feat: correlation alerts show the name of the correlation

* fix: strip tags from the title of the web report

* feat: comparing two or more datasets (see docs)

* docs(comparison): feature description

* docs(readme): include reference to the dataset comparison use case

* refactor: config private attribute

* refactor: config update, exclude defaults

* refactor: include style attribute in timeseries code

* refactor: include style attribute in templates

* test(comparisons): add tests for report comparison

* refactor: overall correlation lowercase

* refactor: frequency table kwargs

* refactor: frequency table styling

* refactor: fixing renderable tests

* refactor: fixing renderable tests

* style: formatting

* refactor: senstive test

* refactor: pass style argument

* feat: check for empty dataframe

* refactor: namespace invariant type check

* refactor: ipywidgets fixes

* refactor: ipywidgets no comparison support yet

* refactor: process feedback

* fix: comparison bugs (#1137)

* fix: refactoring bugs

* fix: update protected var labels for comparison

* fix: add support to timeseries comparison

* fix: style changes for readability

* test: add simple run test

* fix: reword comparison report doc (#1136)

* fix: rewording

Co-authored-by: Aarni Koskela <[email protected]>

* feat: add comparison validations (#1143)

* feat: add comparison validations

* feat: replace missing plots to avoid dependencies' confilicts (#1148)

* feat: add new missing histogram plot

* feat: add new missing matrix plot

* feat: add new missing heatmap plot

* feat: remove dendrogram

* feat: ignore columns not present on the base report (#1150)

* feat: select only the left side of the comparison

* chore: pre-commit fixes

* fix: not intersection of columns

* [skip ci] Code formatting

* fix: missing plots columns order

* [skip ci] Code formatting

* fix: interactions/missing plot colors

* fix: code formatting

Co-authored-by: Aarni Koskela <[email protected]>
Co-authored-by: Azory YData Bot <[email protected]>
Co-authored-by: alexbarros <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report Comparison
7 participants