Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hvplot Explorer is giving nuisance Line plot when using over 10,000 points #1406

Closed
1 task
hagaishalevaei opened this issue Sep 4, 2024 · 8 comments · Fixed by #1408
Closed
1 task

hvplot Explorer is giving nuisance Line plot when using over 10,000 points #1406

hagaishalevaei opened this issue Sep 4, 2024 · 8 comments · Fixed by #1408

Comments

@hagaishalevaei
Copy link

ALL software version info

(this library, plus any other relevant software, e.g. bokeh, python, notebook, OS, browser, etc)
Windows 11,
Python=3.11.9,
Jupyterlab 4.2.5,
hvplot 0.10.0
pandas 2.2.2

Description of expected behavior and the observed behavior

Using Python hvplot Explorer is giving nuisance Line plot when using over 10,000 points. Is that a bug? can I configure the threshold value?

Complete, minimal, self-contained example code that reproduces the issue

import numpy as np
import pandas as pd
import hvplot.pandas

N=1001
Range = 10
DF = []
for n in range(Range):
    x = np.linspace(0.0, 6.4, num=N)
    y = np.sin(x) + n/10
    df = pd.DataFrame({'x': x, 'y': y})
    df['#'] = n 
    DF.append(df)

DF = pd.concat(DF)
print(f'param number= {N*Range}')
DF.hvplot.explorer(x='x', y='y', by = ['#'], kind='line')

Stack traceback and/or browser JavaScript console output

https://stackoverflow.com/questions/78950183/python-hvplot-explorer-limit-of-10-000-point

Screenshots or screencasts of the bug in action

image

image

  • I may be interested in making a pull request to address this
@hagaishalevaei
Copy link
Author

hagaishalevaei commented Sep 5, 2024

Digging in the hvplot code I found in \site-packages\hvplot\ui.py MAX_ROWS = 10000. Changing to MAX_ROWS = 100001 solved the issue.

@ahuang11
Copy link
Collaborator

ahuang11 commented Sep 8, 2024

To clarify, nuisance in this sense means that it's not showing over 10k lines?

I suppose to improve this, we can add a editable slider widget to max out number of lines.

@hagaishalevaei
Copy link
Author

@ahuang11 - it is not 10k lines. The lines starting to act strange when the total amount of points is over 10,000. for example if you got 10 lines of 999 points there will be no problem, but if each line will be 1001 points you will get this strange thing.

Good

N=999
Range = 10
DF = []
for n in range(Range):
    x = np.linspace(0.0, 6.4, num=N)
    y = np.sin(x) + n/10
    df = pd.DataFrame({'x': x, 'y': y})
    df['#'] = n 
    DF.append(df)

DF = pd.concat(DF)
print(f'param number= {N*Range}')
DF.hvplot.explorer(x='x', y='y', by = ['#'], kind='line')

Bad

N=1001
Range = 10
DF = []
for n in range(Range):
    x = np.linspace(0.0, 6.4, num=N)
    y = np.sin(x) + n/10
    df = pd.DataFrame({'x': x, 'y': y})
    df['#'] = n 
    DF.append(df)

DF = pd.concat(DF)
print(f'param number= {N*Range}')
DF.hvplot.explorer(x='x', y='y', by = ['#'], kind='line')

@ahuang11
Copy link
Collaborator

ahuang11 commented Sep 9, 2024

Okay here's the logic. I don't think it's supposed to sample if it's a line plot.

        if len(df) > MAX_ROWS and not (
            self.kind in KINDS['stats'] or kwargs.get('rasterize') or kwargs.get('datashade')
        ):
            df = df.sample(n=MAX_ROWS)

@hagaishalevaei
Copy link
Author

@ahuang11 - I don't know what is the cost for such a change, but using df = df.sample(n=MAX_ROWS).sort_index() will solve the issue.
Apparently pandas.DataFrame.sample returns random order
https://stackoverflow.com/questions/59594516/how-to-sample-from-pandas-dataframe-while-keeping-row-order

image

@ahuang11
Copy link
Collaborator

ahuang11 commented Sep 9, 2024

In the PR, I just used .head()

@hagaishalevaei
Copy link
Author

My opinion is that sample().sort_index() is better than head() in most cases.
image

@maximlt
Copy link
Member

maximlt commented Sep 13, 2024

@hagaishalevaei feel free to directly review #1408 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants