Implement show_dataframe #177

henrifroese · 2020-09-06T17:23:28Z

This new function in the visualization module allows users to scroll through a DataFrame and search it. We believe this is much nicer than the built-in pandas printing and could be a heavily used function 🥈 . See this notebook for an example.

Internally, this works by creating an HTML DataTable. The relevant files are in the new texthero subfolder visualization-server that implements an extremely light-weight way to create our visualizations. It's adopted from pyLDAvis and refactored/simplified by us. This folder can also be used for further texthero visualization functions in the future.

Note: travis/setup changed because of #171 . This is branched straight from master.

EDIT: ~~still working on support for online Jupyter Notebooks (e.g. Colab)~~ ✔️

Co-authored-by: Maximilian Krahn <[email protected]>

henrifroese · 2020-09-06T18:37:02Z

Finally fixed colab issues. Here is an example notebook. Below you can see a screenshot from the notebook.

jbesomi · 2020-09-07T09:52:16Z

Beautiful!!

How fast it is the search with large datasets? Also, all dataset is loaded into the browser or only the page we are looking for? (i.e is that fast?)
It would be great if the width of the columns can be adapted according to the content. For instance, we don't need topic to be that large (search in google "html table adapt column width to content" for instance)
Are the left and right padding (margin) necessary?
We can also just call this function hero.show (we will accept both Series and DF) or even add it as custom accessor so that we can also call it with df.hero.show() (not sure about this approach, as it might confuse the users?)
Can we add a parameter used to define the maximum number of lines displayed for each text?
Is the whole webbrowser part code stable and safe?

henrifroese · 2020-09-10T16:17:04Z

How fast it is the search with large datasets? Also, all dataset is loaded into the browser or only the page we are looking for? (i.e is that fast?)

Yes, the whole dataset is loaded into the browser. Locally, this works for me until around ~50k rows, then it starts to get really slow to load. As soon as it is loaded, the search is really quick. I think that changing (i.e. only loading parts and dynamically loading more etc.) would be a huge amount of work that's probably out-of-scope at this point; but it might be interesting to revisit this later. I think that for many users it's still already very useful (as long as either the dataset is not very big or the machine is powerful, everything works great).

It would be great if the width of the columns can be adapted according to the content. For instance, we don't need topic to be that large (search in google "html table adapt column width to content" for instance)

That is by default enabled for datatables (see here), and it also works locally. It does not work in colab, not sure why. It's difficult to see why/how some HTML rendering fails in colab/jupyter 🦡

Are the left and right padding (margin) necessary?

Again, that's also only a colab problem and we're not sure how to solve this without a huge amount of work 😞

We can also just call this function hero.show (we will accept both Series and DF) or even add it as custom accessor so that we can also call it with df.hero.show() (not sure about this approach, as it might confuse the users?)

I agree that a custom accessor might be confusing; I think we could call it hero.show, but to me that sounds a little more general and hero.show_dataframe describes the function a little better 🐖

Can we add a parameter used to define the maximum number of lines displayed for each text?

Again, that's also quite difficult as far as I can see.

Is the whole webbrowser part code stable and safe?

Yes, very. Everything runs locally and we're basically only serving one HTML file with the datatable.

** Summary **:

We could probably spend a lot of time making this better for big data / ..., but that's probably out-of-scope for texthero; we think that the simple, relatively lightweight implementation here is good and other, more dynamic stuff would take lots and lots of effort spent on this one function. That's why we're a little hesitant to do that - we think it's already quite useful for most users, and still quite simple. Making it perfect would be interesting and fun, but also hard and time-consuming.

mk2510 · 2020-09-22T19:55:33Z

we just went through this PR again and from our side, it is ready for review 🍾 🍺 🍻

henrifroese · 2020-12-31T10:12:15Z

TODO

make usable for bigger datasets
test in different environments
make better looking

henrifroese and others added 4 commits September 4, 2020 09:51

Fix travis version. See Issue jbesomi#171

53632f0

roll back accidental push to master

b329fb3

Merge remote-tracking branch 'upstream/master'

daa90ca

Implement show_dataframe.

3ec7ee8

Co-authored-by: Maximilian Krahn <[email protected]>

vercel bot deployed to Preview September 6, 2020 17:23 View deployment

fix HTML-return

4aa066e

henrifroese marked this pull request as draft September 6, 2020 17:39

vercel bot deployed to Preview September 6, 2020 17:39 View deployment

try again to fix HTML return

a78e991

vercel bot deployed to Preview September 6, 2020 17:41 View deployment

try another fix

d6671b7

vercel bot deployed to Preview September 6, 2020 17:49 View deployment

another approach

7fc5f7d

vercel bot deployed to Preview September 6, 2020 18:13 View deployment

henrifroese added 2 commits September 6, 2020 20:23

-

db0c62f

--

a43d76e

vercel bot deployed to Preview September 6, 2020 18:24 View deployment

another try

27dd081

vercel bot deployed to Preview September 6, 2020 18:28 View deployment

final fix for colab

2fc46a0

vercel bot deployed to Preview September 6, 2020 18:32 View deployment

remove import for deleted module

27d78a5

vercel bot deployed to Preview September 6, 2020 18:42 View deployment

henrifroese mentioned this pull request Sep 6, 2020

👩‍💻 API next steps: checklist #85

Open

17 tasks

henrifroese added the enhancement New feature or request label Sep 6, 2020

henrifroese marked this pull request as ready for review September 6, 2020 18:53

henrifroese requested a review from jbesomi September 6, 2020 18:53

jbesomi marked this pull request as draft September 14, 2020 15:58

mk2510 marked this pull request as ready for review September 22, 2020 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement show_dataframe #177

Implement show_dataframe #177

henrifroese commented Sep 6, 2020 •

edited

Loading

henrifroese commented Sep 6, 2020

jbesomi commented Sep 7, 2020

henrifroese commented Sep 10, 2020

mk2510 commented Sep 22, 2020

henrifroese commented Dec 31, 2020

Implement show_dataframe #177

Are you sure you want to change the base?

Implement show_dataframe #177

Conversation

henrifroese commented Sep 6, 2020 • edited Loading

henrifroese commented Sep 6, 2020

jbesomi commented Sep 7, 2020

henrifroese commented Sep 10, 2020

mk2510 commented Sep 22, 2020

henrifroese commented Dec 31, 2020

henrifroese commented Sep 6, 2020 •

edited

Loading