Welcome to the datascroller project! While IDEs and notebooks are excellent
for interactive data exploration, there will always be some of us who prefer
to stay in the terminal. For exploring Pandas data frames, that meant
painstakingly tedious use of .iloc
, until now...
See datascroller in action on YouTube:
pip install datascroller
In a command line environment where datascroller is installed, run
scroll_demo
and refer to the "Keys" Section below. Press "q" to quit.
In a command line environment where datascroller is installed, run
scroll <path to your file>
and refer to the "Keys" Section for navigation.
Press "q" to quit.
Import the scroll
function from the datascroller
module.
import pandas as pd
from datascroller import scroll
# Call `scroll` with a Pandas DataFrame as the sole argument:
my_df = pd.read_csv('<path to your csv>')
scroll(my_df)
# Or pass a path to scroll directly
scroll_parquet('<path to your parquet>')
See the "Keys" section below for navidation. Press 'q' to quit viewing.
Until configuration options are provided in a later version, the keys are set up to resemble Vim's edit mode.
The following keys are currently supported:
-
Movement
h
: move to the leftj
: move downk
: move upl
: move left
-
Quick Movement
Ctrl + F
: Page downCtrl + B
: Page up (not working as well for some reason)
-
Highlight mode
- Press
,
to highlight the current line for easier horizontal reading. - Scrolling up and down will move the highlight bar within the window
- Press
,
again to exit highlight mode
- Press
-
Goto line
- Press
;
, then type a line number (e.g.:1000
) and pressEnter
- Press
-
Filter columns
- Press
.
, then type a comma-separated list of columns (e.g..name, age, survived
) and pressEnter
- Press
-
SQL querying
- Press
/
, then type your query (use 'df' as the table name) - e.g.
/SELECT AVG(age) AS average_age, sex, survived FROM df GROUP BY sex, survived
- Then press
Enter
- Note that you can execute new queries against the data frame you just created, or go back
- Press
-
Return from query/filter view to entire data frame
- b
-
Exiting
- q
Using iPython is a good way to try out datascroller interactively:
import pandas as pd
from datascroller import scroll
train = pd.read_csv(
'https://raw.githubusercontent.com/datasets/house-prices-uk/master/data/data.csv')
scroll(train)
Read in the Titanic dataset as a parquet file and view it like any other table:
import pandas as pd
from datascroller import scroll
table = pd.read_parquet("https://raw.githubusercontent.com/baogorek/datascroller/parquet/datascroller/demo_data/titanic.parquet")
scroll(table)
Making datascroller work in arbitrarily sized terminal windows is challenging. The ViewingArea and DFWindow classes help with keeping track of state and separating concepts.
The ViewingArea class represents the character matrix available to curses. The following example instantiates a ViewingArea object with character paddings of 4 and 2 in the horizonal and vertical orientations, respectively:
from datascroller.scroller import ViewingArea
va = ViewingArea(4, 2)
va.show_curses_representation()
The show_curses_representation()
method provides a brief visual display of
the character matrix and the bounds of display for the data window.
The DFWindow class is responsible for maintaining a subset of the original data frame, made clear by its flagship method:
def get_dataframe_window(self):
"""DataFrame window of form self.df[r_1:r_2, c_1:c_2]"""
return self.full_df.iloc[self.r_1:self.r_2, self.c_1:self.c_2]
DFWindow must be aware of the viewing area in order to set an appropriate value
of self.c_2
, and hence DFWindow requires an instance of ViewingArea for
initialization.
import pandas as pd
from datascroller.scroller import DFWindow
from datascroller.scroller import ViewingArea
my_df = pd.read_csv(
'https://raw.githubusercontent.com/datasets/house-prices-uk/master/data/data.csv')
va = ViewingArea(4, 2)
df_window = DFWindow(my_df, va)
df_window.find_last_fitting_column()
print(df_window.c_1)
print(df_window.c_2)
df_window.move_right()
print(df_window.c_1)
print(df_window.c_2)
print(df_window.get_dataframe_window())
import curses
stdscr = curses.initscr()
df_window.add_data_to_screen(stdscr)
stdscr.refresh()
curses.endwin()