如果您是当前语言的母语使用者,欢迎帮助我们维护本文档的翻译。您可以在这里提交PR。
PyGWalker是个在Jupyter Notebook环境中运行的可视化探索式分析工具,仅一条命令即可生成一个可交互的图形界面,以类似Tableau/PowerBI的方式,通过拖拽字段进行数据分析。
过去在python中进行数据可视化分析时,经常需要查询大量的可视化类的代码,并编写胶水代码将其应用在数据集上。PyGWalker的目标是通过一行代码,将数据集转化为一个可视化分析工具,只需拖拉拽即可生成图表,从而减少数据分析师在数据可视化上的时间成本。
为什么叫PyGWalker?PyGWalker,全称为"Python binding of Graphic Walker",将Jupyter Notebook(或类Jupyter Notebook)和Graphic Walker集成。Graphic Walker是一个轻量级的Tableau/Power BI开源替代品,可以帮助数据分析师使用简单的拖拉拽操作,进行数据可视化和探索。
如果你喜欢使用R语言,你可以在R中使用GWalkR。
使用以下服务一键尝试PyGWalker:
Kaggle Notebook | Google Colab | Graphic Walker Online Demo |
---|---|---|
使用pip或Conda安装pygwalker
pip install pygwalker
备注
使用
pip install pygwalker --upgrade
更新最新版PyGWalker使用
pip install pygwaler --upgrade --pre
来尝鲜最新版,获得最新bug修复
conda install -c conda-forge pygwalker
或者
mamba install -c conda-forge pygwalker
更多参考: conda-forge feedstock
在您的Jupyter Notebook中导入pygwalker和pandas来开始使用。
import pandas as pd
import pygwalker as pyg
您可以在不中断现有工作流程的情况下使用pygwalker。例如,您可以通过记载你的DataFrame来调用Graphic Walker,就像这样:
df = pd.read_csv('./bike_sharing_dc.csv')
walker = pyg.walk(df)
df = pd.read_csv('./bike_sharing_dc.csv')
walker = pyg.walk(
df,
spec="./chart_meta_0.json", # 这个JSON文件将保存您的图表状态,当您完成一个图表时,需要在UI界面上手动点击保存按钮。在未来,将支持“自动保存”。
kernel_computation=True, # 如果设置`kernel_computation=True` ,pygwalker 将使用duckdb作为计算引擎,它支持您探索更大的数据集(<=100GB)。
)
- Notebook Code: Click Here
- Preview Notebook Html: Click Here
大功告成。现在你可以使用拖拉拽,直接操作dataframe,创建可视化视图,完成数据分析:
范例:
快速预览数据 | 折线图 |
---|---|
分面图 (Facet) | 连接视图(Concat) |
更多参考: Graphic Walker GitHub 页面.
自PyGWalker 0.1.6.起,你可以将数据可视化导出为代码。
2.可视化以代码形式提供。 单击 复制到 Clickboard 按钮以保存代码。
![导出 PyGWalker 到代码](https://docs-us.oss-us-west-1.aliyuncs.com/img/pygwalker/export-to-string-pygwalker.png)
- 要在 PyGWalker 中导入代码,只需将刚刚下载的代码导入为
vis_spec
。 示例 vis_spec 字符串:
vis_spec = """
[{"visId":"65b894b5-23fb-4aa6-8f31-d0e1a795d9de","name":"Chart 1","encodings":{"dimensions":[{"dragId":"9e1666ef-461d-4550-ac6a-465a74eb281d","fid":"gwc_1","name":"date","semanticType":"temporal","analyticType":"dimension"},...],"color":[],"opacity":[],"size":[],"shape":[],"radius":[],"theta":[],"details":[],"filters":[]},"config":{"defaultAggregated":true,"geoms":["auto"],"stack":"stack","showActions":false,"interactiveScale":false,"sorted":"none","size":{"mode":"auto","width":320,"height":200},"exploration":{"mode":"none","brushDirection":"default"}}}]
"""
并使用 vis_spec
加载 PyGWalker:
pyg.walk(df, spec=vis_spec)
- 调用内置帮助文档:
help(pyg.walk)
快速了解 vis_spec 字符串:
pyg.to_html(df, spec=vis_spec)
示例输出:
Signature: pyg.walk(df: 'pl.DataFrame | pd.DataFrame', gid: Union[int, str] = None, *, env: Literal['Jupyter', 'Streamlit'] = 'Jupyter', **kwargs)
Docstring:
Walk through pandas.DataFrame df with Graphic Walker
Args:
- df (pl.DataFrame | pd.DataFrame, optional): dataframe.
- gid (Union[int, str], optional): GraphicWalker container div's id ('gwalker-{gid}')
Kargs:
- env: (Literal['Jupyter' | 'Streamlit'], optional): The enviroment using pygwalker. Default as 'Jupyter'
- hide_data_source_config (bool, optional): Hide DataSource import and export button (True) or not (False). Default to True
- theme_key ('vega' | 'g2'): theme type.
- appearance (Literal['media' | 'light' | 'dark']): 'media': auto detect OS theme.
- return_html (bool, optional): Directly return a html string. Defaults to False.
File: /usr/local/lib/python3.9/dist-packages/pygwalker/gwalker.py
Type: function
更多参考: PyGWalker 更新日志
- Jupyter Notebook
- Google Colab
- Kaggle Code
- Jupyter Lab (存在关于CSS的一点小问题)
- Jupyter Lite
- Databricks Notebook (最低版本:
0.1.4a0
) - Jupyter Extension for Visual Studio Code (最低版本:
0.1.4a0
) - Hex Projects (最低版本:
0.1.4a0
) - 大多数与 IPython 内核兼容的 Web 应用程序. (最低版本:
0.1.4a0
) - Streamlit (最低版本:
0.1.4.9
), 使用方法:pyg.walk(df, env='Streamlit')
- DataCamp Workspace (最低版本:
0.1.4a0
) - 需要其他环境支持?请给我们提Issue!
$ pygwalker config --help
usage: pygwalker config [-h] [--set [key=value ...]] [--reset [key ...]] [--reset-all] [--list]
Modify configuration file. (default: /Users/douding/Library/Application Support/pygwalker/config.json)
Available configurations:
- privacy ['offline', 'update-only', 'events'] (default: events).
"offline": fully offline, no data is send or api is requested
"update-only": only check whether this is a new version of pygwalker to update
"events": share which events about which feature is used in pygwalker, it only contains events data about which feature you arrive for product optimization. No DATA YOU ANALYSIS IS SEND.
- kanaries_token ['your kanaries token'] (default: empty string).
your kanaries token, you can get it from https://kanaries.net.
refer: https://space.kanaries.net/t/how-to-get-api-key-of-kanaries.
by kanaries token, you can use kanaries service in pygwalker, such as share chart, share config.
options:
-h, --help show this help message and exit
--set [key=value ...]
Set configuration. e.g. "pygwalker config --set privacy=update-only"
--reset [key ...] Reset user configuration and use default values instead. e.g. "pygwalker config --reset privacy"
--reset-all Reset all user configuration and use default values instead. e.g. "pygwalker config --reset-all"
--list List current used configuration.
更多详情,请参考: How to set your privacy configuration?
- PyGWalker论文 PyGWalker: On-the-fly Assistant for Exploratory Visual Data Analysis
- 关于Graphic Walker,参考 Graphic Walker GitHub
- 我们也在开发新一代增强分析型BI:RATH。RATH是新一代智能化数据分析工具。借助AI,因果推断,智能可视化引擎协助你进行数据分析,体验前所未有的自动化。更多请访问:RATH GitHub