mysql_pd_bulk (MySQL bulk insert from pandas dataframes)

Quickly set up MySQL databases from large static datafiles (MySQL Community Server)

Multiple datafiles can be inserted and connected through Foreign Keys, all in one step.

For each datafile, the following input is needed:

A chunked pandas dataframe, generated by one of it's many reader functions
(each chunk is processed by one insert statement)
A dictionary with table specifications:
- keys: column names (only specified columns are inserted)
- values: datatypes + additional specifications (like "PRIMARY KEY")
- nested dictionaries (optional): hold foreign key constraints

Check out the Jupyter notebook "demo" to see the module in action.

Note:

The pandas library already provides the convenient pd.DataFrame.to_sql() method. It uses SQLAlchemy and allows bulk inserts (full dataframe at once, or chunkwise). However, when selecting "mysqlconnector" as driver in the SQLAlchemy engine, I found the bulk insert to be slow. Using the MySQLCursor.executemany() method directly from mysql.connector seems to work a lot faster. Since I wanted to stick with this particular driver, here's my attempt at creating a custom bulk insert function.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
demo_data		demo_data
.gitignore		.gitignore
README.md		README.md
bulk_insert.py		bulk_insert.py
demo.ipynb		demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mysql_pd_bulk (MySQL bulk insert from pandas dataframes)

Note:

About

Releases

Packages

Languages

AntonHardock/mysql_pd_bulk

Folders and files

Latest commit

History

Repository files navigation

mysql_pd_bulk (MySQL bulk insert from pandas dataframes)

Note:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages