Skip to content

Tool to adapt multiple dataframes to one unique format

License

Notifications You must be signed in to change notification settings

lucianosrp/dapter

Repository files navigation

Data + Adapter

Dapter is a convenient tool that helps working with multiple data sources. It allows you to easily rename column names and transform your data in one go.

With Dapter, you can store a series of instructions for your data cleaning routines into custom objects. You can then reuse the object to any DataFrames at any part of your code. See the step-by-step example below.

📝 Example

Renaming columns and adding transformations can be "lazily" set-up in a tuple:

import pandas as pd
from dapter import accepts

def convert_to_eur(col: pd.Series) -> pd.Series:
    return col * 0.92

eur_col = (accepts("Amount USD", "amount_usd","USD"), convert_to_eur)

euro_col is a series of instructions that will tell dapter to:

  • Consider any column that is named after one of the names in accepts
  • Apply convert_to_eur to those columns

Once we have defined all the column "instructions" we can then store them together in a custom object that inherits from dapter.BaseMapper

from dapter import BaseMapper

class TransactionMapper(BaseMapper):
    amount_eur = euro_col

We have just defined that all instructions of euro_col will be assigned to a new column called amount_eur.

This object can then be used to apply all the renaming and transformations stored inside it to any DataFrame

mapper = TransactionMapper()

dfs = mapper.apply(df1, df2, df3)
df = pd.concat(dfs)

🧰 Installation

Using pip:

pip install dapter

🔄 Infinite DataFrame compatibility

Dapter uses narwhals in the background so it can accepts any (See supported1) kind of DataFrame libraries.

Which means you can define Polars Series and Expr transformations for pandas' Series and vice-versa!

You can also feed any DataFrame to the apply method.

Full sample code

from dapter import BaseMapper, accepts, accepts_anycases
import pandas as pd

df1 = pd.DataFrame(
    [
        {
            "Date": "2023-02-01 10:00:01",
            "Vendor Name": "Golden Oil LLC",
            "Amount USD": 49.99,
            "Category": "Personal",
        }
    ]
)

df2 = pd.DataFrame(
    [
        {
            "transaction_date": "2023-03-01 10:00:01",
            "vendor_name": "Get Cars Inc.",
            "amount_usd": 2999.9,
            "category": "Transportation",
        }
    ]
)
df3 = pd.DataFrame(
    [
        {
            "DATE": "2023-04-01 10:00:01",
            "VENDOR_NAME": "Maintainers Exc.",
            "USD": 5249.0,
            "CAT": "Personal",
        }
    ]
)


def convert_to_eur(col: pd.Series) -> pd.Series:
    return col * 0.92

def clean_str(col:pd.Series) -> pd.Series:
    return col.str.to_lower().str.replace(" ","_")

class TransactionMapper(BaseMapper):
    transaction_date = accepts("transaction_date", "Date","DATE")
    vendor_name = accepts_anycases()    
    amount_eur = accepts("Amount USD", "amount_usd","USD"), convert_to_eur
    category = accepts("Category", "category","CAT"), clean_str

mapper = TransactionMapper()

dfs = mapper.apply(df1, df2, df3)
df = pd.concat(dfs)
df
transaction_date vendor_name amount_eur category
2023-02-01 10:00:01 Golden Oil LLC 45.99 personal
2023-03-01 10:00:01 Get Cars Inc. 2999.9 transportation
2023-04-01 10:00:01 Maintainers Exc. 5249.0 personal

Footnotes

  1. cuDF, Modin, pandas, Polars, PyArrow, Dask, Ibis, Vaex

About

Tool to adapt multiple dataframes to one unique format

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages