Skip to content

Latest commit

 

History

History
143 lines (99 loc) · 3.26 KB

README.md

File metadata and controls

143 lines (99 loc) · 3.26 KB

BFH

A Python DSL for schema transformations

Docs: http://percolate.github.io/bfh/

bfh

Look, you can put square pegs in round holes:

class SquarePeg(Schema):
    id = IntegerField()
    name = UnicodeField()
    width = NumberField()


class RoundHole(Schema):
    id = UnicodeField()
    name = UnicodeField()
    diameter = NumberField()


def largest_square(width):
    return math.sqrt(2 * width**2)


class SquarePegToRoundHole(Mapping):
    source_schema = SquarePeg
    target_schema = RoundHole

    id = Concat('from_square', ':', Str(Get('id')))
    name = Get('name')
    diameter = Do(largest_square, Get('width'))

my_peg = SquarePeg(id=1, name="peggy", width=50)

transformed = SquarePegToRoundHole().apply(my_peg).serialize()
transformed['id']
# u'from_square:1'
transformed['name']
# u'peggy'
transformed['diameter']
# 70.71067811865476

BFH is a DSL for mapping blobs to other blobs. It can map dict-ish or object-ish things... as long as you got names and the names got values, you can whack it into shape. The use of explicit schema objects is totally optional... a mapping can be used without input and output schemas... the names in the mapping are all you really need. Viz.

class ImpliesSchemas(Mapping):
    id = Concat('author', ':', Get('nom_de_plume'))
    name = Get('nom_de_plume')
    book = Get('best_known_for')

source = {
    "nom_de_plume": "Mark Twain",
    "best_known_for": "Huckleberry Finn"
}
output = ImpliesSchemas().apply(source)

type(output)
# <class 'bfh.GenericSchema'>
output.serialize().keys()
# ['book', 'id', 'name']

Explicit schemas, however, can help preserve your sanity when things get complex.

Validation

While BFH can validate a schema, it's not primarily a validation library, OK? There are lots of those out there, so we don't get too fancy here. Just some sanity checking.

my_peg = SquarePeg(id=1, name="peggy", width=50)
my_peg.validate()
# True

broken_peg = SquarePeg(id=2, name="broken", width="a hundred")
broken_peg.validate()
# ... (raises Invalid)

Reserved Words

Obviously your schema needs a field called 'if' or 'finally'. Use a double-underscore name and all will be well:

class Fancy(Schema):
    # if = IntegerField()  # Ouch! SyntaxError!
    __if = IntegerField()

You can init in any of these ways:

Fancy(__if=1)
Fancy(**{"__if": 1})
Fancy(**{"if": 1})

In a mapping, use the de-dundered name

Get("if")

Implicit null values and serialization

Given a sample dataset:

    {
        "name": "peg",
        "age": null
    }

If you call serialize with the default kwargs, you would get the following behavior:

transformed = SomeMapping().apply(parsed_data).serialize()
>>> {'name': 'peg', 'age': None}

But if implicit_nulls is set to True, you would get:

transformed = SomeMapping().apply(parsed_data).serialize(implicit_nulls=True)
>>> {'name': 'peg'}

This can be useful to filter out null values from a dataset on serialization.

Build Status

Tested on Python 2.7.10 and 3.5.0:

Circle CI