Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple Enum Support #502

Open
luckydonald opened this issue Apr 19, 2020 · 17 comments
Open

Simple Enum Support #502

luckydonald opened this issue Apr 19, 2020 · 17 comments

Comments

@luckydonald
Copy link

luckydonald commented Apr 19, 2020

Abstract

This proposes support for normal type enums to be supported.

There would be no special tables added to model those inside the database.
They would be implemented only on a pure python level.
The database would simply store the value of those.

Previous implementation attempt, e.g. #346 and #392, needed to change the type of the field to store the names string, thus introducing an overhead and possible headaches like max length of a VARCHAR(x) and such.

Storing the underlaying values, e.g. int of an IntEnum allows it to use the same code used for int, as well as all related settings when defining them in models.

Introduction

In python Enums can have a native type associated with them.

The popular IntEnum (from enum import IntEnum) is defined as the following:

class IntEnum(int, Enum):  
    """Enum where members are also (and must be) ints"""

which would be typically be used as:

class FoobarEnum(IntEnum):
   MANGO = 1
   BANANA = 2
   ANSWER = 42
   SOMETHING = 123
   WORLD_DOMINATION = 666
   EASTEREGG = 4458

The Enum class now automatically makes all the attributes the same type as the Enum class.
So here that's int and Enum
Now, because FoobarEnum (and so FoobarEnum.MANGO, FoobarEnum.BANANA, etc.) are already valid int types:

>>> FoobarEnum.MANGO + FoobarEnum.SOMETHING
124

Those can already be used when writing to the database.

Having pony model

class Whatever(db.Entity):
    id =enum_field = Required(int)

you can easily set that with the enum:

new_whatever = Whatever(enum_field=FoobarEnum.MANGO)
some_whatever = Whatever.get(enum_field=FoobarEnum.BANANA)

The problem here is they won't be converted back to that enum class:

>>> another_whatever.enum_field
4458

>>> some_enum
<FoobarEnum.EASTEREGG: 4458>
>>> isinstance(FoobarEnum.EASTEREGG, int)
True
>>> isinstance(FoobarEnum.EASTEREGG, FoobarEnum)
True

>>> isinstance(another_whatever.enum_field, int)
True
>>> isinstance(another_whatever.enum_field, FoobarEnum)
False

Proposal

The proposal is to store the underlying enum values, as those are basically already implemented, and all the corner cases in queries and such should be ironed out.
So only the retrieval must have an additional step of loading the value in the right Enum type.

This is as easy as FoobarEnum(4458):

val = 4458
py_type = FoobarEnum

>>> py_type(val)
<FoobarEnum.EASTEREGG: 4458>

>>> py_type(val) == FoobarEnum.EASTEREGG
True

Implementation Approaches

Both share that the upgrade would be strait forward replacing a type with any fitting class being subclass of that original type (here: int) and implementing enum.Enum. An example would be enum.IntEnum:

class Whatever(db.Entity):
   id = …
-  enum_field = Required(int,        size=8, min=0, max=4458, nullable=True, default=IntEnum.ANSWER)
+  enum_field = Required(FoobarEnum, size=8, min=0, max=4458, nullable=True, default=IntEnum.ANSWER)
This can be monkey patched for testing purposes, like the following (click to expand)
from pony.orm.dbproviders.mysql import MySQLProvider
from pony.orm.dbproviders.sqlite import SQLiteProvider
from pony.orm.dbproviders.postgres import PGProvider
from pony.orm.dbproviders.oracle import OraProvider
MySQLProvider.converter_classes = [(Enum, EnumConverter)] + MySQLProvider.converter_classes
SQLiteProvider.converter_classes = [(Enum, EnumConverter)] + SQLiteProvider.converter_classes
PGProvider.converter_classes = [(Enum, EnumConverter)] + PGProvider.converter_classes
OraProvider.converter_classes = [(Enum, EnumConverter)] + OraProvider.converter_classes

Both work for enums with native types and the additional types supported by pony.
So class IntEnum(int, Enum), class StrEnum(str, Enum), class BoolEnum(bool, Enum).
Even more unusual enum types like class DateEnum(datetime.datetime, Enum), etc. would work.

Implementation Approach A

from pony.orm.dbapiprovider import IntConverter, StrConverter, Converter
from enum import Enum
from typing import Type


class EnumConverter(Converter):
    @staticmethod
    def _get_real_converter(py_type) -> Type[Converter]:
        """
        Gets a converter for the underlying type.
        :return:
        """
        if issubclass(py_type, int):
            return IntConverter
        elif issubclass(py_type, str):
            return StrConverter
        else:
            raise TypeError('only str and int based enums supported')
        # end if
    # end def

    def __init__(self, provider, py_type, attr=None):
        super(EnumConverter, self).__init__(provider=provider, py_type=py_type, attr=attr)
        self.converter_class = self._get_real_converter(self.py_type)
        self.converter = self.converter_class(provider=self.provider, py_type=self.py_type, attr=self.attr)
    # end if

    def init(self, kwargs):
        self.converter.init(kwargs=kwargs)
    # end def

    def validate(self, val, obj=None):
        assert issubclass(self.py_type, Enum)
        assert issubclass(self.py_type, (int, str))
        return self.converter.validate(val=val, obj=obj)
    # end def

    def py2sql(self, val):
        return self.converter.py2sql(val=val)
    # end def

    def sql2py(self, val):
        return self.converter.sql2py(val=val)
    # end def

    def val2dbval(self, val, obj=None):
        """ passes on the value to the right converter """
        return self.converter.val2dbval(val=val, obj=obj)
    # end def

    def dbval2val(self, dbval, obj=None):
        """ passes on the value to the right converter """
        py_val = self.converter.dbval2val(self, dbval=dbval, obj=obj)
        if py_val is None:
            return None
        # end if
        return self.py_type(py_val)  # SomeEnum(123) => SomeEnum.SOMETHING
    # end def

    def dbvals_equal(self, x, y):
        self.converter.dbvals_equal(self, x=x, y=y)
    # end def

    def get_sql_type(self, attr=None):
        return self.converter.get_sql_type(attr=attr)
    # end def

    def get_fk_type(self, sql_type):
        return self.converter.get_fk_type(sql_type=sql_type)
    # end def
# end class

Possibly self._get_real_converter(self.py_type) can be replaced by simply using self.provider.get_converter_by_py_type(self.py_type).
It is just added above as an extra function to illustrate the working of it better.

Needed changes:

  • The converter_classes need to include EnumConverter as well.
    • It must be one of the first elements there, so that the EnumConverter class would always be chosen before any class implementing the underlying enum type, like int.
    • This needs to be done for all the database providers (pony.orm.dbproviders).
      Example for pony.orm.dbproviders.mysql:
    converter_classes = [
+       (Enum, dbapiprovider.EnumConverter),
        (NoneType, dbapiprovider.NoneConverter),
        (bool, dbapiprovider.BoolConverter),
        (basestring, MySQLStrConverter),
        (int_types, dbapiprovider.IntConverter),
        (float, MySQLRealConverter),
        …
    ]

Implementation Approach B

Here the existing Converters would be modified to support enums.
In their dbval2val they would be modified to include the following code before returning the value:

if is subclass(self.py_type, Enum):
   return self.py_type(val)
return val

Implementation Decision

Probably A is the cleaner approach as that requires less places to modify (or forget to do so), and also works with all database-native types as well as future types as well.

Limitations

The special meaning a enum has, that is both having a name and making sure those values are whitelisted will not be reflected inside the database, it would only be enforced by the python side.

Only enums combining native types (or other types already supported by pony) can be used, as the value is stored, not the keys.
This means no enums combining multiple types.
The following would be impossible:

class MixedEnum(enum.Enum):
  FOO = None
  BAR = True
  BATZ = False
  MORE = 123
  STUFF = "ponies!"
  TO_ADD = datetime.now()
@luckydonald
Copy link
Author

Sample Text

@matee911
Copy link

matee911 commented Jun 9, 2020

Any plans to release this soon? Also, I doesn't look like it supports Enums in select()

@luckydonald
Copy link
Author

@matee911 You mean currently, or one of the planned approaches (A or B)?

@luckydonald
Copy link
Author

luckydonald commented Jan 29, 2021

I'd now change the _get_real_converter of EnumConverter to:

    def _get_real_converter(self, py_type):
        """
        Gets a converter for the underlying type.
        :return: Type[Converter]
        """
        for t, converter_cls in self.provider.converter_classes:
            if issubclass(t, EnumConverter):
                # skip our own type, otherwise this could get ugly
                continue
            # end if
            if issubclass(py_type, t):
                return converter_cls
            # end if
        # end for
        throw(TypeError, 'No database converter found for enum base type %s' % py_type)
    # end def

(no longer a @staticmethod)

That should allow to use all the types the database/pony supports.

@luckydonald
Copy link
Author

I'll mirror the discussion we have in telegram here.

@luckydonald
Copy link
Author

luckydonald commented Feb 10, 2021

Alexander Kozlovsky, 31. Jan 2021:

[1.] I see the practical problem with enums in the fact that many programmers will want to change the list of values some time later. Some of them will change enum values in Python without understanding that the values in the database remain the same, and then their application will stop working correctly.

[2.] I have a look at #502 and #585. In general it looks good. I'm not ready to include it right now, need to think about it a bit more.

[3.] Currently Pony supports Python 2.7 as well, so it should work on 2.7 too.

[4.] On the other side, Pony does not depend on other packages, so I want to avoid specifying dependency to some backported version of Enum for Python 2.7

[5.] The methods _prepare_int_kwargs and _prepare_str_kwargs look too complicated to me. I'm not sure it is good to specify things like max_len looking at the list of specified values. Most users will want to extend the number of possible values later, and with specified max_len or overly restricted int size, it will not be possible to extend the list of elements without some database migration.

[6.] Some people expect from enum to support database enums, for example, for PostgreSQL. I think it is a different topic to add database enums support, but it is good to understand, is the proposed solution compatible with such future addition.

[7.] Will return to this PR on the next week

[Numbers added to easier reference them.]

@luckydonald
Copy link
Author

luckydonald commented Feb 10, 2021

Thanks for taking your time for looking into this. I'll respond to your paragraphs below, counting them from the top, starting at 1.

1) I think the fact that this is no database side enum but actually just the values in the database is a fact which just has to be documented clearly, really.

2) Thanks! I think that will really be helpful for working with enum values.

3+4) Okey, I'll check py2 that in detail.
By the way, how long is support for 2.7 planned to be kept? It went End-of-life over a year ago, after all, so adding a backport (enum34 or aenum) just for those could still be a viable solution. An alternative I could look into would be making that enum import fail in a graceful way which would simply disable enum support on that old system. If you still use python 2.7, you're not used to enums anyway.

5) _prepare_*_kwargs:
I think of those as a convince method. If a user set's any of those values, those will be take into account properly.
Again as in 1, this will be documented in the documentation in detail, so that people who want an enum with a big size can simply set those values appropriately from the start.
As soon as we have a migration in place which can handle the change of the size attribute of a int, this should be possible to mirror into this as well.
But of course that one could always use the biggest int available and call it a day.

6+1) I still think that doesn't really matter for this approach too much. If database enums are added later it could be done in a transparent and forward compatible way. Either opt-in or opt-out. I am thinking Required(SomeEnum, database=True) or something similar (don't pin me on that variable name).

In fact, from reading up on enum types in postgres, I think it is compatible as long as you manually run a CREATE TYPE someenum AS ENUM ('sad', 'ok', 'happy'); and ALTER TABLE "foobar" ALTER some_column TYPE someenum USING some_column::someenum; to create the enum and change the column type of your table to use that manually.

The only thing this is missing is the automised approach, but for that we should probably do migrations first, and thus I would want to keep those topics separated.

[typos fixed]

@luckydonald
Copy link
Author

Additionally I want to add, there was asked about a key based version of this, so an int enum would still store a string like FIRST_VALUE, and I think that could be easily integrated as a store_keys=True key (or whatever the name would end up being), if so desired it should be quite trivial to add as well.

@luckydonald
Copy link
Author

luckydonald commented Feb 10, 2021

luckydonald, 10. February 2021:


3+4) I had a bit of thought about py2 compatibility.
For that I would rewrite it to try to import std-lib Enum, and aenum, but if none of those are available disable that converter.
That way you'd either install aenums or be on a supported system, or you can't use it.

I mean I if there's no Enums in py2 one can simply use normal numbers instead. We don't need to support something which doesn't exist on py2.


5) Regarding _prepare_*_kwargs:
I how about rewriting it so that it does the needed checks (i.e. the given enum fits max_size), but that 'automatic' size finding would be opt-in?
So for example having something like automatic_size=True as parameter to turn on the magic or keep it off and it would simply use the biggest int64 and not set max and min for you.


6) Regarding database enums,
Did you had some time to think about it more, if that would be a deal breaker?

@luckydonald
Copy link
Author

luckydonald commented Feb 10, 2021

Alexander Kozlovsky:

[3+4] I think conditional import is ok


[5] think using int64 is not necessary, as all enums with explicitly defined values should fit to default int size.

The same for strings, I think default str without explicitly specified size should be ok most of the time. In SQLite and PostgreSQL strings may be of any size, and in MySQL it will be VARCHAR(255) by default

automatic_size looks good, how about renaming it to a shorter auto_size?

@luckydonald
Copy link
Author

luckydonald commented Feb 10, 2021

luckydonald:

5.) Sure, will use that.

@arian-f
Copy link

arian-f commented Jun 14, 2021

@luckydonald I'm trying to use your fork. Works very nice with a sqlite-DB 👍 . With a mysql-DB I'm getting this exception:
UnexpectedError: Object Artifact[new:1] cannot be stored in the database. DataError: 1366 Incorrect integer value: 'ProcessingState.NEW' for column 'processingstate' at row 1

is there a reason it shouldn't work against mysql? The traceback doesn't look useful to me, but I can share it of course if you think it is useful

@luckydonald
Copy link
Author

luckydonald commented Oct 10, 2021

@arian-f
Good find. Sorry for not looking into it yet.
Yes, please, if you still have it laying around or can reproduce it.

I would also especially be interested in the parts of the Pony model which breaks it, so basically a minimal code example making it go boom:boom: would be great!

  • Pony model definition
  • Enum definition
  • Python code making the breaking database query
  • Exception message and stack trace

Or considering how long it's been ago, anything you can remember about it 😅

@tweigel-dev
Copy link

tweigel-dev commented Dec 20, 2021

Nice Idea @luckydonald,
is there any ambition merging this?
maybe @luckydonald could open an MR with the necessary changes and we can have a look whether the maintainer will have a blink on it

@austinjp
Copy link

Hi all. Is there any movement on this?

I was bitten by this yesterday and it took me a while to figure out what was going on. For anyone else who Googles their way to this page, my workaround was to use pony.orm.raw_sql like so:

raw = [
    'x.some_col=="{}"'.format(MyEnum.some_attr.value),
    'x.some_other_col=="{}"'.format(MyEnum.some_other_attr.value),
]
res = orm.select(
    x for x in MyDbEntity if orm.raw_sql(
        " and ".join(raw)
    )
)

@luckydonald
Copy link
Author

luckydonald commented Mar 20, 2022

@tweigel-dev In fact has a PR ready (#585), and I did test it a bit as well. See the unit tests there.
Maybe give it a go and say if it's working for you.

@austinjp not that I'm aware of, that PR is waiting for review.

Actually it would need a merge with the new changes, but hopefully that should still work.

@rudaoshi
Copy link

Seems that no enum support are merged. Should this be merged? Enum support is important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants