-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from pickled blobs to JSON data #1786
base: master
Are you sure you want to change the base?
Conversation
If we are moving from BLOBs to JSON then we should really use the new format. See PR #800. The new format uses the The main benefit of the new format is that it is easier maintain and debug. Instead of lists we use dictionaries. So, for example, we refer to the field "parent_family_list" instead of field number 9. Upgrades are no problem. We just read and write the raw data. When I have more time I'll update you on discussion whilst you have been away. |
Oh, that sounds like a great idea! I'll take a look at the JSON format and switch to that. Should work even better with the SQL JSON_EXTRACT(). |
There are a few places where the new format is used, so we will get some bonus performance improvements. Feel free to make changes to my existing code if you see a benefit. You may also want to have a quick look at how we serialize |
Making some progress. Turns out, the serialized format had leaked into many other places, probably for speed. Probably good candidates for business logic. |
I added a |
@Nick-Hall , I will probably need your assistance regarding the complete save/load of the to_json and from_json functions. I looked at your PR but as it touches 590 files, there is a lot there. In this PR, I can now upgrade a database, and load the people views (except for name functions which I have to figure out). |
Thanks @Nick-Hall, that was very useful. I think that I will cherry pick some of the changes (like attribute name changes, elimination of private attributes). You'll see that I did many of the same changes you made. But, one thing I found is that if we want to allow upgrades from previous versions, then we need to be able to read in blob_data, and write out json_data. I think my version has that covered. I'll continue to make progress. |
@dsblank Why are you removing the properties? The validation in the setters will no longer be called. |
@Nick-Hall , I thought that was what @prculley did for optimization, and I thought was needed. I can put those back :) |
Perhaps we could consider a solution similar to that provided by the pickle A A I expect that only a handful of classes would need to override the default methods. |
Yes, I agree that this is the best approach. |
I can make arguments for either approach. Consistency is of greater value. I can't imagine the strings will change frequently. |
@Nick-Hall, the one thing I didn't convert was the metadata; it is still in a pickled format. I didn't even look at the format of metadata. I suspect that we'll want to remove pickle from there, too. That can be done in a follow-up PR, or here. |
Have a look at how I do it in the MongoDB backend. All the tables become collections of JSON documents. It's only really a proof of concept so feel free to use a different JSON structure. Getting rid of all the pickled data is probably the way to go. I don't mind if you continue with this PR. |
@Nick-Hall, thanks for the pointer to your MongoDB database implementation. I stole the basic framework. This was a little tricky as I was using the version to know whether to use JSON or blob, and the version is of course stored in the metadata. But I moved away from using the version number and made a probe to see if the DB supports JSON. I think that is the last thing I'd want to add to this PR. |
Co-authored-by: stevenyoungs <[email protected]>
Co-authored-by: stevenyoungs <[email protected]>
Co-authored-by: stevenyoungs <[email protected]>
Co-authored-by: stevenyoungs <[email protected]>
This PR converts the database interface to use JSON data rather than the pickled blobs used since the early days.
db.serializer
a. abstracts data column name
b. contains serialize/unserialize functions
a. It does this by switching between serializers