-
-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* feat: schema migration - schema migration as norm namespace in datahike - Closes #13 * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! feat: schema migration * fixup! remove orchestra from prod (#612)
- Loading branch information
1 parent
c0cc8f3
commit 11da6f1
Showing
25 changed files
with
542 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# Schema Migration | ||
|
||
Schema migration with Datahike is the evolution of your current schema into a future schema. | ||
|
||
## Why using the schema migration tool? | ||
You could use the `transact`-fn of the api-ns to apply your schema, but with our | ||
`norm`-ns you can define your migrations centrally and they will be applied once and only | ||
once to your database. | ||
|
||
This helps when your production database has limited accessibility from your developer | ||
machines and you want to apply the migrations from a server next to your production code. | ||
In case you are setting up your database from scratch for e.g. development purpose you can | ||
rely on your schema to be up-to-date with your production environment because you are | ||
keeping your original schema along with your migrations in a central repository. | ||
|
||
## How to migrate a database schema | ||
When we are speaking of changes to your schema, these should always add new definitions and | ||
never change existing definitions. In case you want to change existing data to a new format | ||
you will have to create a new schema and transact your existing data transformed again. A | ||
good intro to this topic [can be found here](https://docs.datomic.com/cloud/schema/schema-change.html). | ||
|
||
## Transaction-functions | ||
Your transaction functions need to be on your classpath to be called and they need to take | ||
one argument, the connection to your database. Each function needs to return a vector of | ||
transactions so that they can be applied during migration. | ||
|
||
Please be aware that with transaction-functions you will create transactions that need to be | ||
held in memory. Very large migrations might exceed your memory. | ||
|
||
## Norms? | ||
Like [conformity for Datomic](https://github.com/avescodes/conformity) we are using the term | ||
norm for our tool. You can use it to declare expectations about the state of your database | ||
and enforce those idempotently without repeatedly transacting schema. These expectations | ||
can be the form of your schema, data in a certain format or pre-transacted data for e.g. | ||
a development database. | ||
|
||
## Migration folder | ||
Preferably create a folder in your project resources called `migrations`. You can however | ||
use any folder you like even outside your resources. If you don't want to package the | ||
migrations into a jar you can just run the migration-functions with a path as string passed. | ||
In your migration-folder you store your migration-files. Be aware that your chosen | ||
migration-folder will include all subfolders for reading the migrations. Don't store | ||
other files in your migration-folder besides your migrations! | ||
|
||
## How to migrate | ||
|
||
1. Create a folder of your choice, for now let's call it `migrations`. In this folder you | ||
create a new file with an edn-extension like `001-my-first-norm.edn`. Preferably you name the | ||
file beginning with a number. Please be aware that the name of your file will be the id of | ||
your migration. Taking into account that you might create some more migrations in the future | ||
you should left-pad the names with zeros to keep a proper sorting. Keep in mind that your | ||
migrations are transacted sorted after your chosen ids one after another. Spaces will be | ||
replaced with dashes to compose the id. | ||
|
||
2. Write the transactions itself into your newly created file. The content of the file needs | ||
to be an edn-map with one or both of the keys `:tx-data` and `tx-fn`. `:tx-data` is just | ||
transaction data in the form of a vector, `:tx-fn` is a function that you can run during the | ||
execution to migrate data from one attribute to another for example. This function needs to | ||
be qualified and callable from the classpath. It will be evaluated during the migration and | ||
needs to return transactions. These transactions will be transacted with `:tx-data` together | ||
in one transaction. | ||
|
||
Example of a migration: | ||
```clojure | ||
{:tx-data [{:db/doc "Place of occupation" | ||
:db/ident :character/place-of-occupation | ||
:db/valueType :db.type/string | ||
:db/cardinality :db.cardinality/one}] | ||
:tx-fn my-transactions.my-project/my-first-tx-fn} | ||
``` | ||
|
||
3. When you are sufficiently confident that your migrations will work you usually want to store | ||
it in some kind of version control system. To avoid conflicts with fellow colleagues we | ||
implemented a security net. Run the function `update-checksums!` from the `datahike.norm.norm` | ||
namespace to create or update a `checksums.edn` file inside your migrations-folder. This file | ||
contains the names and checksums of your migration-files. In case a colleague of yours | ||
checked in a migration that you have not been aware of, your VCS should avoid merging the | ||
conflicting `checksums.edn` file. | ||
|
||
4. To apply your migrations you most likely want to package the migrations into a jar together | ||
with datahike and a piece of code that actually runs your migrations and run it on a server. | ||
You should check the correctness of the checksums with `datahike.norm.norm/verify-checksums` | ||
and finally run the `datahike.norm.norm/ensure-norms!` function to apply your migrations. For | ||
each migration that is already applied there will be a `:tx/norm` attribute stored with the | ||
id of your migration so it will not be applied twice. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,245 @@ | ||
(ns datahike.norm.norm | ||
(:require | ||
[clojure.java.io :as io] | ||
[clojure.string :as string] | ||
[clojure.pprint :as pp] | ||
[clojure.data :as data] | ||
[clojure.edn :as edn] | ||
[clojure.spec.alpha :as s] | ||
[taoensso.timbre :as log] | ||
[hasch.core :as h] | ||
[hasch.platform :as hp] | ||
[datahike.api :as d] | ||
[datahike.tools :as dt]) | ||
(:import | ||
[java.io File] | ||
[java.util.jar JarFile JarEntry] | ||
[java.net URL])) | ||
|
||
(def checksums-file "checksums.edn") | ||
|
||
(defn- attribute-installed? [conn attr] | ||
(some? (d/entity @conn [:db/ident attr]))) | ||
|
||
(defn- ensure-norm-attribute! [conn] | ||
(if-not (attribute-installed? conn :tx/norm) | ||
(:db-after (d/transact conn {:tx-data [{:db/ident :tx/norm | ||
:db/valueType :db.type/keyword | ||
:db/cardinality :db.cardinality/one}]})) | ||
@conn)) | ||
|
||
(defn- norm-installed? [db norm] | ||
(->> {:query '[:find (count ?t) . | ||
:in $ ?tn | ||
:where | ||
[_ :tx/norm ?tn ?t]] | ||
:args [db norm]} | ||
d/q | ||
some?)) | ||
|
||
(defn- get-jar [resource] | ||
(-> (.getPath resource) | ||
(string/split #"!" 2) | ||
first | ||
(subs 5) | ||
JarFile.)) | ||
|
||
(defmulti ^:private retrieve-file-list | ||
(fn [file-or-resource] (type file-or-resource))) | ||
|
||
(defmethod ^:private retrieve-file-list File [file] | ||
(if (.exists file) | ||
(let [migration-files (file-seq file) | ||
xf (comp | ||
(filter #(.isFile %)) | ||
(filter #(string/ends-with? (.getPath %) ".edn")))] | ||
(into [] xf migration-files)) | ||
(dt/raise (format "Norms folder %s does not exist." (str file)) {:folder file}))) | ||
|
||
(defmethod ^:private retrieve-file-list URL [resource] | ||
(if resource | ||
(let [abs-path (.getPath resource) | ||
last-path-segment (-> abs-path (string/split #"/") peek)] | ||
(if (string/starts-with? abs-path "file:") | ||
(->> (get-jar resource) | ||
.entries | ||
enumeration-seq | ||
(filter #(and (string/starts-with? (.getName %) last-path-segment) | ||
(not (.isDirectory %)) | ||
(string/ends-with? % ".edn")))) | ||
(->> (file-seq (io/file abs-path)) | ||
(filter #(not (.isDirectory %)))))) | ||
(dt/raise "Resource does not exist." {:resource (str resource)}))) | ||
|
||
(defmethod ^:private retrieve-file-list :default [arg] | ||
(dt/raise "Can only read a File or a URL (resource)" {:arg arg :type (type arg)})) | ||
|
||
(defn- filter-file-list [file-list] | ||
(filter #(and (string/ends-with? % ".edn") | ||
(not (string/ends-with? (.getName %) checksums-file))) | ||
file-list)) | ||
|
||
(defn filename->keyword [filename] | ||
(-> filename | ||
(string/replace #" " "-") | ||
(keyword))) | ||
|
||
(defmulti ^:private read-edn-file | ||
(fn [file-or-entry _file-or-resource] (type file-or-entry))) | ||
|
||
(defmethod ^:private read-edn-file File [f _file] | ||
(when (not (.exists f)) | ||
(dt/raise "Failed reading file because it does not exist" {:filename (str f)})) | ||
[(-> (slurp f) | ||
edn/read-string) | ||
{:name (.getName f) | ||
:norm (filename->keyword (.getName f))}]) | ||
|
||
(defmethod ^:private read-edn-file JarEntry [entry resource] | ||
(when (nil? resource) | ||
(dt/raise "Failed reading resource because it does not exist" {:resource (str resource)})) | ||
(let [file-name (-> (.getName entry) | ||
(string/split #"/") | ||
peek)] | ||
[(-> (get-jar resource) | ||
(.getInputStream entry) | ||
slurp | ||
edn/read-string) | ||
{:name file-name | ||
:norm (filename->keyword file-name)}])) | ||
|
||
(defmethod ^:private read-edn-file :default [t _] | ||
(dt/raise "Can not handle argument" {:type (type t) :arg t})) | ||
|
||
(defn- read-norm-files [norm-list file-or-resource] | ||
(->> norm-list | ||
(map (fn [f] | ||
(let [[content metadata] (read-edn-file f file-or-resource)] | ||
(merge content metadata)))) | ||
(sort-by :norm))) | ||
|
||
(defn- compute-checksums [norm-files] | ||
(->> norm-files | ||
(reduce (fn [m {:keys [norm] :as content}] | ||
(assoc m | ||
norm | ||
(-> (select-keys content [:tx-data :tx-fn]) | ||
h/edn-hash | ||
hp/hash->str))) | ||
{}))) | ||
|
||
(s/def ::tx-data vector?) | ||
(s/def ::tx-fn symbol?) | ||
(s/def ::norm-map (s/keys :opt-un [::tx-data ::tx-fn])) | ||
(defn- validate-norm [norm] | ||
(if (s/valid? ::norm-map norm) | ||
(log/debug "Norm validated" {:norm-map norm}) | ||
(let [res (s/explain-data ::norm-map norm)] | ||
(dt/raise "Invalid norm" {:validation-error res})))) | ||
|
||
(defn- neutral-fn [_] []) | ||
|
||
(defn- transact-norms [conn norm-list] | ||
(let [db (ensure-norm-attribute! conn)] | ||
(log/info "Checking migrations ...") | ||
(doseq [{:keys [norm tx-data tx-fn] | ||
:as norm-map | ||
:or {tx-data [] | ||
tx-fn 'datahike.norm.norm/neutral-fn}} | ||
norm-list] | ||
(log/info "Checking migration" norm) | ||
(validate-norm norm-map) | ||
(when-not (norm-installed? db norm) | ||
(log/info "Running migration") | ||
(->> (d/transact conn {:tx-data (vec (concat [{:tx/norm norm}] | ||
tx-data | ||
((var-get (requiring-resolve tx-fn)) conn)))}) | ||
(log/info "Done")))))) | ||
|
||
(defn- diff-checksums [checksums edn-content] | ||
(let [diff (data/diff checksums edn-content)] | ||
(when-not (every? nil? (butlast diff)) | ||
(dt/raise "Deviation of the checksums found. Migration aborted." {:diff diff})))) | ||
|
||
(defmulti verify-checksums | ||
(fn [file-or-resource] (type file-or-resource))) | ||
|
||
(defmethod verify-checksums File [file] | ||
(let [norm-list (-> (retrieve-file-list file) | ||
filter-file-list | ||
(read-norm-files file)) | ||
edn-content (-> (io/file (io/file file) checksums-file) | ||
(read-edn-file file) | ||
first)] | ||
(diff-checksums (compute-checksums norm-list) | ||
edn-content))) | ||
|
||
(defmethod verify-checksums URL [resource] | ||
(let [file-list (retrieve-file-list resource) | ||
norm-list (-> (filter-file-list file-list) | ||
(read-norm-files resource)) | ||
edn-content (-> (->> file-list | ||
(filter #(-> (.getName %) (string/ends-with? checksums-file))) | ||
first) | ||
(read-edn-file resource) | ||
first)] | ||
(diff-checksums (compute-checksums norm-list) | ||
edn-content))) | ||
|
||
(defmulti ^:private ensure-norms | ||
(fn [_conn file-or-resource] (type file-or-resource))) | ||
|
||
(defmethod ^:private ensure-norms File [conn file] | ||
(let [norm-list (-> (retrieve-file-list file) | ||
filter-file-list | ||
(read-norm-files file))] | ||
(transact-norms conn norm-list))) | ||
|
||
(defmethod ^:private ensure-norms URL [conn resource] | ||
(let [file-list (retrieve-file-list resource) | ||
norm-list (-> (filter-file-list file-list) | ||
(read-norm-files resource))] | ||
(transact-norms conn norm-list))) | ||
|
||
(defn ensure-norms! | ||
"Takes Datahike-connection and optional a java.io.File object | ||
or java.net.URL to specify the location of your norms. | ||
Defaults to the resource `migrations`. | ||
Returns nil when successful and throws exception when not. | ||
Ensures your norms are present on your Datahike database. | ||
All the edn-files in this folder and its subfolders are | ||
considered migration-files aka norms and will be transacted | ||
ordered by their names into your database. All norms that | ||
are successfully transacted will have an attribute that | ||
marks them as migrated and they will not be applied twice." | ||
([conn] | ||
(ensure-norms! conn (io/resource "migrations"))) | ||
([conn file-or-resource] | ||
(ensure-norms conn file-or-resource))) | ||
|
||
(defn update-checksums! | ||
"Optionally takes a folder as string. Defaults to the | ||
folder `resources/migrations`. | ||
Returns nil when successful and throws exception when not. | ||
All the edn-files in the folder and its subfolders are | ||
considered migration-files aka norms. For each of | ||
these norms a checksum will be computed and written to | ||
the file `checksums.edn`. Each time this fn is run, | ||
the `checksums.edn` will be overwritten with the current | ||
values. | ||
This prevents inadvertent migrations of your database | ||
when used in conjunction with a VCS. A merge-conflict | ||
should be raised when trying to merge a checksums.edn | ||
with stale data." | ||
([] | ||
(update-checksums! "resources/migrations")) | ||
([^String norms-folder] | ||
(let [file (io/file norms-folder)] | ||
(-> (retrieve-file-list file) | ||
filter-file-list | ||
(read-norm-files file) | ||
compute-checksums | ||
(#(spit (io/file norms-folder checksums-file) | ||
(with-out-str (pp/pprint %)))))))) |
Oops, something went wrong.