-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database for storing run status #102
Comments
Check this piece in the |
More notes: Try to make this not mandatory, i.e TACA will try to upload run status to a backend database if such a backend is defined in the configuration file, i.e db_backend: couchdb
db_credentials:
user:
password:
url:
port: if not defined, TACA shouldn't crash, but only log a WARNING;
|
Some comments:
EDIT: Is it possible to create this sub-database so general that both NGI and Clinical could make use of it? I'll try to think about what we would need to store to make it work for us. |
Thanks for the comments @robinandeer !
I guess its a trade off between CLI cleanness and abstraction, isn't it? For example, if we decide to do this through entry points we still have to implement the different backends, its just that we also have to implement the CLI part, so would be something like You may argue that we are doing precisely that for archiving, i.e
I don't see how this could affect the idea behind this issue. What we want (at least by now) its a very simple status, i.e
We thought on that as well, the problem is that we don't want a local database because what we want is something that helps the NASs and the processing machines to communicate. For example, remove a run in
That was the idea, so good you're in for that ^^! We could define a simple API on the First set of methods proposed for the API:
Thanks again! Let's keep discussing this ^^ |
This is not necessary or else I'm misunderstanding you perhaps 😅 - You could still deduce the backend from a YAML file! But this isn't such a big deal I guess. If you're interested you can just ask and I'll explain it further
I see. So we are thinking that we want to backup on a per sample level from now on. We will generate FASTQ-files and just get rid of BCL:s in the near future. I guess my question was if the systems will be flexible to handle this sort of thing?
Awesome! I will clue in busy, busy @ingkebil to give his opinion on the practical aspects 😄 |
@robinandeer and me had a discussion yesterday about this. We tried to reach a "consensus" on a design that would work both for NGI and Clinical Genomics (CG). Here it is roughly what we talked about: API design This shouldn't matter for the calls that we already proposed, which are (modified to fit the new design):
Where entity would be run/flowcell for us, sample for CG. @robinandeer , @vezzi , @senthil10 any API call that you can immediately think of? We can always add more later. Database design @vezzi do we (you..) plan to replace the Code design class TACADB():
""" Base class for TACA database.
Takes care of reading credentials from configuration and instantiating
the correct backend.
""""
def __init__(self):
# Read config, detect backend
try:
instantiate_backend(backend, config)
except WhateverError:
logger.error('Could not load backend database, not updating run status')
def get_latest_event(entity):
...
class CouchDBBackend(TACADB):
"""CouchDB backend for TACA.
"""
def __init__(self, **config):
#1. create connection with database
#2. Check "schema" or database
#3. Implement API calls The idea behind this is that it should be fairly easy to add backends to TACA; so we in NGI can develop the one for CouchDB and CG can develop one for... is it MySQL? I would like other's opinions before moving on! Otherwise we'll have fun on our own ^^ |
In my opinion, the "date" entry for an event should always have high resolution, at least to the second, and it doesn't cost much to also store down to millisecond. For debugging purposes and potential future analytics, high-res temporal data is required, and just having date is not good enough. To reflect this, call the field "timestamp". Also, to avoid complications with daylight savings and timezones, the timestamp should always be in UTC, and be stored explicitly as such, to avoid future confusion. E.g. "2015-04-15T14:11:54.725Z" |
Database design API design I've made some new suggestions for what the methods should be named that anyone can comment on! I agree with @pekrau about the dates but I guess we hadn't gotten to the details yet - super! EDIT: changed link to point to actual plugin module |
@pekrau absolutely, that screenshot is only a manually written database entry, totally agree on the date format, ISO format #FTW @robinandeer excellent! |
+1 for @pekrau's suggestion and I also have few questions :) Since this |
@senthil10 We can discuss the implementation, but I don't want to start adding dependencies if they're not 100% needed. |
@guillermo-carrasco and @robinandeer I really like it. API calls seems ok to me, then once we will start to implement them it will be natural to find new one. about replacing FC db... the plan with @Galithil, for now, is to check how we can add HiSeqX FC to the flowcellDB, or if it is better to create a new DB. I do not see a real need anyway to move here the old FC DB, it will contain the same data as now plus status info... On the other hand the risk is that we end up using the status FC-name has a key to access th eFC db creating an external key that is exactly how a non relational DB should not be used..... Anyhow, the discussion on FC-database needs to be hold off for a while, we need first to understand what will happen with HiSeqX FCs |
Would be nice to have a small and simple database just to save the status of the runs, i.e
SEQUENCING
,ARCHIVING
,ARHCIVED
, etc.The idea is to implement it in such a way that the database backend should be abstract/plugable. Basically define an API (in the
Run
class probably) with 2 main functions:get_run_status()
andset_run_status(status)
@vezzi you can use this issue to discuss implementation and/or define status.
The text was updated successfully, but these errors were encountered: