What form should that database (or repository) be for storing experimental data? #1

mrshirts · 2014-04-02T03:08:00Z

No description provided.

leeping · 2014-04-02T04:06:20Z

Hi Michael,

I think that this repository could either be highly flexible and contain information as we find it (e.g. mixed-format tables and papers from the literature), or it could be formal and curated, or it could contain both (i.e. "raw data" and "curated data" folders).

Yesterday we had a discussion for how to best store experimental data in a ForceBalance calculation - but I am not sure if this format is suitable for a repository: leeping/forcebalance#59

John suggested using one of the existing standards for experimental data formats like IASTAB. I don't think it's the best choice for ForceBalance because it could overcomplicate simple jobs - plus it would take me too long to write a parser that fully conforms to the standard - but it might be a good solution for more "long term" data storage.

davidlmobley · 2014-04-11T21:40:57Z

So, in general I don't think files as exported by Excel are a good choice
for a flexible data format. Updates by script (for example) could be
nontrivial as one would then have to ensure the output is identical to what
would have been obtained exporting from Excel.

Also, while delimited text files can be helpful in some cases, these can be
particularly problematic in others. For example, IUPAC names can contain
BOTH spaces AND commas. In a space-delimited file, the spaces obviously
present problems. Likewise, in a comma delimited file the commas present
problems. When I use a delimited file for chemical information, I typically
end up having to use alternate delimiters (currently I'm using ';') which
are not particularly Excel-friendly. Presumably if the data contains URLs
(which it would if linking to papers) problems with special characters in
URLs could also pose problems.

I think a better solution would be some type of XML or XML-like format. I
propose not reinventing the wheel; instead, see what Python libraries are
available, probably for XML libraries, and just adopt a format which will
work with those. Plan on making a tool which will update the libraries, and
another tool which can dump the library into a human readable format for
easy perusal. This could use tabs for delimiting data, and since it
wouldn't be necessary to parse this to/from any other format there would be
no problems with a parser needing to be able to decipher the delimiters.

David

On Tue, Apr 1, 2014 at 8:08 PM, Michael Shirts [email protected]:

Reply to this email directly or view it on GitHubhttps://github.com//issues/1
.

David Mobley
Assistant Professor
Department of Pharmaceutical Sciences
Department of Chemistry
3134B Natural Sciences I
University of California, Irvine
Irvine, CA 92697
[email protected]
work (949) 824-6383
cell (949) 385-2436

mrshirts added the question label Apr 2, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What form should that database (or repository) be for storing experimental data? #1

What form should that database (or repository) be for storing experimental data? #1

mrshirts commented Apr 2, 2014

leeping commented Apr 2, 2014

davidlmobley commented Apr 11, 2014

What form should that database (or repository) be for storing experimental data? #1

What form should that database (or repository) be for storing experimental data? #1

Comments

mrshirts commented Apr 2, 2014

leeping commented Apr 2, 2014

davidlmobley commented Apr 11, 2014