You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some musings about the merger.
Or more precisely: when do we need to update and merge.
Merging records is the most complex and error-prone action we have.
Don't do it unless necessary.
Process:
harvest: oai from arXiv, feed from publisher - original.V1
conversion to inspire-json - leads to basic_record.V1
enrichment - adds enrichment_record.V1 to basic_record.V1 (visible in HoldingPen)
harvest of updated version - original.v2
conversion to inspire-json - leads to basic_record.V2
Here we could compare basic_record.V2 and basic_record.V1.
No change -> end workflow
If the pdf changed (can we see that?), replace only the fulltext and re-run refextract
enrichment
auto-merge incl. info from BAI tables - leads to (partially-)merged_record.V2 (visible in HoldingPen)
It is difficult (impossible for me) to compare which info really came from arXiv and what should be updated. I'm not sure this is the best procedure.
To determine whether an update and merge is necessary I would base the comparison on the converted INSPIRE json record, not on the original harvest. Publisher metadata can be very rich and might change in a place we don't use. In addition the structure might change. The conversion to json is a good filter to avoid such problems.
"id": 1119139,
"metadata": {
"$schema": "https://labs.inspirehep.net/schemas/records/hep.json",
"_collections": [
"Literature"
],
"_files": [
{
"bucket": "7a52c6cf-2889-4233-8fb6-4fdfccf87f53",
"checksum": "md5:a4c818b1694a6a502a0a2f21674ca92e",
"key": "1807.02123.tar.gz",
"size": 66761,
"version_id": "d8afbc29-0514-43a5-9263-2adef7b8d371"
},
{
"bucket": "7a52c6cf-2889-4233-8fb6-4fdfccf87f53",
"checksum": "md5:005bb51602500a9a0b66c925205e2afd",
"key": "1807.02123.pdf",
"size": 916611,
"version_id": "ba0ccc5d-2c9e-42fd-9dd5-3ecb47bb412a"
}
],
"abstracts": [
{
"source": "arXiv",
"value": "Gravity theories beyond general relativity (GR) can change the properties of gravitational waves: their polarizations, dispersion, speed, and, importantly, energy content are all heavily theory- dependent. All these corrections can potentially be probed by measuring the stochastic gravitational- wave background. However, most existing treatments of this background beyond GR overlook modifications to the energy carried by gravitational waves, or rely on GR assumptions that are invalid in other theories. This may lead to mistranslation between the observable cross-correlation of detector outputs and gravitational-wave energy density, and thus to errors when deriving observational constraints on theories. In this article, we lay out a generic formalism for stochastic gravitational- wave searches, applicable to a large family of theories beyond GR. We explicitly state the (often tacit) assumptions that go into these searches, evaluating their generic applicability, or lack thereof. Examples of problematic assumptions are: statistical independence of linear polarization amplitudes; which polarizations satisfy equipartition; and which polarizations have well-defined phase velocities. We also show how to correctly infer the value of the stochastic energy density in the context of any given theory. We demonstrate with specific theories in which some of the traditional assumptions break down: Chern-Simons gravity, scalar-tensor theory, and Fierz-Pauli massive gravity. In each theory, we show how to properly include the beyond-GR corrections, and how to interpret observational results."
}
],
"acquisition_source": {
"datetime": "2018-07-10T03:36:36.182790",
"method": "hepcrawl",
"source": "arXiv",
"submission_number": "1117913"
},
"arxiv_eprints": [
{
"categories": [
"gr-qc",
"astro-ph.CO",
"astro-ph.HE",
"hep-th"
],
"value": "1807.02123"
}
],
"authors": [
{
"full_name": "Isi, Maximiliano",
},
{
"full_name": "Stein, Leo C.",
}
],
"documents": [
{
"fulltext": true,
"hidden": true,
"key": "1807.02123.pdf",
"material": "preprint",
"original_url": "http://export.arxiv.org/pdf/1807.02123",
"source": "arxiv",
"url": "/api/files/7a52c6cf-2889-4233-8fb6-4fdfccf87f53/1807.02123.pdf"
}
],
"license": [
{
"license": "arXiv nonexclusive-distrib 1.0",
"material": "preprint",
"url": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/"
},
{
"license": "arXiv nonexclusive-distrib 1.0",
"url": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/"
}
],
"preprint_date": "2018-07-05",
"public_notes": [
{
"source": "arXiv",
"value": "18 pages (plus appendices), 1 figure"
}
],
"report_numbers": missing due to problem with merger
"titles": [
{
"source": "arXiv",
"title": "Measuring stochastic gravitational-wave energy beyond general relativity"
}
]
merged_record.V2
I assume this is additional information from enrichment and automerge.
Difficult to say since the steps in between are not accessible to me.
Some musings about the merger.
Or more precisely: when do we need to update and merge.
Merging records is the most complex and error-prone action we have.
Don't do it unless necessary.
Process:
Here we could compare basic_record.V2 and basic_record.V1.
No change -> end workflow
If the pdf changed (can we see that?), replace only the fulltext and re-run refextract
It is difficult (impossible for me) to compare which info really came from arXiv and what should be updated. I'm not sure this is the best procedure.
To determine whether an update and merge is necessary I would base the comparison on the converted INSPIRE json record, not on the original harvest. Publisher metadata can be very rich and might change in a place we don't use. In addition the structure might change. The conversion to json is a good filter to avoid such problems.
Example: arXiv.1807.02123
At arXiv:
Current metadata
[v1] Thu, 5 Jul 2018 18:00:12 GMT (65kb,D)
INSPIRE 1st harvest
HP 1117913
Can anyone find out which arXiv metadata were harvested?
basic_record.V1
record after conversion to json
enrichment_record.V1
Information added during the worklow
Update
HP 1119139
basic_record.V2
looks very much the same as basic_record.V1
merged_record.V2
I assume this is additional information from enrichment and automerge.
Difficult to say since the steps in between are not accessible to me.
The text was updated successfully, but these errors were encountered: