-
Notifications
You must be signed in to change notification settings - Fork 769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make EM the generic translator. Addresses #1092 #1434
base: master
Are you sure you want to change the base?
Conversation
My main concern here is duplicating the page scan for DOI. It's probably quite fast, but we obviously want to be very careful with translators that run on every page load, so we should at least do some profiling to see how many milliseconds it would be adding. But I agree that we can merge the |
I have tested it for functionality and it works! Otherwise, yes, there are somewhat significant, but expected differences between EM and the current code we have. |
21dbb16
to
72f5ceb
Compare
Okay, so I've pushed just the change that allows EM to save even when it doesn't detect any significant tags. Good to review and merge. |
See the scan code. It is an XPath query and then running a regexp for each returned result. The load time we're adding for each page is proportional to the length of the page, at O(n log n). On this webpage (which is one of the testcases for DOI |
@adam3smith please take a look at this when you have time |
Should I be seeing this?
Because that doesn't appear to be the case. e.g. still getting the grey Webpage icon on |
No, we took that out for the moment — the only thing necessary for this version is make sure we're OK applying the EM low-quality save to all webpages, in place of the current special-case webpage code (which just saves document.title, URL, and access date), noting that these would include pages where no effort was made to embed metadata. I haven't looked at the whole function, but I would say, for example, that we don't want to assign the hostname as Library Catalog for all webpages (or at all for EM saves). And I'm not sure if including the |
The tags will be emptied if EM is not called from another translator, see https://github.com/adomasven/translators/blob/21dbb16f0f9e6a54ef5109a8fa58f75a3ba8a0c6/Embedded%20Metadata.js#L821-L823 |
We should probably do the same thing for Otherwise the function adds:
Creators and abstract note quality will vary wildly. I'm not sure if even the worst possible abstract is worse than no abstract. A bad creator, on the other hand is worse than no creator when citing. So maybe add creators only for saves from another translator too? (Although if creators come from highwire metadata, then they're okay to include. This is specifically applicable to low quality fields.) |
We can look around, but I think we want to keep the author meta tag — I’m sure it’s often wrong, but it’s a legit field that, unlike |
72f5ceb
to
e7ee6c3
Compare
I've updated this to exclude libraryCatalog for non-child-EM-translator saves. There will be a follow up after we merge some changes into the connector. Awaiting approval. |
I'm sorry I'm dense here (and by all means feel free to merge this if it's blocking other things). But if you want me to look at it more closely, could someone give me a page where to see the changed behavior at work? |
Ugh, sorry. The main change was supposed to be this, although it seems that I'll push the initial changes to this PR and we can start over, although we need to merge the connector code first, so it will have to wait. Once again, sorry to bother you. |
no worries, thanks for clarifying. |
As per this comment:
The changes in the connector are coming soon, but we should get this EM translator in circulation early. The consequence of this update is that connector will detect EM translator for all previous pages that only displayed webpage saving options.
Alternatively, we should at least merge the part in
importRDF()
that will allow the EM translator to save even wheninit()
doesn't detect a viable item type, such that when push out the updated Connector and it tries to save with EM translation doesn't break.Awaiting comments from @dstillman regarding connector related stuff.