Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid mobility issues #2465

Closed
wants to merge 1 commit into from
Closed

avoid mobility issues #2465

wants to merge 1 commit into from

Conversation

fgnievinski
Copy link
Contributor

@fgnievinski fgnievinski commented Sep 16, 2019

Copy link
Contributor

@israelcefrin israelcefrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents Google bot to index and crawl HTML chunks generated by citation plugin for reference section in the article detail page. The "*" makes this solution works for single and multi-journal installation.

@asmecher
Copy link
Member

If I understand this correctly, the proposal prevents indexing of the citations entirely in order to avoid a layout warning in the Google analysis tools, correct? Is it not possible to adujst the presentation of the citations to satisfy the analyser's requirements, or to exclude that part of the page from analysis but not indexing? It feels like taking a sledgehammer to crack a nut.

@israelcefrin
Copy link
Contributor

The code which is being prevented from indexing is a piece like this:

<div class="csl-bib-body">
<div class="csl-entry">Stranack, K. (2018) “Editorial”, <i>OJS3 Testdrive Journal</i>, 
1(3). Available at: https://demo.publicknowledgeproject.org/ojs3/testdrive/index.php/testdrive-journal/article/view/710 
(Accessed: 18September2019).</div>
 </div>

Which is called from a URL like this one: https://demo.publicknowledgeproject.org/ojs3/testdrive/index.php/testdrive-journal/citationstylelanguage/get/harvard-cite-them-right?submissionId=710

This chunk is embedded into citation box:
image

As far I can see, this HTML is not accessed directly by human users, only by OJS citation fetching script and machines, i.e.: Googlebot.
We could add a whole HTML page structure presentation, i.e.: HTML | HEAD > META-TAG:VIEW-PORT > BODY > DIV ... to satisfy the Google analysis tool , however it will impact in adjusting the citation box fetching also.

@asmecher
Copy link
Member

@israelcefrin, it's a JSON exchange between the article page and the Citation Style Language plugin that fetches and presents the citation, not e.g. a HTML page load containing only the citation contents. So I'm pretty sure the Google tool is emulating a browser and correctly handling the JSON load within the article view page, rather than trying to present the citation format request stand-alone.

From the thread linked at the top, the two warnings still appearing on the webmaster tool are

  • Text too small to read
  • Viewport not set

Could those not be legitimate warnings?

@israelcefrin
Copy link
Contributor

israelcefrin commented Sep 18, 2019

@asmecher Google bot crawls the source-code of the article detail page.

In the article detail page we have the citation formats being called with a hyperlink (A) HREF (HTML) parameter and a data-load-citation data-json-href (JSON) parameter, the second has the return=json flag.

<ul class="citation_formats_styles">
<li>
<a aria-controls="citationOutput" 
href="https://demo.publicknowledgeproject.org/ojs3/testdrive/index.php/testdrive-journal/citationstylelanguage/get/acm-sig-proceedings?submissionId=710"
data-load-citation data-json-href="https://demo.publicknowledgeproject.org/ojs3/testdrive/index.php/testdrive-journal/citationstylelanguage/get/acm-sig-proceedings?submissionId=710&amp;return=json"
>ACM</a>
 </li>

The former is being indexed by Google Bot, which crawls every HREF value in a page, and ignores data-load-citation data-json-href. It is the first HREF which is assessed by Google tool and returns the issues.

I think that A HREF is a fallback in case that JSON doesn't load for any reason and the citation content is still accessible in such case.

I've submit only this value for a test and you can see that Text too small to read and Viewport not set are real issues in a small screen.

https://search.google.com/test/mobile-friendly?id=d9T6yCbU_2crKT12rZZe7g

OTOH we could also adjust the output code in the plugin to add a rel="nofollow" to the A HREF, however Google itself recommends to use robots.txt[1] for internal links and this parameter only for external links.
[1] https://support.google.com/webmasters/answer/96569

@asmecher
Copy link
Member

Ah, yes, I see that now. Mind if I punt this over to @NateWr? The robots.txt approach still feels too heavy-handed to me, but whether a no-follow would be better (or a changed approach to that drop-down) is too front-end-best-practicesy for me.

@israelcefrin
Copy link
Contributor

Mind if I punt this over to @NateWr?

Not at all, please feel free to hop in @NateWr.
I hadn't tagged him because I thought he was away this week.

On that note, just a screenshot showing how Google bot sees the Citation box when crawling it (I've disable CSS, it is like a screenreader sees it also):
image

@NateWr
Copy link
Contributor

NateWr commented Sep 23, 2019

I agree that intervening at the robots.txt level is not a good solution here. Automated tools are always going to throw up false positives and I would consider this to be bordering on a false positive (ie - technically correct but probably not relevant).

In this case, the URL that does not return JSON is intended to act as a fallback for browsers with JavaScript disabled.

The appropriate solution is either to get rid of the raw HTML fallback or to improve it so that it's valid HTML. Two options:

  1. Convert the <a> elements to <button> elements and require JavaScript. Semantically this is more correct and would also more closely align with accessibility recommendations as I understand them. This would involve removing the non-JSON return option in CitationStyleLanguageHandler.

  2. Update the HTML output to return valid doctype, <html>/<body> wrappers, and CSS code to break long links on mobile devices.

Option 1 is probably want we want to be moving towards. But Option 2 would preserve compatibility with existing theme setups.

I'd recommend Option 2 for now, with a plan to use Option 1 in all future themes.

cc @sssoz and @Vitaliy-1 regarding future themes.

@fgnievinski
Copy link
Contributor Author

maybe this should PR be converted to an issue, to reach a broader consensus?

@fgnievinski
Copy link
Contributor Author

OTOH we could also adjust the output code in the plugin to add a rel="nofollow" to the A HREF, however Google itself recommends to use robots.txt[1] for internal links and this parameter only for external links.
[1] https://support.google.com/webmasters/answer/96569

a third related alternative is to use the noindex directive:

<meta name="robots" content="noindex">

https://developers.google.com/search/docs/crawling-indexing/block-indexing

@fgnievinski
Copy link
Contributor Author

related to #7183:

with several citation styles enabled, google search console reports thousands of "pages" (citation snippets) affected.

@kaitlinnewson
Copy link
Member

Related issue with linked PRs: pkp/pkp-lib#10474

Going to close this PR in favour of the "nofollow" approach for now, but feel free to re-open or add comments in the issue if further discussion is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants