Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The results of a Google search for "oniguruma" are crazy! (in Japan) #234

Open
kkos opened this issue Apr 30, 2021 · 39 comments
Open

The results of a Google search for "oniguruma" are crazy! (in Japan) #234

kkos opened this issue Apr 30, 2021 · 39 comments
Labels

Comments

@kkos
Copy link
Owner

kkos commented Apr 30, 2021

If you do a Google search for the keyword "oniguruma" you'll see some very strange results. The first few links that appear on the first page are related to the keyword oniguruma, but the rest of the pages are mostly made up of completely unrelated links. I noticed this last August. However, this may be the case only in Japan. I don't know what is going on in other parts of the world.

The rest of this article is written below.
https://kkos.fc2.net/blog-entry-1.html

@kkos kkos added the question label May 27, 2021
@kkos kkos pinned this issue May 27, 2021
@kkos
Copy link
Owner Author

kkos commented May 27, 2021

The attack on Google search is still going on.
I still don't know what it looks like outside of Japan.
When I registered this Issue, the behavior changed a little, so I think the criminal is looking at this page.

@dbqpdb
Copy link

dbqpdb commented Jun 4, 2021

From San Francisco:
Screen Shot 2021-06-03 at 19 23 22

@kkos
Copy link
Owner Author

kkos commented Jun 4, 2021

Thank you.
For the first time, I was able to learn how things look outside of Japan.
At least there doesn't seem to be anything weird in the first page.
In my environment, a dozen or so pages of mostly irrelevant stuff are displayed.

@ruigazio
Copy link

ruigazio commented Jun 4, 2021

Portugal. Page count is under 100k instead of 1.14M
oniguruma

@kkos
Copy link
Owner Author

kkos commented Jun 5, 2021

The number of searches in Japan is close to that.
It turns out that there will be little impact outside of Japan.

@andreseduardop
Copy link

This is how it looks today in Colombia.

oniguruma - Google Search_Página_1

@kkos
Copy link
Owner Author

kkos commented Jun 8, 2021

Most of the unrelated pages I see here are in Japanese, so it seems to be fine for non-Japanese areas.
If you don't see any unfamiliar or unusual characters (Japanese characters: kanji, hiragana, etc.) within the first few pages, you should be fine.
This may be due to the fact that the culprits are in Japan, where they are mechanically manipulating clicks to increase their rankings.

@SergioInToronto
Copy link

In Canada the results on Google and DuckDuckGo are similar to those posted above. Looks fine to me.
image

@Gerst20051
Copy link

The search results also look fine for me in Los Angeles, CA 👍

@SamuelMarks
Copy link

Works fine in Sydney, Australia

@kanevbg
Copy link

kanevbg commented Aug 9, 2021

From Bulgaria:
image

Repository owner deleted a comment from shenlebantongying Aug 12, 2021
@kkos
Copy link
Owner Author

kkos commented Sep 12, 2021

I don't think so.
Even here, the first six or seven of the first page will be the relevant pages. But after that, it's mostly filled with irrelevant pages for more than ten pages.
In other words, most of what comes up in a search is irrelevant links.
In your image, "すること。7 記載容量6は、営業外収益の「その他」" and "70 花粉発生源対策推進事業" probably have nothing to do with Oniguruma.

@HeveraletLaidCenx
Copy link

HeveraletLaidCenx commented Sep 23, 2021

image

Hi there~ Here's the result for me, seems fine? I from China and use a global network _(:3

I'm not sure if you've tried changing Google's search settings? There're sth about the region and languages for the search result ...

btw, I used Singapore as the region setting (for some sorry reason what I won't to entangled in), and I set the languages for the search result to 简体中文、繁體中文、English and 日本語。

@kkos
Copy link
Owner Author

kkos commented Sep 23, 2021

Thank you.
I am convinced that the Japanese search results are abnormal and that the non-Japanese search results are normal.
I just checked the contents of the two links in the previous example by @mmizutani.

Neither of them contains the strings "Oniguruma" or "鬼車", and neither of them has anything to do with Oniguruma.
Moreover, this is the result of the first page, and the next pages are full of irrelevant links.
Although @mmizutani hasn't produced a second page, I'm convinced of that from my own results.
I have no idea about the impact of where you search.

@HeveraletLaidCenx
Copy link

Confirmed, After I tried changing the region to Japan, the search results showed these completely unrelated items ... Trying to find the reason

@HeveraletLaidCenx
Copy link

image

seems changed the search options to 完全一致 from tools would help,and notice that most those things are PDF Doc

@HeveraletLaidCenx
Copy link

I have a guess about it... Weather is it possible that Google parsed all that content into romanization and then split it to match and lead to this...

@kkos
Copy link
Owner Author

kkos commented Sep 23, 2021

You're right, most of the irrelevant links are PDFs.
But not all of them, maybe 60%.
When I set it to exact match, the irrelevant links disappeared.
That doesn't mean that the cause isn't an attack.

@kkos kkos unpinned this issue Nov 5, 2021
@kocoten1992
Copy link

It fine from Vietnam.

image

@kkos kkos changed the title The results of a Google search for "oniguruma" are crazy! The results of a Google search for "oniguruma" are crazy! (in Japan) Mar 19, 2022
@tonco-miyazawa
Copy link

tonco-miyazawa commented Apr 6, 2022

Zip file of screenshots ( canada, france, indonesia, Taiwan ) 9.47MB
https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/234ver3.zip

@kkos
Copy link
Owner Author

kkos commented Apr 9, 2022

I looked at your search results.
I used to think that the results only depended on the language, but now I know that it depends on the language and the location.
In other words, the results are terrible when you search in Japan specifying Japanese, and not so terrible otherwise.
However, your search also showed that the effects of this attack are not entirely absent outside of Japan.
Some examples are shown below.
These have nothing to do with Oniguruma.
And these are links that I have seen many times.

france_ja_p6, indonesia_ja_p7, Taiwan_ja_p6
円行東自治会 - FC2
https://engyouhigashi.web.fc2.com/inout-hiritu.html

canada_ja_p6, france_ja_p4, indonesia_ja_p4, Taiwan_ja_p3
持 続 可 能 な 医 療 保 険 制 度 を 構 築 す る た め の 国 民 健
https://www.sangiin.go.jp/japanese/gianjoho/ketsugi/189/f069_052601.pdf

france_ja_p6, indonesia_ja_p5, Taiwan_ja_p4
食品流通合理化促進事業
https://www.maff.go.jp/j/shokusan/sijyo/info/attach/attach/pdf/sijyou_yosan2-9.pdf

canada_ja_p7, france_ja_p7, indonesia_ja_p6
お 困 り の 方 へ 騒 音 や 悪 臭 な ど で
https://www.city.tochigi-sakura.lg.jp/manage/contents/upload/61bb4ab4dbec7.pdf

In canada_ja, p7 is more of irrelevant links.
indonesia_ja is more of irrelevant links from p6.
Taiwan_ja is more of irrelevant links from p4.

@tonco-miyazawa,
I would like to know what happens to the "other keywords" that appear below the results when I search for oniguruma and specify a time period of 24 hours or less.
Here are the results I just ran (in Japan, in Japanese)
Screen shot 2022-04-09 22 43 13
These bullshit words have been showing up at a high rate for nearly two years.

@kkos kkos added status and removed question labels Apr 29, 2022
@kkos
Copy link
Owner Author

kkos commented May 1, 2022

@tonco-miyazawa
I did not notice the April 15 addendum until today.

I wrote a rebuttal on my blog. (In both English and Japanese.).
https://kkos.fc2.net/blog-entry-2.html

@tonco-miyazawa
Copy link

After a re-investigation, I found that my idea was wrong.
再調査をしたところ、私の考えが間違っていたことが分かりました

I deleted the previous remarks.
私は以前の発言を削除しました

I'm sorry about that remark.
ご迷惑をおかけしてすみませんでした

@Befzz
Copy link

Befzz commented Nov 25, 2022

google is trying to show you most relevant information in your language based on your ip / location or preferences(if you are signed-in)

You can ask google to show results found in other language or multiple languages:

https://www.google.com/search?q=Oniguruma&lr=lang_ja|lang_en

英語 と 日本語のページを検索 ( プライバシーモード ) (click me)

image

japanese + english:
lr=lang_ja|lang_en 

english:
lr=lang_en

?) lang_XX (言語(lr)の収集値)
https://developers.google.com/custom-search/docs/xml_results_appendices#languageCollections

?) lr Language Restriction (言語制限)
https://developers.google.com/custom-search/docs/xml_results#lrsp

?) hl (インターフェース言語 )
https://developers.google.com/custom-search/docs/xml_results#hlsp

IMHO It is not that you "attacked", it is simply that keyword is less popular/cited than it's japanese counterpart.

It is ofthen desireable to search for english results only, especially in programming...

You can add new search engine and make it Default (click me)

image

@kkos
Copy link
Owner Author

kkos commented Nov 27, 2022

@Befzz
If you read and understand the following two, I don't think you would make such a claim.
https://kkos.fc2.net/blog-entry-1.html
https://kkos.fc2.net/blog-entry-2.html

I didn't want to write the same thing twice, so I wrote another article.
https://kkos.fc2.net/blog-entry-3.html

@Adirelle
Copy link

Adirelle commented Dec 9, 2022

Have you taken into account the effect of the personalized search algorithms used by Google?

@kkos
Copy link
Owner Author

kkos commented Dec 9, 2022

Did you not read my first entry?
https://kkos.fc2.net/blog-entry-1.html

I don't think this is because Google is displaying customized results for users. The reason is that searching in Chrome's incognito mode did not make any difference.

I've heard that in incognito mode (or secret mode?), the search results will not be personalized.

And by @tonco-miyazawa
https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/onig_SS2.zip
The following two files in this archive show the same strange related keywords I saw.

Japan_aichi_JP_24h.png
Japan_hokkaido_JP_24h.png

(* But since it is in Japanese, I don't think you would know what it is when you see it.)

@anuraaga
Copy link

I randomly found this issue when looking up regex engines from Wikipedia. Guess advertising on front page works :)

I was able to reproduce in Japan. More importantly my friend at Google could too. It looks like a search bug so hope it gets fixed.

My hypothesis is the romaji gets converted to kanji 鬼車, but maybe split into two tokens 鬼 and 車. Especially the latter will retrieve a lot of unrelated pages, just need something like 車でお越しの方 somewhere.

The issue doesn't reproduce outside Japanese because the conversion from romaji to kanji is probably disabled elsewhere.

I suspect the attacker is a software bug and hope it gets squashed! I think we all know how hard CJK can be to get right ;)

@kkos
Copy link
Owner Author

kkos commented Dec 12, 2022

I don't think so.
I just searched for "Oniguruma" and downloaded the first unrelated link (11th of all) and looked at the contents, but neither "鬼" nor "車" existed.
http://kitakyuminibas.g2.xrea.com/yamagata2018-1.pdf
(Acrobat Reader has a search function.)

Before, I looked at the contents of some of the links in my previous response to @mmizutani's comment, but again, those letters were not there as well.
#234 (comment)

Besides, it is not only "鬼車" that is troublesome to locate the word separator in Japanese, but I believe this is true for all words composed of multiple characters.
I don't know if Google's search engine uses a morphological analyzer for Japanese, but this is irrelevant since the problem I am having is specific to the keyword "Oniguruma".

@Adirelle
Copy link

Beside the how or the what, who would have an interest in such an attack and to achieve what ?

@kkos
Copy link
Owner Author

kkos commented Dec 15, 2022

I guess the goal would be to harass me.
I have no control over this, so Google should identify and prosecute this culprit.

@zzak
Copy link

zzak commented Jan 29, 2023

Screenshot 2023-01-30 at 8 17 12

Hello Kosako-san, the issue seems fixed for me (from Akita). Sorry if you were impacted by this!

@tonco-miyazawa
Copy link

It's not fixed yet. Please see the second and subsequent pages of the search results.
The first page often looks normal.
I have suggested to google several times to fix this issue but no response from google.

@pseudoClone
Copy link

Nepal seems fine too! I assumed it was targeted for Asian countries only.

screenshot-2023-02-21_18:28:48

@aj3423
Copy link

aj3423 commented Nov 25, 2023

Hi, just happened to see this, I know a little about Japanese. In Japanese,
鬼 == oni == ghost (in English)
車 == guruma == vehicle
I would say this name is very "Japanese", and I actually assumed you are Japanese in the first place just by the plugin name, I think that's why google shows different results for different countries.

@larouxn
Copy link

larouxn commented Jan 3, 2024

Searching Google here in Japan, search results seem fine.
Screenshot from 2024-01-03 17-46-29

@kkos
Copy link
Owner Author

kkos commented Jan 4, 2024

I am aware that I have not seen any attacks since last June.
However, I am not inclined to close this issue now, as I have been under attack for over three years and attacks can resume at any time.

@rubyFeedback
Copy link

rubyFeedback commented Mar 27, 2024

Although this issue here is about an attack on/at/via Google search, I would like to add that Google search, for
various reasons - including Google internal ones - has become significantly worse in the last years.

We may have to go back to the days before Google Search in regards of adding links to e. g. oniguruma
and other sites that should realistically be the first google search result.

On a tiny side note, I always found "onig" versus "oniguruma" a slight annoyance.

E. g.:

https://github.com/kkos/oniguruma/releases/download/v6.9.9/onig-6.9.9.tar.gz

IMO it is better to use one, same name - be it onig or oniguruma I have no preference,
but it is weird that the project is called oniguruma, but the download then tells you
that the name is onig.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests