Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reconsider which code points can be in hashtags #106

Open
Johann150 opened this issue May 8, 2022 · 6 comments
Open

reconsider which code points can be in hashtags #106

Johann150 opened this issue May 8, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@Johann150
Copy link
Contributor

Johann150 commented May 8, 2022

I have often seen hashtags being recognized in something that was definitely not intended to be a hashtag.

for example:

T=EÔ"º8øzÄ?k=ÿ#Ô¥ÄÀ¬"Üpz_µkýQ<)ÝÑIµë|`®ÿfeäÁ©¶Æ×çcDØ6=²Áå7À¾l|<à¾,3;«V(Cµ#ÒlP;Â0·¶R»ÛW篻ø6®9ÊëDa+¼ôà¬WG´w¾½Èírs¡Ò+p\z¿L9ÊGÞ7îR
image

Also apparently some spacing characters may be part of hashtags which is definitely incorrect. For example a nonbreaking space (U+00A0) is recognized as part of a hashtag. https://genau.qwertqwefsday.eu/notes/901diers1g

@marihachi
Copy link
Contributor

例としてあげているのは作ったものですか?実際にどれくらい発生するのかが重要です。

@Johann150
Copy link
Contributor Author

Both examples are not by me. The first one is probably less common case, it was taken from https://genau.qwertqwefsday.eu/notes/8zvhj6kdbj

@syuilo
Copy link
Member

syuilo commented May 12, 2022

#の前に空白があるか、行の先頭に無い限りハッシュタグと見なさないようにしても良いかも

@syuilo
Copy link
Member

syuilo commented May 12, 2022

ただ主に日本語などの分かち書きではない言語で不便になるケースもあるかも
以下のいずれもハッシュタグと認識されなくなる

  • ふー、#foo
  • ふー(#foo)
  • 「#foo ふー」

@Johann150
Copy link
Contributor Author

For another example I also often see people from other Fediverse software trying to separate a hashtag from the rest of a word if they only want a part of the word to be the hashtag, e.g. #hash|tag. See for example https://genau.qwertqwefsday.eu/notes/8zwzta88ki

@marihachi
Copy link
Contributor

そもそもハッシュタグを誤認識するパターンが稀なので、重要度は高くなさそう。

#の前に空白があるか、行の先頭に無い限りハッシュタグと見なさないようにしても良いかも

この案で対応するとしても、デメリットが大きい。

Also apparently some spacing characters may be part of hashtags which is definitely incorrect. For example a nonbreaking space (U+00A0) is recognized as part of a hashtag. https://genau.qwertqwefsday.eu/notes/901diers1g

これについては修正したほうが良さそう。

@marihachi marihachi added the enhancement New feature or request label May 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants