Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove html tags in overviews #292

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SethFalco
Copy link
Contributor

@SethFalco SethFalco commented Jul 21, 2023

Problem

In JellyCon, a series overview can often contain HTML, namely the <br> element. This does not break anything, but can make the UI look and feel unpolished.

For example, 07-Ghost returns an overview featuring <br> tags, and Ascendance of a Bookworm has <i> and <a> tags. The Jellyfin web client preserves safe HTML, so these are correctly interpreted.

{
  "Name": "07-Ghost",
  "OriginalTitle": "07-Ghost",
  
  "Overview": "Barsburg Empire\u0027s Military Academy is known for training elites who bring victory to the empire. Students of the academy freely utilize an ability called \u0022Zaiphon\u0022 to fight, while the types of Zaiphon usable depends on the nature of the soldier.\u003Cbr\u003E\u003Cbr\u003E\nTeito Klein, a student at the academy, is one of the most promising soldiers produced. Although ridiculed by everyone for being a sklave (German for slave) with no memories of his past, he is befriended by a fellow student called Mikage. While preparing for the final exam, Teito uncovers a dark secret related to his past. When an attempt to assassinate Ayanami, a high-ranking official who killed his father, fails, Teito is locked away awaiting punishment.\u003Cbr\u003E\u003Cbr\u003E\nOnly wanting the best for Teito, Mikage helps him escape. Teito ends up at the 7th District Church where he is taken in by the bishops. It is here that Teito attempts to evade the grasp of Ayanami and the Military, so he can rediscover his memories and learn why he is the person that can change the fate of the world.\u003Cbr\u003E\u003Cbr\u003E\n[Written by MAL Rewrite]",
  
}
{
  "Name": "Ascendance of a Bookworm",
  "OriginalTitle": "Honzuki no Gekokujou: Shisho ni Naru Tame ni wa Shudan wo Erandeiraremasen",
  
  "Overview": "Anime adaptation of Part 1 of <i>Honzuki no Gekokujou</i>.<br><br>When a sickly young girl suddenly becomes obsessed with inventing new things, her family and friends are all puzzled. \"What has gotten into Myne?\" they wondered, never dreaming that the answer is not a \"What\" but a \"Who\": Urano Motosu, a book-loving apprentice librarian who died in an earthquake in Tokyo who somehow found herself in Myne's body! And since Myne's world is still in a medieval stage, where books can only be owned by the elite, the new Myne intends to do everything she can to bring her beloved books to the masses in the time she has left.<br><br>(Source: Sentai Filmworks)<br><br><i>Note: The first episode was included in a bonus DVD that came with the limited edition of the 8th volume of <a href=\"https://anilist.co/manga/110802\">Part 4</a> of the light novel that was released on September 10th, 2019 prior to its TV premiere.</i>",
  
}

Solution

Jellyfin Web handles this by delegating to DOMPurify to sanitize the overview string. This cleans unsafe-HTML, and leaves safe-HTML intact to be interpreted by the browser.

Reference: Item Details controller in the jellyfin/jellyfin-web repository

I propose we do something similar to this, but also take advantage of some luxuries, as we're ultimately on a different platform.

  • Kodi has limited support for formatting in the plot, so most HTML can just be stripped off.
  • We can securely opt for a DIY solution because the HTML will not be interpreted anyway.

I think the best solution to this is to use the built-in HTMLParser from the standard library, to trim off HTML tags, and HTML elements that are not intended to be human-readable.

For example, the <br>, <p>, <i> tags can have the tags removed but their content preserved. Meanwhile, tags like style and script are omitted entirely, as these aren't relevant.

Stylized titles and emotes like 1968 < 2018 > 2068 and Lily <3 have been accounted for. If an overview has text like this, we determine that it's not an HTML tag and preserve how it was written.

Notes

  • I don't use content for formatting, we look at the HTML only. This keeps everything locale/language agnostic.

Tests

# Modified
assert f("And since Myne's world is still in a medieval stage, where books can only be owned by the elite, the new Myne intends to do everything she can to bring her beloved books to the masses in the time she has left.<br><br>(Source: Sentai Filmworks)") == "And since Myne's world is still in a medieval stage, where books can only be owned by the elite, the new Myne intends to do everything she can to bring her beloved books to the masses in the time she has left. (Source: Sentai Filmworks)"
assert f("The first episode was included in a bonus DVD that came with the limited edition of the 8th volume of <a href=\"https://anilist.co/manga/110802\">Part 4</a> of the light novel that was released on September 10th, 2019 prior to its TV premiere.") == "The first episode was included in a bonus DVD that came with the limited edition of the 8th volume of Part 4 of the light novel that was released on September 10th, 2019 prior to its TV premiere."
assert f("Anime adaptation of Part 1 of <i>Honzuki no Gekokujou</i>.") == "Anime adaptation of Part 1 of Honzuki no Gekokujou."
assert f("<p>This is an overview!</p><p>As two paragraphs!</p>") == "This is an overview! As two paragraphs!"
assert f("<details><summary><i>More details on the plot:</i></summary>And since Myne's world is still in a medieval stage, where books can only be owned by the elite, the new Myne intends to do everything she can to bring her beloved books to the masses in the time she has left.<br><br>(Source: Sentai Filmworks)</details>") == "And since Myne's world is still in a medieval stage, where books can only be owned by the elite, the new Myne intends to do everything she can to bring her beloved books to the masses in the time she has left. (Source: Sentai Filmworks)"

# Untouched
assert f("When a sickly young girl suddenly becomes obsessed with inventing new things, her family and friends are all puzzled.") == "When a sickly young girl suddenly becomes obsessed with inventing new things, her family and friends are all puzzled."
assert f("This is an overview for 1968 < 2018 > 2068!") == "This is an overview for 1968 < 2018 > 2068!"
assert f("This is an overview for Lily <3!") == "This is an overview for Lily <3!"
assert f("This is an <Overview/> with <fake> HTML!") == "This is an <Overview/> with <fake> HTML!"

Screenshots

This previously contained <br><br> in the overview, which has been stripped off.

@SethFalco SethFalco force-pushed the fix/html-in-overviews branch 4 times, most recently from 63c0660 to 0a6d659 Compare July 22, 2023 20:50
@SethFalco SethFalco marked this pull request as draft July 25, 2023 12:15
@SethFalco
Copy link
Contributor Author

SethFalco commented Jul 25, 2023

I've just realized that the label formatting syntax actually works on variables passed in from code. If we'd like, we can make the parser interpret some of it to the label formatting equivalent.

We could consider doing the following:

  • <strong> / <b>[B]
  • <em> / <i>[I]
  • <br>[CR]

However, I'm unsure if it's wise to use the syntax in code. It looks like this syntax is intended for skins to use, even if it may work when the variables appear in code.

Reference: Kodi Documentation for Label Formatting

Can I get thoughts from others? Should we apply the formatting syntax in the variable, or should we leave it so that only the skin is responsible for style?

The same example as above, but replacing <br> with [CR] instead of removing it.

@SethFalco SethFalco marked this pull request as ready for review July 25, 2023 22:40
resources/lib/utils.py Fixed Show fixed Hide fixed
@TrueTechy
Copy link
Contributor

I've had a look over the code and at first glace it seems fine, except for the dependency on python 3. Since we've not officially dropped support from Python 2 yet (I know, I know, it's coming soon I swear) this will need to wait till that's done before this can be merged

@TrueTechy TrueTechy added the Blocked This is blocked by something label Aug 30, 2023
@SethFalco
Copy link
Contributor Author

SethFalco commented Aug 30, 2023

Thanks for checking this out!

this will need to wait till that's done before this can be merged

Ahh, I didn't realize this was a consideration. Though I do wonder if that's relevant as I thought Kodi migrated to Python 3 in Kodi 19, which is the earliest version of Kodi the Jellyfin project aims to support.

We are making an effort in supporting current and previous release of Kodi

#293 (comment)

the nightly builds for Kodi 19 "Matrix" are now using the Python 3 interpreter to run all Python-based add-ons

https://kodi.tv/article/kodi-19-python-3-goes-live/

It could just be that I've misunderstood Kodi's release notes, as there isn't a clear table pairing Kodi against Python versions.

Later I'll look into how much work it'd be to implement a solution backward compatible with Python 2.7, but if it's two entirely different interfaces than perhaps just waiting is best. ^-^'

@TrueTechy
Copy link
Contributor

Our informal support policy is as oddstr mentioned in the jellyfin-kodi repo PR discussion of the deprecation

current release±1, which currently translates to Matrix (old), Nexus (current) and Omega (next).

Our intent is to follow suit with this addon I've just not gotten around to doing it yet. Since it's so imminent I don't think it'd be worth looking at a custom solution

I shall circle back to this once we've dropped the Kodi support officially, and we can look to get it merged then

@oddstr13
Copy link
Member

I'm intending on one or two more releases in jf4kodi before starting to rip out py2 compatibility and adding typing annotations 😸

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked This is blocked by something
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants