Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous Sites not parsing / different Markup #316

Open
fuxifex opened this issue Jul 4, 2023 · 1 comment
Open

Miscellaneous Sites not parsing / different Markup #316

fuxifex opened this issue Jul 4, 2023 · 1 comment

Comments

@fuxifex
Copy link

fuxifex commented Jul 4, 2023

Description

miscsites() returns an empty array

Movies / TV-Shows / Person

tt0479884
Crank

Type

Bug:
the page section "Miscellaneous Sites" is not being parsed correctly - the function miscsites() returns an empty array.
In Title.php #2325: "!<h4 class="li_group">$title\s*\s*(.+?)<(h4|div)!ims"
^^ The Regex Pattern needs to be updated as the markup in the Document is different:
<h3 class="ipc-title__text"><span id="misc">Miscellaneous Sites</span> ...

Please note:
the markup also contains a JSON which seems to provide all Infos aswell, so you don't have to parse HTML:
{"id":"misc","name":"Miscellaneous Sites","section":{"items":[{"id":"http://www.abandomoviez.net/db/pelicula.php?film=13399","rowTitle":"Abandomoviez.net","rowLink":"http://www.abandomoviez.net/db/pelicula.php?film=13399","listContent":[{"text":"Spanish"}],"rowLinkType":"external","refTagSuffix":"msc_os_0"},{"id":"http://www.aceshowbiz.com/movie/crank/","rowTitle":"AceShowbiz.com","rowLink":"http://www.aceshowbiz.com/movie/crank/","rowLinkType":"external","refTagSuffix":"msc_os_1"},{"id":"https://www.aveleyman.com/FilmCredit.aspx? ....

Code

// Avoid posting hundreds of lines of source code.
// Edit to just the relevant portions.

Expected Results / What do you want to do?

Actual Results / What is happening?

@fuxifex
Copy link
Author

fuxifex commented Apr 4, 2024

i can confirm this problem - it even seems to load content by ajax(?)
for example - with "Terminator":
https://www.imdb.com/title/tt0088247/externalsites/?ref_=tt_ql_dts_5#misc
I want to grab the Wikipedia URL but it doesn't even show up in the browser's page source ..
thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant