Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example produces non valid JSON (single quotes) #16

Open
mlliarm opened this issue Nov 24, 2022 · 1 comment
Open

Example produces non valid JSON (single quotes) #16

mlliarm opened this issue Nov 24, 2022 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@mlliarm
Copy link

mlliarm commented Nov 24, 2022

Hello,

I found today that using your library with python 3.9.15 and 3.10.xx that it produces a non recognizable JSON result.

The code I wrote:

# bad_test.py
import html_to_json

html_string = """
<head>
    <title>Floyd Hightower's Projects</title>
    <meta charset="UTF-8">
    <meta name="description" content="Floyd Hightower&#39;s Projects">
    <meta name="keywords" content="projects,fhightower,Floyd,Hightower">
</head>
"""
output_json = html_to_json.convert(html_string)
print(output_json)

Result:

{'head': [{'title': [{'_value': "Floyd Hightower's Projects"}], 'meta': [{'_attributes': {'charset': 'UTF-8'}}, {'_attributes': {'name': 'description', 'content': "Floyd Hightower's Projects"}}, {'_attributes': {'name': 'keywords', 'content': 'projects,fhightower,Floyd,Hightower'}}]}]}

The fix I found was to use json.dumps on the resulting dict:

# good_test.py
import html_to_json, json

html_string = """
<head>
    <title>Floyd Hightower's Projects</title>
    <meta charset="UTF-8">
    <meta name="description" content="Floyd Hightower&#39;s Projects">
    <meta name="keywords" content="projects,fhightower,Floyd,Hightower">
</head>
"""

output_json = html_to_json.convert(html_string)
print(json.dumps(output_json))

Output:

{"head": [{"title": [{"_value": "Floyd Hightower's Projects"}], "meta": [{"_attributes": {"charset": "UTF-8"}}, {"_attributes": {"name": "description", "content": "Floyd Hightower's Projects"}}, {"_attributes": {"name": "keywords", "content": "projects,fhightower,Floyd,Hightower"}}]}]}

Result of "python good_test.py | prettyjson":

{
    "head": [
        {
            "title": [
                {
                    "_value": "Floyd Hightower's Projects"
                }
            ],
            "meta": [
                {
                    "_attributes": {
                        "charset": "UTF-8"
                    }
                },
                {
                    "_attributes": {
                        "name": "description",
                        "content": "Floyd Hightower's Projects"
                    }
                },
                {
                    "_attributes": {
                        "name": "keywords",
                        "content": "projects,fhightower,Floyd,Hightower"
                    }
                }
            ]
        }
    ]
}

If you agree I can make a fix in the result of the html_to_json.convert func and send a PR.

@fhightower
Copy link
Owner

Thanks for reporting this! Makes sense - I'll happily accept this as a PR 😄 .

@fhightower fhightower added the bug Something isn't working label Dec 2, 2022
mlliarm added a commit to mlliarm/html-to-json that referenced this issue Feb 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants