Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writeJSON is not UTF-8 compliant #5

Open
adinapoli opened this issue Aug 18, 2013 · 3 comments
Open

writeJSON is not UTF-8 compliant #5

adinapoli opened this issue Aug 18, 2013 · 3 comments

Comments

@adinapoli
Copy link
Contributor

If we write an instance of a ToJSON data type using writeJSON, it doesn't handle correctly utf-8 text with contains accented letters. This is an excerpt of an italian text, with the current function:

di essere il più precisi possibile nell'inserimento

The problem is twofold:

a) We need to encode using not the standard encode function, but the one inside Data.Aeson.Encode
b) We need to set the charset=utf-8 encoding inside the HTTP header.

This is the proposed patch:

-------------------------------------------------------------------------------
-- | Set MIME to 'application/json' and write given object into
-- 'Response' body. Exactly as Snap.Extras' @writeJSON@, but handles correctly
-- UTF-8 text.
writeEncodedJSON :: (MonadSnap m, ToJSON a) => a -> m ()
writeEncodedJSON a = do
  modifyResponse $ setHeader "Content-Type" "application/json; charset=utf-8"
  writeLBS . AE.encode $ a

where AE.encode is a qualified import of Data.Aeson.Encode.

With the proposed patch, everything works as expected:

di essere il più precisi possibile nell'inserimento

I also suggest we refactor out the modifyResponse, maybe creating a combinator which adds the charset utf8 ad the content-type, so that we can reuse what we already have : jsResponse, jsonResponse etc.

A.

@ozataman
Copy link
Owner

Ah, weird. Couple of questions:

  1. Aren't we already using the encode from Data.Aeson? A look at http://hackage.haskell.org/packages/archive/aeson/0.6.2.0/doc/html/src/Data-Aeson-Generic.html#encode shows that we are using Data.Aeson.Encode.encode. Am I missing something here?
  2. As explained here (http://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean), I thought all JSON is automatically interpreted as UTF8 and therefore the additional denotation is unnecessary?
  3. What front-end/client/browser are you using to interpret the results? It almost sounds like you're using an invalid parser that is NOT assuming any JSON is utf8 but instead assuming it is latin-1 or ascii or something. As far as I know, that is invalid behavior. For example, try passing a non-utf8 valid string to aeson for parsing and it will crap out with an error. It forces you to ensure your input is utf8 encoded.

@adinapoli
Copy link
Contributor Author

Hi Oz, again let me elaborate on this and I will get back to you. I can reply to 3) straight away:

  1. I'm using Google Chrome, so I don't think I'm in any way doing something an end user wouldn't do.

I'll get back later to you with points 1 and 2.

@tom-bop
Copy link

tom-bop commented Aug 27, 2018

@adinapoli any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants