Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with UTF-8 encoded files #21

Open
TomOne opened this issue Nov 2, 2014 · 4 comments
Open

Issues with UTF-8 encoded files #21

TomOne opened this issue Nov 2, 2014 · 4 comments

Comments

@TomOne
Copy link

TomOne commented Nov 2, 2014

This issue was originally reported here: https://stackoverflow.com/questions/26693809/creating-a-chocolatey-package-using-the-quick-start-guide-fails-the-utf-8-test-e

Summary: Warmup does not seem to to parse UTF-8 encoded files correctly.

Consider the follwing comment with non-ASCII characters in the nuspecs of the Chocolatey templates repository:

<!-- Do not remove this test for UTF-8: if “Ω” doesn’t appear as greek uppercase omega letter enclosed in quotation marks, you should use an editor that supports UTF-8, not this one. -->

After a warmup chocolatey some-package, the comment in some-package.nuspec becomes to:

<!-- Do not remove this test for UTF-8: if ??? doesn?t appear as greek uppercase omega letter enclosed in quotation marks, you should use an editor that supports UTF-8, not this one. -->

As you can see, all non-ASCII characters are replaced with ?. This is a similar issue that affects many Microsoft products, e.g. notepad.exe, PowerShell, cmd.exe or Visual Studio. A BOM would force it to parse it as UTF-8, but of course we all know that a BOM for UTF-8 is deprecated, not recommended by the Unicode standard and could cause issues with programs that do not expect a BOM. Microsoft is probably the only larger company which still ignores this part of the standard, even more than a decade after UTF-8 became a de-facto standard for text files. 😠

This issue is probably also a cause why some packages on chocolatey.org have broken Unicode characters.

The solution here would be to integrate a character encoding detection library to detect all common character encodings and then output the nuspecs as UTF-8 without BOM.

@ferventcoder
Copy link
Member

Chocolatey (the exe) has a command choco new and it should work appropriately writing out the file. This brings stuff in house for choco so there is not a reliance and akward setup of external things. And folks don't have to remember to update templates every once in awhile either.

@chribben
Copy link

chribben commented Nov 2, 2014

Thanks. @ferventcoder: how would you use choco new in this context? No success when trying it out in PS and can't find any documentation.

@ferventcoder
Copy link
Member

It's in the rewritten Chocolatey that has not been released yet (about 70% of the way there on existing features) - see our kickstarter for a preview - https://www.kickstarter.com/projects/ferventcoder/chocolatey-the-alternative-windows-store-like-yum

The kickstarter is to support improvements to infrastructure, addition of pro/business licenses for features that cost $$ to provide, and the longer term future of Chocolatey itself.

@chribben
Copy link

chribben commented Nov 4, 2014

Ah I see, thanks.

On Tue, Nov 4, 2014 at 5:32 PM, Rob Reynolds [email protected]
wrote:

It's in the rewritten Chocolatey that has not been released yet (about 70%

of the way there on existing features) - see our kickstarter for a preview

https://www.kickstarter.com/projects/ferventcoder/chocolatey-the-alternative-windows-store-like-yum

The kickstarter is to support improvements to infrastructure, addition of
pro/business licenses for features that cost $$ to provide, and the longer
term future of Chocolatey itself.


Reply to this email directly or view it on GitHub
#21 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants