Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added micro_blog #3268

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

added micro_blog #3268

wants to merge 4 commits into from

Conversation

meatball133
Copy link
Member

No description provided.

@BethanyG BethanyG added paused Work paused until further notice and removed paused Work paused until further notice labels Dec 5, 2022
@vaeng
Copy link
Contributor

vaeng commented Feb 1, 2023

Are you still working on this @meatball133? I would be happy to help if you want any.

@BethanyG
Copy link
Member

BethanyG commented Feb 1, 2023

Hi @vaeng 👋🏽

Thank you for your interest. 😄

The open PRs here are drafts of work that have been pre-agreed. @meatball133 and I are still working through them. Overall, wider community contributions have been paused for this track until at least May/June. But if you have issues or proposals, we will be happy to discuss them in the exercism forum.


- **ASCII** can encode English language characters.
All characters are precisely 1 byte long.
- **UTF-8** is a Unicode text encoding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **UTF-8** is a Unicode text encoding.
- **UTF-8** is a variable-length Unicode text encoding.

All characters are precisely 1 byte long.
- **UTF-8** is a Unicode text encoding.
Characters take between 1 and 4 bytes.
- **UTF-16** is a Unicode text encoding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **UTF-16** is a Unicode text encoding.
- **UTF-16** is also a variable-length Unicode text encoding.

- **UTF-16** is a Unicode text encoding.
Characters are either 2 or 4 bytes long.

UTF-8 and UTF-16 are both Unicode encodings which means they're capable of representing a massive range of characters including:
Copy link
Member

@BethanyG BethanyG Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UTF-8 and UTF-16 are both Unicode encodings which means they're capable of representing a massive range of characters including:
UTF-8 and UTF-16 are both capable of representing a massive range of reader-perceived 'characters' or [graphemes][grapheme] including:

Consider the letter 'a' and the emoji '😛'.
In UTF-16 the letter takes 2 bytes but the emoji takes 4 bytes.

The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits.
The trick to this exercise is to use APIs designed around Unicode characters (codepoints) instead of Unicode codeunits.
[grapheme]: https://dictionary.cambridge.org/us/dictionary/english/grapheme


- Text in most of the world's languages and scripts
- Historic text
- Emoji
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Emoji
- Emoji
- Symbols used in Physics and Mathematics

- Historic text
- Emoji

UTF-8 and UTF-16 are both variable length encodings, which means that different characters take up different amounts of space.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UTF-8 and UTF-16 are both variable length encodings, which means that different characters take up different amounts of space.
UTF-8 and UTF-16 are both variable length encodings, which means that different graphemes can take up different amounts of space.

@@ -0,0 +1,19 @@
{
"blurb": "Given an input string, truncate it to 5 characters.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"blurb": "Given an input string, truncate it to 5 characters.",
"blurb": "Given a Unicode input string, truncate it to 5 grapheme clusters.",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants