Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ensure safe text slicing boundaries with multi-byte characters #67

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MujahedSafaa
Copy link

This commit resolves a panic that occurs when the text_info function attempts to slice a string containing non-ASCII characters.

The fix addresses the issue reported in the Deno repository: denoland/deno#23875

Use Case:
The bug arises because the range_text method tries to slice a string using byte indices (start and end) that may not align with UTF-8 character boundaries. In Rust, attempting to slice a string at invalid UTF-8 boundaries results in a panic.

Solution:
The solution involves modifying the range_text method to correctly handle non-ASCII characters. The method now uses char_indices() to collect character boundaries and maps the start and end byte indices to these boundaries. The resulting string slice is based on valid UTF-8 character boundaries, ensuring the operation is safe and the encoding remains intact.

@CLAassistant
Copy link

CLAassistant commented Sep 2, 2024

CLA assistant check
All committers have signed the CLA.

@MujahedSafaa
Copy link
Author

@dsherret Could you please review this PR?

@magurotuna
Copy link
Contributor

I think it would be really great if we could have a unit test for this :)

let char_indices: Vec<_> = text.char_indices().collect();
let start_char_idx = char_indices.iter().position(|&(i, _)| i == start).unwrap_or(0);
let end_char_idx = char_indices.iter().position(|&(i, _)| i == end).unwrap_or(char_indices.len() - 1);
&text[char_indices[start_char_idx].0..char_indices[end_char_idx].0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might just be masking a bug. The range provided to this should be valid instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants