fix: ensure safe text slicing boundaries with multi-byte characters #67

MujahedSafaa · 2024-09-02T14:38:06Z

This commit resolves a panic that occurs when the text_info function attempts to slice a string containing non-ASCII characters.

The fix addresses the issue reported in the Deno repository: denoland/deno#23875

Use Case:
The bug arises because the range_text method tries to slice a string using byte indices (start and end) that may not align with UTF-8 character boundaries. In Rust, attempting to slice a string at invalid UTF-8 boundaries results in a panic.

Solution:
The solution involves modifying the range_text method to correctly handle non-ASCII characters. The method now uses char_indices() to collect character boundaries and maps the start and end byte indices to these boundaries. The resulting string slice is based on valid UTF-8 character boundaries, ensuring the operation is safe and the encoding remains intact.

CLAassistant · 2024-09-02T14:38:12Z

All committers have signed the CLA.

MujahedSafaa · 2024-09-05T06:39:41Z

@dsherret Could you please review this PR?

magurotuna · 2024-09-08T04:15:06Z

I think it would be really great if we could have a unit test for this :)

dsherret · 2024-09-08T17:40:48Z

rs-lib/src/common/text_info.rs

+    let char_indices: Vec<_> = text.char_indices().collect();
+    let start_char_idx = char_indices.iter().position(|&(i, _)| i == start).unwrap_or(0);
+    let end_char_idx = char_indices.iter().position(|&(i, _)| i == end).unwrap_or(char_indices.len() - 1);
+    &text[char_indices[start_char_idx].0..char_indices[end_char_idx].0]


I think this might just be masking a bug. The range provided to this should be valid instead.

fix: ensure safe text slicing boundaries with multi-byte characters

9a200cd

dsherret reviewed Sep 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ensure safe text slicing boundaries with multi-byte characters #67

fix: ensure safe text slicing boundaries with multi-byte characters #67

MujahedSafaa commented Sep 2, 2024

CLAassistant commented Sep 2, 2024 •

edited

Loading

MujahedSafaa commented Sep 5, 2024

magurotuna commented Sep 8, 2024

dsherret Sep 8, 2024

fix: ensure safe text slicing boundaries with multi-byte characters #67

Are you sure you want to change the base?

fix: ensure safe text slicing boundaries with multi-byte characters #67

Conversation

MujahedSafaa commented Sep 2, 2024

CLAassistant commented Sep 2, 2024 • edited Loading

MujahedSafaa commented Sep 5, 2024

magurotuna commented Sep 8, 2024

dsherret Sep 8, 2024

Choose a reason for hiding this comment

CLAassistant commented Sep 2, 2024 •

edited

Loading