Skip to content

Commit

Permalink
Fix a bug with sorting & uniquifying when using a locale
Browse files Browse the repository at this point in the history
The locale comparer can return equal for strings which are different. For
example, when sorting with `en-US` the following strings are equal:

    A
    a
    ª

This means that the sorter would simply keep these in their original
order. That means that the sorted output would depend on the original order,
meaning two files with different orderings but the same lines would sort
differently.

This also broke the `--unique` flag because the `lines.dedup_by` call relies
on duplicate lines being adjacent. If they're not it won't properly dedup the
`Vec`.
  • Loading branch information
autarch committed Nov 4, 2023
1 parent 98fe8cb commit 624108e
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 1 deletion.
5 changes: 5 additions & 0 deletions Changes.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## 0.1.3

- When sorting a file with repeated lines with a `--locale`, the sorting order was not always
consistent, and the `--unique` flag could leave duplicates behind.

## 0.1.2 - 2023-06-04

- Sorting is now done in parallel using [rayon](https://docs.rs/rayon/latest/rayon/). This
Expand Down
11 changes: 10 additions & 1 deletion src/comparer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,16 @@ fn compare_two_strings(
str2: &str,
) -> Ordering {
if let Some(c) = collator {
c.compare(str1, str2)
let ord = c.compare(str1, str2);
if ord != Ordering::Equal {
return ord;
}
// If the strings are equal according to the collator they may still
// be different, in which case we want to further sort them
// somehow. Otherwise they end up sorted based on their original order
// in the file, which is random and means two files containing the
// same lines in different order could be sorted differently.
str1.cmp(str2)
} else if case_insensitive {
str1.to_lowercase().cmp(&str2.to_lowercase())
} else {
Expand Down
14 changes: 14 additions & 0 deletions src/test-cases/locale-not-unique.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
--sort text --case-insensitive --locale en-US --unique
####
NotSorted
####
ª
A
a
ª
A
a
####
A
a
ª

0 comments on commit 624108e

Please sign in to comment.