Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NSString: Cache ICU collator in thread-local storage #450

Merged
merged 2 commits into from
Sep 23, 2024

Conversation

hmelder
Copy link
Contributor

@hmelder hmelder commented Sep 17, 2024

Apple's CoreFoundation caches a UCollator instance for a language in TSD. It turns out that instantiation of collators is a very expensive operation. Reusing existing collators greatly improves runtime when repeatedly comparing strings (i.e. when sorting).

Benchmarks

Here are some selected benchmarks to validate the collator optimisation and the KVC optimisation from #445.
I've exported the titles of a large media libraries (~70000 songs) for the following micro benchmarks.

Benchmarks were performed on an AMD Ryzen 7 5700G (Freq. scaling disabled), 16GB of DDR4 RAM, Fedora 40.

Run on (16 X 400 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 512 KiB (x8)
  L3 Unified 16384 KiB (x1)

Comparison

static void
BM_CaseInsensitiveNumericComparisonGNUstep(benchmark::State &state, NSString *str1 , NSString *str2) {
  NSUInteger mask = NSCaseInsensitiveSearch | NSNumericSearch;
  NSRange range = NSMakeRange(0, [str1 length]);
  NSLocale *locale = [NSLocale currentLocale];

  IMP compareIMP = (IMP)compareOptionsRangeLocaleIMP;

  OVERWRITE_COMPAREIMP

  for (auto _ : state) {
    benchmark::DoNotOptimize([str1 compare:str2
                                   options:mask
                                     range:range
                                    locale:locale]);
  }

  RESET_COMPAREIMP
}

GNUstep Base master

-------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations
-------------------------------------------------------------------------------------
BM_CaseInsensitiveNumericComparison              1081 ns         1081 ns       645262

GNUstep Base with Collator Opts

-------------------------------------------------------------------------------------
Benchmark                                           Time             CPU   Iterations
-------------------------------------------------------------------------------------
BM_CaseInsensitiveNumericComparison               147 ns          147 ns      4711968

Sorting

I cannot share the full micro benchmark as I am using the same sorting logic as implemented in Djay (isolated into mediaObjectAttributeValueCompare).
For BM_DjayMediaLibrarySort, I am loading the titles into an NSArray, randomise all entries and run a sort operation with a comparator block. I track the number of comparisons by swizzling compare:range:mask:locale: with compareOptionsRangeLocaleStatisticsIMP.

static NSUInteger compareCounter = 0;
typedef NSComparisonResult (*CompareFunction)(NSString *, SEL, NSString *, NSUInteger, NSRange, id);
static CompareFunction compareIMP;

// Hooked IMP for compare:options:range:locale:
NSComparisonResult
compareOptionsRangeLocaleStatisticsIMP(NSString *self, SEL _cmd, NSString *string,
                             NSUInteger mask, NSRange compareRange, id locale) {
                              compareCounter++;

                              return compareIMP(self, _cmd, string, mask, compareRange, locale);
}

static void BM_CaseInsensitiveNumericArraySortLibrary(benchmark::State &state) {
  NSArray<NSString *> *array = [NSArray arrayWithContentsOfFile:@"/Users/hmelder/Music/djayLibraryTitles.plist"];
  if (array == nil) {
    return;
  }

  NSUInteger mask = NSCaseInsensitiveSearch | NSNumericSearch;
  NSLocale *locale = [NSLocale currentLocale];

  for (auto _ : state) {
    [array sortedArrayUsingComparator:^NSComparisonResult(NSString *str1, NSString *str2) {
      return [str1 compare:str2 options:mask range:NSMakeRange(0, [str1 length]) locale:locale];
    }];
  }

  state.SetItemsProcessed(state.iterations() * [array count]);
}


static void BM_DjayMediaLibrarySort(benchmark::State &state) {
  NSMutableArray<NSString *> *array = [NSMutableArray arrayWithContentsOfFile:@"/Users/hmelder/Music/djayLibraryTitles.plist"];
  if (array == nil) {
    return;
  }

  // Randomize the array

   NSUInteger count = [array count];
    for (NSUInteger i = count - 1; i > 0; i--) {
        // Generate a random number within the range [0, i]
        NSUInteger j = arc4random_uniform((uint32_t)(i + 1));

        // Swap elements at indices i and j
        [array exchangeObjectAtIndex:i withObjectAtIndex:j];
    }

  compareCounter = 0;
  SEL sel = sel_registerName("compare:options:range:locale:");
  Method meth = class_getInstanceMethod([NSString class], sel);
  compareIMP = (CompareFunction)method_getImplementation(meth);
  method_setImplementation(meth, (IMP)compareOptionsRangeLocaleStatisticsIMP);

  for (auto _ : state) {
    [array sortedArrayUsingComparator:^NSComparisonResult(NSString *str1, NSString *str2) {
      return mediaObjectAttributeValueCompare(str1, str2, YES);
    }];
  }

  method_setImplementation(meth, (IMP)compareIMP);
  state.SetItemsProcessed(compareCounter * state.iterations());
}

Here, I am using a sort descriptor instead of a comparator, as I am accessing properties of instances of DummyClass instead of directly comparing array entries. This makes heavy use of KVC to resolve the value of the property, so it is interesting to see the performance with the KVC optimisations in #445 in practice.

@interface DummyClass : NSObject
@property (nonatomic, strong) NSString *title;
@end

@implementation DummyClass
@end

static void BM_DJayMediaLibraryObjectSort(benchmark::State &state) {
  NSArray<NSString *> *array = [NSArray arrayWithContentsOfFile:@"/Users/hmelder/Music/djayLibraryTitles.plist"];
  if (array == nil) {
    return;
  }


  NSMutableArray<DummyClass *> *objects = [NSMutableArray arrayWithCapacity: [array count]];
  for (NSString *title in array) {
    DummyClass *object = [[DummyClass alloc] init];
    object.title = title;
    [objects addObject:object];
  }

  NSUInteger count = [objects count];
    for (NSUInteger i = count - 1; i > 0; i--) {
        // Generate a random number within the range [0, i]
        NSUInteger j = arc4random_uniform((uint32_t)(i + 1));

        // Swap elements at indices i and j
        [objects exchangeObjectAtIndex:i withObjectAtIndex:j];
    }

  NSSortDescriptor *sortDescriptor = [NSSortDescriptor sortDescriptorWithKey:@"title" ascending:YES comparator:^NSComparisonResult(id obj1, id obj2) {
    return mediaObjectAttributeValueCompare(obj1, obj2, YES);
	}];

  compareCounter = 0;
  SEL sel = sel_registerName("compare:options:range:locale:");
  Method meth = class_getInstanceMethod([NSString class], sel);
  compareIMP = (CompareFunction)method_getImplementation(meth);
  method_setImplementation(meth, (IMP)compareOptionsRangeLocaleStatisticsIMP);

  for (auto _ : state) {
    [objects sortedArrayUsingDescriptors:@[sortDescriptor]];
  }

  method_setImplementation(meth, (IMP)compareIMP);
  state.SetItemsProcessed(compareCounter * state.iterations());
}

GNUstep Base master

----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------
BM_CaseInsensitiveNumericArraySortLibrary        257 ms          256 ms            3 items_per_second=281.696k/s
BM_DjayMediaLibrarySort                         1812 ms         1812 ms            1 items_per_second=1.18121M/s
BM_DJayMediaLibraryObjectSort                   4247 ms         4247 ms            1 items_per_second=503.963k/s

GNUstep Base with Collator Opts

----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------
BM_CaseInsensitiveNumericArraySortLibrary       70.5 ms         70.5 ms            9 items_per_second=1.02437M/s
BM_DjayMediaLibrarySort                          760 ms          760 ms            1 items_per_second=2.81475M/s
BM_DJayMediaLibraryObjectSort                   3004 ms         3004 ms            1 items_per_second=712.651k/s

GNUstep Base with Collator and KVC Opts

----------------------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------
BM_CaseInsensitiveNumericArraySortLibrary       70.0 ms         70.0 ms            9 items_per_second=1.03151M/s
BM_DjayMediaLibrarySort                          746 ms          746 ms            1 items_per_second=2.86756M/s
BM_DJayMediaLibraryObjectSort                    798 ms          798 ms            1 items_per_second=2.68376M/s

There is still room for more improvements:

  1. Using a fixed on stack-inline buffer that loads characters on demand
  2. Reconfigure options instead of caching collator with fixed mask

@hmelder hmelder requested a review from rfm as a code owner September 17, 2024 16:44
@hmelder
Copy link
Contributor Author

hmelder commented Sep 17, 2024

Test failures are unrelated.

Copy link
Contributor

@rfm rfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. The implementation looks sound too.

@hmelder hmelder merged commit 5cd1997 into master Sep 23, 2024
3 of 9 checks passed
@hmelder hmelder deleted the string-compare-opts branch September 23, 2024 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants