gh-122288: Improve performances of `fnmatch.translate` #122289

picnixz · 2024-07-25T16:12:29Z

This is a smaller PR compared to the one for the C implementation and is probably easier to review. Below are the benchmarks reported on the issue:

+------------------------------------------+-----------------------+-----------------------+
| Benchmark                                | fnmatch-translate-ref | fnmatch-translate-py  |
+==========================================+=======================+=======================+
| abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!]  | 6.09 us               | 3.99 us: 1.53x faster |
+------------------------------------------+-----------------------+-----------------------+
| !abc/[!]b-ac-z9-1]/def/\*?/*/**c/?*[][!] | 6.39 us               | 4.07 us: 1.57x faster |
+------------------------------------------+-----------------------+-----------------------+
| a**?**cd**?**??k***                      | 2.24 us               | 1.51 us: 1.49x faster |
+------------------------------------------+-----------------------+-----------------------+
| a/**/b/**/c                              | 1.97 us               | 1.12 us: 1.76x faster |
+------------------------------------------+-----------------------+-----------------------+
| man/man1/bash.1                          | 3.00 us               | 1.21 us: 2.48x faster |
+------------------------------------------+-----------------------+-----------------------+
| a*b*c*d*e*f*g*h*i*j*k*l*m*n*o**          | 5.40 us               | 3.33 us: 1.62x faster |
+------------------------------------------+-----------------------+-----------------------+
| Geometric mean                           | (ref)                 | 1.71x faster          |
+------------------------------------------+-----------------------+-----------------------+

Issue: Improve performances of fnmatch.translate #122288

picnixz · 2024-08-14T07:26:39Z

@barneygale Friendly ping in case you forgot you were interested in this PR.

Lib/fnmatch.py

barneygale · 2024-08-18T16:54:50Z

Could you add some timings to the PR description please?

barneygale

This is a nice improvement :)

Lib/fnmatch.py

Lib/test/test_fnmatch.py

Lib/fnmatch.py

Misc/NEWS.d/next/Library/2024-07-25-18-06-51.gh-issue-122288.-_xxOR.rst

…_xxOR.rst

Lib/fnmatch.py

dg-pb · 2024-08-23T10:13:07Z

Small optimization possibility - 1% for 1 pattern. Not sure about the others. Maybe slightly more readable too.

REP1 = repeat('\\')
REP2 = repeat(r'\\')
REP3 = repeat('-')
REP4 = repeat(r'\-')
_replace = str.replace

...

stuff = map(_replace, chunks, REP1, REP2)
stuff = map(_replace, stuff, REP3, REP4)
stuff = '-'.join(stuff)

And cache size. Not sure what it should be, are any rules for such?

Apart from this, LGTM.

picnixz · 2024-08-23T11:54:44Z

And cache size. Not sure what it should be, are any rules for such?

I kept the same cache size. For re.escape we could have a smaller cache though if we are assuming US glyphs. We could have a cache of 512 to handle latin-1 + special characters from foreign languages (next power after 256). What do you think? I put 32k but it's probably an overkill and we could probably be safe under 4096 different glyphs, even in foreign languages such as Chinese characters.

dg-pb · 2024-08-23T21:27:53Z

My thinking is as follows.

With 32K size, the max size of this application (with assumption that average string size is 20 characters) is:

In [39]: sys.getsizeof('a' * 20) * 2 * 32000 / 1024**2
Out[39]: 3.7   # MB

It is not much for standard machine, but is Python used on some micro platforms where that might be big? I have little experience with such and 4MB might not be an issue at all, but I would say it is worth looking into it.

I think the easiest would be to find other use cases of lru_cache in standard library and see what is maximum size limit for those. If those exist, then this could be answered easily without much effort.

Otherwise, if you can't find anything maybe someone else has any insights?

As a last resort, could make a conservative choice. My python starts with 5MB initial memory consumption. Take 5% of it, then, 2048 size would be that.

picnixz · 2024-08-23T22:06:53Z

I think the easiest would be to find other use cases of lru_cache in standard library and see what is maximum size limit for those. If those exist, then this could be answered easily without much effort.

fnmatch._compile_pattern takes 32k as a cache size. It is also documented as such. For re.escape, the cache could be much smaller (see my comment on the glyphs) so I think taking 2048/4096 would be enough.

but is Python used on some micro platforms where that might be big?

For microcontrollers, there is MicroPython and only a subset of Python is actually available. 4MB can be quite big but I don't think they would implement LRU cache that size.

dg-pb · 2024-08-23T22:11:10Z

fnmatch._compile_pattern takes 32k as a cache size.

In this case, 32K seems fine to me. Just follow the standard and if there are any issues in the future, these can be handled as a pair.

picnixz · 2024-08-27T17:09:06Z

@barneygale This one is the (only) remaining fnmatch PR that I decided to keep. I'd appreciate your opinion on the size of the cache for re.escape (I think we could live with maxsize=4096 instead of 32k).

Lib/fnmatch.py

The rationale for this change is as follows: re.escape() is only used to cache single Unicode characters in shell patterns; we may heuristically assume that they are ISO-8859-1 encodable, thereby requiring a cache of size 256. To allow non-traditional glyphs (or alphabets with a small number of common glyphs), we double the cache size.

picnixz · 2024-10-14T11:54:45Z

@barneygale friendly ping in case you forgot about this PR!

barneygale

LGTM, nice work :)

Lib/fnmatch.py

Co-authored-by: Barney Gale <[email protected]>

picnixz · 2024-10-18T03:15:22Z

I'm committing from my phone so hopefully it'll be fine. Otherwise I'll have a look at my return next week!

Lib/test/test_fnmatch.py

picnixz · 2024-10-22T12:35:58Z

@barneygale I've also updated the tests just to remember what the indices were so it should be ready to merge now (you can take a last look since I've stared waaaay to much at this PR).

picnixz added 3 commits July 25, 2024 18:10

improve performances of fnmatch.translate

7f15063

add tests

83d0904

blurb

275a1c7

bedevere-app bot mentioned this pull request Jul 25, 2024

Improve performances of fnmatch.translate #122288

Open

bedevere-app bot added the awaiting review label Jul 25, 2024

picnixz added performance Performance or resource usage stdlib Python modules in the Lib dir labels Jul 25, 2024

picnixz added 4 commits July 25, 2024 18:20

fix usages

e60d057

keep legacy version for glob

03217d7

actually not needed...

804da13

Merge branch 'main' into fnmatch-py-implementation

06a3652

barneygale self-requested a review July 27, 2024 16:12

picnixz mentioned this pull request Jul 28, 2024

Add C accelerator for fnmatch #121445

Closed

picnixz requested a review from serhiy-storchaka August 14, 2024 13:03

reduce the number of calls to str.join

baa6ce3

picnixz commented Aug 17, 2024

View reviewed changes

Lib/fnmatch.py Outdated Show resolved Hide resolved

micro-optimization on re.sub

80b22e0

barneygale reviewed Aug 18, 2024

View reviewed changes

address Barney's review

7a9a87c

dg-pb reviewed Aug 23, 2024

View reviewed changes

Lib/fnmatch.py Outdated Show resolved Hide resolved

picnixz commented Aug 23, 2024

View reviewed changes

Lib/fnmatch.py Outdated Show resolved Hide resolved

Misc/NEWS.d/next/Library/2024-07-25-18-06-51.gh-issue-122288.-_xxOR.rst Outdated Show resolved Hide resolved

Update Misc/NEWS.d/next/Library/2024-07-25-18-06-51.gh-issue-122288.-…

90539bc

…_xxOR.rst

dg-pb reviewed Aug 23, 2024

View reviewed changes

Lib/fnmatch.py Show resolved Hide resolved

use lower-case parameter names

1d52949

barneygale reviewed Aug 27, 2024

View reviewed changes

Lib/fnmatch.py Outdated Show resolved Hide resolved

barneygale reviewed Aug 27, 2024

View reviewed changes

Lib/fnmatch.py Outdated Show resolved Hide resolved

barneygale reviewed Aug 27, 2024

View reviewed changes

Lib/fnmatch.py Show resolved Hide resolved

barneygale reviewed Aug 27, 2024

View reviewed changes

Lib/fnmatch.py Outdated Show resolved Hide resolved

picnixz force-pushed the fnmatch-py-implementation branch from a205e6b to 0518912 Compare August 28, 2024 09:44

picnixz added 3 commits August 28, 2024 12:08

rename variable indices to star_indices

0226437

remove ambiguous comment about '?' case

01a5173

picnixz force-pushed the fnmatch-py-implementation branch from 0518912 to bb6c3ee Compare August 28, 2024 10:08

picnixz requested a review from barneygale October 3, 2024 18:58

barneygale approved these changes Oct 18, 2024

View reviewed changes

Lib/fnmatch.py Outdated Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting review labels Oct 18, 2024

Update Lib/fnmatch.py

c14ce4f

Co-authored-by: Barney Gale <[email protected]>

picnixz commented Oct 22, 2024

View reviewed changes

Lib/test/test_fnmatch.py Outdated Show resolved Hide resolved

Update Lib/test/test_fnmatch.py

38d3427

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-122288: Improve performances of `fnmatch.translate` #122289

gh-122288: Improve performances of `fnmatch.translate` #122289

picnixz commented Jul 25, 2024 •

edited

Loading

picnixz commented Aug 14, 2024

barneygale commented Aug 18, 2024

barneygale left a comment

dg-pb commented Aug 23, 2024 •

edited

Loading

picnixz commented Aug 23, 2024

dg-pb commented Aug 23, 2024

picnixz commented Aug 23, 2024

dg-pb commented Aug 23, 2024

picnixz commented Aug 27, 2024

picnixz commented Oct 14, 2024

barneygale left a comment

picnixz commented Oct 18, 2024

picnixz commented Oct 22, 2024

gh-122288: Improve performances of fnmatch.translate #122289

Are you sure you want to change the base?

gh-122288: Improve performances of fnmatch.translate #122289

Conversation

picnixz commented Jul 25, 2024 • edited Loading

picnixz commented Aug 14, 2024

barneygale commented Aug 18, 2024

barneygale left a comment

Choose a reason for hiding this comment

dg-pb commented Aug 23, 2024 • edited Loading

picnixz commented Aug 23, 2024

dg-pb commented Aug 23, 2024

picnixz commented Aug 23, 2024

dg-pb commented Aug 23, 2024

picnixz commented Aug 27, 2024

picnixz commented Oct 14, 2024

barneygale left a comment

Choose a reason for hiding this comment

picnixz commented Oct 18, 2024

picnixz commented Oct 22, 2024

gh-122288: Improve performances of `fnmatch.translate` #122289

gh-122288: Improve performances of `fnmatch.translate` #122289

picnixz commented Jul 25, 2024 •

edited

Loading

dg-pb commented Aug 23, 2024 •

edited

Loading