memoryview is a Sequence but does not implement the full Sequence API #125420

JelleZijlstra · 2024-10-13T23:44:33Z

Bug report

Bug description:

The memoryview builtin is registered as a Sequence, but doesn't implement the full API, because it doesn't inherit from Sequence at runtime and doesn't implement the mixin methods.

>>> import collections.abc
>>> issubclass(memoryview, collections.abc.Sequence)
True
>>> collections.abc.Sequence.index
<function Sequence.index at 0x10148d250>
>>> memoryview.index
Traceback (most recent call last):
  File "<python-input-4>", line 1, in <module>
    memoryview.index
AttributeError: type object 'memoryview' has no attribute 'index'

Among the methods listed for Sequence in the documentation, memoryview is missing __contains__, __reversed__, index, and count. This is causing problems for typing; in the typeshed stubs we have to either lie one way by claiming that memoryview has methods it doesn't have, or lie another way by claiming it is not a Sequence when issubclass() says otherwise at runtime (python/typeshed#12800). To fix this, we should either make memoryview not a Sequence, or add the missing methods. The former has compatibility implications, so I propose to add the methods in 3.14.

CPython versions tested on:

CPython main branch

Operating systems tested on:

No response

Linked PRs

The text was updated successfully, but these errors were encountered:

JelleZijlstra · 2024-10-13T23:47:49Z

cc @rhettinger for ABCs and @AlexWaygood for prior discussion on typeshed.

picnixz · 2024-10-14T00:13:14Z

I think it would be nice to have such addition. If we were to add them in 3.14, I'd like to help! (at least with one method; I think it's better if we don't add them all in one go).

JelleZijlstra · 2024-10-14T00:50:25Z

Go ahead, I haven't started coding yet.

ZeroIntensity · 2024-10-14T01:36:18Z

IIRC, the sequence protocol on memoryview works in the C API. You guys should be able to just use those functions to implement the methods.

picnixz · 2024-10-14T10:24:25Z

By the way:

>>> reversed(memoryview(b'a'))
<reversed object at 0x7f6ece3fd8b0>

works and we actually have tests for that. However, I think there is somewhere an implicit cast or a generic reversed that is being used via indices + length maybe? So I don't know whether we should have a dedicated reversed method or just dispatch to the generic implementation.

JelleZijlstra · 2024-10-14T12:59:55Z

Yes, reversed() works through len and getitem lookup. It may be worth adding that method if it makes reversing more efficient, but I'd also be OK with skipping that one and using some hacks in the stub; after all, it's unlikely people are calling __reversed__ directly.

__contains__ is in a similar situation; in already works on memoryviews. However, there it's likely that a contains implementation written directly for memoryview is significantly more efficient.

picnixz · 2024-10-14T13:02:16Z

Well... the __contains__ I used is just by iterating over the object for now. I actually haven't checked that __contains__ worked beforehand. I can definitely make it more efficient though if performances matter (it's just that the __contains__ does not exist per se).

ZeroIntensity · 2024-10-14T13:03:46Z

I took a quick glance at the PRs and they seem fine to me, but I would be very cautious about making sure that e.g. memoryview.index does the same thing as PySequence_Index, otherwise you're bound to break untold amounts of code relying on the buffer protocol.

JelleZijlstra · 2024-10-14T13:12:51Z

That's a good point (though I'm not sure there are untold amounts of code using PySequence_Index on memoryviews). PySequence_Index delegates to _PySequence_IterSearch in abstract.c, which works similar to the implementation in Bénédikt's PRs.

picnixz · 2024-10-14T13:15:39Z

... I forgot to finish the multi-dimensional case. And the fact that it's not only basic memoryviews that should be supported but the entire protocol... I'll need to check the multi-dimensional case carefully.

you're bound to break untold amounts of code relying on the buffer protocol.

(I'll try passing on that achievement)

JelleZijlstra · 2024-10-14T13:18:25Z

It appears iteration on multidimensional memoryviews currently isn't implemented at all, so we don't need to support it for these new methods either.

picnixz · 2024-10-15T08:26:10Z

I've implemented the missing (naive) interface. The performances should be better than just the generic implementations since we have less checks but some methods can be improved by directly accessing the underlying buffer instead of creating iterators.

If such performance gain is needed, we'll address those concerns in a separate issue.

ZeroIntensity · 2024-10-15T11:17:12Z

I can look closer at the PRs sometime today. I have a few general comments at a quick glance:

You can probably get some micro optimizations by using the thread state and the private API (if we even do that in the standard library, I'm not sure).
There any new assertions--those are good sanity checks, and especially useful when debugging.
Be careful to make sure that the buffer has everything it should. If you just copied over from the existing implementation, this should be fine (though, assertions are your friend here).

picnixz · 2024-10-15T11:26:27Z

Performances should really be addressed separately. We try to avoid using the private API if this is not performance critical or if it is exposed "almost directly" to the user. Before using the private API, we need benchmarks but I think it's first better not to do it in those PRs. We will definitely gain performances by directly accessing the buffers but those should be investigated.

For assertions, which place do you have in in mind? I can find some in the __reversed__ implementation since I'm making assumption on the inputs, so I can update it later but otherwise I can't find other places where assertions need to be required.

ZeroIntensity · 2024-10-15T11:29:35Z

When writing C code, I like to put assertions everywhere and anywhere. They add no overhead at runtime (they get compiled out on release builds) and make debugging a lot easier. Generally, if you make an assumption, add an assertion.

JelleZijlstra · 2024-10-15T14:19:23Z

The performances should be better

That's a dangerous statement; it would be good to run some microbenchmarks on operations like x in memoryview(), reversed(memoryview()).

I am going to wait a bit before merging any of the PRs here in case other core devs want to voice their opinion on whether this is a good idea.

rhettinger · 2024-10-17T23:58:35Z

I'm conflicted about this one.

On the plus side, this makes the APIs more consistent and will simplify the typeshed stubs.

On the minus side, it feels a bit off to implement (and have to test and maintain) an actual concrete implementation for functionality that no user has ever needed. It seems a little like the tail wagging the dog. In the past, we've usually applied a principle of parsimony and resisted API expansions until good use cases have arisen. That has helped avoid cruft accumulating for features than never get used.

picnixz · 2024-10-18T03:20:34Z

On the plus side, we could also benefit from possible improvements though that's just a gut feeling.

For instance materializing the reverse iterator is now slower but iterating over it is faster if the buffer is large enough.

This could also serve for the multidimensional case but I have no idea if this is a feature that we plan to have or not. The APIs were simple enough to implement so it didn't bother me btw.

JelleZijlstra · 2024-10-18T03:41:00Z

I feel that we should add count and index. memoryview is registered as a Sequence, and Sequence is documented as having those methods, so to me it seems like a bug that the methods don't exist on memoryview.

I don't feel as strongly about the missing dunders, __contains__ and __reversed__. Adding them makes the typing a bit more tidy, but we could manage without them. Adding those methods may produce performance wins, but there probably isn't a lot of real code that relies on reversing memoryviews or doing containment checks on them.

picnixz · 2024-10-18T04:02:32Z

Yeah index and count seem important since they are public and documented. For contains and reversed I think we can just rely on the default dunders.

I'm actually wondering but does "being a sequence" means "having a __contains__ method" or does it mean "x in seq" works? or is it not detailed on purpose?

JelleZijlstra added the type-bug An unexpected behavior, bug, or error label Oct 13, 2024

picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Oct 14, 2024

bedevere-app bot mentioned this issue Oct 14, 2024

gh-125420: implement Sequence.__contains__ API on memoryview objects #125441

Open

This was referenced Oct 14, 2024

gh-125420: implement Sequence.count API on memoryview objects #125443

Open

gh-125420: implement Sequence.index API on memoryview objects #125446

Open

bedevere-app bot mentioned this issue Oct 15, 2024

gh-125420: implement Sequence.__reversed__ API on memoryview objects #125505

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memoryview is a Sequence but does not implement the full Sequence API #125420

memoryview is a Sequence but does not implement the full Sequence API #125420

JelleZijlstra commented Oct 13, 2024 •

edited by bedevere-app bot

Loading

JelleZijlstra commented Oct 13, 2024

picnixz commented Oct 14, 2024

JelleZijlstra commented Oct 14, 2024

ZeroIntensity commented Oct 14, 2024

picnixz commented Oct 14, 2024

JelleZijlstra commented Oct 14, 2024

picnixz commented Oct 14, 2024

ZeroIntensity commented Oct 14, 2024

JelleZijlstra commented Oct 14, 2024

picnixz commented Oct 14, 2024

JelleZijlstra commented Oct 14, 2024

picnixz commented Oct 15, 2024 •

edited

Loading

ZeroIntensity commented Oct 15, 2024

picnixz commented Oct 15, 2024

ZeroIntensity commented Oct 15, 2024 •

edited

Loading

JelleZijlstra commented Oct 15, 2024

rhettinger commented Oct 17, 2024

picnixz commented Oct 18, 2024 •

edited

Loading

JelleZijlstra commented Oct 18, 2024

picnixz commented Oct 18, 2024 •

edited

Loading

memoryview is a Sequence but does not implement the full Sequence API #125420

memoryview is a Sequence but does not implement the full Sequence API #125420

Comments

JelleZijlstra commented Oct 13, 2024 • edited by bedevere-app bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

JelleZijlstra commented Oct 13, 2024

picnixz commented Oct 14, 2024

JelleZijlstra commented Oct 14, 2024

ZeroIntensity commented Oct 14, 2024

picnixz commented Oct 14, 2024

JelleZijlstra commented Oct 14, 2024

picnixz commented Oct 14, 2024

ZeroIntensity commented Oct 14, 2024

JelleZijlstra commented Oct 14, 2024

picnixz commented Oct 14, 2024

JelleZijlstra commented Oct 14, 2024

picnixz commented Oct 15, 2024 • edited Loading

ZeroIntensity commented Oct 15, 2024

picnixz commented Oct 15, 2024

ZeroIntensity commented Oct 15, 2024 • edited Loading

JelleZijlstra commented Oct 15, 2024

rhettinger commented Oct 17, 2024

picnixz commented Oct 18, 2024 • edited Loading

JelleZijlstra commented Oct 18, 2024

picnixz commented Oct 18, 2024 • edited Loading

JelleZijlstra commented Oct 13, 2024 •

edited by bedevere-app bot

Loading

picnixz commented Oct 15, 2024 •

edited

Loading

ZeroIntensity commented Oct 15, 2024 •

edited

Loading

picnixz commented Oct 18, 2024 •

edited

Loading

picnixz commented Oct 18, 2024 •

edited

Loading