Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metaclasses cannot have tp_new in Python 3.14 #1216

Closed
astrelsky opened this issue Sep 12, 2024 · 21 comments
Closed

Metaclasses cannot have tp_new in Python 3.14 #1216

astrelsky opened this issue Sep 12, 2024 · 21 comments

Comments

@astrelsky
Copy link
Contributor

python/cpython#123947

Of course it is not mentioned in the documentation for PyType_FromMetaclass where the deprecation is documented, but I think you are supposed to use __call__. I'm not 100% sure but I think simply switching to using __call__ in PyJPClass_hook instead of directly using PyJPClass_Type->tp_new should prevent this from being a problem for jpype. If PyJPClass_Type still has the tp_new slot it will raise a TypeError when calling PyType_FromSpecWithBases. I can't actually tell for sure if this will be a problem or if the current hacks just completely circumvent it.

JPPyObject vself = JPPyObject::call(PyJPClass_Type->tp_new(PyJPClass_Type, rc.get(), PyJPClassMagic));

@Thrameos
Copy link
Contributor

This likely won't work for us. This is part of their continued effort to make all of the machinery behind the scenes private. I tried their official way of doing things. The problem is that types are closed and there is no way to add the memory required for our meta class storage. If they want us to have any hope to proceeding forward, they will need to look at our code, understand our use case, and recommend a solution.

@Thrameos
Copy link
Contributor

The function we are doing with tp_new are the following....

  1. Check to see how we are being called. If called by a user rather than Java we produce an error. This is potentially movable to the "init" though that will occur after the point that type is being created.
    (*) if we are ever going to support extensions of Java classes directly this is the point were we have to do it. We will be walking the function tree for the requested class and moving methods out of the way to underbar so that JMethods can go in their place and create the stub class for the forward. (THIS MUST HAPPEN IN TYPE NEW!)

  2. Next we check for the type magic keyword. Only internal calls can satisfy this requirement.

  3. Call their new with our allocator which makes a new type with the space required for multiple inheritance.

  4. Change the slots. This should be a safety check in most cases. If the class was properly constructed our alloc and finalizer should have been installed.

  5. Remove "BAD" decisions such as Py_TPFLAGS_INLINE_VALUES which change the memory layout.

The second from meta is more drastic as it is replicating a heap type allocation entirely. It was added because Python does not support FromSpecWithBases which is a specialization. If we get them to accept a patch for that concept then it would reduce our burdens.

In theory the calls could be moved elsewhere though I suspect the type system has similar issues with those places because we have irregular class structures. Unfortunately every hour of my time is accounted for until the end of the fiscal year and then I have other real life concerns like finding food (account numbers). In the worst case scenario, I will have to accept a project that prohibits public contributions to free software. My employer uses JPype and in theory they should provide me an account to deal with these issues so that I can be responsive.

My recommendation would be to start a PEP to try to get the FromSepcWithBases accepted or direction how to put it in place. I will need to find a new path for the tp_new slot. But this will take me several weekends of time to get you an answer. I am still working the Python 3.13 issue.

@Thrameos
Copy link
Contributor

Okay I investigated this one. The problem is not with the PyType_FromMetaclass. It takes the meta class, module, spec, and bases so everything we need.

The problem is that I can't find any way of passing kwargs to the places that need them when creating new types. I always get

   class JInterface(_jpype._JObject, internal=True):  # type: ignore[call-arg]
TypeError: JInterface.__init_subclass__() takes no keyword arguments

The error message doesn't make much sense as I am creating JInterface class so of course it wouldn't be called. I tried adding __init_subclass__ to all the parents and the derived types. I just can't figure out how it gets to that error. Assuming that I can place the proper checks for our types (either in the _JClass or the _JObject implementation then I can complete this task. But for now I remain stumped. I will need some documentation on the path of type creation with metaclass from the C API to make further progress.

@Thrameos
Copy link
Contributor

Thrameos commented Sep 15, 2024

I have found a way to make the kwargs pass through. The problem appears to be that __init_subclass__ is not sticky so I can't expect it to pass properly through the object model. Unfortunately the same problem appears to happen for the tp_alloc slot, meaning that I can't allocate proper memory for the slots. This means I have to find every single path that results in a Java type object and try to find some place where I can set the slots before the first object is created.

Edit: Reading the code it appears that they should be sticky. I am guessing bugs in our calling path are sending Python down into dead code or other bad, bad places.

@Thrameos
Copy link
Contributor

I gave it a shot, but after 14 hours at fixing Python issues I am beat.

See #1217

@astrelsky
Copy link
Contributor Author

astrelsky commented Sep 15, 2024

I have found a way to make the kwargs pass through. The problem appears to be that __init_subclass__ is not sticky so I can't expect it to pass properly through the object model. Unfortunately the same problem appears to happen for the tp_alloc slot, meaning that I can't allocate proper memory for the slots. This means I have to find every single path that results in a Java type object and try to find some place where I can set the slots before the first object is created.

Edit: Reading the code it appears that they should be sticky. I am guessing bugs in our calling path are sending Python down into dead code or other bad, bad places.

I think I just found the cause of this specific issue, at least for JBoolean.

bases = PyTuple_Pack(1, &PyLong_Type, PyJPObject_Type);

It should be bases = PyTuple_Pack(2, &PyLong_Type, PyJPObject_Type);

@Thrameos
Copy link
Contributor

Thrameos commented Sep 15, 2024 via email

@astrelsky
Copy link
Contributor Author

astrelsky commented Sep 15, 2024

Excellent. I will be out on a hike for the day and try to get a few more hours on this later. I recommend rebasing against the Py3.13 branch latest. There were a number of instabilities in the prior attempts.

I probably should have said this in the beginning, but there really is no rush here. I was just trying to make noise to prevent the same situation we currently have with 3.13.

I do see the same issue with JInterface.

I had an idea to eliminate the need to have extra storage for the PyLong_Type subclasses . However, I can't seem to find where this would be needed. JBoolean, JByte, JInt, JShort and JLong are for the primitives so they don't need and currently don't appear to use a java slot.

I spent a few hours this morning setting this up to build with cmake and ninja so I can build and test in place without having to wait for everything to rebuild everytime and copy the output pyd. I also "cleaned" up the headers so now I can use modern tooling without having 12354684651383 errors thrown at me. I will hopefully be more helpful sooner rather than later.

Edit: Nevermind on not being able to find the javaslot on JInt.

@astrelsky
Copy link
Contributor Author

astrelsky commented Sep 15, 2024

Ok. So from what I can gather, the hassle of having to allocate additional storage for all the primitive types can go away. The only 2 which require a "java slot" are JClass and JObject.

Each JClass "java slot" should only hold a JPClass*. The reason for this is because the value you would get from JPValue::m_Value should be obtainable from JPClass::m_Class. Each JObject instance "java slot" will hold the jobject and the JPClass* can be obtained from it's type which is guarenteed to be a JClass. I think the primitive JInt, JLong, etc, types can be obtained in the same fashion but if not I can work around it with a miniscule performance hit. That only leaves JException` instances. The way I look at that is we can just add whatever we need to the python object to make it work. If you're throwing an exception across the barrier, you're already incurring a performance hit and whatever is added from this would only amount to a tiny fraction.

The benefit of doing it this way would allow jpype to work for older versions of Python and newer versions in the "same" way while allowing it to be done correctly with PyObject_GetTypeData and PyObject_GetItemData in versions where available. I'm hopeful this can be done without having to introduce another PEP, mostly because that is something I will never volunteer for. The reason I won't volunteer for writing a PEP is simply because having vague, unclear specifications, which likely won't make sense, is just a bad idea.

The downside of this, which I'm sure is quite obvious, is that it will require a lot of refactoring. I've already started and I think I can make this work, it's just going to take some time. It may be a month or two before I have anything done. Worst case scenerio, I'm completely wrong and I end up becoming much more familiar with jpype's internals which is beneficial anyways.

tl;dr we can take advantage of every Python object having a type and being able to store specific data in a type, to hold the JPClass* in the type. The primitive wrappers would probably all need to be declared in c++ instead of Python so the 8 bytes can be allocated and set correctly.

@Thrameos
Copy link
Contributor

The issue for the Java slot is for the boxed types as well as the overload. We do keep the JClass instance on JInt, JLong, etc for type matching when overloading. Thus all types require that extra information. Remember the main reason for those types it to resolve overloading the deciding the correct box type when box casting. In theory you could have a lookup table when performing the overload rather that a direct box lookup but that was punishing when performing overload matching.

I am going to try to get my son to write a PEP for moving user data ahead of the object into the opaque space.

@Thrameos
Copy link
Contributor

Thrameos commented Sep 15, 2024

If the sole goal is to avoid the memory model issues, then the solution is to allocate two types for every JClass, the abstract version obeys the class diagram tree of the Java and the concrete tree adds the memory on top of each specific type as needed (type, exception, long, float, object, exception). There would then be a pointer in the JClass which denotes the offset to the JValue. All of the new methods for the abstract would construct the concrete with the correct type. No extra memory is required though boxed types and primitives become a real pain as both of them have very hard coded layouts making meaning you will just end up fighting the next breakage. It just requires the extra association on the type hints and a bunch of extra layers to make sure the user can accidently create one of the abstract classes. The issue that I had when I tried that solution was that isa tests which seem trivial were broken. Everything is a derived class and not the type that people expected. So I abandoned it in favor of the "extra memory" solution. After all what is the point of having an allocator slot if you can't add to the memory profile?

The issue here is that the user level GC API in Python is an after thought and because it is off the beaten path, it breaks so regularly that it might as well not exist. I spend weekend after weekend tracing through Python code with debuggers deciding that the methods they gave me don't work and are corrupting method, and then I just abandon it in favor of a hack which works at least until they start adding opaque memory everywhere. For now my polymorph solution works for every known version of Python and hopefully we can negotiate a workable API going forward. It is a three line change to add a preheader size slot to the type structure, add the new opaque data to preheader size when the type is allocated, and then return that pointer when the opaque data is request. I tested it without issues on the release candidate, and it makes it so every object can have an inline dictionary if they want. So we should just press them for that solution in 3.14. Once they go to a grand unified object model we will use the user data slot. No need to redo jpype in the meantime. (It took me months to perform the testing and research to create this model in the first place.)

Also don't dismiss the overhead of the cost of name binding. As it is currently we do a few if statements and we immediately get a type slot we can test. The previous solution was the "dictionary option" in which we forced a managed dictionary on all of the types we created and then inserted a capsule in the dictionary with the value. The lookups for binding were 20+ times slower as though it seems trivial to access the dictionary and find the slot then ask for the capsule information, every one of those required many steps each of which had many guard conditions. (where is the dict pointer located in the object, is the dict null, is the key slot located, is the capsule null, is the type a capsule, does the capsule type match the requested key.) We are currently running nearly at the limit of the Python/JNI transfer rate which took a long time to achieve.

PS: It isn't so simple even if we did everything above board to work on many versions of Python. Every version has bugs/idiosyncrasies like the Init functions that forgot to increment the ref count on the parent type, so you won't escape testing every version. If there is a corruption problem in one version it can be a man month of time tracing to the source. Hence I am very loath to performing a rewrite.

@astrelsky
Copy link
Contributor Author

At this point I'm going to continue on anyway at least until I understand the root of the problem. I don't see how an additional dereference into the type to get the JPClass* would be problematic.

@Thrameos
Copy link
Contributor

Good news. I managed to get a version that works on all available versions of Python which does not use tp_new on a meta class. Please feel free to test it on Python 3.14.

@Thrameos
Copy link
Contributor

Thrameos commented Sep 16, 2024

Getting a JClass is fast, but remember there is no guarantee that it is a JClass. I used to test using IsInstance but that turned out to be order O(n) which given the rather large inheretance tree of Java was problematic. (I implemented fast isinstance and tried to get it added to python but nevet got traction.) Internally Python cheats by using a bit flag. The closest way to make a fast check was to set something in the slots that can't be defeated. Method testing was the easiest solution. The finalizer slot which we must have to clean up the jvalue reference is thus the best fast check we can make for a JClass.

@astrelsky
Copy link
Contributor Author

Getting a JClass is fast, but remember there is no guarantee that it is a JClass. I used to test using IsInstance but that turned out to be order O(n) which given the rather large inheretance tree of Java was problematic. (I implemented fast isinstance and tried to get it added to python but nevet got traction.) Internally Python cheats by using a bit flag. The closest way to make a fast check was to set something in the slots that can't be defeated. Method testing was the easiest solution. The finalizer slot which we must have to clean up the jvalue reference is thus the best fast check we can make for a JClass.

What I was thinking about doing, since all the jpype types would be moved into c++, is creating a large char[] of all the jpype type names aligned to a boundary. Ten pointers into the table would be used for spec->name and then set tp_name directly after creation. A copy of the name is made that they deallocate later because tp_name is public API and they expect it to be setable independently. This would allow taking the tp_name, checking if it falls within the table, and then doing arithmetic on the pointer to get an index into an array of JPClass* for the special types.

Even if it doesn't work out with tp_name there is bound to be another similar possibility.

@Thrameos
Copy link
Contributor

Doesn't that mean the data must grow with every new type that is imported from java. That seems rather expensive.

If you are going that way it is better to look at the __mro__.

>>> jpype.java.lang.String.__mro__
(<java class 'java.lang.String'>, <java class 'java.io.Serializable'>, <java class 'java.lang.Comparable'>, <java class '_jpype._JComparable'>, <java class 'java.lang.CharSequence'>, <java class 'java.lang.constant.Constable'>, <java class 'java.lang.constant.ConstantDesc'>, <java class 'java.lang.Object'>, <java class '_jpype._JObject'>, <class 'object'>)
>>> jpype.java.lang.String.__class__.__mro__
(<class '_jpype._JClass'>, <class 'type'>, <class 'object'>)
>>> jpype.java.lang.RuntimeException.__mro__
(<java class 'java.lang.RuntimeException'>, <java class 'java.lang.Exception'>, <java class 'java.lang.Throwable'>, <java class 'JException'>, <java class '_jpype._JException'>, <class 'Exception'>, <class 'BaseException'>, <java class 'java.io.Serializable'>, <java class 'java.lang.Object'>, <java class '_jpype._JObject'>, <class 'object'>)
>>> jpype.JInt[:].__mro__
(<java class 'int[]'>, <java class '_jpype._JArrayPrimitive'>, <java class '_jpype._JArray'>, <java class 'java.lang.Object'>, <java class '_jpype._JObject'>, <class 'object'>)

Notice that all java types have _JObject in the second position to last or _JClass in the third position. You can arrange for _JClass to have a _JObject and order the __mro__ to always place _JObject in the second to last spot. That was the basis of the "fast" type check method which was in the second rewrite of JPype. It is still slower than the tp_finalize check though as you need to test the tuple length, access the tuple position, verify the type.

@astrelsky
Copy link
Contributor Author

astrelsky commented Sep 16, 2024

Doesn't that mean the data must grow with every new type that is imported from java. That seems rather expensive.

Nope. It would only be the jpype types and not instances of them. It would just be a static table in read-only memory.

For instances of JClass it would at most require an additional dereference to get the JClass type.

@Thrameos
Copy link
Contributor

Thrameos commented Sep 16, 2024

Then I am not quite sure I understand the solution.

>>> jpype.java.lang.String.__class__.__name__
'_JClass'
>>> jpype.java.lang.String("A").__class__.__name__
'java.lang.String'

If you want to test a String instance then the tp_name of its class with be java.lang.String. I can see it working if you test

if (Py_TYPE(obj) == PyJPClass_Type ||  Py_TYPE(Py_TYPE(obj)) == PyJPClass_Type )  { /* isa java type */ }

though I don't see how that is any different than

if (Py_TYPE(obj)->tp_finalizer == PyJPValue_finalize) { /* isa java type */ }

@astrelsky
Copy link
Contributor Author

This is just an example, it is incomplete. I only put enough to present the idea.

alignas(0x100) static const char NAME_TABLE[] = {
/* 0x00 */ '_', 'j', 'p', 'y', 'p', 'e', '.', '_', 'J', 'C', 'l', 'a', 's', 's', 0x0, 0x0,
/* 0x10 */ '_', 'j', 'p', 'y', 'p', 'e', '.', 'J', 'B', 'o', 'o', 'l', 'e', 'a', 'n', 0x0,
/* 0x20 */ '_', 'j', 'p', 'y', 'p', 'e', '.', 'J', 'B', 'y', 't', 'e', 0x0, 0x0, 0x0, 0x0,
/* 0x30 */ '_', 'j', 'p', 'y', 'p', 'e', '.', 'J', 'S', 'h', 'o', 'r', 't', 0x0, 0x0, 0x0,
/* 0x40 */ '_', 'j', 'p', 'y', 'p', 'e', '.', 'J', 'I', 'n', 't', 0x0, 0x0, 0x0, 0x0, 0x0,
/* 0x50 */ '_', 'j', 'p', 'y', 'p', 'e', '.', 'J', 'L', 'o', 'n', 'g', 0x0, 0x0, 0x0, 0x0,
/* 0x60 */ '_', 'j', 'p', 'y', 'p', 'e', '.', '_', 'J', 'A', 'r', 'r', 'a', 'y', 0x0, 0x0,
/* 0x70 */ '_', 'j', 'p', 'y', 'p', 'e', '.', 'J', 'F', 'l', 'o', 'a', 't', 0x0, 0x0, 0x0,
/* 0x80 */ '_', 'j', 'p', 'y', 'p', 'e', '.', 'J', 'D', 'o', 'u', 'b', 'l', 'e', 0x0, 0x0,
};


const char *const JClassName = NAME_TABLE + 0x00;
const char *const JBooleanName = NAME_TABLE + 0x10;
const char *const JByteName = NAME_TABLE + 0x20;
const char *const JShortName = NAME_TABLE + 0x30;
const char *const JIntName = NAME_TABLE + 0x40;
const char *const JLongName = NAME_TABLE + 0x50;
const char *const JArrayName = NAME_TABLE + 0x60;
const char *const JFloatName = NAME_TABLE + 0x70;
const char *const JDoubleName = NAME_TABLE + 0x80;

const uintptr_t NAME_TABLE_START = (uintptr_t) NAME_TABLE;
const uintptr_t NAME_TABLE_END = NAME_TABLE_START + sizeof(NAME_TABLE);


PyTypeObject *PyJPJint_Type = nullptr;
PyType_Spec numberJintSpec = {
	JIntName,
	0,
	0,
	Py_TPFLAGS_DEFAULT,
	0
};

void sample(PyObject* module) {
	PyObject *bases;

	bases = PyTuple_Pack(1, &PyJPNumberLong_Type);
	PyJPJint_Type = (PyTypeObject*) PyJPClass_FromSpecWithBases(&numberJintSpec, bases);
	Py_DECREF(bases);
	JP_PY_CHECK(); // GCOVR_EXCL_LINE
	PyJPJint_Type->tp_name = JIntName;
	// alias this in types.py
	PyModule_AddObject(module, "JInt", (PyObject*) PyJPJint_Type);
	JP_PY_CHECK(); // GCOVR_EXCL_LINE

}

enum class PyJPTypeKind {
	CLASS,
	BOOLEAN,
	BYTE,
	SHORT,
	INT,
	LONG,
	ARRAY,
	FLOAT,
	DOUBLE,
	INVALID
};

static inline bool isJavaType(PyTypeObject *type) {
	if (type == nullptr) {
		return false;
	}
	uintptr_t addr = (uintptr_t) type->tp_name;
	return addr >= NAME_TABLE_START && addr < NAME_TABLE_END;
}

// place somewhere it can be inlined
// this gives you something you can switch over or use as an array index
inline PyJPTypeKind getJavaTypeKind(PyObject *self) {
	PyTypeObject *type = Py_TYPE(self);
	if (!isJavaType(type)) {
		type = Py_TYPE(type);
	}
	if (!isJavaType(type)) {
		return PyJPTypeKind::INVALID;
	}
	return PyJPTypeKind((((uintptr_t)type->tp_name) >> 4) & 0xf);
}

@Thrameos
Copy link
Contributor

Thrameos commented Sep 17, 2024

I am still not sure that this solves a problem. All calls to get a slot are on a derived class so tp_name will never be one of those checked for. When the derive class is created you can set the offset of in a regular place like the m_Doc field. So sorting it out isnt the issue.

The problem is that some types are varible size and others have incompatible basicsizes. This makes it such the describing the slot location depends on the object if after the var section. And challenging to build in the first place if in basicsize. Java insists that Exception and Boxed types come from Object. Python insists those classes can't change the basicsize. I recommend starting the same way I did by creating a dummy Python module that inherits each of the base types in the order that Java requires. After a few months of playing I gave up and went with the allocator solution. It was the only solution that maintained the isinstance checks. Perhaps you will find differently.

@Thrameos
Copy link
Contributor

I did consider the solution where the object tree was flat such the classes did not actually derive from each other and all gave fake isinstance and subclass checks. Though technically that passes, it was complex on the C backed.

@marscher marscher mentioned this issue Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants