-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cython xdrlib #441
Cython xdrlib #441
Conversation
assert_equal(self.ts.has_forces, True) | ||
assert_array_almost_equal(self.ts.forces, self.refpos + 101) | ||
|
||
# class TestXTCTimestep(_TestTimestep, _XTCTimestep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there anything really special about the old XTCTimestep? I didn't could so far to everything with the base.Timestep. If there is nothing special I'll remove these test as the class doesn't exist anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some stuff for keeping the status of the Reader and _frame
which is the trajectories opinion of the frame, rather than MDA's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and the unitcell isn't standard either. Timestep._unitcell
should be the native format representation of the box, gromacs has 3 vectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I already use the native format with the conversion functions defined in 'lib.mdamath'. If this is all then we don't have a reason that to keep that special Timestep class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that is if you write out your unit cell again you might end up with rounding errors (89.999 angles) , which then isn't a cuboid any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this code that is actually what MDAnalysis is doing currently.
unitcell = self.convert_dimensions_to_unitcell(ts).astype(np.float32) # must be float32 (!) |
in a convoluted way currently we are directly using for every save
box = triclinic_vectors(triclinic_box(box_vectos))
But I'll definitely run a check how often that can be done until the errors sum.
ef4dddb
to
c327d35
Compare
I think another thing that's going to have to be checked is performance. Probably just something simple like iterating through a huge trajectory and seeing how fast it goes |
I'll do a performance check once I implemented the old seek code again. For proper performance tests we should have a least two files. One with a large number of frames and another with a large number of atoms, I'm think that that memory allocations could be a problem with a large number of atoms per frame. But I'll have to test that first. |
self.n_atoms = len(sub) | ||
else: | ||
self.n_atoms = self.xtc.n_atoms | ||
self.xtc.seek(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to put the trajectory before frame 0? The Reader should read the first frame, so that calling next reads the second frame
u = mda.Universe(etc)
u.trajectory.next() # should be second frame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK I didn't know that. I added a TODO for me and I will also add tests for all readers.
94eb23e
to
1a98b6d
Compare
#474 definitely was a good idea. The new tests are already catching stuff that I didn't notice before. |
Yeah I was going to move a few Readers over into the new tests and I'm half expecting to find a few nice little bugs. Don't worry about the unitcell thing too much, I'm just wary of boxes having an angle of 89.99. There's a huge difference between that and 90.0. |
3bd46ae
to
793d85d
Compare
8df826c
to
925d192
Compare
OK finally all the tests are passing, old and new! This means that just the optimized seeking is left to do + some cleaning up. As far as performance goes this isn't doing to bad. Version 0.12.1 needs 24s to run the xdr tests and this needs 28s. But my hope is that this will go away once I optimized the seeking behavior |
BTW is there something super special about the single frame readers? What is the expected behavior of the Readers when the trajectory only contains 1 frame? |
I think with the SingleFrameReader, what was happening previously was each Reader had an implementation of |
6ecd7cf
to
09058db
Compare
4797572
to
de978f5
Compare
cdef np.ndarray dims = np.array([est_nframes], dtype=np.int64) | ||
cdef np.ndarray _offsets = ptr_to_ndarray(<void*> offsets, dims, np.NPY_INT64) | ||
print("read {} frames, estimated {}".format(n_frames, est_nframes)); | ||
print("first offset = {}".format(offsets[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm finally trying to read the offsets from the file in the same way we used to before. But I can't quite seem to make if possible. This code compiles but doesn't interface with the correctly with the C xdrlib. offsets
is still a NULL-ptr
after calling the read_xtc_n_frames
. This shouldn't have happened. Anyone has an idea of why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the memory is allocated in the xdrlib but the pointer at the cython level is never updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This [SO}(http://stackoverflow.com/questions/1398307/how-can-i-allocate-memory-and-return-it-via-a-pointer-parameter-to-the-calling#1398321) posts answers it. If I want to change the address a pointer points to in a function I have to pass a pointer to that pointer. Ok so offset reading works I just need to include it in the API (YEAH)
ba87760
to
54f3b3c
Compare
raise RuntimeError('Trying to seek over max number of frames') | ||
print(frame, self.offsets[frame]) | ||
# print(xdrlib.xdr_seek(self.xfp, self.offsets[frame], xdrlib.SEEK_SET)) | ||
print(xdrlib.xdr_seek(self.xfp, 0, xdrlib.SEEK_SET)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hitting a wall again here. Every time I'm calling this function from python I get SystemError: error return without exception set
. I have no clue where that comes from. The xdrlib functions seem to be called correctly if I add printf
statements there. As far as I can make it out there are no errors occuring. @mnmelo did you see similar things implementing the seeking in SWIG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... didn't bump into that one. My first reaction was to blame the frame type, as they always must be 64bit (large file access, and the such).
Another thing I noticed is that, in xdrfile.c
/xdrfile.h
the exdr_message
is being built with exdrNR-1
members. This might be related, since exdrNR
is out-of-bounds, and that's what seek returns on error. Maybe you can change xdrstdio_setpos
or xdr_seek
to return another error (though none of the available ones really fit it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked exdr_message
so far but that will be useful to get better errors messages. So i'm sure that is not the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea... I couldn't also tell it would be the cause, but do flag it as problematic because exdrNR
shouldn't be used for an error code and it will break when we attempt to retrieve the error message from it.
Anyway, a Google search led me to http://pythonextensionpatterns.readthedocs.org/en/latest/exceptions.html, where it is pointed out that somewhere a function is returning NULL
without setting an exception. None of the seek functions can return a NULL
...
So I think I'm done with the largest part of the implementation. The offsets are now also calculated directly in the xdrfile-library. I've started to do some benchmarks. The test-suite runs in the same time. But when I try to benchmark bigger files I don't get conclusive results. I have a gut feeling that this version is faster but I can't really prove it right now. It would be nice if others could also benchmark and test this code. When I have time this week I'll run some more tests on different systems. 1.7 GB XTCMDA v0.12.1
MDA this PR
150 MB XTCMDA v0.12.1
MDA this PR
|
Also I'm not sure why the travis tests are failing. The files exists in the repository and when I do a fresh checkout everything works on my laptop. |
This also includes information how to read the content of the files in ascii without using MDAnalysis
Base test classes in test files seem to be needed to start with an underscore otherwise nose picks them up and tries to lets their tests run.
09ac232
to
afcd9da
Compare
@richardjgowers I added the changelog information. You can merge this if you like. |
|
||
Enhancement | ||
* Offsets reading for xtc/trr files has been speed up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sped up (exception in english)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
2dc85bd
to
6002e9f
Compare
Awesome. So afaik it's just DCD and lots of six things that need doing now? |
@richardjgowers hopefully yes. But the dcd will be a big one again. |
See #419
This is the progress I made on the cython port of xdrlib. Most tests are passing, some are failing because currently XTC/TRR-File has less features then the old version. I'll add them later, see list below). The reimplemented Readers/Writers are now just a minimal subclass of
base.Reader
. I'd like to imagine this is easier to understand then the previous codeThings that have happened
short list of what I did.
TODO