Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enabled selecting between serial and OpenMP distance functions #529

Merged
merged 3 commits into from
Nov 12, 2015

Conversation

orbeckst
Copy link
Member

  • all functions in lib.distances acquired the mode= keyword;
    • "serial" selects the standard serial C/Cython code from lib._distances;
    • "OpenMP" selects the OpenMP parallelized C/Cython code from
      lib._distances_openmp
  • on systems without an OpenMP capable compiler, this will likely just mean
    that the OpenMP versions are actually serial (the OpenMP pragma should
    get ignored) --- need to be confirmed/tested
  • tests in test_distances were refactored so that both serial and OpenMP
    versions are tested
  • the current implementation is not very pretty (see lib.distances.select())
    but will make it very easy to add additional acceleration schemes such as
    CUDA or numba as long as the call signatures of the calc_* functions remain
    the same.

@orbeckst
Copy link
Member Author

Btw, I am very open for any comments on how to make this nicer, and I am also not very happy with the name mode for the keyword argument as used to determine if running in serial or OpenMP, e.g

d = MDAnalysis.lib.distances.distance_array(g.positions, ow.positions, box=u.dimensions, 
                                            mode="OpenMP")

@kain88-de
Copy link
Member

@orbeckst instead of mode you could use a boolean switch parallel=True/False. Or if the openMP code supports a variable number of jobs/CPUs some packages have adopted a jobs argument, stating the number of CPU's used. -1 then means use all available CPU's.

@jbarnoud
Copy link
Contributor

I agree with @kain88-de : mode is a rather vague keyword.

@richardjgowers richardjgowers self-assigned this Nov 11, 2015
pass
del importlib

def select(funcname, args=None, kwargs=None, mode="serial"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this _select so people don't try and use directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to _run().

@richardjgowers
Copy link
Member

Maybe backend instead of mode?

@jbarnoud
Copy link
Contributor

backend looks clearer already.  I would go for parallel.
Le 11 nov. 2015 18:54, Richard Gowers [email protected] a écrit :Maybe backend instead of mode?

—Reply to this email directly or view it on GitHub.

@orbeckst
Copy link
Member Author

On 11 Nov, 2015, at 09:54, Richard Gowers wrote:

Maybe backend instead of mode?

Funnily enough, that's exactly what I just changed it to. It's a bit technical but the best description I can think of. parallel is not good because parallel="serial" sounds really strange.

With backend we can also imagine backend="CUDA", backend="OpenCL", backend="SPIDAL" etc.

@hainm
Copy link
Contributor

hainm commented Nov 11, 2015

I am +1 for backend if you have more than 2 choices.

- all functions in lib.distances acquired the backend=<backend> keyword;
  - "serial" selects the standard serial C/Cython code from lib._distances;
  - "OpenMP" selects the OpenMP parallelized C/Cython code from
    lib._distances_openmp
- on systems without an OpenMP capable compiler, this will likely just mean
  that the OpenMP versions are actually serial (the OpenMP pragma should
  get ignored) --- need to be confirmed/tested
- tests in test_distances were refactored so that both serial and OpenMP
  versions are tested
- the current implementation is not very pretty (see lib.distances._run())
  but will make it very easy to add additional acceleration schemes such as
  CUDA or numba as long as the call signatures of the calc_* functions remain
  the same.
- 'cimport c_numpy; c_numpy.import_array()' replaced with
  'import numpy; cimport numpy'
- replaced ndarray.dimensions --> ndarray.shape
- re-ran cython to generate C files
@orbeckst orbeckst force-pushed the feature-selectparallel branch from 9c147ca to 789acd9 Compare November 11, 2015 22:06
@orbeckst
Copy link
Member Author

Rebased to include

  • mode --> backend
  • select() --> _run()

Also removed cimport c_numpy (see mailing list for additional discussion).

@hainm
Copy link
Contributor

hainm commented Nov 11, 2015

I am not sure why duplicate a lot of code here

$ diff distances_openmp.pyx distances.pyx 
19,20c19,20
< Parallel distance calculation library --- :mod:`MDAnalysis.lib._distances_openmp`
< =================================================================================
---
> Distance calculation library --- :mod:`MDAnalysis.lib._distances`
> =================================================================
22,23c22
< 
< Contains OpenMP versions of the contents of "calc_distances.h"
---
> Serial versions of all distance calculations
25a25,28
> cimport cython
> cimport c_numpy
> c_numpy.import_array()
> 
49a53
>     void minimum_image(double *x, float *box, float *inverse_box)
83c87
<                                    <double*>result.data) 
---
>                                    <double*>result.data)
255a260,307
> 
> 
> @cython.boundscheck(False)
> def contact_matrix_no_pbc(coord, sparse_contacts, cutoff):
>     cdef int rows = len(coord)
>     cdef double cutoff2 = cutoff ** 2
>     cdef float[:, ::1] coord_view = coord
> 
>     cdef int i, j
>     cdef double[3] rr;
>     cdef double dist
>     for i in range(rows):
>         sparse_contacts[i, i] = True
>         for j in range(i+1, rows):
>             rr[0] = coord_view[i, 0] - coord_view[j, 0]
>             rr[1] = coord_view[i, 1] - coord_view[j, 1]
>             rr[2] = coord_view[i, 2] - coord_view[j, 2]
>             dist = rr[0]*rr[0] + rr[1]*rr[1] + rr[2]*rr[2]
>             if dist < cutoff2:
>                 sparse_contacts[i, j] = True
>                 sparse_contacts[j, i] = True
> 
> 
> @cython.boundscheck(False)
> def contact_matrix_pbc(coord, sparse_contacts, box, cutoff):
>     cdef int rows = len(coord)
>     cdef double cutoff2 = cutoff ** 2
>     cdef float[:, ::1] coord_view = coord
>     cdef float[::1] box_view = box
>     cdef float[::1] box_inv = 1. / box
> 
>     cdef int i, j
>     cdef double[3] rr;
>     cdef double dist
>     for i in range(rows):
>         sparse_contacts[i, i] = True
>         for j in range(i+1, rows):
>             rr[0] = coord_view[i, 0] - coord_view[j, 0]
>             rr[1] = coord_view[i, 1] - coord_view[j, 1]
>             rr[2] = coord_view[i, 2] - coord_view[j, 2]
> 
>             minimum_image(rr, &box_view[0], &box_inv[0])
> 
>             dist = rr[0]*rr[0] + rr[1]*rr[1] + rr[2]*rr[2]
> 
>             if dist < cutoff2:
>                 sparse_contacts[i, j] = True
>                 sparse_contacts[j, i] = True

_distances['serial'] = importlib.import_module("._distances",
package="MDAnalysis.lib")
try:
_distances['OpenMP'] = importlib.import_module("._distances_openmp",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty minor, but can we make the backend selection case insensitive? I think you could just call this 'openmp' and then call .lower() on whatever string is passed.

- backend is now case-insensitive (as requested by @richardjgowers)
- missing/wrongly spelled backend now raises ValueError instead of
  RuntimeError
- added tests for both
richardjgowers added a commit that referenced this pull request Nov 12, 2015
enabled selecting between serial and OpenMP distance functions
@richardjgowers richardjgowers merged commit d168f86 into develop Nov 12, 2015
@richardjgowers richardjgowers deleted the feature-selectparallel branch November 12, 2015 17:04
@orbeckst
Copy link
Member Author

With OpenMP-versions available I can now implement #530 and remove lib.parallel.distances.

(This comment mainly serves to link to #530.)

orbeckst added a commit that referenced this pull request Nov 12, 2015
- the cython/prange-based/OpenMP versions of some of the distance calculations were
  slower and had fewer features than the C/OpenMP based ones that are now available
  with backend="OpenMP" as implemented in PR #529 so these ones (and there tests)
  are fully removed
- closes issue #530
- The whole lib.parallel module was also removed because at the moment there is nothing
  in it; we can revisit the discussion about organization of lib (issue #287) when
  necessary.
orbeckst added a commit that referenced this pull request Nov 12, 2015
- The cython/prange-based/OpenMP versions of some of the distance calculations were
  slower and had fewer features than the C/OpenMP based ones that are now available
  with backend="OpenMP" as implemented in PR #529 so these ones (and their tests)
  are fully removed.
- removed test for deprecated core.parallel.distances
- The whole lib.parallel module was also removed because at the moment there is nothing
  in it; we can revisit the discussion about organization of lib (issue #287) when
  necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants