Add local memory CUDA support to Kernel #126

adityapb · 2018-10-17T11:59:40Z

No description provided.

pep8speaks · 2018-10-17T11:59:43Z

Hello @adityapb! Thanks for updating the PR.

There are no PEP8 issues in the file examples/cpy/vm_kernel.py !
There are no PEP8 issues in the file pysph/cpy/low_level.py !
There are no PEP8 issues in the file pysph/cpy/tests/test_low_level.py !
There are no PEP8 issues in the file pysph/cpy/tests/test_translator.py !
There are no PEP8 issues in the file pysph/cpy/translator.py !
There are no PEP8 issues in the file pysph/cpy/transpiler.py !
There are no PEP8 issues in the file pysph/cpy/types.py !

Comment last updated on October 24, 2018 at 10:47 Hours UTC

prabhuramachandran

Looks like we should discuss this a bit. In the meanwhile can you address the comments I have made?

prabhuramachandran · 2018-10-21T18:51:58Z

pysph/cpy/translator.py

@@ -255,14 +259,17 @@ def get_struct_from_instance(self, obj):
        helper = CStructHelper(obj)
        return helper.get_code() + '\n'

-    def parse(self, obj):
+    def parse(self, obj, local_decl=None):


I don't like this too much and am not sure why you need to do this? Can you explain why the signature of the object method does not have the necessary information?

Also if this is going to be added, we should have a proper docstring as the context here is totally non-obvious. For the case of parse itself, obj is somewhat obvious but not the local_decl as it seems very CUDA specific.

prabhuramachandran · 2018-10-21T18:52:27Z

pysph/cpy/translator.py

        else:
            raise TypeError('Unsupported type to wrap: %s' % obj_type)
+        self._local_decl = None


Shouldn't this be set even when the type error is raised?

prabhuramachandran · 2018-10-21T18:53:02Z

pysph/cpy/translator.py

@@ -532,6 +539,26 @@ def visit_FunctionDef(self, node):
        args = self._get_function_args(node)
        body = '\n'.join(self._indent_block(self.visit(item))
                         for item in self._remove_docstring(node.body))
+        local_decl = ''
+        if self._local_decl:
+            decls = ['extern LOCAL_MEM float shared_buff[];']


This is very strange, why is it declared as a float?

I guess I am missing something with the way you define and use local memory in CUDA but I think this needs to be properly explained.

prabhuramachandran · 2018-10-21T18:59:38Z

pysph/cpy/translator.py

+        if self._local_decl:
+            decls = ['extern LOCAL_MEM float shared_buff[];']
+            for arg, dtype in self._local_decl.items():
+                if len(decls) == 1:


But len(decls) will always be 1 according to above!

prabhuramachandran · 2018-10-21T19:01:10Z

pysph/cpy/translator.py

+                                  '&%(prev_arg)s[size_%(prev_arg)s];')
+                    local_decl = local_decl % {'dtype': dtype, 'arg': arg,
+                                               'prev_arg': prev_arg}
+                    decls.append(local_decl)


This whole block of code is not at all intentional and it is very difficult to see what you are trying to do. I think it really needs to be refactored.

prabhuramachandran · 2018-10-22T21:05:50Z

pysph/cpy/translator.py

@@ -693,3 +694,141 @@ def __init__(self, detect_type=ocl_detect_type, known_types=None):

    def _get_self_type(self):
        return KnownType('GLOBAL_MEM %s*' % self._class_name)
+
+
+class CUDAConverter(CConverter):


Please try to reuse as much code as possible, derive this from OpenCLConverter so we don't have to update things in two places.

prabhuramachandran · 2018-10-22T21:06:47Z

pysph/cpy/translator.py

+    def _get_self_type(self):
+        return KnownType('GLOBAL_MEM %s*' % self._class_name)
+
+    def _get_function_args(self, node):


Again, please try to refactor to minimize cut/pasting.

prabhuramachandran

I like that this is cleaner but do want no cut/pasting of code as it makes it very hard to maintain. Please make the suggested changes. If possible, squash your unnecessary commits.

prabhuramachandran · 2018-10-22T21:09:40Z

pysph/cpy/translator.py

+        return code
+
+    def visit_FunctionDef(self, node):
+        assert node.args.vararg is None, \


Again a lot of similar code that is cut pasted. Any cut pasting makes it very hard to update things later. I don't mind adding an empty method to the base class that is only used for CUDA but you need to think about reducing cut/pasting and try to avoid sending PRs with any of it. :) Also please measure coverage for just the translator alone and ensure that the translator has almost 100% coverage.

prabhuramachandran

Looks much better. I am sorry to ask for more changes but the alignment can be a cause for subtle issues and errors, so I think it should be handled cleanly. You do have the local size information, so would it be possible to compute a 4 byte aligned quantity and use that for the size? At the very least this should be clearly documented.

prabhuramachandran · 2018-10-26T20:41:23Z

pysph/cpy/translator.py

+                    prev_arg = arg
+                else:
+                    local_decl = ('%(dtype)s* %(arg)s = (%(dtype)s*) '
+                                  '&%(prev_arg)s[size_%(prev_arg)s];')


There is one big problem here if the sizes are not aligned to 4 bytes as discussed in the cuda programming guide. So this should perhaps be carefully documented or perhaps a warning given if there is an alignment problem.

prabhuramachandran · 2018-10-30T19:02:38Z

I've filed an issue #130, for you to address. I will merge this for now. Thanks!

adityapb closed this Oct 17, 2018

adityapb reopened this Oct 17, 2018

prabhuramachandran requested changes Oct 21, 2018

View reviewed changes

prabhuramachandran reviewed Oct 22, 2018

View reviewed changes

prabhuramachandran requested changes Oct 22, 2018

View reviewed changes

prabhuramachandran mentioned this pull request Oct 24, 2018

Use new strides for kernel corrections. #127

Merged

adityapb force-pushed the cpy-jit branch 5 times, most recently from 7710883 to cefc028 Compare October 24, 2018 10:26

adityapb added 2 commits October 24, 2018 16:15

Local memory support to CUDA

021d7d6

Use defaults channel in travis instead of anaconda

9c49201

adityapb force-pushed the cpy-jit branch 2 times, most recently from 8c554ab to 21e79e1 Compare October 24, 2018 11:04

Refactor CUDAConverter and add tests

21e79e1

prabhuramachandran requested changes Oct 26, 2018

View reviewed changes

prabhuramachandran mentioned this pull request Oct 30, 2018

Fix the alignment issues with CUDA local memory #130

Open

prabhuramachandran approved these changes Oct 30, 2018

View reviewed changes

prabhuramachandran merged commit cf63c32 into pypr:master Oct 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add local memory CUDA support to Kernel #126

Add local memory CUDA support to Kernel #126

adityapb commented Oct 17, 2018

pep8speaks commented Oct 17, 2018 •

edited

Loading

prabhuramachandran left a comment

prabhuramachandran Oct 21, 2018

prabhuramachandran Oct 21, 2018

prabhuramachandran Oct 21, 2018

prabhuramachandran Oct 21, 2018

prabhuramachandran Oct 21, 2018

prabhuramachandran Oct 21, 2018

prabhuramachandran Oct 21, 2018

prabhuramachandran Oct 22, 2018

prabhuramachandran Oct 22, 2018

prabhuramachandran left a comment

prabhuramachandran Oct 22, 2018

prabhuramachandran left a comment

prabhuramachandran Oct 26, 2018

prabhuramachandran commented Oct 30, 2018

Add local memory CUDA support to Kernel #126

Add local memory CUDA support to Kernel #126

Conversation

adityapb commented Oct 17, 2018

pep8speaks commented Oct 17, 2018 • edited Loading

Comment last updated on October 24, 2018 at 10:47 Hours UTC

prabhuramachandran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prabhuramachandran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prabhuramachandran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

prabhuramachandran commented Oct 30, 2018

pep8speaks commented Oct 17, 2018 •

edited

Loading