Write a kernel in Kotlin, run some magic and voilà. A freshly baked kernel wrapper ready to be executed.
This is a proof of concept to provide a way to write Cuda kernels in Kotlin.
The Kotlin code is transpiled from Kotlin into CPP/Cuda source code.
That code is then compiled to a ptx
file with nvcc
to be executed.
First, define a kernel class with the @Kernel
annotation.
Only supports 1 @Kernel
per file.
Declare a global function with the @Global
annotation.
Only supports 1 @Global
per kernel.
@Kernel
class SaxpySample: KudaContext() {
@Global
fun saxpy(n: Int, a: Float, x: FloatArray, @Return y: FloatArray) {
val i: Int = blockIdx.x * blockDim.x + threadIdx.x
if (i < n) y[i] = a * x[i] + y[i]
}
}
Run some generator magic.
This is currently achieved as a gradle task.
Have a look at the sample
project,
kuda
task for details.
After code generation, a kernel call wrapper is available.
fun main() {
val saxpy = SaxpySampleWrapper()
val a = 0.5f
val x = FloatArray(116) { it.toFloat() }
val y = FloatArray(116) { -1.0f }
val res = saxpy(KernelParameters.for1D(x.size), 10, a, x, y)
println(res.joinToString())
}
All the boilerplate code is in the wrapper. This uses jCuda to forward the kernel call to the graphics card.
#⚠ Very experimental.
Currently supports some basic C-like operations with a lot of restrictions.
Anything going beyond the simple examples shown here would require a lot of parsing and processing, or a lot of manual implementation to achieve anything remotely useful (complete math lib, most used Cuda functions).
Kotlin data types are mapped to their C equivalent according to the following table.
Kotlin | C++ |
---|---|
Byte | char |
Short | short |
Int | int |
Long | long |
Float | float |
Double | double |
BooleanArray | Not supported1 |
ByteArray | char * |
ShortArray | short * |
IntArray | int * |
LongArray | long * |
FloatArray | float * |
DoubleArray | double * |
Uses kotlin 1.3 experimental unsigned types.
Casts are supported with the variable.toXxx()
kotlin cast notation for all primitives types
expect between float and unsigned types as Kotlin
doesn't propose it.
Tested operators are
Arithmetic
+
+
unary-
-
unary*
multiplication/
%
++
prefixed--
prefixed++
postfixed--
postfixed
Relational
(
)
priority, not function call>
<
>=
<=
Logical
&&
||
!
Binary
&
|
^
Kotlin | C++ |
---|---|
and | & |
or | | |
xor | ^ |
while
if
for
is explicitly not supported as the syntax are very different.
While will do the job just fine.
Supports C matrix notation int [][] foo
and int ** foo
with nested arrays: val foo: Array<IntArray>
only inside the kernel.
Passing such arguments via the wrapper is not supported.
A lot... ʘ︵ʘ
No conversion for Char
s and CharArray
s.
Don't use a valid variable kotlin name which is a C++ keyword,
such as extern
, bool
, unsigned
, ...
Names are not resolved. Use threadIdx.x
, not KudaContext.threadIdx.x
.
Kotlin types are converted by name (java.lang.Class.getSimpleName).
Using a class named BoolArray will make it translated to
bool *
whichever package it comes from.
val b = true
will not work. Use explicit types val b:Boolean = true
No support for for(x in xs) { ... }
None so far.
Not tested
Kotlin forbids the reassignment of function parameters. Either redeclare the variable, or use a 1 element array.
Limited to binary operators special cases
Lots of them in the code ! This is a section for TODOs which are not bound to a specific code location
Grab all the nVidia doc and try their samples
Formalize the code generator kuda
task as gradle plugin.
Especially check for paths validity. The rest should be handled by the libs.
Propose placeholders for all the Cuda functions.
Map C struct to kotlin data classes
val i:Long = 1
val j:Int = i.toInt()
Should be translated as
long i = 1
int j = (int) i
Same for toFloat
, toDouble
, ...
In Kotlin/Java int, double... have initial values. Also init these values in C.
Write your kernels in C, with the true Cuda API and call them from the JVM.
Kuda is not translating from bytecode to cuda. It's source to source.
For a bytecode to kernel approach, you may have a look at aparapi which provides such a mechanism for OpenCL.
https://github.com/nativelibs4java/JavaCL
Attempting to provide the cuda basing math operations turned to be much harder than anticipated. All the functions are declared in CPP headers. These headers use non trivial templates and macros which makes the parsing of these headers either incomplete or very hard with the tools/knowledge I have (ANTLR, beginner CPP experience).
To anyone who may what to do something similar to this project and what to go further, have a look at CPP headers parsing, the way variables are passed as pointers, how functions are declared for both the device and the host.
There are challenges to overcome in these areas to make kotlin a real alternative to writing C++ for Cuda.
Footnotes
-
The size of a Boolean is JVM implementation dependant and JCuda doesn't offer a way to get a boolean's size nor pointer to a boolean array. As a workaround, use any of the integer types. ↩