You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🐛 build cuda_codegen of densenet161-fp16.onnx failed
Now I can generate densenet161-fp32 and densenet161-fp16 cudacode and have a correct output, but for fp16, I got a builded failed issue, and the message is :
error: more than one operator "+" matches these operands:
built-in operator "arithmetic + arithmetic"
function "operator+(const __half &, const __half &)"
operand types are: double + half
According to my research, this was caused by that cuda does't implement operator overload for datatype half :
extern "C" __launch_bounds__(49) __global__ void BatchNormInference_half_half_half_half_half_half_cuda_BatchNormInference_1049(half* input0, half* input1, half* input2, half* input3, half* input4, half* output0)
{
const int st = blockIdx.x * 7 * 7;
const int c_id = blockIdx.x % 736;
#pragma unroll 1
for (int i = threadIdx.x; i < 7 * 7; i += blockDim.x)
{
output0[st + i] = (input1[c_id] + (input0[c_id] * (input2[st + i] - input3[c_id]) / sqrtf(1e-05 + input4[c_id])));
}
}
It worked when I reconstruct the expression to below:
Currently I rewrite the cuda::BatchNormNCHW::emit_function_body() to solve this problem, But I think there's another solution, to reference cutlass code, implement operator overload of half datatype, for example:
🐛 build cuda_codegen of densenet161-fp16.onnx failed
Now I can generate densenet161-fp32 and densenet161-fp16 cudacode and have a correct output, but for fp16, I got a builded failed issue, and the message is :
According to my research, this was caused by that cuda does't implement operator overload for datatype half :
It worked when I reconstruct the expression to below:
I'm now trying to figure it out.
The text was updated successfully, but these errors were encountered: