all GpuConvOp kernel should allow non c_contiguous memory access
use template to implement this as this make it use more registers.
TODO: conv_patch,conv_full_patch_split, conv_full_patch
DONE: conv_patch_split_stack while merged into conv_patch_stack
TODO: conv_patch,conv_full_patch_split, conv_full_patch
DONE: conv_patch_split_stack while merged into conv_patch_stack
Leave a comment