unflip filters to be more c_contiguous
In the back-propagation we flip the filters. We could unflip the ndarray
and don't flip it when we load in shared memory. That way we will be able
to use the version that is c_contiguous more frequently. As this version
use less register we will have a speed up when this allow to have a higer
occupency.
and don't flip it when we load in shared memory. That way we will be able
to use the version that is c_contiguous more frequently. As this version
use less register we will have a speed up when this allow to have a higer
occupency.
Leave a comment
on 2010-01-26 15:08 *
By nouiz
Assigned to changed from none to -none-
Component changed from administrative stuff to code
Priority changed from Highest (1) to Normal (3)
TODO:
-conv_rows_stack{2,3}(21% for test_lenet_64, 12% for test_lenet_108) are they limited
by registers?
-conv_patch_stack(32% for test_lenet_108) are they limited by registers?
DONE: conv_patch_stack_reduce, conv_full_patch_stack_padded
-conv_rows_stack{2,3}(21% for test_lenet_64, 12% for test_lenet_108) are they limited
by registers?
-conv_patch_stack(32% for test_lenet_108) are they limited by registers?
DONE: conv_patch_stack_reduce, conv_full_patch_stack_padded
TODO:
-conv_rows_stack{2,3}(21% for test_lenet_64, 12% for test_lenet_108) are they limited
by registers?
-conv_patch_stack(32% for test_lenet_108) are they limited by registers?
DONE: conv_patch_stack_reduce, conv_full_patch_stack_padded
-conv_rows_stack{2,3}(21% for test_lenet_64, 12% for test_lenet_108) are they limited
by registers?
-conv_patch_stack(32% for test_lenet_108) are they limited by registers?
DONE: conv_patch_stack_reduce, conv_full_patch_stack_padded