finish the new convolution with fft
left todo:
2010/12/10 Josh Bleecher Snyder :
> (1) Commenting out and then replacing different parts of the
> calculations suggest that, for the dimensions I looked at, 1/3 of the
> time is spent in the inverse fft, 1/3 of the time in
> add_across_images_and_normalize (which you suggest is optimizable),
> 1/12 the time in the other parts and 1/4 the time in...something
> mysterious that is left over after you comment out
> pad_images_and_kernels, elementwise_image_kernel_multiply,
> add_across_images_and_normalize, and both ffts.
- Handling strides. There's no particular reason the input must be contiguous.
- Setup -- how to handle plans, scratch space, etc.,
- Now in the support_c_code, put into a shared variable to bypass the GC and only 1 plan for each apply node(now it is shared, so we can recreate it each time.
- Optimization. This includes both optimizing the FFT version, and the point you mentioned, which is deciding when to use this version and when to use the non FFT-version.
- Preventing usage when compiled against cuFFT < 3.2.
- Testing.
- Perhaps some code cleanup once all the dust has settled.
- I have see segfault, but I don,t remember if they where fixed in the end.
- why the c code is much faster then the python op version?
- valid/same convolution?
- 1st version using subtensor
- 2nd version but doing it in the op to save computation
2010/12/10 Josh Bleecher Snyder :
> (1) Commenting out and then replacing different parts of the
> calculations suggest that, for the dimensions I looked at, 1/3 of the
> time is spent in the inverse fft, 1/3 of the time in
> add_across_images_and_normalize (which you suggest is optimizable),
> 1/12 the time in the other parts and 1/4 the time in...something
> mysterious that is left over after you comment out
> pad_images_and_kernels, elementwise_image_kernel_multiply,
> add_across_images_and_normalize, and both ffts.
Leave a comment