finish the new convolution with fft

left todo:

Handling strides. There's no particular reason the input must be contiguous.
Setup -- how to handle plans, scratch space, etc.,
Now in the support_c_code, put into a shared variable to bypass the GC and only 1 plan for each apply node(now it is shared, so we can recreate it each time.
Optimization. This includes both optimizing the FFT version, and the point you mentioned, which is deciding when to use this version and when to use the non FFT-version.
Preventing usage when compiled against cuFFT < 3.2.
Testing.
Perhaps some code cleanup once all the dust has settled.
I have see segfault, but I don,t remember if they where fixed in the end.
why the c code is much faster then the python op version?
valid/same convolution?
1st version using subtensor
2nd version but doing it in the op to save computation

2010/12/10 Josh Bleecher Snyder :
> (1) Commenting out and then replacing different parts of the
> calculations suggest that, for the dimensions I looked at, 1/3 of the
> time is spent in the inverse fft, 1/3 of the time in
> add_across_images_and_normalize (which you suggest is optimizable),
> 1/12 the time in the other parts and 1/4 the time in...something
> mysterious that is left over after you comment out
> pad_images_and_kernels, elementwise_image_kernel_multiply,
> add_across_images_and_normalize, and both ffts.

Attachments

Related Tickets

Followers

finish the new convolution with fft

Related Tickets

Add people from your team or external to follow ticket activity

Followers will receive email updates about new ticket activity or emails sent to theano+603@tickets.assembla.com

Attachments

Related Tickets

Followers

finish the new convolution with fft

Granting access, please wait...

Related Tickets

Add people from your team or external to follow ticket activity

Followers will receive email updates about new ticket activity or emails sent to theano+603@tickets.assembla.com