Should we unroll the full patch multiplication when the image and kernel fit in shared memory?

maybe we can have a good speed up by unrolling the full patch
multiplication instead of just the multiplication by a row? This could be
faster? What is the impact on the number of register?

Activity
Attachments 0
Related Tickets 0
Followers 3
Time

Related Tickets

Add people from your team or external to follow ticket activity

none
nouiz
josharian

Attachments

Related Tickets

Followers

Should we unroll the full patch multiplication when the image and kernel fit in shared memory?

Granting access, please wait...

Related Tickets

Add people from your team or external to follow ticket activity

Followers will receive email updates about new ticket activity or emails sent to theano+460@tickets.assembla.com