parallel compilation for multi-cpu machines
All of the make_thunks() could be run in parallel with threads. Even python threads are good enough, because the work is done in separate processes (gcc or nvcc). This would make compilation much faster on multi-cpu machines.
Leave a comment