minimize memory usage of join
This is specially usefull on gpu.
We could optimize the memory usage of join. We could preallocate the
output of join and give the sub part to each op for its output.
this case:
join(var1,var2,...)
we can determine the size to preallocate with infer_shape if the var
have an owner that implement infer_shape, otherwise with the shape object.
Then we can use subtensor to get the good part for each var.
1. use set_subtensor to put the output into the good part? Do this do what we want?
2. Relly on inplace subtensor followed by inplace computation of the
var? Don't work for var that can't do inplace(conv,...)
3. We can make 2 op. A prejoin and a postjoin. The prejoin preallocate the output and put the good part in the the output_storage of the node that should produce them.(What about inplace op? disable it for those node?). Then make a postjoin op that will check that the output of the production node are the data we gived it into the output_storage. If it is not the case the op didn't used the output given. In that case, copy the output into the preallocated memory in prejoin. What if some op don't implement infer_shape? We won't be able to preallocate it without memory overhead, but we could allow only those op to be executed before the preallocated memory and we copy it in the prejoin op.
We could optimize the memory usage of join. We could preallocate the
output of join and give the sub part to each op for its output.
this case:
join(var1,var2,...)
we can determine the size to preallocate with infer_shape if the var
have an owner that implement infer_shape, otherwise with the shape object.
Then we can use subtensor to get the good part for each var.
1. use set_subtensor to put the output into the good part? Do this do what we want?
2. Relly on inplace subtensor followed by inplace computation of the
var? Don't work for var that can't do inplace(conv,...)
3. We can make 2 op. A prejoin and a postjoin. The prejoin preallocate the output and put the good part in the the output_storage of the node that should produce them.(What about inplace op? disable it for those node?). Then make a postjoin op that will check that the output of the production node are the data we gived it into the output_storage. If it is not the case the op didn't used the output given. In that case, copy the output into the preallocated memory in prejoin. What if some op don't implement infer_shape? We won't be able to preallocate it without memory overhead, but we could allow only those op to be executed before the preallocated memory and we copy it in the prejoin op.
Leave a comment