Meta-tickets: Memory management
This is a meta-tickets. It should be the parent of all ticket related to memory management in Theano.
We will put the high level relationship about them here.
Here is the tickets that are included:
216: ask for many generic stuff (Done)
94: memory for efficient resizing
695: DebugMode warn about op that don't reuse memory from previous call
674: Finish Prealocated output work
675: Tell Theano witch apply have a preallocated output
795: garbage collection for Op with c_code (Done)
509: minimize the memory usage of Join
797: Theano function to manually call a garbage collection on function when they don't run.
798: Make the toposort also lower the maximum memory used.
Here is what we plan:
0) 695, should be quick
1) 674, Finish automatic test in DebugMode
2) 674,795, Make each thunk don't keep a ptr to its output in C and reread the output as done by the inputs.
3) 509 (need 674 finished)
4) 675, don't need 674, 509 finished as Scan preallocate outputs of each taps, but will be more useful when we preallocate more stuff.
Should we make a pool of memory for CudaNdarray? We could reuse some existing malloc algo and include it in Theano if there license allow it. I have see someone do this for the GPU with jemalloc. Check for other implementation too.
We will put the high level relationship about them here.
Here is the tickets that are included:
216: ask for many generic stuff (Done)
94: memory for efficient resizing
695: DebugMode warn about op that don't reuse memory from previous call
674: Finish Prealocated output work
675: Tell Theano witch apply have a preallocated output
795: garbage collection for Op with c_code (Done)
509: minimize the memory usage of Join
797: Theano function to manually call a garbage collection on function when they don't run.
798: Make the toposort also lower the maximum memory used.
Here is what we plan:
0) 695, should be quick
1) 674, Finish automatic test in DebugMode
2) 674,795, Make each thunk don't keep a ptr to its output in C and reread the output as done by the inputs.
3) 509 (need 674 finished)
4) 675, don't need 674, 509 finished as Scan preallocate outputs of each taps, but will be more useful when we preallocate more stuff.
Should we make a pool of memory for CudaNdarray? We could reuse some existing malloc algo and include it in Theano if there license allow it. I have see someone do this for the GPU with jemalloc. Check for other implementation too.
Leave a comment