-
Followers
Klaas Bosteels (Assigned To) , Daniel Lescohier
AttachmentsNo attachmentsAssociationsNo associationsActivityon Mar 19, 2009 @ 09:39am UTC * By Klaas Bosteels
Status changed from New to InvalidThe reason why we use RLIMIT_AS is because RLIMIT_DATA doesn't seem to work in practice:
Feel free to reopen if you know a way to make RLIMIT_DATA work, but for now I'm closing this ticket...$ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import resource >>> resource.setrlimit(resource.RLIMIT_AS, (10000, 10000)) >>> try: manyints = range(100000) ... except MemoryError: print "memerror" ... memerror >>> $ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import resource >>> resource.setrlimit(resource.RLIMIT_DATA, (10000, 10000)) >>> try: manyints = range(100000) ... except MemoryError: print "memerror" ... >>> len(manyints) 100000 >>>Time ExpenditureLoading
RLIMIT_AS
bq. The maximum size of the process's virtual memory (address space) in bytes. This limit affects calls to brk(2), mmap(2) and mremap(2),
which fail with the error ENOMEM upon exceeding this limit. Also automatic stack expansion will fail (and generate a SIGSEGV that kills
the process when no alternate stack has been made available). Since the value is a long, on machines with a 32-bit long either this limit
is at most 2 GiB, or this resource is unlimited.
RLIMIT_DATA
bq. The maximum size of the process's data segment (initialized data, uninitialized data, and heap). This limit affects calls to brk() and
sbrk(), which fail with the error ENOMEM upon encountering the soft limit of this resource.
So, basically, I don't think we should include mmap in the memory limit. You already know the size of the files you're going to memory-map, so you'd program the job to only mmap files that would fit in memory. Also, if you're mmapping data that you don't write to, the memory will be shared by all the mapreduce jobs running on the system. So, if one has configured 8 mappers on a machine, and you're mmapping a 2GB file, and other anonymous memory used per map job is 250MB, then you're using a total of 2GB + 8*250MB = 4GB. Because of this sharing, anonymous memory and mmap memory limits really need to be specified separately.
If you're accessing the mmap file data read only, the data will not be swapped to the pagefile; the file itself acts as the backing store; it will not write data out to the swapfile, it will just read the data in from the file, or discard the pages in ram, and then re-read the data from the file later.