$Header: /u/moraes/src/mallocs/moraes/malloc/RCS/malloc.doc,v 1.12 1989/10/31 02:09:48 moraes Exp $ Yet another malloc() -------------------- Mark Moraes Standard calls It provides the standard calls we expect from any self-respecting malloc, viz. char * malloc(nbytes) unsigned int nbytes; which returns pointer to a contiguous memory block, at least nbytes bytes long, char * calloc(nelements, element_size) unsigned int nelements, element_size; which returns a pointer to a contiguous memory block, at least nelements * element_size bytes long, with all locations set to zero, char * realloc(ptr, nbytes) char *ptr; unsigned int nbytes; attempts to change the size of the previously malloc'ed block pointed to by ptr to nbytes, and failing that, returns a pointer to a new block nbytes long, after copying the contents of the old block to it. Realloc returns NULL if it fails, and DOES NOTHING WITH THE OLD BLOCK. (This is not defined; some reallocs may free the old block they were passed) void free(ptr) char *ptr; returns the previously malloc'ed block pointed to by ptr to the storage pool. It will do its best to keep storage as unfragmented as possible. void cfree(ptr) char *ptr; The same as free(), used to free calloc()ed blocks. There are a couple of additional functions that Sun malloc provides, and that some programs need (notably the X Windows server on Suns) char * memalign(alignment, nbytes) unsigned int alignment; unsigned int nbytes; This returns a pointer to a memory block at least nbytes long, guaranteeing that the block starting address is an even multiple of alignment. Alignment must be a power of 2. char * valloc(nbytes) unsigned int nbytes; This is the same as memalign(getpagesize(), nbytes). A frequently used function is to save a copy of a NULL-terminated character array (a C string) in storage malloc'ed exactly to fit it, for example when reading or parsing some input line from a large buffer. Newer systems provide the strdup() function to do just this. (This may not appear a complex funtion - using it eliminates the all-too-frequent error where you forget to add 1 to the length of the string to allow for the NULL terminator) char * strdup(s) char *s; strdup returns NULL if the malloc fails. Additional functions In addition to the usual functions, this malloc provides some debugging and profiling functions, which we discuss in more detail below. (NOTE: To use these special features, you have to include the header file "malloc.h" provided with this malloc. This header file also defines the C preprocessor symbol CSRIMALLOC. This allows anyone using any of the special features of this malloc to enclose any calls to the new features within #ifdef CSRIMALLOC, thus preserving code portability) Since this malloc library is usually installed as libmalloc.a, to use it instead of a regular system malloc, you need to specify something like -lmalloc on your link/load command line before -lc, if any. Make sure your program has at least one call to any one of malloc/free/calloc/realloc or the Unix loader may not link in this malloc, and will instead use the system one from libc, since it makes only one pass. Most of the debugging features will be available in a version of the malloc called malloc_d, which is what you should use for development and testing by specifying -lmalloc_d. For production programs, link with -lmalloc to get the fast version. Frequently, people forget to check the return value on a malloc() assuming that modern systems never run out of memory. Inevitably, Murphy ensures that some system will run out of memory, and a user will be faced with the illuminating error message "Segmentation violation - core dumped". Alas, when memory runs out, the core dump is likely to be large. This malloc provides an emalloc() function which exits if NULL is returned, with an out of memory message. Using this in place of malloc is advised if running out of memory is a fatal condition. (If not, either write your own emalloc() to recover gracefully, or check the value returned by malloc) char * emalloc(nbytes) unsigned nbytes; Similarly, a realloc which will die instead of returning NULL is erealloc(). char * erealloc(ptr, nbytes) unsigned nbytes; A similar function, like strdup() but not returning NULL is strsave(). char * strsave(s) char *s; Debugging support Alas, one of the more common, and unpleasant errors a C programmer can make is to accidentally exceed the bounds of an array, and write over the data that immediately follows (or less frequently, precedes) it in memory. Such "corruption" errors usually show up at a completely different place in the program from the place the actual overwriting occurs (A corollary to Murphy's Law suggests the error will appear in the last related module of the program), thus making it at least as hard to track down as the proverbial needle in the haystack. While there is little that we can do to help detect errors in static, automatic or global data, we can at least try to ensure the validity of the malloc()ed data. To get this support, the malloc() must be compiled with -DDEBUG - since this reduces performance somewhat, it is usually done in a separate object module from the one usually used in production code. This debugging malloc() makes sure all internal heap pointers relevant to an allocation (or free) are correct on every call to one of the allocation routines, and aborts if it detects any trouble, with a message starting with "assertion botched". This is useful because it causes the program to die nearer the trouble spot than would otherwise be the case, helping to pinpoint the error. This narrows down the location of the error to something less than the aforementioned haystack, but the problem is still somewhat large. int mal_verify(fullcheck) int fullcheck; lends a helping hand in such distressing situations. It runs through all allocated and free blocks, checking all heap pointers for validity. Since the heap pointers (and block size values) are at the beginning and end of allocated (and free) blocks, any overwriting off the end of a memory block usually results in the corruption of the heap size values, or the pointer values of the next block. If 'fullcheck' is non-zero, then additional checking of the contents of free blocks is done, to ensure that they haven't been written into after being freed. Calling mal_verify() with fullcheck as 1 is recommended. (More on this later) On detecting an error, mal_verify() aborts the program, on the not unreasonable grounds that such an error is A BAD THING and should be fixed immediately. A careful programmer will probably want to put calls to mal_verify() at frequent points in any code, as checkpoints to trap any stray overwriting or memory corruption that may take place. Since mal_verify() does nothing in a malloc compiled without -DDEBUG, it has no overhead in production code other than the procedure call. ("But I can't be overwriting any data between X and Y in my code" are famous last words) Instead of putting calls to mal_verify(), the programmer can set the debugging level of the allocator. void mal_debug(level) int level; Most debugging is done at level 1, which is the default. This only checks the internal pointers that are encountered during a malloc() or free(). Setting level to 2 with the mal_debug() call results in a complete check of the heap (i.e. a call to mal_verify(0) ) every time malloc(), realloc() or free() are called. (The other allocation routines call these three) This can be very slow if your program does a lot of debugging, but is worth turning on for a specific segment of the program where you suspect trouble. (The author uses this mode when writing and debugging a program initially, switches to normal debugging for alpha and beta test) Level 3 asks for mal_verify(1) on every malloc(), realloc() or free() which checks all free blocks' insides for any signs of someone writing to them. We recommend the use of a binary search technique for pinpointing the needle, er, the overwriting. This presupposes that malloc(), free(), or a previously (wisely) inserted mal_verify() has caused the program to die at some point with an "assertion botched" error. (If the programmer has not been using this malloc so far, or a variant compiled without -DDEBUG, then the only indication of a memory corruption error is utterly strange things happening to the program, possibly variable changing values when they're not supposed to, or mysterious deaths of the program in malloc(), free() or some such call) Insert a mal_verify() call (Referred to from now on as Checkpoint 1) at some point well before the error occurs, if necessary, at the first executable statement in the program. Insert another mal_verify call on the statement just before you suspect the error manifesting itself. (Checkpoint 2) (Note that when we say insert a mal_verify() call at some point "before" an error occurs, we are referring to a temporal location, i.e some piece of code that is executed by the program in the time before the error occurs. Physically, this may be in a different procedure, or a different file altogether.) Run the program. If Checkpoint 1 causes the program to die, then the error is trapped in between it and the start of the program. (case A) If Checkpoint 2 causes the program to die, then the error is trapped between it and Checkpoint 1, (case B) and if neither dies, the error is after Checkpoint 2, (case C) or is not an overwriting error at all, or is an error subtle enough to avoid the heap pointer checking. In the last case, we wish you luck... Case A: (The bug is before checkpoint 1) Move Checkpoint 2 to where checkpoint 1 presently is, and move checkpoint 1 further back. Run the program again. Case B: (The bug is between checkpoint 1 and checkpoint 2 - narrow the search area down further) This is the case we attempt to maintain. Having attained it, we promptly attempt to lose it again. (Why is this starting to sound like Zen...) To do so, we move either Checkpoint 1 or Checkpoint 2 closer to the other, and run the program again. Case C: (The bug is after checkpoint 2) Move Checkpoint 1 to where checkpoint 2 presently is, and move checkpoint 2 further ahead. Run the program again. The objective is to bring the two checkpoints as close to each other as is necessary to spot the bug. (Recognizing the bug is up to the programmer - loops exceeding the bound by one are a common problem causing this error - confusion with C's starting arrays at 0 unlike other languages which start them at 1 is another problem). Those familiar with numerical methods will see the similarity between this method and the binary search method for finding a root of an equation. (It also bears a resemblance to the binary search method on sorted data, but the resemblance is less striking) In a modular program, which has been well structured, placing such checkpoints is easy to do; simply start at the top level, narrow it down to some procedure call at that level, insert them at the entry and exit points of that procedure, narrow it down to some procedure call at this level, and recurse downward till the fault is detected. We noted earlier that some corruption bugs manifest themselves (Why is this starting to read like a ghostbusters' script...) as data values that change when they shouldn't. In this case, a simpler method to trace the bug is often to put a trace on that data location by setting a global pointer variable to point to it, and printing the value at the two checkpoints. The same search strategy can be employed. This has been found useful in at least one bug the author has encountered, which sneakily refused to corrupt the heap pointers, jumping over them straight into another block to do its dirty work. A vicious form of heap corruption is when someone frees a block of memory, but forgets to NULL or reset all pointers pointing to that block. At some point in the future, if that block is accessed, strange things can happen. If that block is still free, and in the heap, there is a chance of corrupting malloc's internal pointers and structures used to maintain the free list, in which case mal_verify() will detect the corruption and abort. Or the corruption may go into the middle of the block and go undetected. Even worse, the block or part of the block may have been allocated to the program for some other purpose, and the corruption may now be smashing data in another part of the program. This sort of corruption is insidious, and very hard to reproduce, let alone trace down. To help trace this down, when a block is freed, in debugging mode, the allocator scribbles a magic pattern all over it, thus making sure any data in there is likely to be wrong. Invoking mal_verify(1) will check every free block to make sure its contents are that magic pattern and abort if it detects corruption. Setting debug level to 3 with mal_debug() will force this sort of verification on every call to malloc(), realloc() or free(). Obviously, if the block gets allocated, and then corrupted, the malloc verification cannot detect it, since it has no idea what goes on inside allocated blocks. Advanced debugging tools. void mal_heapdump(fd) FILE *fd; If all else fails, the programmer may obtain a printout of the entire heap by calling mal_heapdump() with a file descriptor to an already open file (fopen(3)). This will print the start address of all blocks, allocated or free, and their sizes. These can be matched with pointer addresses in the program and used to trace heap corruption. If you use this call, you probably know how to do this. Large doses of intuition (and strong beverages) are recommended. mal_heapdump() performs the same checking that mal_verify(1) does, but does not abort unless the error makes it impossible to dump the heap further. A knowledge of the malloc internals is necessary to fully exploit this call's output. Profiling support This malloc() is faster than most other mallocs that I've seen - the search strategy is a first-free, which is not excitingly fast in theory, but, with small modifications turns out to be quite fast in practice; See Knuth for more details. (Theoretically, the time is of the same order as the size of the free list, so the free list is kept as small as possible by coalescing a block with adjacent freed blocks if possible, when it is freed) The free() is based on a boundary tags scheme, as described in Knuth. It is linear-time, for those who care about such things, and is a few statements of C code - 4 comparisons and a little pointer arithmetic. It also accesses memory locations next to the block only, so it has good virtual memory performance. (The malloc turns out to have good VM performance most of the time, but will occasionally scan through a few pages, and in worst case, through a lot of pages. If however, those pages are all being used, then the VM performance is likely to be influenced more by the program using the malloc than the malloc itself) Nonetheless, a program which calls malloc and free *very* frequently might be slow. In order to track down malloc usage, if compiled with -DPROFILEDSIZES, this malloc keeps count of the number of block sizes being used by the program. To print out these collected statistics, call void mal_statsdump(fd) FILE *fd; where fd is an already opened file descriptor, and it will print a list of block sizes, and the number of times a block of that size was allocated. This information provides some indication of the allocation profile of the calling program. When mal_statsdump() is called, it zeroes all the counters, so that you can check the allocation profile of specific segments of a program by calling mal_statsdump() repeatedly. If more detailed tracing information is needed, the malloc must be compiled with -DTRACE. This prints out the size of every allocation or free. This can be turned on or off using void mal_trace(flag) int flag; If flag is 0, tracing is turned off, (this is the default state), if it is non-zero, then tracing commences. The trace records are of the form: For a malloc, +
where is the number of bytes requested, is the number of bytes allocated (usually nbytes rounded up to a multiple of the word size, plus any slop if that is requested with mal_slopset), and
is the block returned. If the malloc results in the system being asked for more memory, via sbrk(), it prints sbrk where is the number of bytes the system was asked for. After the first such call, it will then print heapstart
where
is the For a free, -
where
is the address of the block being freed and is the malloc'ed size of the block. (the size requested by the user, rounded up to the word size with slop, if any) For a realloc, it may print out the same as as a free if we're shrinking and the part of the block was freed no-op if we're shrinking and the remaining part of the block was too small to be freed, it prints ++
where is the number of bytes requested, is the actual number of bytes in the block returned, and
is the address of the block returned, is the actual size of the free block after the one we're returning, which we grew into, is the address of the free block after the one we're returning. The free block information is internal to the malloc and is of little use to the user. the same as for a malloc and a free if the block ends up being copied to a new block. The trace records may be prefixed by the filename and linenumber of the call is the source if the file malloc.h was included and the C preprocessor symbol MALLOC_TRACE defined when compiling the program, and the malloc library was compield with the tracing option enabled. (Ask your system adminstrator, or whoever installed the malloc) (typically with the flag -DMALLOC_TRACE on the cc command line) This is advised - it makes debugging and leak tracing much easier. The file to which this information is printed can be set with void mal_setstatsfile(fd) FILE *fd; The default is stderr. There are two variables in this malloc that can be set to tune performance somewhat, and keep memory fragmentation down. One is 'slop', the other is the sbrk() size. void mal_slopset(slop) int slop; The minimum size block allocated is slop. (The default for this is the minimum possible to maintain the heap pointers required for a free block, denoted by a slop of 0) If you notice however, that a lot of blocks are being used in a specific small size, or small range of small sizes, then you might want to increase slop so that slop is big enough to cover all those sizes - while this may waste some memory, it will speed up allocation for those sizes by guaranteeing that all blocks in the free list are at least that size, so the first fit becomes as fast as possible, and the memory fragmentation is reduced because of the more uniform block size. void mal_sbrkset(nbytes) int nbytes; If there isn't a block large enough to supply a malloc request, then the allocator asks for the system to increase the data space of the process using the sbrk call. By default, sbrk() is called with 1K * sizeof(Word). (unless the malloc request is for a larger size) If your program uses less memory than this, you may want to reduce the size. On the other hand, in a program that allocates a lot of memory, to reduce the number of system calls, you may want to increase this size, using the mal_sbrkset() call. Performance The 4.3 malloc, (variants of which are distributed with perl, tcsh, GNU and so on) is slightly faster than this malloc but it wastes more space because it allocates blocks in powers of 2. It does not coalesce free space, which can lead to it wasting lots of space. (There are some pathological allocation sequences where it will ask for more space from the system even though it has enough free space) The Sun malloc wastes somewhat less than this malloc, but is twice as slow, and causes more paging because it stores free blocks and their headers separately. It has debugging support similar to mal_verify() and mal_debug(), but not quite as thorough. The 4.1 malloc is much slower than this malloc, and wastes about the same space. Incompatibilities There is only one serious incompatibility that I know of with some other malloc()s. In the old 4.1 malloc(), free was kept fast by not having it merge adjacent free blocks. This resulted in seriously fragmented arenas for programs that did a lot of allocation and freeing. The realloc() kludge provided a hook to force the merging of such blocks - the user called realloc() with a a block which had been freed. I think this practice is bad (Also, both ANSI and SVID do not support it) - since this malloc does a fast free that also merges the freed block to maintain the largest possible contiguous free blocks, there is no need for storage compaction. If compiled with -DDEBUG, this realloc() will die with an error if a freed block is passed to realloc() which would enable fixing programs that adhere to the old convention. If not compiled with -DDEBUG, it sets errno to EINVAL and returns NULL. Memory leaks Some memory allocated by a program is meant for the entire lifetime of the program. (For many simple programs, this constitutes all the allocation done by the program, and this section does not apply to such programs) Some memory however is allocated for some time, and is later freed. Keeping track of memory to be freed is often a nuisance, and sometimes, programs may change the only pointer to an allocated block without freeing the block first. such unreferenced, but allocated memory is called "garbage" or a "memory leak", and is wasted, since the program has no way of finding it again. (Other languages, like Lisp, perform garbage collection frequently, finding all blocks that are unreferenced, and freeing them. In the Unix/C environment, this is difficult to do, even though garbage collecting mallocs exist. Even worse, it is often inefficient) Memory leaks are serious for programs that run for a long time - window managers, daemons, and suchlike, since the total memory wasted over time may be large. Meticulously freeing everything allocated by a program is the best solution - alas, for object-oriented programming styles, it becomes very hard to keep track of every object ever created so that open can free it. One method or providing temporary storage is the alloca() function. This is used to allocate space off the run-time stack so that it is automatically reclaimed upon procedure exit. It can therefore be used to provide temporary storage needed by a procedure and the procedures called by the procedure. Whether or not to use the alloca() call is a somewhat controversial matter. The manual page warns that "alloca() is both machine- and compiler-dependent; its use is discouraged." On the other hand, a fairly portable implementation using malloc() and free() does exist, and some compilers (eg) GNU cc provide a __builtin_alloca function, which they translate into a couple of machine instructions to extend the frame pointer in the appropriate direction. With these compilers, alloca() can be very fast - much faster than any other memory allocation technique short of statically allocated buffers. Alloca() still does not address the problem of storage which is needed temporarily, but which may be passed to a routine's parent. Another way for a programmer to trace what is going on is to define the preprocessor symbol MALLOC_TRACE (for example, with -DMALLOC_TRACE on the cc command line when compiling the program) and then include the header file "malloc.h" in the program. When MALLOC_TRACE is defined, this header redefines malloc() and friends to macros, which invoke _malloc() etc; the latter are routines which take the filename and linenumber at which they are called. Calling void mal_leaktrace(value) int value; with value > 0 will start the leaktracing process, recording the address of each block returned by malloc, calloc, etc, along with the filename:linenumber at which it was called. If that block is freed, it deletes the record for that block. (Calling mal_leaktrace() with value == 0 turns off tracing) At any time, the programmer can call void mal_dumpleaktrace(fd) FILE *fd; where fd is a file pointer onto a file openeed for writing with fopen() or freopen(), into which a list of unfreed blocks is dumped. This list is in the form filename:linenumber: sequence-no. address-in-decimal (address-in-hex) This permits the programmer to examine the places in the program where blocks were allocated and not freed. These represent potential memory leaks. (Several text-editors will understand this sort of output as output from grep -n, and will be able to load and step through these files, eg. Jove) Typically, memory leaks take place within the main loop of a program, so the general structure of the program may look something like initialization allocation while (something) { things to do } The initial allocation is at worst a one-time memory wastage and does not concern us that much. Memory that is allocated inside the loop, and is not freed does concern us since it is a continuous loss, so after the initialization allocation, we insert a call to mal_leaktrace() to start tracing. When the second iteration starts, we call mal_dumpleaktrace() to dump the blocks that were presumably allocated during the first iteration and have not yet been freed, and do the same at the start of the third iteration for unfreed blocks from the second iteration and so on. The code now looks something like initialization allocation mal_leaktrace(1); while(something) { mal_dumpleaktrace(stderr); things to do; } The above is a simple example - more complex control-flow may require turning leak tracing on and off repeatedly, so as not to get deluged with information. If you use allocation functions within your code that layer on top of malloc, this leak tracing as it is will not be too useful since it will only report the location of your allocation functions. In that case, you have to define subsidiary allocation functions like _malloc() and #defines like those in the malloc.h file so that you can record the real address of the call. See the _malloc.c file and the malloc.h header for examples on how to do this. (You have to call the real allocation and then use the RECORD_FILE_AND_LINE() macro from trace.h to store the address of the allocated block. When you free the block you have to call the DELETE_RECORD() macro to remove that address. Do not include malloc.c in these files, and make sure you do call the real malloc from your allocation function - otherwise the allocation package will attempt to record the same address twice and fail) You may also want to include defs.h and then use PRTRACE() to print the line number and file name in the trace file.