I need some fast, thread safe memory pooling library.
I've googled a lot, but fast solutions don't have thread safety, while others are really big.
Any suggestions?
Both nedmalloc and ptmalloc are C based thread caching memory managers, based around doug lea's malloc(the core of most linux allocators). They are both under good licences as well, unlike hoard, which requires payment for commercial use, last I looked. Googles tcmalloc also has C bindings iirc, and is built from the ground up as a thread caching allocator, as well as some built in heap and cpu profiling tools, it is however build for massive memory usage(the example they give is 300mb+ per thread), and as such many not work as well as expected for smaller scale apps
You're supposed to use one memory pool per thread.
The Apache Portable Runtime works well and shouldn't be all that big.
Have you tried Hoard?
See also these two articles from Intel.com
Related
I was wondering if I could make a large number of system calls at the same time, with only one switch overhead. I need this because I have a need to make many (128) system calls at the same time. If I could do this without switching between kernel and userland 256+ times I think it could make my (speed sensitive) library significantly faster.
You really can't do that from an application program. What you could do is build a loadable kernel module that implements those operations and presents a simple API -- then you can change context once, do all the work, and return.
However, as with most of these sorts of optimization questions, the first thing to ask is "why do you think it's going to be necessary?" Do you have timing information etc? Have you profiled? How much of a performance issue do you really have, and is the additional complexity going to be worth the speedup?
I don't think Linux will support syscall chaining anytime soon. You might have more luck implementing this on another kernel and porting your application.
That said, it's not difficult to write a proxy to do the job in kernelspace for you, but don't expect it to be merged upstream. I've worked on real-time stuff and we had a solution like that, but never used in production because of support issues :/.
I see that some C libraries have ability to specify custom memory allocators (malloc/free replacements).
In what systems/environments/conditions is that useful? Isn't this feature just a leftover from MSDOS era or similar no-longer-relevant problems?
Background story:
I'm planning to make pngquant a library that can be embedded in various software (from iOS apps to Apache modules). I'm using malloc()/free() and my own memory pools for small allocations. I use 2MB-50MB of memory in total. I use threads, but only need to alloc on the main thread.
In any application where control over memory allocation is critical (for example my field, game development, or other real or near real time systems) the inability to control memory allocations in a library immediately disqualifies it from use.
Many malloc/free algorithms exist. The system malloc is sometimes not optimized for the task that the library is handling, so the caller might want to try a few different ones to optimize performance.
A few that come to mind are:
dlmalloc
jemalloc
TCMalloc
There are also Garbage Collection libraries such as the Boehm Garbage Collector which are usable in C by calling the provided malloc/free replacements (even though free is then a dummy function call, kept for compatibility).
There are also many possible uses, for example one may write a debug malloc/free function that could trace memory allocations and liberations in the library, such as one that I wrote that uses SQLite to record statistics about how the memory is used (admittedly at the cost of performance, but it is a debugging situation).
I'm looking to implement 'branch and bound' over a cluster (like Amazon's say), as I want it to be horizontally scalable, not limited to a single CPU. There's a paper "Task Pool Teams: A Hybrid Programming Environment for Irregular Algorithms on SMP Clusters" by Judith Hippold and Gudula Runger. It's basically a bottom-up, task-stealing framework like Intel's TBB, except for ad-hoc networks instead of shared memory. If this library was available I'd use it (replacing the local, threaded part with TBB). Unfortunately they don't seem to have made it available for download anywhere that I could find, so I wonder are there other implementations, or similar libraries out there?
It doesn't look like Microsoft's Task Parallel Library has the equivalent, either, to steal from.
(I tried to make a tag 'taskpool' after 'threadpool', the most-used variant (before 'thread-pool') but, didn't have enough points. Anyone heavy enough think it's worth adding?)
edit:
I haven't tried it yet, but it PEBBL (under here: software.sandia.gov/trac/acro/wiki/Packages) claims to scale really high. The paper that the answerer mentions from the Wiley book 'Parallel Branch-and-Bound Algorithms', Crainic, Le Cun and Roucairol, 2006, from "Parallel Combinatorial Optimization", 2006 edited by El-Ghazali Talbi was where I found it, and there are other libraries listed; some may be better, I reserve the right to update this :). Funny that Google didn't find these libs, either my Googling was weak or Google itself fails to be magic sometimes.
When you say "over a cluster" it sounds like you mean distributed memory, and parallelizing branch and bound is a notoriously difficult problem for distributed memory - at least in a way that guarantees scalability. The seminal paper on this topic is available here, and there's an excerpt from a Wiley book on the topic here.
Shared memory branch is bound is an easier problem because you can implement a global task queue. A good high level description of how to do both shared memory and message passing implementations is available here. If nothing else, the references section is worth purusing for ideas and existing implementations.
One thing you might consider is investigating shared message queues like RabbitMQ. It is a AMQP server (a messaging protocol developed so distributed applications can send messages to each other).
you basically need some kind of distributed synchronization/queue
I suggest looking into armci as a low-level distributed memory interface with synchronization and build on top of that.
Alternative is to allocate mpi process as Master to distribute work allocation.
http://www.cs.utk.edu/~dongarra/ccgsc2008/talks/Talk10-Lusk.pdf
I will be TA for an operating systems class this upcoming semester. The labs will deal specifically with the Linux Kernel.
What concepts/components of the Linux kernel do you think are the most important to cover in the class?
What do you wish was covered in your studies that was left out?
Any suggestions regarding the Linux kernel or overall operating systems design would be much appreciated.
My list:
What an operating system's concerns are: Abstraction and extension of the physical machine and resource management.
How the build process works ie, how architecture specific/machine code stuff is implanted
How system calls work and how modules can link up
Memory management / Virtual Memory / Paging and all the rest
How processes are born, live and die in POSIX and other systems
userspace vs kernel threads and what the difference is between process/threads
Why the monolithic Kernel design is growing tiresome and what are the alternatives
Scheduling (and some of the alternative / domain specific schedulers)
I/O, Driver development and how they are dynamically loaded
The early stages of booting and what the kernel does to setup the environment
Problems with clocks, mmu-less systems etc
... I could go on ...
I almost forgot IPC and Unix 'eveything is a file' design decisions
POSIX, why it exists, why it shouldn't
In the end just get them to go through tanenbaum's modern operating systems and also do case studies on some other kernels like Mach/Hurd's microkernel setup and maybe some distributed and exokernel stuff.
Give a broad view past Linux too, I recon
For those who are super geeky, the history of operating systems and why they are the way they are.
The Virtual File System layer is an absolute must for any Linux Operating System class.
I took a similar class in college. The most frustrating but, at the same time, helpful project was writing a small file system for the Linux operating system. Getting this to work takes ~2-3 weeks for a group of 4 people and really teaches you the ins and outs of the Kernel.
I recently took an operating systems class, and I found the projects to be challenging, but essential in understanding the concepts in class. The projects were also fun, in that they involved us actually working with the Linux source code (version 2.6.12, or thereabouts).
Here's a list of some pretty good projects/concepts that I think should be covered in any operating systems class:
The difference between user space and kernel space
Process management (i.e. fork(), exec(), etc.)
Write a small shell that demonstrates knowledge of fork() and exec()
How system calls work, i.e. how do we switch from user to kernel mode
Add a simple system call to the Linux kernel, write a test application that calls the system call to demonstrate it works.
Synchronization in and out of the kernel
Implement synchronization primitives in user space
Understand how synchronization primitives work in kernel space
Understand how synchronization primitives differ between single-CPU architectures and SMP
Add a simple system call to the Linux kernel that demonstrates knowledge of how to use synchronization primitives in the Linux kernel (i.e. something that has to acquire, say, the tasklist lock, etc. but also make it something where you have to kmalloc, which can't be done while holding a lock (unless you GFP_ATOMIC, but you shouldn't, really))
Scheduling algorithms, and how scheduling takes place in the Linux kernel
Modify the Linux task scheduler by adding your own scheduling policy
What is paging? How does it work? Why do we have paging? How does it work in the Linux kernel?
Add a system call to the Linux kernel which, given an address, will tell you if that address is present or if it's been swapped out (or some other assignment involving paging).
File systems - what are they? Why do they exist? How do they work in the Linux kernel?
Disk scheduling algorithms - why do they exist? What are they?
Add a VFS to the Linux kernel
Well, I just finished my OS course this semester so I thought I'd chime in.
I was kind of upset that we didn't actually play around with the actual OS itself, rather we just did system programming. I'd recommend having the labs be on something that is in the OS itself, which is what it sounds like what you want to do.
One lab that I did enjoy and found useful however was writing our own malloc/free routines. It was difficult, but pretty entertaining as well.
Maybe also cover loading programs into memory and/or setting up the memory manager (such as paging).
For labs, one thing that may be cool is to show them actual code and discuss about it, ask questions about what do they think things are done that way and not another, etc.
If I were again in the University I would certainly appreciate more in depth lessons about synchronization primitives, concurrency and so on... those are hard matters that are more difficult to approach without proper guidance. I remember I went to a speech by Paul "Rusty" Russell about spinlocks and other synchronization primitives that was absolutely rad, maybe you could find it in youtube and borrow some ideas.
Another good topic (or possibly exercise for the students) would be looking at virtualisation. Especially Rusty Russel's "lguest" which is designed as a simple introduction to what is required to virtualise an operating system. The docs are good reading too.
I actually just took a class that perfectly fits your description (OS Design using linux) in the spring. I was actually very frustrated with it because I felt like the teacher focused too narrowly for the projects rather than give a broader understanding. For instance, our last project revolved around futexes. My partner and I barely learned what they were, got it working (kinda) and then turned it in. I came away with no general knowledge of anything really from that project. I wish one of the projects had been to write a simple device driver or something like that.
In other words, I think it's good to make sure a good broad overview is presented, with as much detail as you can afford, but ultimately broad. I felt like my teacher nitpicked these tiny areas and made us intensely focus on those, while in the end I did NOT come away with that great of a general understanding of the inner-workings of Linux.
Another thing I'd like to note is a lot of why I didn't retain knowledge from the class was lack of organization. Topics came out of nowhere any given week, and there was no roadmap. Give the material a logical flow. Mental organization is the key to retaining the knowledge.
The networking sub-system is also quite interesting. You could follow a packet as it goes from the socket system call to the wire and the other way around.
Fun assignments could be:
create a state-full firewall by using netfilter
create an HTTP load balancer
design and implement a simple tunneling protocol
Memory mapped I/O and the 1g/3g vs 2g/2g split between kernel address space and user addressable space in 32bit operating systems.
Limitations of 32 bit architecture on hard drive size and what this means for the design of file systems.
Actually just all the pros and cons of going to 64 bit, what it means and why as well as the history and why are aren't there yet.
What memory leak detectors have people had a good experience with?
Here is a summary of the answers so far:
Valgrind - Instrumentation framework for building dynamic analysis tools.
Electric Fence - A tool that works with GDB
Splint - Annotation-Assisted Lightweight Static Checking
Glow Code - This is a complete real-time performance and memory profiler for Windows and .NET programmers who develop applications with C++, C#, or any .NET Framework
Also see this stackoverflow post.
second the valgrind... and I'll add electric fence.
Valgrind under linux is fairly good; I have no experience under Windows with this.
If you have the money: IBM Rational Purify is an extremely powerful industry-strength memory leak and memory corruption detector for C/C++. Exists for Windows, Solaris and Linux. If you're linux-only and want a cheap solution, go for Valgrind.
Mudflap for gcc! It actually compiles the checks into the executable. Just add
-fmudflap -lmudflap
to your gcc flags.
I had quite some hits with cppcheck, which does static analysis only. It is open source and has a command line interface (I did not use it in any other way).
lint (very similar open-source tool called splint)
Also worth using if you're on Linux using glibc is the built-in debug heap code. To use it, link with -lmcheck or define (and export) the MALLOC_CHECK_ environment variable with the value 1, 2, or 3. The glibc manual provides more information.
This mode is most useful for detecting double-frees, and it often finds writes outside the allocated memory area when doing a free. I don't think it reports leaked memory.
Painful but if you had to use one..
I'd recommend the DevPartner BoundsChecker suite.. that's what people at my workplace use for this purpose. Paid n proprietary.. not freeware.
I've had minimal love for any memory leak detectors. Typically there are far too many false positives for them to be of any use. I would recommend these two as beiong the least intrusive:
GlowCode
Debug heap
For Win32 debugging of memory leaks I have had very good experiences with the plain old CRT Debug Heap, that comes as a lib with Visual C.
In a Debug build malloc (et al) get redefined as _malloc_dbg (et al) and there are other calls to retrieve results, which are all undefined if _DEBUG is not set. It sets up all sorts of boundary guards on the heap, and allows you to diplay the results at any time.
I had a few false positives when I was witting some time routines that messed with the library run time allocations until I discovered _CRT_BLOCK.
I had to produce first DOS, then Win32 console and services that would run for ever. As far as I know there are no memory leaks, and in at least one place the code run for two years unattended before the monitor on the PC failed (though the PC was fine!).
On Windows, I have used Visual Leak Detector. Integrates with VC++, easy to use (just include a header and set LIB to find the lib), open source, free to use FTW.
At university when I was doing most things under Unix Solaris I used gdb.
However I would go with valgrind under Linux.
The granddaddy of these tools is the commercial, closed-source Purify tool, which was sold to IBM and then to UNICOM
Parasoft's Insure++ (source code instrumentation) and valgrind (open source) are the two other real competitors.
Trivia: the original author of Purify, Reed Hastings, went on to found NetFlix.
No one mentioned clang's MSan, which is quite powerful. It is officially supported on Linux only, though.
This question maybe old, but I'll answer it anyway - maybe my answer will help someone to find their memory leaks.
This is my own project - I've put it as open source code:
https://sourceforge.net/projects/diagnostic/
Windows 32 & 64-bit platforms are supported, native and mixed mode callstacks are supported.
.NET garbage collection is not supported. (C++ cli's gcnew or C#'s new)
It high performance tool, and does not require any integration (unless you really want to integrate it).
Complete manual can be found here:
http://diagnostic.sourceforge.net/index.html
Don't be afraid of how much it actually detects leaks it your process. It catches memory leaks from whole process. Analyze only biggest leaks, not all.
I'll second the valgrind as an external tool for memory leaks.
But, for most of the problems I've had to solve I've always used internally built tools. Sometimes the external tools have too much overhead or are too complicated to set up.
Why use already written code when you can write your own :)
I joke, but sometimes you need something simple and it's faster to write it yourself.
Usually I just replace calls to malloc() and free() with functions that keep better
track of who allocates what. Most of my problems seem to be someone forgot to free and this helps to solve that problem.
It really depends on where the leak is, and if you knew that, then you would not need any tools. But if you have some insight into where you think it's leaking, then put in your own instrumentation and see if it helps you.
Our CheckPointer tool can do this for GNU C 3/4 and, MS dialects of C, and GreenHills C. It can find memory management problems that Valgrind cannot.
If your code simply leaks, on exit CheckPointer will tell you where all the unfreed memory was allocated.