Where are some good resources for looking at the pros/cons of different ways of implementing heap allocators? Resources touching on efficiency (fragmentation, throughput, etc) are preferred. I am NOT looking for simple code repositories.
edit:
I'm not really interested in the philosophical grounding of this wiki. As such, I don't really want to get into 'why' I'm interested in this. Regardless of the underlying intentions/problems/etc, this information exists, so if you know of any good resources, please link to them here!
This is a very old problem, and to get a comprehensive view you will have to dig through the research literature. (I'm not aware of a good textbook treatment.)
A few places to start:
Doug Lea's description of his memory allocator
The Art of Computer Programming, Volume 1 by Don Knuth
Quick fit: an efficient algorithm for heap storage allocation by Weinstock and Wulf
This one is worth spending a day in the library. Yes, a big building full of paper—the problem is that old.
Related
I recently came across a blog post that introduced the term "Bayesian Spam Filtering" and talked about how this was the approach behind spam filtering for emails.
I also remember a paper (perhaps it was this?) discussing how Game Theory is involved in packet routing, or how it used for Resource Allocation in Cloud Computing. Also, I recall a university course on Formal Methods, and how they're used in Software Engineering.
I am looking for books which talk about how concepts from Mathematics or CS Theory are actually applied in every day technology.
Any good textbook will at least mention and ideally discuss everyday applications of the subject matter whenever possible. For example, the excellent Artificial Intelligence: A Modern Approach by Russell and Norvig does this throughout.
More specialized books will of course tend to have deeper discussions of the applications. What is one 30-page chapter in Russell and Norvig is a whole book by Sutton and Barto: Reinforcement Learning: An Introduction (free pdf at the link, courtesy of the authors). Have a look at chapter 16 for some fascinating applications.
There are also books that start with an application, i.e. a practical problem, and then develop all the theory necessary to solve it. One of my favorites in this category is In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation by William Cook.
You might be interested in Math for Programmers which has some interesting approaches to programming by pure math. In retrospect, A Programmer's Introduction to Mathematics teaches math using programming, which might be helpful if you want to learn it backwards.
I am looking for good academic references on how to benchmark programs. There seems to be a lot of lore in benchmarking, but I haven't seen many references that explain what a good benchmark is, what a bad one is, and how to write one.
Thanks.
Academically speaking, a relevant article is "Statistically rigorous Java performance evaluation" from OOPSLA 2007 (which you can find from Google Scholar); while focused on Java, it contains general lessons on benchmarking, and the content about Java generalizes nicely to most languages running on some virtual machine and simply using garbage collection. Finally, they summarize the statistics knowledge needed for analyzing the results.
Additionally, here is a framework from Google:
http://code.google.com/p/caliper/
And here their Wiki discusses some criteria for a good benchmark:
http://code.google.com/p/caliper/wiki/JavaMicrobenchmarkReviewCriteria
I'm considering doing something with Domain Specific Languages for my undergraduate project. My one problem is I can't really find any interesting examples that I can root around in. Does anyone have any good examples of DSELs (preferably open source)?
Also, one area I would love to look at is solving/addressing concurrency problems (coroutines etc) with DSEL's. Are there any good examples that anyone uses of this in DSELs? If this is a stupid application of DSELs please explain why...
Another potential area to explore would database programming. Again is this a stupid area to explore with DSEL's. For example, would adding some crazy database manipulation syntax to C# say be a good project to undertake?
EDIT: General languages I would be looking at implementing in would be Java, Python, Scala, C# etc. Probably not C++ or C.
Linda implementations can be considered as eDSLs. STM implementations like CL-STM are certainly eDSLs.
Unrelated to concurrency, but extremely useful are embedded Prolog implementations, there are plenty of them for Scheme, Lisp and Clojure. Parsing eDSLs had been mentioned already - and their patriarch Parsec definitely worth digging into.
EDIT: with your list of implementation languages you're missing the most interesting eDSL opportunities. The most powerful and flexible eDSLs are made with metaprogramming. Scala-style (or even Haskell-style) eDSLs are based on high order functions, i.e., on mini-interpreters. They're more complicated in design, much less flexible and limited to the syntax of your host language.
boost::spirit if you're after C++ is an interesting example. Quote:
Spirit is a set of C++ libraries for
parsing and output generation
implemented as Domain Specific
Embedded Languages (DSEL)...
(I have no idea what you mean by "solving concurrency" though. I don't see how you can solve "concurrency problems" in general, or how a DSEL could help.)
Object-Oriented programmers seem to have all the fun. Not only are they treated to major framework revisions every two years, and new and Improved languages every five, they also get to deal with design practices tailor-made to their programming style. From test-driven development to design patterns, Object-Oriented programmers have a lot to keep up with.
By contrast, the C programming world seems far more sedate. The last major revision to the language was in 1999, and the next one is likely to be far less impressive. K&R 2nd edition is still held up as a good introductory text by many, despite being twenty years old now.
If we, as C programmers, have developed and improved our skills and practices (and I think we probably have), we don't seem to be very good at communicating them. We don't sell books about them, post about them on blogs, or organise workshops around them. Not in the way the rest of the software development world seems to.
So, let's share.
What are your preferred 'modern' C programming practices?
Do you use `template' libraries of long, involved preprocessor macros to squeeze the last inch of performence out of hardware in the same way C++ programmers can? Do you use a allocation library like halloc to minimize the time you spend on managing memory, or do you use a full-blown automatic garbage collector?
Of course, if you've been using these things since 1987, feel free to chime in as well; the point of this question is to share practices that are out of the ordinary but might benefit others.
What are your preferred 'modern' C software design practices?
Design considerations are at least as important, of course. Do you adapt design practices from the Object-Oriented world? Do you use UML? Or you opt to iron out specifications in a language-neutral style (flowcharts, Z, weakest precondition calculus, anything)?
I try to use ready-made libraries for basic functionality when possible. I find glib (part of the GTK+ GUI framework) absolutely brilliant when it comes to general data structures and such. No more writing your own hash table, linked list, dynamic array or whatever.
I also think the object-oriented ideas in the GTK+ toolkit are great, and often structure my code the same. There's nothing stopping you from adopting paradigms in C, it's flexible enough to express many things that are just made "first-class" in other languages, even if doing so often involves a certain ... verbosity, of course.
Not really a C programming practice, because I'm one of those newfangled object-oriented programmers working in C++, but this:
Object Oriented Programming is not a silver bullet
I wish my company had more pure C programmers to teach the juniors that there is life beyond Object Orientation.
To be honest, my answer would be that I finally gave in to C++ after fighting it for a long time. I've come to really enjoy its advantages.
I like being able to let the compiler take care of the OO plumbing, being able to use exceptions and RAII instead of littering return codes and resource releases all over, not reimplementing a linked list or an automatically expanding vector or a smarter string library for the umpteenth time, operator overloading instead of vector_add() everywhere, etc. Granted, there are libraries for much of this in C, but it seems like such things are rather fragmented between competing solutions. It's nice having such amenities standardized in C++.
The nice thing is that I'm still free to drop down and do all the stuff I might have done in C if I feel like that's what suits the program best. There's no OO straight-jacket like in Java.
1999: Use C, it is fast, low-level, efficient
2009: Use Python, it is fast-enough, productive, multi-platform, popular and fun
I've been working on a project, which is a combination of an application server and an object database, and is currently running on a single machine only. Some time ago I read a paper which describes a distributed relational database, and got some ideas on how to apply the ideas in that paper to my project, so that I could make a high-availability version of it running on a cluster using a shared-nothing architecture.
My problem is, that I don't have experience on designing distributed systems and their protocols - I did not take the advanced CS courses about distributed systems at university. So I'm worried about being able to design a protocol, which does not cause deadlock, starvation, split brain and other problems.
Question: Where can I find good material about designing distributed systems? What methods there are for verifying that a distributed protocol works right? Recommendations of books, academic articles and others are welcome.
I learned a lot by looking at what is published about really huge web-based plattforms, and especially how their systems evolved over time to meet their growth.
Here a some examples I found enlightening:
eBay Architecture: Nice history of their architecture and the issues they had. Obviously they can't use a lot of caching for the auctions and bids, so their story is different in that point from many others. As of 2006, they deployed 100,000 new lines of code every two weeks - and are able to roll back an ongoing deployment if issues arise.
Paper on Google File System: Nice analysis of what they needed, how they implemented it and how it performs in production use. After reading this, I found it less scary to build parts of the infrastructure myself to meet exactly my needs, if necessary, and that such a solution can and probably should be quite simple and straight-forward. There is also a lot of interesting stuff on the net (including YouTube videos) on BigTable and MapReduce, other important parts of Google's architecture.
Inside MySpace: One of the few really huge sites build on the Microsoft stack. You can learn a lot of what not to do with your data layer.
A great start for finding much more resources on this topic is the Real Life Architectures section on the "High Scalability" web site. For example they a good summary on Amazons architecture.
Learning distributed computing isn't easy. Its really a very vast field covering areas on communication, security, reliability, concurrency etc., each of which would take years to master. Understanding will eventually come through a lot of reading and practical experience. You seem to have a challenging project to start with, so heres your chance :)
The two most popular books on distributed computing are, I believe:
1) Distributed Systems: Concepts and Design - George Coulouris et al.
2) Distributed Systems: Principles and Paradigms - A. S. Tanenbaum and M. Van Steen
Both these books give a very good introduction to current approaches (including communication protocols) that are being used to build successful distributed systems. I've personally used the latter mostly and I've found it to be an excellent text. If you think the reviews on Amazon aren't very good, its because most readers compare this book to other books written by A.S. Tanenbaum (who IMO is one of the best authors in the field of Computer Science) which are quite frankly better written.
PS: I really question your need to design and verify a new protocol. If you are working with application servers and databases, what you need is probably already available.
I liked the book Distributed Systems: Principles and Paradigms by Andrew S. Tanenbaum and Maarten van Steen.
At a more abstract and formal level, Communicating and Mobile Systems: The Pi-Calculus by Robin Milner gives a calculus for verifying systems. There are variants of pi-calculus for verifying protocols, such as SPI-calculus (the wikipedia page for which has disappeared since I last looked), and implementations, some of which are also verification tools.
Where can I find good material about designing distributed systems?
I have never been able to finish the famous book from Nancy Lynch. However, I find that the book from Sukumar Ghosh Distributed Systems: An Algorithmic Approach is much easier to read, and it points to the original papers if needed.
It is nevertheless true that I didn't read the books from Gerard Tel and Nicola Santoro. Perhaps they are still easier to read...
What methods there are for verifying that a distributed protocol works right?
In order to survey the possibilities (and also in order to understand the question), I think that it is useful to get an overview of the possible tools from the book Software Specification Methods.
My final decision was to learn TLA+. Why? Even if the language and tools seem better, I really decided to try TLA+ because the guy behind it is Leslie Lamport. That is, not just a prominent figure on distributed systems, but also the author of Latex!
You can get the TLA+ book and several examples for free.
There are many classic papers written by Leslie Lamport :
(http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html) and Edsger Dijkstra
(http://www.cs.utexas.edu/users/EWD/)
for the database side.
A main stream is NoSQL movement,many project are appearing in the market including CouchDb( couchdb.apache.org) , MongoDB ,Cassandra. These all have the promise of scalability and managability (replication, fault tolerance, high-availability).
One good book is Birman's Reliable Distributed Systems, although it has its detractors.
If you want to formally verify your protocol you could look at some of the techniques in Lynch's Distributed Algorithms.
It is likely that whatever protocol you are trying to implement has been designed and analysed before. I'll just plug my own blog, which covers e.g. consensus algorithms.