No recent books on MPI: is it dying? [closed] - distributed

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I've never used Message Passing Interface (MPI), but I've heard its name thrown about, most recently with Windows HPC Server. I had a quick look on amazon to see if there were any books on it, but they're all dated around 7 or more years ago. Is MPI still a valid technology choice for new applications, or has it been largely superceded by other distributed programming alternatives (e.g. DataSynapse GridServer)?
As it's not really an implementation, but rather a standard, what is the likelihood (assuming it's not dead) that learning it will result in better design of distributed programming systems? Is there something else I should be looking at instead?

For what MPI is good for it's still a good choice. It's just possible that there are no recent books on the topic because the existing ones are good enough and most of us using MPI don't need anything more.
I wouldn't characterise MPI as a distributed programming standard, more a standard for parallel programming on distributed memory computers -- which covers most of the largest computers in the world right now.
If I were betting on it being replaced I'd be looking at Chapel, X10, or, most likely, Fortran 2008.
What you should be looking at depends on your requirements, but if they include high-performance number-crunching for scientific and engineering codes, Fortran or C/C++ with MPI should be in your sights. I've never heard of DataSynapse GridServer, a quick Google suggests to me that it's aimed at a completely different class of computational problems.
EDIT: I just checked Amazon for books 'on MPI'. While the Gropp et al books are a bit old now, there are still plenty of other books being published which cover (use of) MPI. This is, in part, a reflection of the usage of MPI. It's not terribly interesting to computer scientists so there aren't many books on 'MPI for MPI's sake', but it is of interest to many computational scientists, so there's a steady stream of 'physics with MPI' and 'engineering with MPI' books. If these are outside your sphere of interest, MPI probably is too.

2016 update. MPI is still king for distributed memory programming on low latency networks of reliable compute nodes. I think the question poser is correct in that MPI is probably not the protocol layer where fault tolerance should take place. Circa 2006 we had MPI over SunGridEngine. Lately MPI on Mesos is becoming popular.
The MPI standard is in active development:
http://meetings.mpi-forum.org/MPI_3.0_main_page.php
The main issue is that now we have some machines with over 10,000 processors, and MPI itself his having a hard time scaling. Lots of open research problems. http://www.springerlink.com/content/q11r042317g88230/

Why do you need a book? The API is well documented.
On distributed systems you dont really have any other option besides MPI.
Some Fortran compilers like the one from Cray and G95 support coarrays. Then you have UPC but I havent seen anyone using it.

Well probably because there's not 'enough to it' (or user base is still too small, or they're too smart) for just the API description and a few examples, to support a separate book. Lots of books of parallel programming do cover it as one of several parallel methods, though. One recent one (Feb 2010) is: "Parallel Programming: For Multicore and Cluster Systems" By Thomas Rauber, Gudula RĂ¼nger. I haven't read it, I mention it because it's recent, and by real experts in the field (both => MPI isn't dead). As for the best book to help you wrap your head around how to use MPI, I can only refer you to people's reviews on Amazon. But look for 'parallel' in the title.

Related

What data structures and algorithms are not implementable in C? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
This may sound naive, but are there any data structures / algorithms that cannot be constructed in C, given enough code? I understand the argument of being Turing complete. I also know it's beneficial to have an elegant solution and that time complexity is important (i.e. more expressive or succinct when implemented in Ruby / Java / C# / Haskell / Lisp). All the languages I've researched or used all seem to have been created or subsequently refactored into C based compilers, interpreters, and/or virtual machines. Are some complex data structures only implementable with an interpreter and/or virtual machine? If that virtual machine or interpreter is C based, isn't that just another data structure abstraction of the underlying C code? i.e. C has a simple type system but serves as the foundation for a dynamic type system. I was surprised to learn metaprogramming seems possible in C using the preprocessor (ioccc.org Immanuel Herrmann). I've also seen some intriguing C algorithms that mimic the concurrency model of Erlang, but don't recall the source.
What inspired this question was the StackOverflow post (Lesser Known Useful Data Structures) and the Patrick Dussud interview on channel9 (Garbage Collection - Past, Present and Future) - explaining how they wrote the the first CLR garbage collector (written in Lisp targeting the JVM, compiled from Lisp to C++ for the CLR).
So, at the end of the day, after I finish punching my cards, I'm wondering if this question is probably more about C programming language design than convenience of programming and time complexity. For example, I could implement a highly complex algorithm in Prolog that is very elegant and quite difficult to understand expressed any other way, but I'm still limited by the assembly instructions and the computer architecture (on/off) at the other end of the stick, so I'd be here all night.
Shor's algorithm for factorizing integers in O((log n)^3) polynomial time cannot be implemented in C, because the computers that it can run on do not yet officially exist. Maybe someday there will be a quantum circuit complete version of C and I'll have to revise my answer.
Joking aside, I don't think anybody can give you a satisfying answer to this. I will try to cover some aspects:
Vanilla, standard C might not be able to make use of the whole feature set of your processor. For example, you are not able to use the TSX feature of recent Intel processors explicitly. You can of course resort to OS primitives, inline assembly, language extensions or third-party libraries to circumvent that.
C by itself is not very good at parallel/asynchronous/concurrent/distributed programming. Some examples of languages that probably make a lot of tasks infinitely easier in this area are Haskell (maybe Data Parallel Haskell soon?), Erlang, etc. that provide very fast and lightweight threads/processes and async I/O. Working with green threads and heavily asynchronous I/O in C is probably less pleasant, although I'm sure it can be done.
In the end, on the user level side of things, of course you can emulate every Turing complete language with any other, as you pointed out so correctly.
Any Turing-complete machine or language can implement any other Turing-complete language, which means it can implement any program in any other Turing-complete language by interpretation if no other way. So the question you're asking is ill-formed; the issue is not whether tasks can be accomplished but how hard you have to work to accomplish them.
C in particular functions almost as a "high-level assembler language", since it will let you get away with many things that more recent languages won't, and thus may allow solutions that would be harder to implement in a more strongly-checked language.
That doesn't mean C is the best language for all those purposes. It forces you to pay much more attention to detail in many areas ranging from memory management to bounds checking to object-orientation (you CAN write OO code in C, but you have to implement it from the ground up). You have to explicitly load and invoke libraries for things that may be built into other languages. C datatypes can be incredibly convoluted (though typedefs and macros can hide much of that complexity). And so on.
The best tool for any given task is the one that (a) you are, or can become, comfortable with; (b) that's a good fit for the task at hand, and (c) that you have available.
Take a look at Turing completeness: Turing Completeness
Basically, any language which is Turing complete can execute all Turing-computable functions. C is a Turing complete language, so in theory you can implement any known solvable algorithm in C (albeit it may be terribly inefficient).

Actual Value of Machine-"Specifics" in C [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I have been reading the C++ Primer, due to all the claims of how usable the language has become because of C++11. The book is probably ok, but it still leaves me wondering every few pages about what the code really does. A a result I end up googling lots and lots of stuff .. only to eventually end up with something along the lines of that bit is "machine specific" or "implementation defined", etc.
I realize what that means, and how portable code is important.
Yet I do wonder, what are the actual values for these specifics for "run-of-the-mill" 64bit x86 PCs? Since GCC, Visual Studio, etc. don't actually ask about what to do in all these cases, but just compile the code (and it works!), there seems to be some sane set of defaults for targeting desktops.
Is there a document that covers these details (in a for non-compiler-writers understandable way, like the pages that I linked to)?
Most Unix or Linux systems you can login and issue the command
locate limits.h
and it will find a number of include files that list the "limits" for values used by the compiler. Many of the limits files in the Linux kernel code are architecture specific, which is your especial interest
Frankly, portability is difficult to obtain 100%. I've been programming for 30 years and have never seen anything but simple programs that are 100% portable. Given the PC is ubiquitous, I don't think you should concern yourself with portability over functionality. Hence, all the references you find to "implementation defined".
In a perfect world, programs would be portable. In the real world, OS makers add features to compete with other OS makers and even themselves (Win 95, 98, 2000, XP, 7, Vista) (and Linux distros have differences). As a result, being portable -- IN MY EXPERIENCE -- means a trade-off you're not willing to make: too slow, too bulky, too much development time, too much testing, etc. If you seek portability, you need to ask why and is it worth it. Even if you decide to do so, you will find yourself adding compile-time options based on your environment and may end up with entire files that are specific and non-portable.
When I write code for an Atmel Mega16 I don't consider whether I'm going to port that code. In this case, you don't have the luxury of infinite CPU cycles and boundless memory to consider a portable solution -- we're trying to squeeze all the juice out of a little micro.
Likewise, it's often the case you need to optimize routines in assembler in order to gain back CPU cycles for more features. (Like a DSP running a DFT -- it's ok in C when you first ship it, but eventually you need to reduce that to ASM to get back a pile of CPU cycles for 23 more features your boss wants you to add by tomorrow morning. Portability be damned.)
So, yes, much is implementation specific. In the PC world you have a little more luxury, but if you're writing code that interfaces with hardware you're often forced to create non-portable code. I could go on and on about this, but I have a loop that needs optimizing...

Sample solutions for low-level problems written in C [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
Does anyone know where I might find sample solutions written in C for low level / systems level applications? A really good website or book recommendation would be cool too.
I've learned some of the basics, but would like to see some code within the context of a real solution written in C, and specifically for a lower-level problem. Id' be interested in how C is used within the context of OS programming, for example. What are some areas where C is used for lower-level programming?
Thanks.
I would suggest you to study MINIX3 from Tanenbaum: http://www.minix3.org/
Its a microkernel architecture, and with his book ( http://vig.prenhall.com/catalog/academic/product/0,1144,0131429388,00.html ) it is really enlightning.
As of my opinion, studying the linux kernel is a bit hardcore for a start ;), and out of a academical point of view the microkernel architecture is superior to the monolithic kernel.
Furthermore, with only a few thousands lines of code, unlike the Linux Kernel, its consumable in a realistic timetable.
And its a real serious project, the European Union sponsored some Millions towards it as far as i am aware of. I think i remind him saying that in one of his talks.
And you have a X-Server running there, a gcc-toolchain etcpp.
Have fun :)
EDIT: As i read the comments, someone mentions the Ruby interpreter. Its written in a mixture of C and Ruby, and as far as it was mentioned in one episode of se-radio.net, it is really nice sourcecode. Though i have to admit, i havent looked into it myself. Might be worth the dig into it if you have some interest in Ruby too.
I'd suggest looking at some (for you) interesting open source projects written in C. For example, there's busybox, a piece of software that runs on embedded devices and has lots of smaller programs to study. You could, for example, take the source for the telnet client on one side and the corresponding RFC on the other. Or, for a steeper learning curve, you could also try studying the open source OSes, like the Linux kernel (here's the tree for browsing) or the BSDs. It's a lot more involved than busybox, but you can still find some parts that are fairly easy to understand if you're familiar with the context.
Studying the Linux kernel, maybe in conjunction with one of the several books on the kernel or device drivers would provide a wealth of material. Much of this is available free.
any or all of the books by W. Richard Stevens that walk though the implementation (TCP/IP Illustrated) or use (UNIX Network Programming) of the networking stack or his Advanced Programming in the UNIX Environment book.
If you have a leaning toward Windows there are several good books, even if they're quite old, including:
Programming Server-Side Applications for Microsoft Windows 2000 by Richter and Clark
Programming Applications for Microsoft Windows by Richter
I would suggest the following sources might be interesting r.e. Operating Systems from a learning perspective. Be aware there have been many advancements actually present in modern kernels:
The original linux code.
xv6. This is a simple unix OS that goes along with MIT's excellent OpenCourseWare course on Operating Systems.
Other ideas:
The current grub stage 1 bootloader isn't that complicated - it's pretty hard to be complicated with 512 bytes to play with.
The Linux kernel module guide gives you an introduction to building kernel modules. You could experiment with building custom, yet pointless, drivers that add say character devices to /dev/ or proc devices to /proc and work towards implementing something interesting. People have implemented web servers in kernel space...
If you want to experiment with Windows kernels, have a go with Native NT applications. I'd start with printing a pointless boot message, then move up to drivers.
Beyond that, it's hard to suggest where you might want to go. Systems level is a wide space.
In the context of low level programming, C and C++ are portable assembler. In many of the above spaces the standard library is either partially or totally missing and extra functionality may be implemented by existing parts of the system-level code you're modifying, so you have to be aware of the API functions available to you in any given space and what you need to implement yourself, as well as what your memory and processing requirements must be. For example, a bootloader written to the MBR has to use bios interrupts and starts in real (16-bit) mode. Those are the constraints of the hardware design. Likewise, functions like fopen() aren't available in kernel space since they wrap system calls - you'd need to use kernel specific constructs to achieve this if it really made sense to write a file from kernel space.

What sort of businesses still hire C programmers? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I'm starting a job search, ideally ending up at a C shop. So far, I'm coming up empty in my local ads, and am starting to think I need to broaden my search, targeting specific types of businesses.
So, what type of places typically use this language?
C is typically used for fairly low-level development. You'll see it used in embedded systems, frequently, which is often listed as a computer engineering position (rather than computer science, or software engineering.) C is also used frequently for device drivers and 'generic' low-level code like math utility code for larger projects.
Generally the sorts of jobs that -need- C are taken by developers who've been using it forever, and have likely been in that position a long time.
Just keep looking! C is a rarity in terms of seeing a job just listed as "C Developer" as you've seen - so obviously they'll just be hard to find.
But I'd just wonder why you're exclusively looking for a C job as opposed to a language like C++ or Objective C :)
Edit:
Just a little note also, not to mislead you with the answer; C is still used for a lot of different stuff. Browsers, instant messengers, server daemons, the network code for even some code written on other languages even. The problem is this is just inefficient in terms the amount of time spent doing the work when you easily write it in Python, on .NET, or any number of other technologies. As such, it just isn't common, but the work can exist.
I work primarily as a C (and Perl) developer, because the application is mature, with a fairly long history (i.e. originally developed in the early 90s). The application suite originally was developed for Unix based graphical workstations. My previous job was a similar situation, a mature distributed application that was developed on multiple Unix platforms, originally in the early 1990s, and due to the source code size and maturity, it would be difficult to justify simply throwing that code base away to move to a new development language or even migrating to C++.
I would imagine there are still a number of larger in-house (used for internal purposes, not sold as a product) applications written in C that are still being maintained. Not entirely unlike the massive COBOL applications that large companies (insurance, finance, banking) that are also still being maintained.
For new development in C, others have already mentioned the embedded systems market, where the development is often for software put into ROM or EEPROM / flash memory where it is referred to as firmware, for microcontrollers (Microchip PIC, Atmel AVR, 80C51, 68HC11, etc.), where object code size, RAM usage, and performance matters so the usage of a programming language with fewer high-level or generic abstractions or assumptions is desirable.
One critical thing about good to great C programmers, is that they are expected if not required to know more about data structures and algorithms. Priority Queues, Binary Trees, MergeSort, QuickSort, Knuth-Morris-Pratt, and Karp-Rabin should be at least vaguely familiar. This is because the C language lacks the STL, Boost, CPAN, and other standard libraries available in other languages. This is at least partly because most implementations are type-specific (due to lack of templates or dynamic typing or similar mechanism) to have generic enough routines to be useful in practise.
Knowing more than one programming language is not a bad thing, even if you don't feel comfortable enough to claim to be comptent enough to program in the additional langages professional. A "modern" scripting or "trendy" web development language might be a good match. Perl, Python, and Ruby are good potential candidates.
For programming experience, functional languages like LISP, Scheme, Prolog*, ML, Objective Caml, Haskell, and Scala are good candidates for making you "think different." Admittedly Prolog is actually a declarative logic programming language, but it is still programming experience expanding.
To add on to Anthony's excellent answer, C is still used extensively in the development of operating systems and firmware, so you may want to try looking in that direction as well.
Good luck in your search for a job.
Things that must run close to the metal, and be fast.
So in addition to what Anthony wrote -- networking protocols, storage device drivers, file systems, the core of operating systems, are still big on C.
Because the focus of interest has commonly moved to applied and web development where you can't do much with C.
Either extend your search geography to other cities/countries or follow the industry trend and learn something new.
Most C programming jobs are in "embedded systems" ... things like televisions, cars, phones, alarms, clocks, toys. Such applications are often memory-constrained by cost reasons, so higher-level languages (eg, Python) are not an option there.
At a time when C and C++ were the predominant coding environments, it was said that 90% of the C programming jobs were for embedded work. Stuff that isn't advertised as software, and for which there are rarely any famous names or faces associated. This is even more the case today.
Linux is completely in C. So any company that contributes to Linux is likely to employ C coders. I worked for an industrial automation company that developed in C. Though most automation shops run PLCs and ladder logic.
iPhone development shops online. Try craigslist as well.
Objective-C is a slim superset of C, so your C skills translate nicely.
Good luck!

Advice for a C, CUDA, & ANN Newbie? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm a business major, two-thirds of the way through my degree program, with a little PHP experience, having taken one introductory C++ class, and now regretting my choice of business over programming/computer science.
I am interested in learning more advanced programming; specifically C, and eventually progressing to using the CUDA architecture for artificial neural network data analysis (not for AI, vision, or speech processing, but for finding correlations between data-points in large data sets and general data/statistical analysis).
Any advice about how I should start learning C? As well as ANN/Bayesian technology for analyzing data? There are so many books out there, I don't know what to choose.
Since CUDA is fairly new, there doesn't seem to be much learner-friendly (i.e. dumbed-down) material for it. Are there learning resources for CUDA beyond the NVIDIA documentation?
Further, what resources would you recommend to me that talk about GPGPU computing and massively parallel programming that would help me along?
I don't recommend trying to learn CUDA first since it's a new technology and you don't have much background in programming.
Since you don't have much experience in C (or C++), CUDA will be a pain to learn since it lacks maturity, libs, nice error messages, etc.
CUDA is meant for people who are familiar with C (C++ experience helps too) and have a problem which needs performance improvement by recoding or rethinking the solution of a well known problem.
If you're trying to solve "ANN/Bayesian" problems I would recommend creating your solution in C++ or C, your choice. Don't bother about creating threads or multithreading. Then, after evaluation the response times of your serial solution try to make it parallel by using OpenMP, Boost threads, w/e. After this, if you still need more performance, then I would recommend learning CUDA.
I think these are valid points because CUDA has some pretty cryptic errors, hard to debug, totally different architecture, etc.
If you're still interested, these are some links to learn CUDA:
Online courses:
GPGP
CIS 665
Richard Edgar's GPU Computing Pages
Forum (the best source of information):
NVIDIA CUDA Forum
Tools:
CUDPP
Problems solved in CUDA:
gpuDG
Histogram Computation
You've expressed 3 different goals:
Learning to program in C
Learning to write code for the CUDA platform
Learning to use Bayes' Nets and/or Neural nets for data analysis
Firstly: these things are not easy for people who already have several degrees in the field. If you only do one, make sure to learn about Bayesian inference. It's by far the most powerful framework available for reasoning about data, and you need to know it. Check out MacKay's book (mentioned at the bottom). You certainly have set yourself a challenging task - I wish you all the best!
Your goals are all fairly different kettles of fish. Learning to program in C is not too difficult. I would if at all possible to take the "Intro to Algorithms & Data Structures" (usually the first course for CS majors) at your university (it's probably taught in Java). This will be extremely useful for you, and basic coding in C will then simply be a matter of learning syntax.
Learning to write code for the CUDA platform is substantially more challenging. As recommended above, please check out OpenMPI first. In general, you will be well-served to read something about computer architecture (Patterson & Hennessy is nice), as well as a book on parallel algorithms. If you've never seen concurrency (i.e. if you haven't heard of a semaphore), it would be useful to look it up (lectures notes from an operating systems course will probably cover it - see MIT Open Courseware). Finally, as mentioned, there are few good references available for GPU programming since it's a new field. So your best bet will be to read example source code to learn how it's done.
Finally, Bayesian nets and Neural nets. First, please be aware that these are quite different. Bayesian networks are a graphical (nodes & edges) way of representing a joint probability distribution over a (usually large) number of variables. The term "neural network" is somewhat vaguer, but generally refers to using simple processing elements to learn a nonlinear function for classifying data points. A book that gives a really nice introduction to both Bayes' nets and Neural nets is David J.C. MacKay's Information Theory, Inference and Learning algorithms. The book is available for free online at http://www.inference.phy.cam.ac.uk/mackay/itila/. This book is by far my favorite on the topic. The exposition is extremely clear, and the exercises are illuminating (most have solutions).
If you're looking for a friendly introduction to parallel programming, instead consider Open MPI or Posix Threading on a CPU cluster. All you need to get started on this is a single multi-core processor.
The general consensus is that multi-programming on these new architectures (gpu, cell, etc) have a way to go in terms of the maturity of their programming models and api's. Conversely, Open MPI and PThreads have been around for quite a while and there are lots of resources around for learning them. Once you have gotten comfortable with these, then consider trying out the newer technologies.
While there's certainly programming interfaces for many other languages, C is probably the most common modern language (Fortran and Pascal are still kicking around in this area) in use in high performance computing. C++ is also fairly popular though, several Bioinformatics packages use this. In any case, C is certainly a good starting place, and you can bump up to C++ if you want more language features or libraries (will probably be at the cost of performance though).
If you are interested in data mining, you might also want to look at the open source system called Orange. It is implemented in C++ but it also supports end-user programming in Python or in a visual link-and-node language.
I don't know if it supports NNs but I do know people use it for learning datamining techniques. It supports stuff like clustering and association rules.
(Also, in case you didn't know about it, you might want to track down somebody in your B-school who does operations management. If you're interested in CS and datamining, you might find likeminded people there.)
Link: gpgpu.org Has some interesting discussion
The latest CUDA releases (3.1, 3.2) have a full featured set of functions called CuBLAS that will handle multi-coring matrix operations for you on single card setups. Paralleling the backproagation will be a bit more of a challenge, but I'm working on it.
I was able to find some great video courses free from Stanford on iTunesU
Programming Methodology (CS106A)
Programming Abstractions (CS106B)
Programming Paradigms (CS107)
Machine Learning (CS229)
Programming Massively Parallel Processors with CUDA
Each one of these courses has around 20 or so lectures so it's a investment to watch them all but well worth it.

Resources