I have been given an assignment in my college to propose an algorithm that improves or speeds up the performance of some existing program, system, algorithm. I have to additionally prove with simulations, compare execution times, etc. that my proposed changes to the chosen program actually make some improvements, cause a performance increase, make the existing application "better", etc... Do you have any ideas?
Alternatively, I could just pick a real-life situation where a program would help and write it. I am keen that it is not complex and does not require very much time and knowledge. Thanks in advance!
A colleague suggested to me to write a garbage collector for a program written in C and time comparison that the program with the GC implemented will execute faster than the one without. (for example, this: https://www.geeksforgeeks.org/snake-game-in-c/) I don't know if I won't encounter too many problems with this.
Related
I am a biologist and I am trying to study computer languages. But, when I was trying to learn about the lpthread library, it seems odd as the result was lower than the sequential version.
In fact I am still reading the Tanenbaum book. But my main focus is to learn the basics of the calculations of the secondary structure of RNAs. So I found the explanation to the nussinov algorithm in a book and did indeed implement it. But when I tried to make a parallel version I believe that I might be missing the whole point, as this is my first contact with parallel implementations.
My questions are:
1. How should I implement a data-parallelism version for this algorithm ?
2. Why is my implementation slightly slower than the sequential one?
The code is available on: https://gist.github.com/drenge/6395472 (each file is a different version parallel/sequential)
there are two ways to make a parallel version of an algorithm/program.
You study the algorithm and write the serial program. Afterwards, you start profiling the program to see where you can obtain speed gains. Those are the places where parallelism might come in handy (might, not will). I call this method the "desparate man's tool". This method is useful (!), but most of the times, the method beneath can provide better performance gains. This way of doing the optimisation method only takes programming and user experience into account.
You take the algorithm and try to figure out an other algorithm that permits parallel handling of the problem. Are there independent calculations or steps in the algorithm, are there parts of the algorithm that can be done before other parts completely finish, ... This could be called "the theoretical approach". Keep in mind that every thread has its overhead, and you don't want the overhead to be bigger than the gain you wish to obtain.
In fact, a combination of both is the best way to go (if parallelism is really necessary): first concentrate on method 2 (optimise the algorithm so that is stays scientifically correct, but can be treated in multi threading). Then look at the critical thread (can be found while profiling) and start optimising that thread.
As Kerrek SB already told: parallel programming is a very complex topic, with lots of possible pitfalls. And at the end of the road, you should ask yourself: is it worth the effort. After all: loosing weeks of study and programming time to gain some minutes is not worth your while.
On the other hand, if your program will run thousands of times, frustrating users due to long waiting times or a lack of responsiveness, than maybe, it could be useful to make a more performant version after all. But again: can't you reach the same goal by optimising a sequential version without the parallel clutter? Lot's of algorithms are of order O(exp(x)) or worse and can be reduced to O(x) or even O(log(x)).
Kind regards,
PB
I've spent my afternoon reading up on processor caches after reading about the effect power of twos can have on cache conflicts. Now I wish to apply this new knowledge to my memory allocator for multi-threaded programs. However, I don't fully understand it yet.
I was under the impression that processors loved powers of two, so my allocator rounds requested sizes to their next power of two and then slices pages into multiples of this size and hands them out. When a page is full, it simply maps a new page and slices it up the same way. This leads to very similar and predictable offsets into pages.
To what extent should I adapt my allocator to avoid this issue? For example, should I try to randomize addresses slightly or am I screwed for using powers of two in the first place?
Thanks!
Until you have uncontrovertible proof that this is performance critical, just leave it be. The extra complication will most probably not be worth it.
Everybody should read (and understand!) Bentley's "Writing efficient programs" (sadly out of print now, his "Programming Pearls" contains a summary, and is well worth a read too).
Before embarking on a code-optimization bout, make sure it is worth it. If the performance is adequate, there are better uses of your time. Yes, you have to measure first.
Measure where the cost is being spent. Programmers are notoriously bad at guessing where the costs are
The most performance gains come from restating the problem (sometimes it is enough to solve a problem that is faster to solve), then overall organization of the system, next better algorithms/data structures; and at the very, very end detail optimizations like the one considered here.
Your friendly compiler, given a bit of prodding in the direction of "generate good code" will today generate much better code than an experienced assembly language programmer when given similar (full function scale) tasks. Most local source code reorganizations "for performance" are either moot (the compiler would have done so on its own) or deleterious (the compiler will recognize and rewrite the usual code sequences, unusual code can confuse it to do nothing or generate bad code).
Programmer time (writing, debugging, maintaining) is much more valuable than a few microseconds of computer time here and there, except for extremely unusual circumstances. Write the simplest code that does the job, rework only if experience shows it is worthwile.
I wrote a Gibbs sampler in R and decided to port it to C to see whether it would be faster. A lot of pages I have looked at claim that C will be up to 50 times faster, but every time I have used it, it's only about five or six times faster than R. My question is: is this to be expected, or are there tricks which I am not using which would make my C code significantly faster than this (like how using vectorization speeds up code in R)? I basically took the code and rewrote it in C, replacing matrix operations with for loops and making all the variables pointers.
Also, does anyone know of good resources for C from the point of view of an R programmer? There's an excellent book called The Art of R Programming by Matloff, but it seems to be written from the perspective of someone who already knows C.
Also, the screen tends to freeze when my C code is running in the standard R GUI for Windows. It doesn't crash; it unfreezes once the code has finished running, but it stops me from doing anything else in the GUI. Does anybody know how I could avoid this? I am calling the function using .C()
Many of the existing posts have explicit examples you can run, for example Darren Wilkinson has several posts on his blog analyzing this in different languages, and later even on different hardware (eg comparing his high-end laptop to his netbook and to a Raspberry Pi). Some of his posts are
the initial (then revised) post
another later post
and there are many more on his site -- these often compare C, Java, Python and more.
Now, I also turned this into a version using Rcpp -- see this blog post. We also used the same example in a comparison between Julia, Python and R/C++ at useR this summer so you should find plenty other examples and references. MCMC is widely used, and "easy pickings" for speedups.
Given these examples, allow me to add that I disagree with the two earlier comments your question received. The speed will not be the same, it is easy to do better in an example such as this, and your C/C++ skills will mostly determines how much better.
Finally, an often overlooked aspect is that the speed of the RNG matters a lot. Running down loops and adding things up is cheap -- doing "good" draws is not, and a lot of inter-system variation comes from that too.
About the GUI freezing, you might want to call R_CheckUserInterrupt and perhaps R_ProcessEvents every now and then.
I would say C, done properly, is much faster than R.
Some easy gains you could try:
Set the compiler to optimize for more speed.
Compiling with the -march flag.
Also if you're using VS, make sure you're compiling with release options, not debug.
Your observed performance difference will depend on a number of things: the type of operations that you are doing, how you write the C code, what type of compiler-level optimizations you use, your target CPU architecture, etc etc.
You can write basic, sloppy C and get something that works and runs with decent efficiency. You can also fine-tune your code for the unique characteristics of your target CPU - perhaps invoking specialized assembly instructions - and squeeze every last drop of performance that you can out of the code. You could even write code that runs significantly slower than the R version. C gives you a lot of flexibility. The limiting factor here is how much time that you want to put into writing and optimizing the C code.
The reverse is also true (duplicate the previous paragraph here, but swap "C" and "R").
I'm not trying to sound facetious, but there's really not a straightforward answer to your question. The only way to tell how much faster your C version would be is to write the code both ways and benchmark them.
Maybe this has been asked before, but I couldn't find it. My question is simple: Does it make sense to write an application in higher level languages (Java, C#, Python) and time/performance-critical functions in C? Or at this point unless you do very low level OS/game/sensor programming it is all the same to have a full, say, Java application?
It makes sense if you a) notice a performance issue, AND b) use performance measurements to locate where the problem occurs, AND c) can't achieve the desired performance by modifying the existing code.
If any of these items don't apply, then it's probably premature optimization.
If you are fluent and productive in a higher level language such a Python and Lua, then by all means start writing in that language. Look for bottlenecks if and when they exist.
speed can be quite similar with things like C#.
What is tricky is latency. So if you want to write something which you know takes < 10ms then C is reasonably predictable (ignoring whatever variability your operating system might introduce).
Having said that for very tight long loops (image processing for example), things like C/C++ can offer some speed up. You can get quite reasonable performance out of C#, you do have to be careful how you program it though, but I have found in general, you can still squeeze more out of C/C++
Usually your preferred language will do whatever you need it to in acceptable time (er, blazing fast).
Sure, critical time/performance functions can be written in a "more optimal/suitable" language like C or assembly - but whether it will actually make things faster is another story. There are laws that govern how much actual/overall speed-up that you'll get, specifically Amdahs Law and (diminishing returns) .
To answer your question, it only makes sense to rewrite these critical functions in lower languages if there is good enough speed-up to warrant the extra work.
I suggest you read Cliff Click's excellent Java vs. C Performance....Again.. It outlines many points of comparison between Java and C++.
Predictably, the conclusion is that it depends, but it's a worthwhile read.
You can only really answer this on a case by case basis and without reference to what you are doing it's impossible to answer.
But maybe what you actually want here is some kind of sanity check to ensure that this approach isn't crazy to consider. I have worked on tools ranging from a very large graphics application (~ million lines) to relatively small physical simulation engines (~10000 lines) there were written just as you describe: Python on the outside for interface (both for API and for GUI), C/C++ on the inside for the heavy lifting. They all benefited from this division of responsibility.
This is done especially with scripting languages. Things that come to mind are games made in Python. Most of the time Python is too slow for some of the more number crunching aspects of games and they make this a C module for speed. Be sure that you actually need the speed though and that number crunching is your performance bottleneck and not a general algorithm issue. Doing a brute-force search over a list is going to be slow in both C and Python.
I'd say that it depends a lot on your application.
What sort of performance is important? short startup-time? high throughput? low latency? Is it important that response time is always predictable?
Is the application short lived or does it run for long periods of time?
Java can give you high throughput, but occasional short freezes while doing garbage collection. C# is probably similar.
Python, well the performance there will often lag behind the others for anything not written in C (some things ARE written in C, even if you didn't do it yourself).
So as others said. It depends.
But as always with performance: Measure first, optimize when you know you need to.
Seems that requirements on safety do not seem to like systems that use AI for safety-related requirements (particularly where large potential risks of destruction/death are involved). Can anyone suggest why? I always thought that, provided you program your logic properly, the more intelligence you put in an algorithm, the more likely this algorithm is capable of preventing a dangerous situation. Are things different in practice?
Most AI algorithms are fuzzy -- typically learning as they go along. For items that are of critical safety importance what you want is deterministic. These algorithms are easier to prove correct, which is essential for many safety critical applications.
I would think that the reason is twofold.
First it is possible that the AI will make unpredictable decisions. Granted, they can be beneficial, but when talking about safety-concerns, you can't take risks like that, especially if people's lives are on the line.
The second is that the "reasoning" behind the decisions can't always be traced (sometimes there is a random element used for generating results with an AI) and when something goes wrong, not having the ability to determine "why" (in a very precise manner) becomes a liability.
In the end, it comes down to accountability and reliability.
The more complex a system is, the harder it is to test.
And the more crucial a system is, the more important it becomes to have 100% comprehensive tests.
Therefore for crucial systems people prefer to have sub-optimal features, that can be tested, and rely on human interaction for complex decision making.
From a safety standpoint, one often is concerned with guaranteed predictability/determinism of behavior and rapid response time. While it's possible to do either or both with AI-style programming techniques, as a system's control logic becomes more complex it's harder to provide convincing arguments about how the system will behave (convincing enough to satisfy an auditor).
I would guess that AI systems are generally considered more complex. Complexity is usually a bad thing, especially when it relates to "magic" which is how some people perceive AI systems.
That's not to say that the alternative is necessarily simpler (or better).
When we've done control systems coding, we've had to show trace tables for every single code path, and permutation of inputs. This was required to insure that we didn't put equipment into a dangerous state (for employees or infrastructure), and to "prove" that the programs did what they were supposed to do.
That'd be awfully tricky to do if the program were fuzzy and non-deterministic, as #tvanfosson indicated. I think you should accept that answer.
The key statement is "provided you program your logic properly". Well, how do you "provide" that? Experience shows that most programs are chock full of bugs.
The only way to guarantee that there are no bugs would be formal verification, but that is practically infeasible for all but the most primitively simple systems, and (worse) is usually done on specifications rather than code, so you still don't know of the code correctly implements your spec after you've proven the spec to be flawless.
I think that is because AI is very hard to understand and that becomes impossible to maintain.
Even if a AI program is considered fuzzy, or that it "learns" by the moment it is released, it is very well tested to all know cases(and it already learned from it) before its even finished. Most of the cases this "learning" will change some "thresholds" or weights in the program and after that, it is very hard to really understand and maintain that code, even for the creators.
This have been changing in the last 30 years by creating languages easier to understand for mathematicians, making it easier for them to test, and deliver new pseudo-code around the problem(like mat lab AI toolbox)
As there is no accepted definition of AI, the question shall be more specific.
My answer is on adaptive algorithms merely employing parameter estimation - a kind of learning - to improve the safety of the output information. Even this is not welcome in functional safety although it may seem that the behaviour of a proposed algorithm is not only deterministic (all computer programs are) but also easy to determine.
Be prepared for the assessor asking you to demonstrate test reports covering all combinations of input data and failure modes. Your algorithm being adaptive means it depends not only on current input values but on many or all of the earlier values. You know that a full test coverage is impossible within the age of the universe.
One way to score is showing that previously accepted simpler algorithms (state of the art) are not safe. This shall be easy if you know your problem space (if not, keep away from AI).
Another possibility may exist for your problem: a compelling monitoring function indicating whether the parameter is estimated accurately.
There are enough ways that ordinary algorithms, when shoddily designed and tested, can wind up killing people. If you haven't read about it, you should look up the case of Therac 25. This was a system where the behaviour was supposed to be completely deterministic, and things still went horribly, horribly wrong. Imagine if it were trying to reason "intelligently", too.
"Ordinary algorithms" for a complex problem space tend to be arkward. On the other hand, some "intelligent" algorithms have a simple structure. This is especially true for applications of Bayesian inference. You just have to know the likelihood function(s) for your data (plural applies if the data separates into statistically independent subsets).
Likelihood functions can be tested. If the test cannot cover the tails far enough to reach the required confidence level, just add more data, for example from another sensor. The structure of your algorithm will not change.
A drawback is/was the CPU performance required for Bayesian inference.
Besides, mentioning Therac 25 is not helpful, since no algorithm at all was involved, just multitasking spaghetti code. Citing the authors, "[the] accidents were fairly unique in having software coding errors involved -- most computer-related accidents have not involved coding errors but rather errors in the software requirements such as omissions and mishandled environmental conditions and system states."