Basic guidelines for high-performance benchmarking - benchmarking

I am going to benchmark several implementations of a numerical simulation software on a high-performance computer, mainly with regard to time - but other resources like memory usage, inter-process communication etc. could be interesting as well.
As for now, I have no knowledge of general guidelines how to benchmark software (in this area). Neither do I know, how much measurement noise is reasonably to be expected, nor how much tests one usually carries out. Although these issues are system dependent, of course, I am pretty sure there exists some standards considered reasonable.
Can you provide with such (introductory) information?

If a test doesn't take much time, then I repeat it (e.g. 10,000 times) to make it take several seconds.
I then do that multiple times (e.g. 5 times) to see whether the test results are reproduceable (or whether they're highly variable).
There are limits to this approach (e.g. it's testing with a 'warm' cache), but it's better than nothing: and especially good at comparing similar code, e.g. for seeing whether or not a performance tweak to some existing code did in fact improve performance (i.e. for doing 'before' and 'after' testing).

The best way is to test the job you will actually be using it for!
Can you run a sub-sample of the actual problem - one that will only take a few minutes, and simply time that on various machines ?

Related

Best Database for remote sensor data logging

I need to choose a Database for storing data remotely from a big number (thousands to tens of thousands) of sensors that would generate around one entry per minute each.
The said data needs to be queried in a variety of ways from counting data with certain characteristics for statistics to simple outputting for plotting.
I am looking around for the right tool, I started with MySQL but I feel like it lacks the scalability needed for this project, and this lead me to noSQL databases which I don't know much about.
Which Database, either relational or not would be a good choice?
Thanks.
There is usually no "best" database since they all involve trade-offs of one kind or another. Your question is also very vague because you don't say anything about your performance needs other than the number of inserts per minute (how much data per insert?) and that you need "scalability".
It also looks like a case of premature optimization because you say you "feel like [MySQL] lacks the scalability needed for this project", but it doesn't sound like you've run any tests to confirm whether this is a real problem. It's always better to get real data rather than base an important architectural decision on "feelings".
Here's a suggestion:
Write a simple test program that inserts 10,000 rows of sample data per minute
Run the program for a decent length of time (a few days or more) to generate a sizable chunk of test data
Run your queries to see if they meet your performance needs (which you haven't specified -- how fast do they need to be? how often will they run? how complex are they?)
You're testing at least two things here: whether your database can handle 10,000 inserts per minute and whether your queries will run quickly enough once you have a huge amount of data. With large datasets these will become competing priorities since you need indexes for fast queries, but indexes will start to slow down your inserts over time. At some point you'll need to think about data archival as well (or purging, if historical data isn't needed) both for performance and for practical reasons (finite storage space).
These will be concerns no matter what database you select. From what little you've told us about your retrieval needs ("counting data with certain characteristics" and "simple outputting for plotting") it sounds like any type of database will do. It may be that other concerns are more important, such as ease of development (what languages and tools are you using?), deployment, management, code maintainability, etc.
Since this is sensor data we're talking about, you may also want to look at a round robin database (RRD) such as RRDTool to see if that approach better serves your needs.
Found this question while googling for "database for sensor data"
One of very helpful search-results (along with this SO question) was this blog:
Actually I've started a similar project (http://reatha.de) but realized too late, that I'm using not the best technologies available. My approach was similar MySQL + PHP. Finally I realized that this is not scalable and stopped the project.
Additionally, a good starting point is looking at the list of data-bases in Heroku:
If they use one, then it should be not the worst one.
I hope this helps.
you can try to use Redis noSQL database

Why Database is not used with 3D and Drawing softwares (to store points and other primitive shapes)?

My question is why Databases are not used with Drawing, 3D Modelling, 3D Design, Game Engines and architecture etc. software to save the current state of the images or the stuff that is present on the screen or is the part of a project in a Database.
One obvious answer is the speed, the speed of retrieving or saving all the millions of triangles or points forming the geometry is very low, as there would be hundreds or thousands of queries per second, but is it really the cause? Considering the apparent advantages of using databases can allow sharing the design live over a network when it is being saved at a common location, and more than one people can work on it at a time or can use can give live feedback when something is being designed when it is being shared, specially when time based update is used, such as update after every 5 or 10 seconds, which is not as good as live synchronization, but should be quick enough. What is the basic problem in the use of Databases in this type of software that caused them not to be used this way, or new algorithms or techniques not being developed or studied for optimizing the benefits of using them this way? And why this idea wasn't researched a lot.
The short answer is essentially speed. The speed of writing information to a disk drive is an order of magnitude slower than writing it to ram. The speed of network access is in turn an order of magnitude slower than writing or reading a hard disk. Live sharing apps like the one you describe are indeed possible though, but wouldn't necessarily require what you would call a "database", nor would using a database be such a great idea. The reason more don't exist is that they are actually fiendishly difficult to program. Programming by itself is difficult enough, even just thinking in a straight line, with a single narrative. But to program something like that requires you to be able to accurately visualise multiple parallel dimensions acting on the same data simultaneously, without breaking anything. This is actually difficult.
Your obvious answer is correct; I'm not an expert in that particular field but I'm at a point that even from a distance you can see that's (probably) the main reason.
This doesn't negate the possibility that such systems will use a database at some point.
Considering the apparent advantages of using databases can allow
sharing the design live over a network when it is being saved at a
common location...
True, but just because the information is being passed around doesn't mean that you have to store it in a database in order to be able to do that. Once something is in memory you can do anything with it - the issue with not persisting stuff is that you will lose data if the server fails / stops / etc.
I'll be interested to see if anyone has a more enlightened answer.
Interesting discussion... I have a biased view towards avoiding adding too much "structure" (as in in the indexes, tables, etc. in a DBMS solution) to spacial problems or challenges beyond those that cannot be readily recreated from a much smaller subset of data. I think if the original poster would look at the problem from what is truly needed to answer the need/develop solutions ... he may not need to query a DBMS nearly as often... so the DBMS in question might relate to a 3D space but in fact be a small subset of total values in that space... in a similar way that digital displays do not concern themselves with every chnage in pixel (or voxel) value but only those that have changed. Since most 3D problems would have a much lower "refresh" rate vs video displays the hits on the DBMS would be infrequent enough to possibly make a 3D DB look "live".

How to know if HW/SW codesign will be useful for a specific application?

I will be in my final year (Electrical and Computer Engineering )the next semester and I am searching for a graduation project in embedded systems or hardware design . My professor advised me to search for a current system and try to improve it using hardware/software codesign and he gave me an example of the "Automated License Plate Recognition system" where I can use dedicated hardware by VHDL or verilog to make the system perform better .
I have searched a bit and found some youtube videos that are showing the system working ok .
So I don't know if there is any room of improvement . How to know if certain algorithms or systems are slow and can benefit from codesign ?
How to know if certain algorithms or systems are slow and can benefit
from codesign ?
In many cases, this is an architectural question that is only answered with large amounts of experience or even larger amounts of system modeling and analysis. In other cases, 5 minutes on the back of an envelop could show you a specialized co-processor adds weeks of work but no performance improvement.
An example of a hard case is any modern mobile phone processor. Take a look at the TI OMAP5430. Notice it has a least 10 processors, of varying types(the PowerVR block alone has multiple execution units) and dozens of full-custom peripherals. Anytime you wish to offload something from the 'main' CPUs, there is a potential bus bandwidth/silicon area/time-to-market cost that has to be considered.
An easy case would be something like what your professor mentioned. A DSP/GPU/FPGA will perform image processing tasks, like 2D convolution, orders of magnitude faster than a CPU. But 'housekeeping' tasks like file-management are not something one would tackle with an FPGA.
In your case, I don't think that your professor expects you to do something 'real'. I think what he's looking for is your understanding of what CPUs/GPUs/DSPs are good at, and what custom hardware is good at. You may wish to look for an interesting niche problem, such as those in bioinformatics.
I don't know what codesign is, but I did some verilog before; I think simple image (or signal) processing tasks are good candidates for such embedded systems, because many times they involve real time processing of massive loads of data (preferably SIMD operations).
Image processing tasks often look easy, because our brain does mind-bogglingly complex processing for us, but actually they are very challenging. I think this challenge is what's important, not if such a system were implemented before. I would go with implementing Hough transform (first for lines and circles, than the generalized one - it's considered a slow algorithm in image processing) and do some realtime segmentation. I'm sure it will be a challenging task as it evolves.
First thing to do when partitioning is to look at the dataflows. Draw a block diagram of where each of the "subalgorithms" fits, along with the data going in and out. Anytime you have to move large amounts of data from one domain to another, start looking to move part of the problem to the other side of the split.
For example, consider an image processing pipeline which does an edge-detect followed by a compare with threshold, then some more processing. The output of the edge-detect will be (say) 16-bit signed values, one for each pixel. The final output is a binary image (a bit set indicates where the "significant" edges are).
One (obviously naive, but it makes the point) implementation might be to do the edge detect in hardware, ship the edge image to software and then threshold it. That involves shipping a whole image of 16-bit values "across the divide".
Better, do the threshold in hardware also. Then you can shift 8 "1-bit-pixels"/byte. (Or even run length encode it).
Once you have a sensible bandwidth partition, you have to find out if the blocks that fit in each domain are a good fit for that domain, or maybe consider a different partition.
I would add that in general, HW/SW codesign is useful when it reduces cost.
There are 2 major cost factors in embedded systems:
development cost
production cost
The higher is your production volume, the more important is the production cost, and development cost becomes less important.
Today it is harder to develop hardware than software. That means that development cost of codesign-solution will be higher today. That means that it is useful mostly for high-volume production. However, you need FPGAs (or similar) to do codesign today, and they cost a lot.
That means that codesign is useful when cost of necessary FPGA will be lower than an existing solution for your type of problem (CPU, GPU, DSP, etc), assuming both solutions meet your other requirements. And that will be the case (mostly) for high-performance systems, because FPGAs are costly today.
So, basically you will want to codesign your system if it will be produced in high volumes and it is a high-performance device.
This is a bit simplified and might become false in a decade or so. There is an ongoing research on HW/SW synthesis from high-level specifications + FPGA prices are falling. That means that in a decade or so codesign might become useful for most of embedded systems.
Any project you end up doing, my suggestion would be to make a software version and a hardware version of the algorithm to do performance comparison. You can also do a comparison on development time etc. This will make your project a lot more scientific and helpful for everyone else, should you choose to publish anything. Blindly thinking hardware is faster than software is not a good idea, so profiling is important.

Should I manage pages or just lean on virtual memory?

I'm writing a database-style thing in C (i.e. it will store and operate on about 500,000 records). I'm going to be running it in a memory-constrained environment (VPS) so I don't want memory usage to balloon. I'm not going to be handling huge amounts of data - perhaps up to 200MB in total, but I want the memory footprint to remain in the region of 30MB (pulling these numbers out of the air).
My instinct is doing my own page handling (real databases do this), but I have received advice saying that I should just allocate it all and allow the OS to do the VM paging for me. My numbers will never rise above this order of magnitude. Which is the best choice in this case?
Assuming the second choice, at what point would it be sensible for a program to do its own paging? Obviously RDBMsses that can handle gigabytes must do this, but there must be a point along the scale at which the question is worth asking.
Thanks!
Use malloc until it's running. Then and only then, start profiling. If you run into the same performance issues as the proprietary and mainstream "real databases", you will naturally begin to perform cache/page/alignment optimizations. These things can easily be slotted in after you have a working database, and are orthogonal to having a working database.
The database management systems that perform their own paging also benefit from the investment of huge research efforts to make sure their paging algorithms function well under varying system and load conditions. Unless you have a similar set of resources at your disposal I'd recommend against taking that approach.
The OS paging system you have at your disposal has already benefit from tuning efforts of many people.
There are, however, some things you can do to tune your OS to benefit database type access (large sequential I/O operations) vs. the typical desktop tuning (mix of seq. and random I/O).
In short, if you are a one man team or a small team, you probably should make use of existing tools rather than trying to roll your own in that particular area.

Exhaustive testing and the cost of "Bug Free"

When I was learning software development, we were taught that actual "bug free" software was mathematically impossible for anything but the most trivial programs. For a mathematical mind, it's very simple to see how basic thingslike the number of possible inputs and the variability of platforms makes bug free not only impossible (in realistic time), but economically stupid for anything short of nuclear power generation.
However I'm constantly hearing business people spout off with "It's understood that software will be bug free, and if it's not all bugs should be fixed for free". I typically respond with "No, we'll fix any bugs found in the UAT period of (x) weeks" where x is defined by contract. This leads to a lot of arguments, and loss of work to people who are perfectly willing to promise the impossible.
Does anyone know of (or can express one) a good explanation of why "bug free" is NOT realistic OR standard -- that your average middle manager can understand?
See Gödel's incompleteness theorems
See also Hilbert's second problem
See also Turing: Halting Problem
See also MSFT financial reports for headcount numbers and Microsoft Connect for bug reporting on Microsoft products for a less intellectual explanation.
Perhaps the best way to explain it would be to read up a little on places that really, really, really don't want software bugs, and what they do about it. (You can use NASA as an example, and the Ariane 5 first launch as an example of what happens with such software bugs.) Middle managers tend to relate to stories and parallel examples.
Alternately, find out what happens with deals where one side does promise the impossible (and it happens much more often than developers like). If you can show that it doesn't end well for the promisee, that might help.
Also, you might want to go into what you would need as a bare minimum to promise bug-free software, which would be a truly comprehensive spec.
I usually walk managers through a simple explanation of how most programs are really state engines, and that as soon as you start interfacing with the real world the number of possible input states rapidly reaches infinity: because the user over here is inputting X while Y happens over here within 50ms of Z happening over there... etc, etc.
Granted, after about five minutes of this their eyes tend to glaze over. But at least I tried.

Resources