How much processing power a computer has [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have this following code, which I run in the terminal.
In another terminal i have 'top' open, where i am able to see the %CPU of the new process i create. I run this for the number of processes (N); 2, 4, 8, 16.
The average %CPU from each i report back is..
2 - 100%
4 - 97%
8 - 50%
16 - 25%
How can the processing power of the computer be determined by these results?
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define N 2 /* define the total number of processes we want */
/* Set global variable */
float total=0;
/* compute function just does something. */
int compute()
{
int i;
float oldtotal=0, result=0;
/* for a large number of times just square root and square
the arbitrary number 1000 */
for(i=0;i<2000000000;i++)
{
result=sqrt(1000.0)*sqrt(1000.0);
}
/* Print the result – should be no surprise */
printf("Result is %f\n",result);
/* We want to keep a running total in the global variable total */
oldtotal = total;
total = oldtotal + result;
/* Print running total so far. */
printf("Total is %f\n",total);
return(0);
}
int main()
{
int pid[N], i, j;
float result=0;
printf("\n"); /* bit of whitespace */
/* We want to loop to create the required number of processes
Note carefully how only the child process is left to run */
for(i=0;i<N;i++)
{
/* Do the fork and catch it if it/one fails */
if((pid[i]=fork())==-1)
{
exit(1);
}
/* now with child we want to do our computation */
else if(pid[i] > 0)
{
/* give a message about the proc ID */
printf("Process Id for process %d is %d\n",i,getpid());
/* call the function to do some computation. If we used sleep
The process would simply sleep. We do not want that */
compute();
/* After we have done our computation we must quit the for
loop otherwise we get a fork bomb! */
break;
}
}
/* nothing else to do so end main function (and program) */
return 0;
}

It depends on your definition of processing power. The classic way is number of instructions per second (MIPS) or floating point operations per second (FLOPs).
Finding MIPS is fiddly because in C code you don't know how many instructions each line of code represents.
You can do a mega-flops calculation though. Loop in C doing a float * float operation of random numbers. See how long it takes to do a lot of calculations (say 109) then calculate how many you did in a second.
Then multiply by the number of processors you have.

Your results show very less difference between the CPU usages for 2 processes and 4 processes so it is almost certainly a quad core processor. Apart from that, not much can be told about the about the speed of the processor with just the percentages. You have also used printf statements which make it even harder to calculate the processing speed since they occasionally flush the buffer.

"Processing power" is a suggestive, yet annoyingly vague term. For most purposes, we don't care about MIPs or FIPs—only for pure number crunching do we give it any mind. Even then, it is as useless a measure for comparing computers as it is to compare automobiles with their maximum RPMs. See BogoMips.
Even with MIPs and FIPs, usually peak performance is measured. Many don't try to determine average values.
Another useful parameter of CPU power is branches per second—which no one measures (but they should). BPS varies considerably based on the intricacies of the CPU architecture, caching, paging, perhaps context switching, and the nature of the branches.
In any useful program, input and output is part of computing power. Hence bandwidth of memory, i/o devices, file systems, and connecting devices, networks, etc. are part of the computer's "computing power".
If you mean a subset of these, please clarify your question.

Related

Fast I/O in c, stdin/out

In a coding competition specified at this link there is a task where you need to read much data on stdin, do some calculations and present a whole lot of data on stdout.
In my benchmarking it is almost only i/o that takes time although I have tried optimizing it as much as possible.
What you have as input is a string (1 <= len <= 100'000) and q rows of pair of int where q also is 1 <= q <= 100'000.
I benchmarked my code on a 100 times larger dataset (len = 10M, q = 10M) and this is the result:
Activity time accumulated
Read text: 0.004 0.004
Read numbers: 0.146 0.150
Parse numbers: 0.200 0.350
Calc answers: 0.001 0.351
Format output: 0.037 0.388
Print output: 0.143 0.531
By implementing my own formating and number parsing inline i managed to get the time down to 1/3 of the time when using printf and scanf.
However when I uploaded my solution to the competitions webpage my solution took 1.88 seconds (I think that is the total time over 22 datasets). When I look in the high-score there are several implementations (in c++) that finished in 0.05 seconds, nearly 40 times faster than mine! How is that possible?
I guess that I could speed it up a bit by using 2 threads, then I can start calculating and writing to stdout while still reading from stdin. This will however decrease the time to min(0.150, 0.143) in a theoretical best case on my large dataset. I'm still nowhere close to the highscore..
In the image below you can see the statistics of the consumed time.
The program gets compiled by the website with this options:
gcc -g -O2 -std=gnu99 -static my_file.c -lm
and timed like this:
time ./a.out < sample.in > sample.out
My code looks like this:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LEN (100000 + 1)
#define ROW_LEN (6 + 1)
#define DOUBLE_ROW_LEN (2*ROW_LEN)
int main(int argc, char *argv[])
{
int ret = 1;
// Set custom buffers for stdin and out
char stdout_buf[16384];
setvbuf(stdout, stdout_buf, _IOFBF, 16384);
char stdin_buf[16384];
setvbuf(stdin, stdin_buf, _IOFBF, 16384);
// Read stdin to buffer
char *buf = malloc(MAX_LEN);
if (!buf) {
printf("Failed to allocate buffer");
return 1;
}
if (!fgets(buf, MAX_LEN, stdin))
goto EXIT_A;
// Get the num tests
int m ;
scanf("%d\n", &m);
char *num_buf = malloc(DOUBLE_ROW_LEN);
if (!num_buf) {
printf("Failed to allocate num_buffer");
goto EXIT_A;
}
int *nn;
int *start = calloc(m, sizeof(int));
int *stop = calloc(m, sizeof(int));
int *staptr = start;
int *stpptr = stop;
char *cptr;
for(int i=0; i<m; i++) {
fgets(num_buf, DOUBLE_ROW_LEN, stdin);
nn = staptr++;
cptr = num_buf-1;
while(*(++cptr) > '\n') {
if (*cptr == ' ')
nn = stpptr++;
else
*nn = *nn*10 + *cptr-'0';
}
}
// Count for each test
char *buf_end = strchr(buf, '\0');
int len, shift;
char outbuf[ROW_LEN];
char *ptr_l, *ptr_r, *out;
for(int i=0; i<m; i++) {
ptr_l = buf + start[i];
ptr_r = buf + stop[i];
while(ptr_r < buf_end && *ptr_l == *ptr_r) {
++ptr_l;
++ptr_r;
}
// Print length of same sequence
shift = len = (int)(ptr_l - (buf + start[i]));
out = outbuf;
do {
out++;
shift /= 10;
} while (shift);
*out = '\0';
do {
*(--out) = "0123456789"[len%10];
len /= 10;
} while(len);
puts(outbuf);
}
ret = 0;
free(start);
free(stop);
EXIT_A:
free(buf);
return ret;
}
Thanks to your question, I went and solved the problem myself. Your time is better than mine, but I'm still using some stdio functions.
I simply do not think the high score of 0.05 seconds is bona fide. I suspect it's the product of a highly automated system that returned that result in error, and that no one ever verified it.
How to defend that assertion? There's no real algorithmic complexity: the problem is O(n). The "trick" is to write specialized parsers for each aspect of the input (and avoid work done only in debug mode). The total time for 22 trials is 50 milliseconds, meaning each trial averages 2.25 ms? We're down near the threshold of measurability.
Competitions like the problem you addressed yourself to are unfortunate, in a way. They reinforce the naive idea that performance is the ultimate measure of a program (there's no score for clarity). Worse, they encourage going around things like scanf "for performance" while, in real life, getting a program to run correctly and fast basically never entails avoiding or even tuning stdio. In a complex system, performance comes from things like avoiding I/O, passing over the data only once, and minimizing copies. Using the DBMS effectively is often key (as it were), but such things never show up in programming challenges.
Parsing and formatting numbers as text does take time, and in rare circumstances can be a bottleneck. But the answer is hardly ever to rewrite the parser. Rather, the answer is to parse the text into a convenient binary form, and use that. In short: compilation.
That said, a few observations may help.
You don't need dynamic memory for this problem, and it's not helping. The problem statement says the input array may be up to 100,000 elements, and the number of trials may be as many as 100,000. Each trial is two integer strings of up to 6 digits each separated by a space and terminated by a newline: 6 + 1 + 6 + 1 = 14. Total input, maximum is 100,000 + 1 + 6 + 1 + 100,000 * 14: under 16 KB. You are allowed 1 GB of memory.
I just allocated a single 16 KB buffer, and read it in all at once with read(2). Then I made a single pass over that input.
You got suggestions to use asynchronous I/O and threads. The problem statement says you're measured on CPU time, so neither of those help. The shortest distance between two points is a straight line; a single read into statically allocated memory wastes no motion.
One ridiculous aspect of the way they measure performance is that they use gcc -g. That means assert(3) is invoked in code that is measured for performance! I couldn't get under 4 seconds on test 22 until I removed the my asserts.
In sum, you did pretty well, and I suspect the winner you're baffled by is a phantom. Your code does faff about a bit, and you can dispense with dynamic memory and tuning stdio. I bet your time can be trimmed by simplifying it. To the extent that performance matters, that's where I'd direct your attention.
You should allocate all your buffers continuously.
Allocate a buffer which is the size of all your buffers (num_buff, start, stop) then rearrange the points to the corresponding offsets by their size.
This can reduce your cache miss \ page faults.
Since the read and the write operation seems to consume a lot of time you should consider adding threads. One thread should deal with I\O and another should deal with the computation. (It is worth checking if another thread for prints could speed things up as well). Make sure you don't use any locks while doing this.
Answering this question is tricky because optimization heavily depends on the problem you have.
One idea is to look at the content of the file you are trying to read and see if there patterns or things that you can use in your favor.
The code you wrote is a "general" solution for reading from a file, executing something and then writing to a file. But if you the file is not randomly generated each time and the content is always the same why not try to write a solution for that file?
On the other hand, you could try to use low-level system functions. One that comes to my thinking is mmap which allows you to map a file directly to memory and access that memory instead of using scanf and fgets.
Another thing I found that might help is in your solutin you are having two while loops, why not try and use only one? Another thing would be to do some Asynchronous I/O reading, so instead of reading the whole file in a loop, and then doing the calculation in another loop, you can try and read a portion at the beginning, start processing it async and continue reading.
This link might help for the async part

Relative C code execution time analysis using system monotonic clock

Is there is a simple but sure way to measure the relative differences in performance between two algorithm implementations in C programs. More specifically, I want to compare the performance of implementation A vs. B? I'm thinking of a scheme like this:
In a unit test program:
start timer
call function
stop timer
get difference between start stop time
Run the scheme above for a pair of functions A and B, then get a percentage difference in execution time to determine which is faster.
Upon doing some research I came across this question about using a Monotonic clock on OSX in C, which apparently can give me at least nanosecond precision. To be clear, I understand that precise, controlled measurements are hard to perform, like what's discussed in "With O(N) known and system clock known, can we calculate the execution time of the code?, which I assume should be irrelevant in this case because I only want a relative measurement.
Everything considered, is this a sufficient and valid approach towards the kind of analysis I want to perform? Are there any details or considerations I might be missing?
The main modification I make to the timing scheme you outline is to ensure that the same timing code is used for both functions — assuming they do have an identical interface, by passing a function pointer to skeletal code.
As an example, I have some code that times some functions that validate whether a given number is prime. The control function is:
static void test_primality_tester(const char *tag, int seed, int (*prime)(unsigned), int count)
{
srand(seed);
Clock clk;
int nprimes = 0;
clk_init(&clk);
clk_start(&clk);
for (int i = 0; i < count; i++)
{
if (prime(rand()))
nprimes++;
}
clk_stop(&clk);
char buffer[32];
printf("%9s: %d primes found (out of %d) in %s s\n", tag, nprimes,
count, clk_elapsed_us(&clk, buffer, sizeof(buffer)));
}
I'm well aware of srand() — why call it once?, but the point of using srand() once each time this function is called is to ensure that the tests process the same sequence of random numbers. On macOS, RAND_MAX is 0x7FFFFFFF.
The type Clock contain analogues to two struct timespec structures, for the start and stop time. The clk_init() function initializes the structure; clk_start() records the start time in the structure; clk_stop() records the stop time in the structure; and clk_elapsed_us() calculates the elapsed time between the start and stop times in microseconds. The package is written to provide me with cross-platform portability (at the cost of some headaches in determining which is the best sub-second timing routine available at compile time).
You can find my code for timers on Github in the repository https://github.com/jleffler/soq, in the src/libsoq directory — files timer.h and timer.c. The code has not yet caught up with macOS Sierra having clock_gettime(), though it could be compiled to use it with -DHAVE_CLOCK_GETTIME as a command-line compiler option.
This code was called from a function one_test():
static void one_test(int seed)
{
printf("Seed; %d\n", seed);
enum { COUNT = 10000000 };
test_primality_tester("IsPrime1", seed, IsPrime1, COUNT);
test_primality_tester("IsPrime2", seed, IsPrime2, COUNT);
test_primality_tester("IsPrime3", seed, IsPrime3, COUNT);
test_primality_tester("isprime1", seed, isprime1, COUNT);
test_primality_tester("isprime2", seed, isprime2, COUNT);
test_primality_tester("isprime3", seed, isprime3, COUNT);
}
And the main program can take one or a series of seeds, or uses the current time as a seed:
int main(int argc, char **argv)
{
if (argc > 1)
{
for (int i = 1; i < argc; i++)
one_test(atoi(argv[i]));
}
else
one_test(time(0));
return(0);
}

Erratic average execution times for C function

I'm trying to optimize a chunk of code given to me by a friend but my baseline for average execution times of it are extremely erratic and I'm lost to as why/how to fix it.
Code:
#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include "wall.h" /* Where his code is */
int main()
{
int average;
struct timeval tv;
int i;
for(i = 0; i < 1000; i++) /* Running his code 1,000 times */
{
gettimeofday(&tv, NULL); /* Starting time */
start(); /* Launching his code */
int ret = tv.tv_usec; /* Finishing time */
ret /= 1000; /* Converting to milliseconds */
average += ret; /* Adding to the average */
}
printf("Average execution time: %d milliseconds\n", average/1000);
return 0;
}
Output of 5 different runs:
804 milliseconds
702 milliseconds
394 milliseconds
642 milliseconds
705 milliseconds
I've tried multiple different ways of getting the average execution time, but each one either doesn't give me a precise enough answer or gives me a completely erratic one. I'm lost as to what to do now, any help would be greatly appreciated!
I know these types of benchmarks are very much system dependent, so I've listed my system specs below:
Ubuntu 12.10 x64
7.8 GiB RAM
Intel Core i7-3770 CPU # 3.40GHz x 8
GeForce GT 620/PCIe/SSE2
Edit
Thank you all for your input but I decided to go with a gprof instead of constructing my own. Thank you, once again!
Your line int ret = tv.tv_usec; /* Finishing time */ doesn't give you the finishing time, it's still the starting time. You should make a second struct timeval, call gettimeofday with that and compare the two.
However, using clock() is probably easier. Of course, if you want to really analyse the performance of your code use a profiler.
There are several problems here, including zero details on what the code you're benchmarking is doing, and that you're using "gettimeofday()" incorrectly (and, perhaps, inappropriately).
SUGGESTIONS:
1) Don't use "gettimeofday()":
http://blog.habets.pp.se/2010/09/gettimeofday-should-never-be-used-to-measure-time
2) Supplement your "time elapsed" with gprof:
http://www.cs.duke.edu/~ola/courses/programming/gprof.html

Split text files across threads

The problem: I have a few text files (10) with numbers in them on every line. I need to have them split across some threads I create using the pthread library. These threads that are created (worker threads) are to find the largest prime number that gets sent to them (and over all the largest prime from all of the text files).
My current thoughts on solutions: I am thinking myself to have two arrays and all of the text files in one array and the other array will contain a binary file that I can read say 1000 lines and send the pointer to the index of that binary file in a struct that contains the id, file pointer, and file position and let it crank through that.
A little bit of what I am talking about:
pthread_create(&threads[index],NULL,workerThread,(void *)threadFields[index]);//Pass struct to each worker
Struct:
typedef struct threadFields{
int *id, *position;
FILE *Fin;
}tField;
If anyone has any insight or a better solution it would be greatly appreciated
EDIT:
Okay so I found a solution to my problem and I believe it is similar to what SaveTheRbtz suggested. Here is what I implemented:
I took the files and merged them in to 1 binary file and kept tack of it in the loop (I had to account for how many bytes each entry was, this was hard-coded)
struct threadFields *info = threadStruct;
int index;
int id = info->id;
unsigned int currentNum = 0;
int Seek = info->StartPos;
unsigned int localLargestPrime = 0;
char *buffer = malloc(50);
int isPrime = 0;
while(Seek<info->EndPos){
for(index = 0; index < 1000; index++){//Loop 1000 times
fseek(fileOut,Seek*sizeof(char)*20, SEEK_SET);
fgets(buffer,20,fileOut);
Seek++;
currentNum = atoi(buffer);
if(currentNum>localLargestPrime && currentNum > 0){
isPrime = ChkPrim(currentNum);
if( isPrime == 1)
localLargestPrime = currentNum;
}
}
Can you do ten threads, each of which processes a file specified as an argument. Each thread will read its own file, checking whether the value is larger than the largest prime it has recorded so far, and if so, checking that the new number is prime. Then, when its finished, it can return the prime to the coordinator thread. The coordinator threads sits back and waits for the threads to finish, collecting the largest prime from each thread, and only keeping the largest. You can probably use 0 as a sentinel value to indicate 'no primes found (yet)'.
Let's say I wanted 11 threads instead of 10; how would I split the workload then?
I'd have the 11th thread do pthread_exit() immediately. If you want to make coordination problems for yourself, you can, but why make life harder than you have to.
If you absolutely must have 11 threads process 10 files and divvy up the work, then I suppose I would probably have set of 10 file streams initially in a queue. The threads would wait on a condition 'queue not empty' to get a file stream (mutexes and conditions and all that). When a thread acuires a file stream, it would read one number from the file and push the stream back onto the queue (signalling queue not empty), then process the number. On EOF, a thread would close the file and not push it back onto the queue (so the threads have to detect 'no file streams left with unread data'). This means that each thread would read about one eleventh of the data, depending on how long the prime calculation takes for the numbers it actually reads. That's much, much, much trickier to code than a simple one thread per file solution, but it scales (more or less) to an arbitrary number of threads and files. In particular, it could be used to have 7 threads process 10 files, as well as having 17 threads process 10 files.
Looks like a job for message queue:
Set of "supplier" threads which split data into chunks
and put then to the queue. In your case chunk can be represented with file name or
(fd, offset, size) tuple. For simplicity there can be one such
supplier.
Number of "worker" threads that pull data from input
queue, process it and put results to another queue. For performance
reasons there usually many workers, for example if your task is
CPU-intensive then sysconf(_SC_NPROCESSORS_ONLN) should be a good
choice.
One "aggregator" thread that "reduces" result queue to single value. For your case it's simple max() function.
This is highly scalable solution will enable you to easily combine many different kinds of processing stages into easily understandable pipeline.

Is it possible to create multi-threading program for PIC12 MCU in HI-TECH C

My friend asks me to help him write a small program for PIC12 MCU. We want
The program stops running when input voltage is less than 1.9V during 2 seconds.
The program reacts immediately when input voltage exceeds 2.5V.
I try to solve the 1st problem by reading and comparing the system's timestamp:
#include <time.h>
... ...
time_t beg, end;
beg = 0;
end = 0;
while(1){
if(INP_VOL < 1.9){
if(beg == 0){
/* Read timestamp when voltage < 1.9 */
gmtime(&beg);
}
/* Campare timestamp */
gmtime(&end);
if(end - beg > 2){
break; /* !!stop running!! */
}
}
else{
/* if voltage back to normal, reset beg timestamp. */
beg = 0;
}
}
I've found the function gmtime(time_t *) in PIC12 User Manual, but I'm not sure if it a good solution.
But I can't figure out how to solve the 2nd problem. It should be kind of independent thread which moniters the input voltage during the execution of the program. And the program should react immediately (by calling an other function) before the circuit is damaged.
I'm computer programmer, but I've never coded for MCU. I'd like to know if it's possible to do such thing in HI-TECH C ?
The typical thing to do here is to use interrupts, specifically timer interrupts.
You set up an interrupt to run e.g. every 1 ms, and in that interrupt code you do whatever the program needs to react quickly to. That leaves the normal execution flow alone, and emulates that the two tasks are done in parallel.
You could have a circuit attached to the external interrupt pin, that gives 1 when voltage goes above 2.5. The external interrupt can be programmed to kick whenever its input goes from 0 to 1.
I don't think C language is the best solutions for PIC12 family.
My suggest is to use ASM. It's very simple by few instructions.
After set the ADC you can use substraction instruction and check the C (carry)
In this manner you can verify IF > or IF <
Test C and skip if zero. Skip the next instruction, the one with the call.
you can also change micro and use PIC18 for better c code performance.

Resources