C Program slowing down after a couple read-ins - c

So i have to write a C program for class, that solves a connect four like game, that roughly said resolves matches of same colored pieces, lets pieces above fall down, then checks again for matches, if no matches were found it drops the next piece and so on.
My given input is a text file, where each line holds information about the color and the x-position, where the piece is supposed to fall down.
My implementation is based on AVL-trees and my program works fine for inputs roughly up to 2.000.000 lines and solves it in about 3 seconds.
One bigger example file i have to solve is roughtly 2.600.000 lines long but doesnt terminate.
The program reads in about 1.000.000 lines per second but right after about the 2.000.000 mark its slows down tremendously and only reads in a couple 100 lines per second.
Its one of my first "bigger" projects and has about 900 lines of code, so i dont see the point in posting it here (unless someone really wants to see it)
I am really clueless what the cause could be, so maybe someone of you has an idea or can point me in a direction for things i have to look out for.

Add logging with timestamps to spot where the slowdown occurs. You need to isolate the code that has the performance issue.
The first step to getting better is to measure where you currently are. Once you know how long the current process takes, you can decide if a change you make helps, hurts or makes no change.
There's a vast world of software optimization, look around for expensive loops. Read up on big oh notation. If you have a loop that processes a loop that processes a loop O(n^3), what seems fast with 100 iterations each can slow down dramatically when each loop is a million.
If you have a performance profiler it can help organize the timestamps from the logs it creates so that the code that's taking the most time is highlighted. Many times what happens in cases like this is that something that seems good enough when you start out isn't good enough once you start to scale.
Nobody says you have to add log to every single line of your source code. That can be overwhelming. Figuring out what you don't have to focus on can be very helpful.

Related

How to debug against a same loop that generates different outputs?

I'm learning debugging. For this purpose I'm using x64dbg. So I understood how to place breakpoints (according to strings) and how to "block" the execution of the program near the assembly code that interests me.
Unfortunately I came across some strange cases.
In some cases the "flow" is not linear. The flow "follows" a sort of loop. With each passage of this loop "something new" is written. For example. Let's say that the software has to write HELLO. At the first turn just an H appears, at the second turn just a HE, etc etc... and I can see this in the comments column of x64dbg.
The stranger thing is that if the program has to write PIZZA, the program crosses the SAME loop that previously wrote hello HELLO... but this time P appears first, then PI, then PIZZ etc etc… and I can see this in the comments column of x64dbg.
In other words, any operation I perform in the software the SAME piece of code is always executed, this code is executed hundreds of times (in loop) and each step contributes to a little piece of the final result (which changes).
How is it possible ? What should I do ?

Profile C code execution percentage (line-by-line annotation)

I need to profile a couple of C codes and get an annotated file with percentage of exeuction time taken by each line or atleast each block(while/if-else/for-functions) etc.
So far I have looked into valgrind(callgrind) and gperf and some other tools. So far what I get is
Count of each function or line of source Code like how many times it is execution.
Or Percentage of Count.
Or execution time taken by each funtion call.
What I do need however if percentage of execution time not the count and that should be for each line of source code or atleast all blocks(while/if-else/for-functions).
Can someone let me know of a way I can do it ?
Thanks,
I believe perf(1) (part of the linux-tools-common package in Ubuntu) will get you what you want. It makes use of a kernel-based subsystem called Performance counters for Linux, included in newer kernels. More information can be found here.
Simple usage example below. Make sure to compile with debugging symbols.
$ perf record ./myprogram arg1 arg2
$ perf report
Cachegrind might be worth looking into too.
You need something that samples the stack, either on CPU time, or wall-clock time if you want to include I/O.
You don't need particularly high frequency of sampling.
Each stack sample is a list of code locations, some of which can be traced to lines in the code.
The inclusive percentage cost of a line of code is just the number of stack samples that contain it, divided by the total number of stack samples (multiplied by 100).
The usefulness of this is it tells you what percentage of total time you could save if you could get rid of that line.
The exclusive percentage is the fraction of samples where that line appears at the end.
If a line of code appears more than once in a stack sample, that's still just one sample containing it. (That takes care of recursion.)
What tools can do this? Maybe oprofile. Certainly Zoom.

Do I have to avoid appending for performance?

I am new to Golang.
Should I always avoid appending slices?
I need to load a linebreak-separated data file in memory.
With performance in mind, should I count lines, then load all the data in a predefined length array, or can I just append lines to a slice?
You should stop thinking about performance and start measuring what the actual bottleneck of you application is.
Any advice to a question like "Should do/avoid X because of performance?" is useless in 50% of the cases and counterproductive in 25%.
There are a few really general advices like "do not needlessly generate garbage" but your question cannot be answered as this depends a lot on the size of your file:
Your file is ~ 3 Tera byte? Most probably you will have to read it line by line anyway...
Your file has just a bunch (~50) of lines: Probably counting lines first is more work than reallocating a []string slice 4 times (or 0 times you you make([]string,0,100) it initially). A string is just 2 words.
Your file has an unknown but large (>10k) lines: Maybe it might be worth. "Maybe" in the sense you should measure on real data.
Your file is known to be big (>500k lines): Definitively count first, but you might start hitting the problem from the first bullet point.
You see: A general advice for performance is a bad advice so I won't give one.

Seek time vs Sequential read

Let's assume that on a hard drive I have some very large data file of a sequence of characters:
ABRDZ....
My question is as follows, if the head is positioned at the beginning of the file, and I need 5 characters every 1000 positions interval, would it better be to do a Seek (since I know where to look) or simply have a large buffer that just reads sequentially then do job in memory.
Naively I'd have answered that reading 'A' then seek to read 'V' is faster than >> reading all the file until ,say, position 200 (the position of 'V'). Ok, this is just an example, since smallest I/O is 512bytes.
Edit: my previous self-naive-answer is partly justified by the following case: given a 100Gb file I need the first and the last charcters; Here I obviously would do a seek .... right?
Maybe there is a trade off between how "long" is the seek vs how much data to retrieve?
Can someone clarify this to me?
[UPDATE]
Generally, from your original numbers 5 out of every 1000, (Ill assume that the 5 bytes is part of the 1000, thus making your step count 1000), if your step count is less 2x your block size, than my original answer is a pretty good explanation. It does get a bit more tricky once you get past 2x your HD block size, because at that point, you would easily be wasting read times, when you could be speeding up by seeking past un-used (or for that matter un-necessary) HD blocks.
[ORIGINAL]
Well, this is an extremely interesting question, with what I beleive to be an equally interesting answer (also somewhat complex). I think that actually this comes down to a couple of other questions, like how big is the block size you have implemented on your drive (or the drive your software is going to run on). If your block size is 4KB, then the (true) minimum your hard drive will get for you at a time is 4096 bytes. In your case if you truly need 5 chars every 1000, then if you did this with ALL disk IO, then you would be essentially re-reading the same block 4 times, and doing 3 seeks in between (REALLY NOT EFFICIENT).
My personal belief is that you could (if you wanted to be drive efficient) in your code, try to understand what the block size of the drive that you are using is, then use that size number to know how many bytes at a time you should bring into RAM. This way you wouldn't have to have a HUGE RAM buffer, but at the same time not really have to SEEK, nor would you be wasting (or performing) any extra reads.
IS THIS THE MOST EFFICIENT.
I dont think it is the most efficient, but it may be good enough for the performance you need, who knows. I do think that even if the read head is where you want it to be, that if you perform algorithmic work in the middle of each block read, rather than reading the whole file all at once, that you will lose time in waiting for the next rotation of the drive platters. Whereas, if you were to read it all at once, the drive should be able to perform a sequential read of all parts of the file at once. Again not as simple though, as if your file is truly more than 1 block, on a rotational drive, you may suffer IF your drive has not been defragmented as it may have to perform random seeks just to get to the next block.
Sorry, for the long winded answer, but par usual, there is no simple answer in your case.
I do think that overall performance would PROBABLY be better if you simply read the whole file at once. There is no way to assure this, as each system is going to have inherently different parameters of their drive setup, etc...

What should my program do when it sees an integer overflow that does not affect the program run?

There is a small program which takes input from users on a prompt. It takes predefined inputs from the users and executes them.
It also displays a number with the prompt indicating the count of the commands :
myprompt 1) usercommand1
...
myprompt 2) usercommand2
...
...
myprompt 3)
I do not expect the user to give more than 65535 commands at a time, so the count is stored as an unsigned short data.
Problem:
I am not sure how the program should handle the case when the user actually crosses this limit of the number of commands. Should I let the count to roll over to 0 (and keep looping) or to stay put at 65535?
I want the program to still function normally, as in take user inputs and process them just as before. Also, the value of count has no effect at all on the command execution.
I looks like you're tackling a problem that might never occur.
Let's assume your users are quite fast, and it takes them 10 seconds to input a command line. Rollover would happen after 655350 seconds, i.e. approximately seven and a half days.
Let the counter roll over. If that still troubles you, then take the high path and make it an unsigned long. Then it will only roll over after 1361 years (on 32-bit machines).
If you ask yourself this question it means you should go the easy way: make the counter an unsigned int.
How to handle the limit is very dependant on what this counter is used for. My feeling is that it is not used for any really interesting thing so your question is kind of moot. Whichever choice you make it will still work correctly.
On the other hand if this counter as some real use you should ask the user of this counter the correct way to proceed: both have some pros and cons (either counter going back in time or stalling) so your user risk being surprised.
You forgot to mention other alternatives: terminate your program. Or remove the limit and use some form of big integers (GMP lib for example) but this souns overkill.
Note that the DNS choose to wraparound the serial number at 2^32. This makes it usable forever. Users of the counter are supposed to detect the overflow. RFC 1982
To be honest this:
I want the program to still function
normally, as in take user inputs and
process them just as before. Also, the
value of count has no effect at all on
the command execution.
answers your own question, if it has no effect at all then just let it start on 0 again.

Resources