I want to make a function that returns the roots of a polynomial, as part of a basic exercise. Is there a safe way to return a variable number of elements (0, 1 or 2, in this case), i.e. making sure the receiver knows how many items were returned so they can avoid a segfault ?
This is arguably more a question of API design than pure programming, the roots themselves are trivial but getting them there safely puzzles me.
The case of a polynomial is special, as the caller knows how many roots there are at maximum. So the caller knows how much memory is needed to safely store the return value.
In this case it is easiest to let the caller pass a buffer to your function and store the roots in there. Your roots function would then only return the number of roots it found.
The other case (where the function returns freshly allocated memory) is way more complicated, and should be avoided if possible. You would introduce strong dependencies to your function (memory allocator) and this way fix the memory allocation schema. This is quite restraining (and unnecessary for your and perhaps many other use-cases). [1]
This could work:
int roots( double roots_out[], double coefs_in[], int ord)
This would be better (less error prone, even if one demands roots_out must have ord-1 elements)
int roots( double roots_out[], int nroots_max, double coefs_in[], int ord)
ord is the order of the polynomial (not the degree) and indicates how many elements there are in coefs_in
Note that there are still some questions left:
What to return if there are infinitely many roots? (zero polynomial?)
What to do if the buffer is too small? (Your solver may generate duplicate roots due to numerical inaccuracies. In other words nroots_max = ord-1 may be sae from mathematical point of view, but not numerically)
[1]: Also even if you may think this way one would save some memory as the number of roots may be less than the degree, this comes with a caveat. During root calculation one needs a big buffer anyway as the number of roots is not known beforehand. So memory saving is a two step process anyway, and, in my opinion, this task should be delegated to the caller. This is most efficient I guess.
Related
I'm trying to find some effective techniques which I can base my integer-overflow detection tool on. I know there are many ready-made detection tools out there, but I'm trying to implement a simple one on my own, both for my personal interest in this area and also for my knowledge.
I know techniques like Pattern Matching and Type Inference, but I read that more complicated code analysis techniques are required to detect the int overflows. There's also the Taint Analysis which can "flag" un-trusted sources of data.
Is there some other technique, which I might not be aware of, which is capable of detecting integer overflows?
It may be worth to try with cppcheck static analysis tool, that claims to detect signed integer overflow as of version 1.67:
New checks:
- Detect shift by too many bits, signed integer overflow and dangerous sign conversion
Notice that it supports both C and C++ languages.
There is no overflow check for unsigned integers, as by Standard unsigned types never overflow.
Here is some basic example:
#include <stdio.h>
int main(void)
{
int a = 2147483647;
a = a + 1;
printf("%d\n", a);
return 0;
}
With such code it gets:
$ ./cppcheck --platform=unix64 simple.c
Checking simple.c...
[simple.c:6]: (error) Signed integer overflow for expression 'a+1'
However I wouldn't expect too much from it (at least with current version), as slighly different program:
int a = 2147483647;
a++;
passes without noticing overflow.
It seems you are looking for some sort of Value Range Analysis, and detect when that range would exceed the set bounds. This is something that on the face of it seems simple, but is actually hard. There will be lots of false positives, and that's even without counting bugs in the implementation.
To ignore the details for a moment, you associate a pair [lower bound, upper bound] with every variable, and do some math to figure out the new bounds for every operator. For example if the code adds two variables, in your analysis you add the upper bounds together to form the new upper bound, and you add the lower bounds together to get the new lower bound.
But of course it's not that simple. Firstly, what if there is non-straight-line code? if's are not too bad, you can just evaluate both sides and then take the union of the ranges after it (which can lose information! if two ranges have a gap in between, their union will span the gap). Loops require tricks, a naive implementation may run billions of iterations of analysis on a loop or never even terminate at all. Even if you use an abstract domain that has no infinite ascending chains, you can still get into trouble. The keywords to solve this are "widening operator" and (optionally, but probably a good idea) "narrowing operator".
It's even worse than that, because what's a variable? Your regular local variable of scalar type that never has its address taken isn't too bad. But what about arrays? Now you don't even know for sure which entry is being affected - the index itself may be a range! And then there's aliasing. That's far from a solved problem and causes many real world tools to make really pessimistic assumptions.
Also, function calls. You're going to call functions from some context, hopefully a known one (if not, then it's simple: you know nothing). That makes it hard, not only is there suddenly a lot more state to keep track of at the same time, there may be several places a function could be called from, including itself. The usual response to that is to re-evaluate that function when a range of one of its arguments has been expanded, once again this could take billions of steps if not done carefully. There also algorithms that analyze a function differently for different context, which can give more accurate results, but it's easy to spend a lot of time analyzing contexts that aren't different enough to matter.
Anyway if you've made it this far, you could read Accurate Static Branch Prediction by Value Range Propagation and related papers to get a good idea of how to actually do this.
And that's not all. Considering only the ranges of individual variables without caring about the relationships between (keyword: non-relational abstract domain) them does bad on really simple (for a human reader) things such as subtracting two variables that always close together in value, for which it will make a large range, with the assumption that they may be as far apart as their bounds allow. Even for something trivial such as
; assume x in [0 .. 10]
int y = x + 2;
int diff = y - x;
For a human reader, it's pretty obvious that diff = 2. In the analysis described so far, the conclusions would be that y in [2 .. 12] and diff in [-8, 12]. Now suppose the code continues with
int foo = diff + 2;
int bar = foo - diff;
Now we get foo in [-6, 14] and bar in [-18, 22] even though bar is obviously 2 again, the range doubled again. Now this was a simple example, and you could make up some ad-hoc hacks to detect it, but it's a more general problem. This effect tends to blow up the ranges of variables quickly and generate lots of unnecessary warnings. A partial solution is assigning ranges to differences between variables, then you get what's called a difference-bound matrix (unsurprisingly this is an example of a relational abstract domain). They can get big and slow for interprocedual analysis, or if you want to throw non-scalar variables at them too, and the algorithms start to get more complicated. And they only get you so far - if you throw a multiplication in the mix (that includes x + x and variants), things still go bad very fast.
So you can throw something else in the mix that can handle multiplication by a constant, see for example Abstract Domains of Affine Relations⋆ - this is very different from ranges, and won't by itself tell you much about the ranges of your variables, but you could use it to get more accurate ranges.
The story doesn't end there, but this answer is getting long. I hope this does not discourage you from researching this topic, it's a topic that lends itself well to starting out simple and adding more and more interesting things to your analysis tool.
Checking integer overflows in C:
When you add two 32-bit numbers and get a 33-bit result, the lower 32 bits are written to the destination, with the highest bit signaled out as a carry flag. Many languages including C don't provide a way to access this 'carry', so you can use limits i.e. <limits.h>, to check before you perform an arithmetic operation. Consider unsigned ints a and b :
if MAX - b < a, we know for sure that a + b would cause an overflow. An example is given in this C FAQ.
Watch out: As chux pointed out, this example is problematic with signed integers, because it won't handle MAX - b or MIN + b if b < 0. The example solution in the second link (below) covers all cases.
Multiplying numbers can cause an overflow, too. A solution is to double the length of the first number, then do the multiplication. Something like:
(typecast)a*b
Watch out: (typecast)(a*b) would be incorrect because it truncates first then typecasts.
A detailed technique for c can be found HERE. Using macros seems to be an easy and elegant solution.
I'd expect Frama-C to provide such a capability. Frama-C is focused on C source code, but I don't know if it is dialect-sensitive or specific. I believe it uses abstract interpretation to model values. I don't know if it specifically checks for overflows.
Our DMS Software Reengineering Toolkit has variety of langauge front ends, including most major dialects of C. It provides control and data flow analysis, and also abstract interpretation for computing ranges, as foundations on which you can build an answer. My Google Tech Talk on DMS at about 0:28:30 specifically talks about how one can use DMS's abstract interpretation on value ranges to detect overflow (of an index on a buffer). A variation on checking the upper bound on array sizes is simply to check for values not exceeding 2^N. However, off the shelf DMS does not provide any specific overflow analysis for C code. There's room for the OP to do interesting work :=}
I've noticed that it is very common (especially in interview questions and homework assignments) to implement a dynamic array; typically, I see the question phrased as something like:
Implement an array which doubles in capacity when full
Or something very similar. They almost always (in my experience) use the word double explicitly, rather than a more general
Implement an array which increases in capacity when full
My question is, why double? I understand why it would be a bad idea to use a constant value (thanks to this question) but it seems like it makes more sense to use a larger multiple than double; why not triple the capacity, or quadruple it, or square it?
To be clear, I'm not asking how to double the capacity of an array, I'm asking why doubling is the convention.
Yes, it is common practice.
Doubling is a good way to manage memory. Heap management algorithms are often based on the classic Buddy System, its an easy way to deal with addressing and coalescing and other challenges. Knowing this, it is good to stick with multiples of 2 when dealing with allocation (though there are hybrid algorithms, like slab allocator, to help with fragmentation, so it isn't so important as it once was to use the multiple).
Knuth covers it in one of his books that I have but forgot the title.
See http://en.wikipedia.org/wiki/Buddy_memory_allocation
Another reason to double an array size is about the addition cost. You don't want each Add() operation to trigger a reallocation call. If you've filled N slots, there is a good chance you'll need some multiple of N anyway, history is a good indicator of future needs, so the object needs to "graduate" to the next arena size. By doubling, the frequency of reallocation falls off logarithmically (Log N). Doubling is just the most convenient multiple (being the smallest whole multiplier it is more memory efficient than 3*N or 4*N, plus it tends to follow heap memory management models closely).
The reason behind doubling is that it turns repeatedly appending an element into an amortized O(1) operation. Put another way, appending n elements takes O(n) time.
More accurately, increasing by any multiplicative factor achieves that, but doubling is a common choice. I've seen other choices, such as in increasing by a factor of 1.5.
In the program I'm working on, this particular operation is definitely not going to be the bottleneck, but it did get me thinking. From the answers to questions such as this one and this one I've learned two ways to easily (efficiently) set all the elements of an array to zero in C:
double myArray[3];
static const double zeroes[3] = {0};
memcpy(myArray, zeroes, sizeof(zeroes));
and
double myArray[3];
memset(myArray, 0, numberOfElementsInMyArray * sizeof(myArray[0]));
Before I move onto my real question: I'm not entirely sure but based on the information I've read, I assume this method would, at least in principle, fill the array with int zeroes (well, unsigned char's but these seem to be fairly equivalent). Is that correct? If so, is an explicit conversion of the int zeroes to double zeroes necessary or is it done implicitly if myArray is declared as an array of double's?
Anyway, my real question is this: if the array isn't very big at all (like the myArray I've declared above), is either of these methods still preferred over a little loop? What if you have a few arrays of the same small size that all need to be assigned zeroes? If commented properly, do you think readability is a factor in the decision and favours a particular solution?
Just to be entirely clear: I am not looking to initialize an array to zeroes.
If it's just a small array (like three elements), it probably won't make much difference whether you use mem* functions, or a loop, or three distinct assignments. In fact, that latter case may even be faster as you're not suffering the cost of a function call:
myArry[0] = myArray[1] = myArray[2] = 0;
But, even if one is faster, the difference would probably not be worth worrying about. I tend to optimise for readability first then, if needed, optimise for space/storage later.
If it was a choice between memcpy and memset, I'd choose the latter (assuming, as seems to be the case, that the all-zero bit pattern actually represented 0.0 in your implementation) for two reasons:
it doesn't require storage of a zeroed array; and
the former will get you into trouble if you change the size of one array and forget the other.
And, for what it's worth, your memset solution doesn't need to have the multiplication. Since you can get the size of the entire array, you can just do:
memset (myArray, 0, sizeof (myArray));
i think the first method of setting without using a loop is better for performance
what happen is that the merroy of array is bitwised by 0 (& 0) so it faster than using a loop for each element in the array.
Just a quick question: What are people's practices when you have to define the (arbitrary) maximum that some array can take in C. So, some people just choose a round number hoping it will be big enough, others the prime number closer to the round number (!), etc., other some more esoteric number, like the prime number closer to... and so on.
I'm wondering, then, what are some best practices for deciding such values?
Thanks.
There is no general rule. Powers of twos work for buffers, I use 1024 quite often for string buffers in C but any other number would work. Prime numbers are useful for hash tables where simple modulo-hashing works well with prime-number sizes. Of course you define the size as a symbolic constant so that you can change it later.
If I can't pin down a reasonable maximum I tend to use malloc and realloc to grow the array as needed. Using a fixed size array when you can't gurantee that it is large enough for the intended purpose is hazardous.
Best practice is to avoid arbitrary limits whenever possible.
It's not always possible, so second-best practice is to take an educated estimate of the largest thing that the array is ever likely to need to hold, and then round up by a healthy margin, at least 25%. I tend to prefer powers of ten when I do this, because it makes it obvious on inspection that the number is an arbitrary limit. (Powers of two also often signify that, but only if the reader recognizes the number as a power of two, and most readers-of-code don't have that table memorized much past 216. If there's a good reason to use a power of two and it needs to be bigger than that, write it in hex. End of digression.) Always document the reasoning behind your estimate of the largest thing the array needs to hold, even if it's as simple as "anyone with a single source file bigger than 2GB needs to rethink their coding style" (actual example)
Don't use a prime number unless you specifically need the properties of a prime number (e.g. as Juho mentions, for hash tables -- but you only need that there if your hash function isn't very good -- but often it is, unfortunately.) When you do, document that you are intentionally using prime numbers and why, because most people do not recognize prime numbers on sight or know why they might be necessary in a particular situation.
If I need to do this I usually go with either a power of two, or for larger data sets, the number of pages required to hold the data. Most of the time though I prefer to allocate a chunk of memory on the heap and then realloc if the buffer size is insufficient later.
I only define a maximum when I have a strong reason for a particular number to be the maximum. Otherwise, I size it dynamically, perhaps with a sanity-check maximum (e.g. a person's name should not be several megabytes long).
Round numbers (powers of 2) are used because they are often easy for things like malloc to use (many implementations keep up with memory in blocks of various power of two sizes), easier for linkers to use (in the case of static or global arrays), and also because you can use bitwise operations to test for limits of them, which are often faster than < and >.
Prime numbers are used because using prime number sized hash tables is supposed to avoid collision.
Many people likely use both prime number and power of two sizes for things in cases where they don't actually provide any benefit, though.
It really isn't possible to predict at the outset what the maximum size could be.
For example, I coded a small cmdline interpreter, where each line of output produced was stored in a char array of size 200. Sufficient for all possible outputs, don't you think?
That was until I issued the env command which had a line with ~ 400 characters(!).
LS_COLORS='no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;
05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;
32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;
31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;
35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:';
Moral of the story: Try to use dynamic allocation as far as possible.
I have a circular, statically allocated buffer in C, which I'm using as a queue for a depth breadth first search. I'd like have the top N elements in the queue sorted. It would be easy to just use a regular qsort() - except it's a circular buffer, and the top N elements might wrap around. I could, of course, write my own sorting implementation that uses modular arithmetic and knows how to wrap around the array, but I've always thought that writing sorting functions is a good exercise, but something better left to libraries.
I thought of several approaches:
Use a separate linear buffer - first copy the elements from the circular buffer, then apply qsort, then copy them back. Using an additional buffer means an additional O(N) space requirement, which brings me to
Sort the "top" and "bottom" halve using qsort, and then merge them using the additional buffer
Same as 2. but do the final merge in-place (I haven't found much on in-place merging, but the implementations I've seen don't seem worth the reduced space complexity)
On the other hand, spending an hour contemplating how to elegantly avoid writing my own quicksort, instead of adding those 25 (or so) lines might not be the most productive either...
Correction: Made a stupid mistake of switching DFS and BFS (I prefer writing a DFS, but in this particular case I have to use a BFS), sorry for the confusion.
Further description of the original problem:
I'm implementing a breadth first search (for something not unlike the fifteen puzzle, just more complicated, with about O(n^2) possible expansions in each state, instead of 4). The "bruteforce" algorithm is done, but it's "stupid" - at each point, it expands all valid states, in a hard-coded order. The queue is implemented as a circular buffer (unsigned queue[MAXLENGTH]), and it stores integer indices into a table of states. Apart from two simple functions to queue and dequeue an index, it has no encapsulation - it's just a simple, statically allocated array of unsigned's.
Now I want to add some heuristics. The first thing I want to try is to sort the expanded child states after expansion ("expand them in a better order") - just like I would if I were programming a simple best-first DFS. For this, I want to take part of the queue (representing the most recent expanded states), and sort them using some kind of heuristic. I could also expand the states in a different order (so in this case, it's not really important if I break the FIFO properties of the queue).
My goal is not to implement A*, or a depth first search based algorithm (I can't afford to expand all states, but if I don't, I'll start having problems with infinite cycles in the state space, so I'd have to use something like iterative deepening).
I think you need to take a big step back from the problem and try to solve it as a whole - chances are good that the semi-sorted circular buffer is not the best way to store your data. If it is, then you're already committed and you will have to write the buffer to sort the elements - whether that means performing an occasional sort with an outside library, or doing it when elements are inserted I don't know. But at the end of the day it's going to be ugly because a FIFO and sorted buffer are fundamentally different.
Previous answer, which assumes your sort library has a robust and feature filled API (as requested in your question, this does not require you to write your own mod sort or anything - it depends on the library supporting arbitrary located data, usually through a callback function. If your sort doesn't support linked lists, it can't handle this):
The circular buffer has already solved this problem using % (mod) arithmetic. QSort, etc don't care about the locations in memory - they just need a scheme to address the data in a linear manner.
They work as well for linked lists (which are not linear in memory) as they do for 'real' linear non circular arrays.
So if you have a circular array with 100 entries, and you find you need to sort the top 10, and the top ten happen to wrap in half at the top, then you feed the sort the following two bits of information:
The function to locate an array item is (x % 100)
The items to be sorted are at locations 95 to 105
The function will convert the addresses the sort uses into an index used in the real array, and the fact that the array wraps around is hidden, although it may look weird to sort an array past its bounds, a circular array, by definition, has no bounds. The % operator handles that for you, and you might as well be referring to the part of the array as 1295 to 1305 for all it cares.
Bonus points for having an array with 2^n elements.
Additional points of consideration:
It sounds to me that you're using a sorting library which is incapable of sorting anything other than a linear array - so it can't sort linked lists, or arrays with anything other than simple ordering. You really only have three choices:
You can re-write the library to be more flexible (ie, when you call it you give it a set of function pointers for comparison operations, and data access operations)
You can re-write your array so it somehow fits your existing libraries
You can write custom sorts for your particular solution.
Now, for my part I'd re-write the sort code so it was more flexible (or duplicate it and edit the new copy so you have sorts which are fast for linear arrays, and sorts which are flexible for non-linear arrays)
But the reality is that right now your sort library is so simple you can't even tell it how to access data that is non linearly stored.
If it's that simple, there should be no hesitation to adapting the library itself to your particular needs, or adapting your buffer to the library.
Trying an ugly kludge, like somehow turning your buffer into a linear array, sorting it, and then putting it back in is just that - an ugly kludge that you're going to have to understand and maintain later. You're going to 'break' into your FIFO and fiddle with the innards.
-Adam
I'm not seeing exactly the solution you asked for in c. You might consider one of these ideas:
If you have access to the source for your libc's qsort(), you might copy it and simply replace all the array access and indexing code with appropriately generalized equivalents. This gives you some modest assurance that the underling sort is efficient and has few bugs. No help with the risk of introducing your own bugs, of course. Big O like the system qsort, but possibly with a worse multiplier.
If the region to be sorted is small compared to the size of the buffer, you could use the straight ahead linear sort, guarding the call with a test-for-wrap and doing the copy-to-linear-buffer-sort-then-copy-back routine only if needed. Introduces an extra O(n) operation in the cases that trip the guard (for n the size of the region to be sorted), which makes the average O(n^2/N) < O(n).
I see that C++ is not an option for you. ::sigh:: I will leave this here in case someone else can use it.
If C++ is an option you could (subclass the buffer if needed and) overload the [] operator to make the standard sort algorithms work. Again, should work like the standard sort with a multiplier penalty.
Perhaps a priority queue could be adapted to solve your issue.'
You could rotate the circular queue until the subset in question no longer wraps around. Then just pass that subset to qsort like normal. This might be expensive if you need to sort frequently or if the array element size is very large. But if your array elements are just pointers to other objects then rotating the queue may be fast enough. And in fact if they are just pointers then your first approach might also be fast enough: making a separate linear copy of a subset, sorting it, and writing the results back.
Do you know about the rules regarding optimization? You can google them (you'll find a few versions, but they all say pretty much the same thing, DON'T).
It sounds like you are optimizing without testing. That's a huge no-no. On the other hand, you're using straight C, so you are probably on a restricted platform that requires some level of attention to speed, so I expect you need to skip the first two rules because I assume you have no choice:
Rules of optimization:
Don't optimize.
If you know what you are doing, see rule #1
You can go to the more advanced rules:
Rules of optimization (cont):
If you have a spec that requires a certain level of performance, write the code unoptimized and write a test to see if it meets that spec. If it meets it, you're done. NEVER write code taking performance into consideration until you have reached this point.
If you complete step 3 and your code does not meet the specs, recode it leaving your original "most obvious" code in there as comments and retest. If it does not meet the requirements, throw it away and use the unoptimized code.
If your improvements made the tests pass, ensure that the tests remain in the codebase and are re-run, and that your original code remains in there as comments.
Note: that should be 3. 4. 5. Something is screwed up--I'm not even using any markup tags.
Okay, so finally--I'm not saying this because I read it somewhere. I've spent DAYS trying to untangle some god-awful messes that other people coded because it was "Optimized"--and the really funny part is that 9 times out of 10, the compiler could have optimized it better than they did.
I realize that there are times when you will NEED to optimize, all I'm saying is write it unoptimized, test and recode it. It really won't take you much longer--might even make writing the optimized code easier.
The only reason I'm posting this is because almost every line you've written concerns performance, and I'm worried that the next person to see your code is going to be some poor sap like me.
How about somthing like this example here. This example easely sorts a part or whatever you want without having to redefine a lot of extra memory.
It takes inly two pointers a status bit and a counter for the for loop.
#define _PRINT_PROGRESS
#define N 10
BYTE buff[N]={4,5,2,1,3,5,8,6,4,3};
BYTE *a = buff;
BYTE *b = buff;
BYTE changed = 0;
int main(void)
{
BYTE n=0;
do
{
b++;
changed = 0;
for(n=0;n<(N-1);n++)
{
if(*a > *b)
{
*a ^= *b;
*b ^= *a;
*a ^= *b;
changed = 1;
}
a++;
b++;
}
a = buff;
b = buff;
#ifdef _PRINT_PROGRESS
for(n=0;n<N;n++)
printf("%d",buff[n]);
printf("\n");
}
#endif
while(changed);
system( "pause" );
}