Why does C support negative array indices? - c

From this post in SO, it is clear that C supports negative indices.
Why support such a potential memory violation in a program?
Shouldn't the compiler throw a Negative Index warning at least? (am using GCC)
Or is this calculation done in runtime?
EDIT 1: Can anybody hint at its uses?
EDIT 2: for 3.) Using counters of loops in [] of arrays/pointers indicates Run-time Calculation of Indices.

The calculation is done at runtime.
Negative indices don't necessarily have to cause a violation, and have their uses.
For example, let's say you have a pointer that is currently pointing to the 10th element in an array. Now, if you need to access the 8th element without changing the pointer, you can do that easily by using a negative index of -2.
char data[] = "01234567890123456789";
char* ptr = &data[9];
char c = ptr[-2]; // 7

Here is an example of use.
An Infinite Impulse Response filter is calculated partially from recent previous output values. Typically, there will be some array of input values and an array where output values are to be placed. If the current output element is yi, then yi may be calculated as yi = a0•xi + a1•xi–1 +a2•yi–1 +a3•yi–2.
A natural way to write code for this is something like:
void IIR(float *x, float *y, size_t n)
{
for (i = 0; i < n; ++i)
y[i] = a0*x[i] + a1*x[i-1] + a2*y[i-1] + a3*y[i-2];
}
Observe that when i is zero, y[i-1] and y[i-2] have negative indices. In this case, the caller is responsible for creating an array, setting the initial two elements to “starter values” for the output (often either zero or values held over from a previous buffer), and passing a pointer to where the first new value is to be written. Thus, this routine, IRR, normally receives a pointer into the middle of an array and uses negative indices to address some elements.

Why support such a potential memory violation in a program?
Because it follows the pointer arithmetic, and may be useful in certain case.
Shouldn't the compiler throw a Negative Index warning at least? (am using GCC)
The same reason the compiler won't warn you when you access array[10] when the array has only 10 elements. Because it leaves that work to the programmers.
Or is this calculation done in runtime?
Yes, the calculation is done in runtime.

Elaborating on Taymon's answer:
float arr[10];
float *p = &arr[2];
p[-2]
is now perfectly OK. I haven't seen a good use of negative indices, but why should the standard exclude it if it is in general undecidable whether you are pointing outside of a valid range.

OP: Why support ... a potential memory violation?
It has potential uses, for as OP says it is a potential violation and not certain memory violation. C is about allowing users to do many things, include all the rope they need to hang themselves.
OP: ... throw a Negative Index warning ...
If concerned, use unsigned index or better yet, use size_t.
OP ... calculation done in runtime?
Yes, quite often as in a[i], where i is not a constant.
OP: hint at its uses?
Example: one is processing a point in an array of points (Pt) and want to determine if the mid-point is a candidate for removal as it is co-incident. Assume the calling function has already determined that the Mid is neither the first nor last point.
static int IsCoincident(Pt *Mid) {
Pt *Left = &Mid[-1]; // fixed negative index
Pt *Right = &Mid[+1];
return foo(Left, Mid, Right);
}

Array subscripts are just syntactic sugar for dereferencing of pointers to arbitrary places in memory. The compiler can't warn you about negative indexes because it doesn't know at compile time where a pointer will be pointing to. Any given pointer arithmetic expression might or might not result in a valid address for memory access.

a[b] does the same thing as *(a+b). Since the latter allows the negative b, so does the former.

Example of using negative array indices.
I use negative indices to check message protocols. For example, one protocol format looks like:
<nnn/message/f>
or, equally valid:
<nnn/message>
The parameter f is optional and must be a single character if supplied.
If I want to get to the value of character f, I first get a pointer to the > character:
char * end_ptr = strchr(msg, '>');
char f_char = '1'; /* default value */
Now I check if f is supplied and extract it (here is where the negative array index is used):
if (end_ptr[-2] == '/')
{
f_char = end_ptr[-1];
}
Note that I've left out error checking and other code that is not relevant to this example.

Related

Are indexes easier to vectorize than pointers?

Is there any example (e.g. on https://godbolt.org/ ) where CLang generates worse code when an algorithm expressed by pointer iterations instead of array indexes? E.g. it can vectorize/unfold in one case but can't in the other one?
In simple examples apparently it doesn't matter. Here is a pointer iteration style:
while (len-- > 0) {
*dst++ = *src++;
}
Here is the logically same code in index style:
while (idx != len) {
dst[idx] = src[idx];
idx++;
}
Disregard any UB and/or off by one errors here.
Edit: the argument about indices being sugar is irrelevant, as desugraing doesn't change the algorithm style. So the following pointer based code is still in the index style:
while (idx != len) {
*(dst + idx) = *(src + idx);
idx++;
}
Note that the index-based loop has only 1 changing variable, while the pointer-based loop has 2, and the compiler must infer that they always change together.
You should look at this in the context of https://en.wikipedia.org/wiki/Induction_variable and https://en.wikipedia.org/wiki/Strength_reduction. Pointer style is essentially strength-reduced index-style, as addition is replaced by increments. And this reduction was beneficial for performance for some time, but no longer.
So my question boils down to if there are situations when this strength reduction cannot be performed or reversed by a compiler.
Another possible case is when indexes are not induction variables. So corresponding pointer code includes "arbitrary jumps" and it's somehow harder to transform the loop due to "history" of past iterations.
As long as no overloaded operator [] is involved, a subscript expression is literally defined to be identical to pointer arithmetic followed by dereferencing the result [expr.sub]/1. Thus, as long as both versions are indeed equivalent, compilers should generally be able to optimize both versions equally well (I'd probably go as far as considering a compiler's failure to optimize one but not the other a performance bug). That being said, note that there are lots of subtleties such as the wrap-around behavior of unsigned arithmetic that can make iterating over an index not exactly equivalent to iterating over a pointer…

Dynamically indexing an array in C

Is it possible to create arrays based of their index as in
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y] = someNr;
dynamically/on the run, without creating foo[0...3][0...4]?
If not, is there a data structure that allow me to do something similar to this in C?
No.
As written your code make no sense at all. You need foo to be declared somewhere and then you can index into it with foo[x][y] = someNr;. But you cant just make foo spring into existence which is what it looks like you are trying to do.
Either create foo with correct sizes (only you can say what they are) int foo[16][16]; for example or use a different data structure.
In C++ you could do a map<pair<int, int>, int>
Variable Length Arrays
Even if x and y were replaced by constants, you could not initialize the array using the notation shown. You'd need to use:
int fixed[3][4] = { someNr };
or similar (extra braces, perhaps; more values perhaps). You can, however, declare/define variable length arrays (VLA), but you cannot initialize them at all. So, you could write:
int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y];
for (int i = 0; i < x; i++)
{
for (int j = 0; j < y; j++)
foo[i][j] = someNr + i * (x + 1) + j;
}
Obviously, you can't use x and y as indexes without writing (or reading) outside the bounds of the array. The onus is on you to ensure that there is enough space on the stack for the values chosen as the limits on the arrays (it won't be a problem at 3x4; it might be at 300x400 though, and will be at 3000x4000). You can also use dynamic allocation of VLAs to handle bigger matrices.
VLA support is mandatory in C99, optional in C11 and C18, and non-existent in strict C90.
Sparse arrays
If what you want is 'sparse array support', there is no built-in facility in C that will assist you. You have to devise (or find) code that will handle that for you. It can certainly be done; Fortran programmers used to have to do it quite often in the bad old days when megabytes of memory were a luxury and MIPS meant millions of instruction per second and people were happy when their computer could do double-digit MIPS (and the Fortran 90 standard was still years in the future).
You'll need to devise a structure and a set of functions to handle the sparse array. You will probably need to decide whether you have values in every row, or whether you only record the data in some rows. You'll need a function to assign a value to a cell, and another to retrieve the value from a cell. You'll need to think what the value is when there is no explicit entry. (The thinking probably isn't hard. The default value is usually zero, but an infinity or a NaN (not a number) might be appropriate, depending on context.) You'd also need a function to allocate the base structure (would you specify the maximum sizes?) and another to release it.
Most efficient way to create a dynamic index of an array is to create an empty array of the same data type that the array to index is holding.
Let's imagine we are using integers in sake of simplicity. You can then stretch the concept to any other data type.
The ideal index depth will depend on the length of the data to index and will be somewhere close to the length of the data.
Let's say you have 1 million 64 bit integers in the array to index.
First of all you should order the data and eliminate duplicates. That's something easy to achieve by using qsort() (the quick sort C built in function) and some remove duplicate function such as
uint64_t remove_dupes(char *unord_arr, char *ord_arr, uint64_t arr_size)
{
uint64_t i, j=0;
for (i=1;i<arr_size;i++)
{
if ( strcmp(unord_arr[i], unord_arr[i-1]) != 0 ){
strcpy(ord_arr[j],unord_arr[i-1]);
j++;
}
if ( i == arr_size-1 ){
strcpy(ord_arr[j],unord_arr[i]);
j++;
}
}
return j;
}
Adapt the code above to your needs, you should free() the unordered array when the function finishes ordering it to the ordered array. The function above is very fast, it will return zero entries when the array to order contains one element, but that's probably something you can live with.
Once the data is ordered and unique, create an index with a length close to that of the data. It does not need to be of an exact length, although pledging to powers of 10 will make everything easier, in case of integers.
uint64_t* idx = calloc(pow(10, indexdepth), sizeof(uint64_t));
This will create an empty index array.
Then populate the index. Traverse your array to index just once and every time you detect a change in the number of significant figures (same as index depth) to the left add the position where that new number was detected.
If you choose an indexdepth of 2 you will have 10² = 100 possible values in your index, typically going from 0 to 99.
When you detect that some number starts by 10 (103456), you add an entry to the index, let's say that 103456 was detected at position 733, your index entry would be:
index[10] = 733;
Next entry begining by 11 should be added in the next index slot, let's say that first number beginning by 11 is found at position 2023
index[11] = 2023;
And so on.
When you later need to find some number in your original array storing 1 million entries, you don't have to iterate the whole array, you just need to check where in your index the first number starting by the first two significant digits is stored. Entry index[10] tells you where the first number starting by 10 is stored. You can then iterate forward until you find your match.
In my example I employed a small index, thus the average number of iterations that you will need to perform will be 1000000/100 = 10000
If you enlarge your index to somewhere close the length of the data the number of iterations will tend to 1, making any search blazing fast.
What I like to do is to create some simple algorithm that tells me what's the ideal depth of the index after knowing the type and length of the data to index.
Please, note that in the example that I have posed, 64 bit numbers are indexed by their first index depth significant figures, thus 10 and 100001 will be stored in the same index segment. That's not a problem on its own, nonetheless each master has his small book of secrets. Treating numbers as a fixed length hexadecimal string can help keeping a strict numerical order.
You don't have to change the base though, you could consider 10 to be 0000010 to keep it in the 00 index segment and keep base 10 numbers ordered, using different numerical bases is nonetheless trivial in C, which is of great help for this task.
As you make your index depth become larger, the amount of entries per index segment will be reduced
Please, do note that programming, especially lower level like C consists in comprehending the tradeof between CPU cycles and memory use in great part.
Creating the proposed index is a way to reduce the number of CPU cycles required to locate a value at the cost of using more memory as the index becomes larger. This is nonetheless the way to go nowadays, as masive amounts of memory are cheap.
As SSDs' speed become closer to that of RAM, using files to store indexes is to be taken on account. Nevertheless modern OSs tend to load in RAM as much as they can, thus using files would end up in something similar from a performance point of view.

What happens if i don't use zero-based array in C

Can someone explain what would happen? Is it really necessary to start at index 0 instead of 1 (which would be easier for me)?
You can do whatever you want, as long as your array subscript is strictly less than the size of the array.
Example:
int a[100];
a[1] = 2; // fine, 1 < 100
What happens if I don't use zero-based array in C
Well, you can't. C arrays are zero based, by definition, by standard.
Is it really necessary to start at 0?
Well, this is no rule to prevent you from leaving index 0 unused, but then, you'll almost certainly not get the desired result.
Using non-zero based arrays in C is possible, but not recommended. Here is how you would allocate a 1-based array of 100 integers:
int * a = ((int*)malloc(100*sizeof(int)))-1;
The -1 moves the start of the pointer back one from the start of the array, making the first valid index 1. So this array will have valid indices from 1 to 100 inclusive.
a[1] = 10; /* Fine */
a[100] = 7; /* Also fine */
a[0] = 5; /* Error */
The reason why this isn't recommended is that everything else in C assumes that pointers to blocks of memory point to the first element of interest, not one before that. For example, the array above won't work with memcpy unless you add 1 to the pointer when passing it in every time.

Runtime of Initializing an Array Zero-Filled

If I were to define the following array using the zero-fill initialization syntax on the stack:
int arr[ 10 ] = { 0 };
... is the run time constant or linear?
My assumption is that it's a linear run time -- my assumption is only targeting the fact that calloc must go over every byte to zero-fill it.
If you could also provide a why and not just it's order xxx that would be tremendous!
The runtime is linear in the array size.
To see why, here's a sample implementation of memset, which initializes an array to an arbitrary value. At the assembly-language level, this is no different than what goes on in your code.
void *memset(void *dst, int val, size_t count) {
unsigned char *start = dst;
for (size_t i = 0; i < count; i++)
*start++ = value;
return dst;
}
Of course, compilers will often use intrinsics to set multiple array elements at a time. Depending on the size of the array and things like alignment and padding, this might make the runtime over array length more like a staircase, with the step size based on the vector length. Over small differences in array size, this would effectively make the runtime constant, but the general pattern is still linear.
This is actually a tip of the ice berg question. What you are really asking is what is the order (Big Oh) of initializing an array. Essentially, the code is looping thru each element of the array and setting them to zero. You could write a for loop to do the same thing.
The Order of magnitude of that loop is O(n), that is, the time spent in the loop increases in proportion to the number of elements being initialized.
If the hardware supported an instruction that says to set all bytes from location X to Y to zero and that instruction worked in M instruction cycles and M never changed regardless of the number of bytes being set to zero, then that would be of order k, or O(k).
In general, O(k) is probably referred to as constant time and O(n) as linear.

Dynamic Arrays and structs

Thanks! I just had to cast the right side of the assignment to Term.
I have to make a dynamic array of polynomials that each have a dynamic array of terms. When giving the term a exponent and coefficient, I get an error "expected expression before '{' token". What am I doing incorrectly when assigning the values?
Also, is there an easy way of keeping the dynamic array of terms ordered by their exponent? I was just planning on looping through, printing the max value but would prefer to store them in order.
Thanks!
polynomialArray[index].polynomialTerm[0] = {exponent, coefficient}; // ISSUE HERE
change to
polynomialArray[index].polynomialTerm[0] = (Term){exponent, coefficient};
polynomialArray[index].polynomialTerm[0]->exponent = exponent;
polynomialArray[index].polynomialTerm[0]->coefficient = coefficient;
There's an efficiency problem here in your code:
if(index > (sizeof(polynomialArray)/sizeof(Polynomial)))
polynomialArray = (Polynomial*)realloc(polynomialArray, index * sizeof(Polynomial));
as polynomialArray is a pointer, I think sizeof(polynomialArray) would always be 4 or 8(64-bit system). So the above if statement will always true as long as index is greater than 0.
If this is C99, I think you need
polynomialArray[index].polynomialTerm[0] = (Term){exponent, coefficient};
You cannot attribute values like that (only during declaration).
You should assign like this:
polynomialArray[index].polynomialTerm[0].exponent = exponent;
polynomialArray[index].polynomialTerm[0].coefficient = coefficient;
About the other question, you really don't need assert here. The pointer will not be NULL if it has a value malloc allocated to it. If not, it is better to be NULL, so you can test if malloc failed.
To have it ordered, you will need to order using some sort algorithm. I think that if you are looking for an easy way, the way you are doing is fine. If it is critical to be ordered (like real time applications), than you need to rethink the approach. If not, keep it and go forward!
Take care,
Beco

Resources