OK to use a terminator to manage fixed length arrays? - c

I'm working in ANSI C with lots of fixed length arrays. Rather than setting an array length variable for every array, it seems easier just to add a "NULL" terminator at the end of the array, similar to character strings. Fot my current app I'm using "999999" which would never occur in the actual arrays. I can execute loops and determine array lengths just by looking for the terminator. Is this a common approach? What are the issues with it? Thanks.

This approach is technically used by your main arguments, where the last value is a terminal NULL, but it's also accompanied by an argc that tells you the size.
Using just terminals sounds like it's more prone to mistakes in the future. What's wrong with storing the size along with an array?
Something like:
struct fixed_array {
unsigned long len;
int arr[];
};
This will also be more efficient and less error-prone.

The main problem I can think of is that keeping track of the length can be useful because there are built in functions in C that take length as a parameter, and you need it to know the length to know where to add the next element too.
In reality it depends on the size of your array, if it is a huge array than you should keep track of the length. Otherwise looping through it to determine the length every time you want to add an element to the end would be very expensive. O(n) instead of the O(1) time you normally get with arrays

The main problem with this approach is that you can't know the length in advance without looping to the end of the array - and that can affect the performance quite negatively if you only want to determine the length.

Why don't you just
Initialize it with a const int that you can use later in the code to check the size, or
Use int len = sizeof(my_array) / sizeof(the_type).

Since you're using 2-dimensional arrays to hold a ragged array, you could just use a ragged array: type *my_array[];. Or you could put the length in element 0 of each row and treat the rows as 1-indexed arrays. With some evil trickery you could even put the lengths at element -1 of each row![1]
Left as exercise ;)

Related

Is checking the size of a list linear?

I've seen many answers online saying that checking the size of a list is constant time, but I don't understand why.
My understanding was that a list isn't stored in contiguous memory chunks (like an array), meaning there is no way of getting the size of a list (last element index + 1), without first traversing through every element.
Thoughts?
I've seen many answers online saying that checking the size of a list is constant time.
This fallacy may originate in a fundamental flaw in the Python language: arrays are called lists in Python. As Python gains popularity, the word list has become ambiguous.
Computing the length of a linked list is an O(n) operation, unless the length has been stored separately and maintained properly.
Retrieving the size of an array is performed in constant time if the size is stored along with the array, as is the case in Python, so a=[1,2,3]; len(a) is indeed very fast.
Computing the length of an array may be an O(n) operation if the array must be scanned for a terminating value, such as a null pointer or a null byte. Thus strlen() in C, which computes the number of bytes in a C string (a null terminated array of char) operates in linear time.
The only way to find the length of a list without holding the locations before hand is iterating through the list. They are not stored in chunks so it requires iterating until reaching the last element. Therefore the time complexity is O(n). Of course if you store a variable holding the length and increment it every time an element is added (and decrement when an element is removed), you would not need to iterate through it which would make it constant as you would only need the first element, or retrieve the data from wherever the length is stored. Perhaps one could use the root element to hold the length therefore making it unnecessary to loop through it to get the length. In short, you are correct. The reason one might say constant is if it is stored beforehand.

Advantages of null termination vs count variable

I need to use array of graph data, i.e. struct with x and y integers. This array will be passed through many functions, and I need to decide the API choice.
typedef struct {
int x;
int y;
} GraphData_t;
How should I choose whether to use NULL-termination for the array, or supply count variable?
I have three approaches for my API:
1: loadGraph(GraphData_t *data, int count); //use count variable
2: loadGraph(GraphData_t *data); // use null-termination (or any other termination value)
typedef struct {
GraphData_t *data;
int count;
} GraphArray_t;
3: loadGraph(GraphArray_t *data); //use a struct which has integrated count variable
So far these seem equal to me. Which one would be the preferable method, and why?
As a rather old dinosaur, I will use history here.
Anyway, the size + pointer idiom is the multi purpose and bullet proof way. If in doubt, use it.
The delimited way is just more common for human beings, specially when you want to initialize an array: no need to manually count the items (with the risk of a one off mistake specially if you later add or remove elements to the initialization list), you just add the delimitor as the last element. BTW, it is the way we use lines in text files... But anyway, the sizeof(array)/sizeof(array[0]) idiom allows to easily and automatically get the size...
The NULL terminated idiom comes from the begining of micro-processors, where code was close to the hardware for performance reasons: comparison to 0 was the fastest test, and memory was expensive. And programmers began to end their constant strings with a NULL character for that reason: only one byte overhead, even if the string was longer than 256 characters. You find reference to this ASCIIZ idiom in MS/DOS 2 manuals but it had been made popular by the pair Unix and K&R C language since the 70's.
It is still convenient, and still used in C strings, but many higher level tools like C++ std::string now prefere the counted idiom which does not require one forbidden value.
For daily programming, the (null) terminated idiom should only be used when an array can only be browsed forward, and when you have no special need for the size. But beware, if you simply want to copy a null terminated array, you have to scan it twice: once for its size and once for its data.
Null termination is a convention that can only be used if the null value is excluded from the set of legal values for the entries. For example the string array argv in main has a null pointer at the end because a null pointer cannot be a legal string.
In your case, the array elements are structures with 2 int coordinates. You would need to decide what values to consider invalid for these coordinates. If all values are OK, then you must pass the number of elements explicitly. Passing the array length explicitly is preferred in all cases, as it avoids unnecessary scans. Indeed main also gets the length of the argv array as a separate int argument argc.
Whether to encapsulate the array pointer and the length in a structure is a matter of style and convenience. For complex structures, it is preferable to group all characteristics in a structure, but for a simple array, it may be more convenient to pass a pointer and a size explicitly as it allows you to apply the function to a subset of the array with loadGraph(data + i, j).
While all approaches can of course get the job done, there are some differences which may or may not be relevant for your use-case.
Null-termination is very convenient if the user needs or wants to use hard-coded arrays, because then they can just add or delete entries, without needing to worry about possibly breaking the application (unless they remove the terminator of course).
Since the size is unknown, almost every function working with a null-terminated array needs to iterate over the whole thing. This might be a problem if the array is large, and many functions usually wouldn't actually need to access all entries (or not in the order they are stored).
The terminator itself obviously needs to be a value that can never occur in your actual data. So, depending on your data, there might not be an obvious candidate to use as terminator (or even none at all).
There are probably more subtle differences which might influence your decision, but these are the first ones that came to my mind.

Trimming a char array in C

I am working on an assignment and I have been noticing a problem in my coding assignment. It is not clear to me how to tackle this problem, probably due to a lack of sleep but anyway. I need to trim a char array of it's white spaces for this assignment.
The solution I thought of involved a second char array and just simply copy the non white spaces to that array and I'm done. But how can I create a char array without knowing it's size, because at that moment I do not yet know the size. I still need to trim it in order to know how many characters need to be copied to the new array, which varies in the assignment
I know there are a lot of good questions out here on stackoverflow but I think this has more to do with the thought process rather then the correct syntax.
My second problem is how do I perform a fscanf/fgetc on a char array since it needs a stream, is it sufficient to give it a pointer rather then a stream?
If making the change in-place simply, shift every chracter after a space back, and repeat till the end of the array. This is very inefficient.
If making a new copy, make a new array of the same length, and then do as you were doing (copy all the non-space characters). If you copy the \0 character as well, then there will be no string termination issue. This is much more efficient.
Going by your comments, it appears you may have the option to input the array in any form you wish. I would then recommend that instead of doing text manipulations later on, just input the string in the form you need.
You can simply use scanf or fscanf repeatedly, to input the separate words into the same array. This will take care of all the whitespaces.
Here is one partial idea: You can make a first pass on the char array and count the blanks, then take the string length minus the blanks for the second array, then perform your copy across skipping the blanks.
You could also create a pass through the array:
Test until end of array:
Is my (Current/Index) position blank? (A space)
If so, grab next available non-blank value and put it there.
then index++
If not, index++
Not sure on the second, will do some checking and see if I can find a good answer there too.

How does the Length() function in Delphi work?

In other languages like C++, you have to keep track of the array length yourself - how does Delphi know the length of my array? Is there an internal, hidden integer?
Is it better, for performance-critical parts, to not use Length() but a direct integer managed by me?
There are three kinds of arrays, and Length works differently for each:
Dynamic arrays: These are implemented as pointers. The pointer points to the first array element, but "behind" that element (at a negative offset from the start of the array) are two extra integer values that represent the array's length and reference count. Length reads that value. This is the same as for the string type.
Static arrays: The compiler knows the length of the array, so Length is a compile-time constant.
Open arrays: The length of an open array parameter is passed as a separate parameter. The compiler knows where to find that parameter, so it replaces Length with that a read of that parameter's value.
Don't forget that the layout of dynamic arrays and the like would change in a 64-bit version of Delphi, so any code that relies on finding the length at a particular offset would break.
I advise just using Length(). If you're working with it in a loop, you might want to cache it, but don't forget that a for loop already caches the terminating bounds of the loop.
Yes, there are in fact two additional fields with dynamic arrays. First is the number of elements in the array at -4 bytes offset to the first element, and at -8 bytes offset there's the reference count. See Rudy's article for a detailed explanation.
For the second question, you'd have to use SetLength for sizing dynamic arrays, so the internal 'length' field would be available anyway. I don't see much use for additional size tracking.
Since Rob Kennedy gave such a good answer to the first part of your question, I'll just address the second one:
Is it better, for performance-critical parts, to not use Length() but a direct integer managed by me?
Absolutely not. First, as Rob mentioned, the compiler does it's thing to access the information extremely quickly, either by reading a fixed offset before the start of the array in the case of dynamic ones, using a compile-time constant in the case of static ones, and passing a hidden parameter in the case of open arrays, you're not going to gain any improvement in performance.
Secondly, the direct integer managed by you wouldn't be any faster, but would actually use more memory (an additional integer allocated along with the one Delphi already provides for dynamic and open arrays, and an extra integer entirely in the case of static arrays).
Even if you directly read the value Delphi stores already for dynamic arrays, you wouldn't gain any performance over Length(), and would risk your code breaking if the internal representation of that hidden header for arrays changes in the future.
Is there an internal, hidden integer
Yes.
to not use Length() but a direct integer managed by me?
Doesn't matter.
See Dynamic arrays item in Addressing pointers article by Rudy Velthuis.
P.S. You can also hit F1 button.

How can I reset an array of strings in C language?

I have a loop that populates "char array_of_strings[100][100];"
At some point I want to be able to clean it from all the strings added so far and start adding from position 0. How can I clean/rest it in C?
Thanks
You can use the memset function to set all characters back to zero.
P.S. Using a two-dimensional array is fairly unconventional for dealing with strings. Try moving towards dynamic allocation of individual strings.
That data structure really has no inherent structure. You basically reset it by pretending there's nothing in it.
You have some counter that tells you which of the strings you've populated, right? Set it to zero. Done.
Assuming that you are in fact using the array as strings, then something like this should work:
int i;
for (i=0; i<100; i++)
array_of_strings[i][0] = 0;
Of course, it you aren't treating your data as strings, then you might need to look at something like memset.
memset(array_of_strings, 0, sizeof(array_of_strings));
This is cleaner than putting a magic number for the size in there, and less likely to break down the road when someone changes the size of your strings array. Since you know the size of the array at compile time using sizeof will work.
You can also use bzero, which is a lot like memset, but only for zeroing. On some platforms bzero may be faster than memset, but honestly both functions are so fast, splitting hairs here is silly.
bzero(array_of_strings, sizeof(array_of_strings));
bzero requires you to
#include
memset needs
#include
The memset man page is here
The bzero man page is here

Resources