How can I reset an array of strings in C language? - c

I have a loop that populates "char array_of_strings[100][100];"
At some point I want to be able to clean it from all the strings added so far and start adding from position 0. How can I clean/rest it in C?
Thanks

You can use the memset function to set all characters back to zero.
P.S. Using a two-dimensional array is fairly unconventional for dealing with strings. Try moving towards dynamic allocation of individual strings.

That data structure really has no inherent structure. You basically reset it by pretending there's nothing in it.
You have some counter that tells you which of the strings you've populated, right? Set it to zero. Done.

Assuming that you are in fact using the array as strings, then something like this should work:
int i;
for (i=0; i<100; i++)
array_of_strings[i][0] = 0;
Of course, it you aren't treating your data as strings, then you might need to look at something like memset.

memset(array_of_strings, 0, sizeof(array_of_strings));
This is cleaner than putting a magic number for the size in there, and less likely to break down the road when someone changes the size of your strings array. Since you know the size of the array at compile time using sizeof will work.
You can also use bzero, which is a lot like memset, but only for zeroing. On some platforms bzero may be faster than memset, but honestly both functions are so fast, splitting hairs here is silly.
bzero(array_of_strings, sizeof(array_of_strings));
bzero requires you to
#include
memset needs
#include
The memset man page is here
The bzero man page is here

Related

Advantages of null termination vs count variable

I need to use array of graph data, i.e. struct with x and y integers. This array will be passed through many functions, and I need to decide the API choice.
typedef struct {
int x;
int y;
} GraphData_t;
How should I choose whether to use NULL-termination for the array, or supply count variable?
I have three approaches for my API:
1: loadGraph(GraphData_t *data, int count); //use count variable
2: loadGraph(GraphData_t *data); // use null-termination (or any other termination value)
typedef struct {
GraphData_t *data;
int count;
} GraphArray_t;
3: loadGraph(GraphArray_t *data); //use a struct which has integrated count variable
So far these seem equal to me. Which one would be the preferable method, and why?
As a rather old dinosaur, I will use history here.
Anyway, the size + pointer idiom is the multi purpose and bullet proof way. If in doubt, use it.
The delimited way is just more common for human beings, specially when you want to initialize an array: no need to manually count the items (with the risk of a one off mistake specially if you later add or remove elements to the initialization list), you just add the delimitor as the last element. BTW, it is the way we use lines in text files... But anyway, the sizeof(array)/sizeof(array[0]) idiom allows to easily and automatically get the size...
The NULL terminated idiom comes from the begining of micro-processors, where code was close to the hardware for performance reasons: comparison to 0 was the fastest test, and memory was expensive. And programmers began to end their constant strings with a NULL character for that reason: only one byte overhead, even if the string was longer than 256 characters. You find reference to this ASCIIZ idiom in MS/DOS 2 manuals but it had been made popular by the pair Unix and K&R C language since the 70's.
It is still convenient, and still used in C strings, but many higher level tools like C++ std::string now prefere the counted idiom which does not require one forbidden value.
For daily programming, the (null) terminated idiom should only be used when an array can only be browsed forward, and when you have no special need for the size. But beware, if you simply want to copy a null terminated array, you have to scan it twice: once for its size and once for its data.
Null termination is a convention that can only be used if the null value is excluded from the set of legal values for the entries. For example the string array argv in main has a null pointer at the end because a null pointer cannot be a legal string.
In your case, the array elements are structures with 2 int coordinates. You would need to decide what values to consider invalid for these coordinates. If all values are OK, then you must pass the number of elements explicitly. Passing the array length explicitly is preferred in all cases, as it avoids unnecessary scans. Indeed main also gets the length of the argv array as a separate int argument argc.
Whether to encapsulate the array pointer and the length in a structure is a matter of style and convenience. For complex structures, it is preferable to group all characteristics in a structure, but for a simple array, it may be more convenient to pass a pointer and a size explicitly as it allows you to apply the function to a subset of the array with loadGraph(data + i, j).
While all approaches can of course get the job done, there are some differences which may or may not be relevant for your use-case.
Null-termination is very convenient if the user needs or wants to use hard-coded arrays, because then they can just add or delete entries, without needing to worry about possibly breaking the application (unless they remove the terminator of course).
Since the size is unknown, almost every function working with a null-terminated array needs to iterate over the whole thing. This might be a problem if the array is large, and many functions usually wouldn't actually need to access all entries (or not in the order they are stored).
The terminator itself obviously needs to be a value that can never occur in your actual data. So, depending on your data, there might not be an obvious candidate to use as terminator (or even none at all).
There are probably more subtle differences which might influence your decision, but these are the first ones that came to my mind.

Reading the same input multiple times - C

I want to ask you if is it possible to read the same input (stdin) multiple times? I am about to get really big number, containing thousands of digits (so I am unable to store it in variable, (and also I can not use folders!). My idea is to put the digits into int array, but I don't know how big the array should be, because amount of digits in input may vary. I have to write general solution.
So my question is, how to solve it, and how to find out amount of digits (so I can initialize array), before I copy digits into array. I tried using scanf(), multiple times, or scanf() and getchar, but it is not working. See my code:
int main(){
int c;
int amountOfDigits=5;
while(scanf("%1d",&c)!=' '){//finding out number of digits with scanf
if(isdigit(c)==0){
break;
}
amountOfDigits++;
}
int digits[amountOfDigits];//now i know lenght of array, and initialize it
for(int i=0;i<amountOfDigits;i++){//putting digits into array
digits[i]=getchar();
}
for(int i=0;i<amountOfDigits;i++){//printing array
printf("%d",digits[i]);
}
printf("\n");
return 0;
}
is it possible to read the same input (stdin) multiple times?
(I am guessing you are a student beginning to learn programming, and you are using Linux; adapt my answer if not)
For your homework, you don't need to read the same input several times. In some cases, it is possible (when the standard input is a genuine file -seekable-, that is when you use some redirection in your command). In other cases (e.g. when the standard input is a pipe, e.g. with a command pipeline; or with here documents in your shell command...) it is not possible to read several times stdin (but you don't need to). In general, don't expect stdin to be seekable with fseek or rewind (it usually is not).
(I am not going to do your homework, but here are useful hints)
so I am unable to store it in variable, (and also I can not use folders!)
You could do several things:
(since you mentioned folders....) you might use some more sophisticated ways of storing data on the disk (but in your particular case, I don't recommend that ...). These ways could be some direct-accessed file (ugly), or some indexed file à la gdbm, or some database à la sqlite or even some RDBMS server like PostGreSQL.
In your case, you don't need any of these; I'm mentioning it since you mentioned "folders" and you meant "directories"!
you really should use some heap allocated memory, so read about C dynamic memory allocation and read carefully the documentation of each standard memory management functions like malloc, realloc, free. Your program should probably use all these three functions (don't forget that malloc & realloc could fail).
Read this and that answers. Both are surprisingly relevant.
You probably should keep somehow:
a pointer to heap allocated int-s (actually, you could use char-s)
the allocated size of that pointer
the used length of that thing, that is the actual number of useful digits.
You certainly don't want to grow your array by repeated realloc at each loop (that is inefficient). In practice, you would adapt some growing scheme like newsize = 3*oldsize/2 + 10 to avoid reallocating memory at each step (of your input loop).
you should thank your teacher for a so useful exercise, but you should not expect StackOverflow to do your homework!
Be also aware of arbitrary-precision arithmetic (called bignums or bigints). It is actually hard to code efficiently, so in real-life you would use some library like GMPlib.

How to index arrays using pointers safely

Edit: If you fundamentally disagree with the Fedora guide here, please explain why this approach would be worse in an objective way than classic loops. As far as I know even the CERT standard doesn't make any statement on using index variables over pointers.
I'm currently reading the Fedora Defensive Coding Guide and it suggests the following:
Always keep track of the size of the array you are working with.
Often, code is more obviously correct when you keep a pointer past the
last element of the array, and calculate the number of remaining
elements by substracting the current position from that pointer. The
alternative, updating a separate variable every time when the position
is advanced, is usually less obviously correct.
This means for a given array
int numbers[] = {1, 2, 3, 4, 5};
I should not use the classic
size_t length = 5;
for (size_t i = 0; i < length; ++i) {
printf("%d ", numbers[i]);
}
but instead this:
int *end = numbers + 5;
for (int *start = numbers; start < end; ++start) {
printf("%d ", *start);
}
or this:
int *start = numbers;
int *end = numbers + 5;
while (start < end) {
printf("%d ", *start++);
}
Is my understanding the recommendation correct?
Is my implementation correct?
Which of the last 2 is safer?
Your understanding of what the text recommends is correct, as is your implementation. But regarding the basis of the recommendation, I think you are confusing safe with correct.
It's not that using a pointer is safer than using an index. The argument is that, in reasoning about the code, it is easier to decide that the logic is correct when using pointers. Safety is about failure modes: what happens if the code is incorrect (references a location outside the array). Correctness is more fundamental: that the algorithm provably does what it sets out to do. We might say that correct code doesn't need safety.
The recommendation might have been influenced by Andrew Koenig's series in Dr. Dobbs a couple of years ago. How C Makes It Hard To Check Array Bounds. Koenig says,
In addition to being faster in many cases, pointers have another big advantage over arrays: A pointer to an array element is a single value that is enough to identify that element uniquely. [...] Without pointers, we need three parameters to identify the range: the array and two indices. By using pointers, we can get by with only two parameters.
In C, referencing a location outside the array, whether via pointer or index, is equally unsafe. The compiler will not catch you out (absent use of extensions to the standard). Koenig is arguing that with fewer balls in the air, you have a better shot at getting the logic right.
The more complicated the construction, the more obvious it is that he's right. If you want a better illustration of the difference, write strcat(3) both ways. Using indexes, you have two names and two indexes inside the loop. It's possible to use the index for one with the name for the other. Using pointers, that's impossible. All you have are two pointers.
Is my understanding the recommendation correct?
Is my implementation correct?
Yes, so it seems.
The method for(type_t start = &array; start != end; start++) is sometimes used when you have arrays of more complex items. It is mostly a matter of style.
This style is sometimes used when you already have the start and end pointers available for some reason. Or in cases where you aren't really interested in the size, but just want to repeatedly compare against the end of the array. For example, suppose you have a ring buffer ADT with a start pointer and an end pointer and want to iterate through all items.
This way of doing loops is actually the very reason why C explicitly allows pointers to point 1 item out-of-bounds of an array, you can set an end pointer to one item past the array without invoking undefined behavior (as long as that item isn't de-referenced).
(It is the very same method as used by STL iterators in C++, although there's more of a rationale in C++, since it has operator overload. For example iterator++ in C++ doesn't necessarily give an item adjacently allocated in the next memory cell. For example, iterators could be used for iterating through a linked list ADT, where the ++ would translate to node->next behind the lines.)
However, to claim that this form is always the preferred one is just subjective nonsense. Particularly when you have an array of integers and know the size. Your first example is the most readable form of a loop in C and therefore always preferred whenever possible.
On some compilers/systems, the first form could also give faster code than the second form. Pointer arithmetic might give slower code on some systems. (And I suppose that the first form might give faster data cache access on some systems, though I'd have to verify that assumption with some compiler guru.)
Which of the last 2 is safer?
Neither form is safer than the other. To claim otherwise would be subjective opinions. The statement "...is usually less obviously correct" is nonsense.
Which style to pick vary on case-to-case basis.
Overall, those "Fedora" guidelines you link seem to contain lots of questionable code, questionable rules and blatant opinions. Seems more like someone wanted to show off various C tricks than a serious attempt to write a coding standard. Overall, it smells like the "Linux kernel guidelines", which I would not recommended to read either.
If you want a serious coding standard for/by professionals, use CERT-C or MISRA-C.

OK to use a terminator to manage fixed length arrays?

I'm working in ANSI C with lots of fixed length arrays. Rather than setting an array length variable for every array, it seems easier just to add a "NULL" terminator at the end of the array, similar to character strings. Fot my current app I'm using "999999" which would never occur in the actual arrays. I can execute loops and determine array lengths just by looking for the terminator. Is this a common approach? What are the issues with it? Thanks.
This approach is technically used by your main arguments, where the last value is a terminal NULL, but it's also accompanied by an argc that tells you the size.
Using just terminals sounds like it's more prone to mistakes in the future. What's wrong with storing the size along with an array?
Something like:
struct fixed_array {
unsigned long len;
int arr[];
};
This will also be more efficient and less error-prone.
The main problem I can think of is that keeping track of the length can be useful because there are built in functions in C that take length as a parameter, and you need it to know the length to know where to add the next element too.
In reality it depends on the size of your array, if it is a huge array than you should keep track of the length. Otherwise looping through it to determine the length every time you want to add an element to the end would be very expensive. O(n) instead of the O(1) time you normally get with arrays
The main problem with this approach is that you can't know the length in advance without looping to the end of the array - and that can affect the performance quite negatively if you only want to determine the length.
Why don't you just
Initialize it with a const int that you can use later in the code to check the size, or
Use int len = sizeof(my_array) / sizeof(the_type).
Since you're using 2-dimensional arrays to hold a ragged array, you could just use a ragged array: type *my_array[];. Or you could put the length in element 0 of each row and treat the rows as 1-indexed arrays. With some evil trickery you could even put the lengths at element -1 of each row![1]
Left as exercise ;)

Is it possible to read in a string of unknown size in C, without having to put it in a pre-allocated fixed length buffer?

Is it possible to read in a string in C, without allocating an array of fixed size ahead of time?
Everytime I declare a char array of some fixed size, I feel like I'm doing it wrong. I'm always taking a guess at what I think would be the maximum for my usecase, but this isn't always easy.
Also, I don't like the idea of having a smaller string sitting in a larger container. It doesn't feel right.
Am I missing something? Is there some other way I should be doing this?
At the point that you read the data, your buffer is going to have a fixed size -- that's unavoidable.
What you can do, however, is read the data using fgets, and check whether the last character is a '\n', (or you've reached the end of file) and if not, realloc your buffer, and read more.
I rarely find that necessary, but do usually allocate a single fixed buffer for the reading, read data into it, and then dynamically allocate space for a copy of it, allocating only as much space as it actually occupies, not the whole size of the buffer I originally used.
When you say "ahead of time", do you mean at runtime, or at compile time?
At compile time you do this:
char str[1000];
at runtime you do this:
char *str = new char[size];
They only way to get exactly the right size is to know how many characters you are going to read in. If you're reading from a file, you can seek to the nearest newline (or some other condition) and then you know exactly how big the array needs to be. ie:
int numChars = computeNeededSpace(someFileHandle);
char *readBuffer = new char[numChars];
fread(someFileHandle, readBuffer, numChars); //probly wrong parameter order
There is no other way to do this. Put yourself in the programs perspective, how is it supposed to know how many keys the user is going to press? The best thing you can do is limit the user, or whatever input.
there are some more complex things, like creating a linked list of buffers, and allocating chunks of buffers then linking them after. But I think that's not the answer you wanted here.
EDIT: Most languages have string/inputbuffer classes that hide this from you.
You must allocate fixed buffer. If it becomes small, than realloc() it to bigger size and continue.
There's no way of determining the string length until you've read it in, so reading it into a fixed-size buffer is pretty much your only choice.
I suppose you have the alternative to read the string in small chunks, but depending on your application that might not give you enough information at a time to work with.
Perhaps the easiest way to handle this dilemma is by defining maximum lengths for certain string input (#define-ing a constant for this value helps). Use a buffer of this pre-determined size whenever you are reading in a string, but make sure to use the strncpy() form of the string commands so you can specify a maximum number of characters to read. Some commonly-used types of strings (for example, filenames or paths) may have system-defined maximum lengths.
There's nothing inherently 'wrong' about declaring a fixed-size array as long as you use it properly (do proper bounds checking and handle the case where input will overflow the array). It may result in unused memory being allocated, but unfortunately the C language doesn't give us much to work with when it comes to strings.
There is the concept of String Ropes that lets you create trees of fixed size buffers. You still have to have fixed size buffers there is no getting around that really, but this is a pretty neat way of building up strings dynamically.
You can use Chuck Falconer's public domain ggets function to do the buffer management and reallocation for you: http://cbfalconer.home.att.net/download/index.htm
Edit:
Chuck Falconer's website is no longer available. archive.org still has a copy, and I'm hosting a copy too.

Resources