Using character vs. pointer for an "array of words" - c

Let's say I have a paragraph and I want to split up all the words and put them in an array. What would be a better way to do it (for this example, let's assume 100 words all under length 20chars):
# character array
char our_array[100][20];
strcpy(our_array[0], "Hello";
strcpy(our_array[1], "Something");
Or:
# string (pointer) array
char *newer_string[100];
newer_string[0] = "Hello";
newer_string[1] = "Something";
Why would one be preferable over the other? And is one more common in practice than the other?

Option 1 will assign a fixed 2D array of 100*20. In this case, the strings are stored in the 2D array. This has the following features.
Fixed storage per string. If e.g. one name is 50 characters long, the array needs to be 100*50.
The array is mutable. i.e. you can change the elements easily.
No requirement of heap memory allocation (malloc/calloc)
If there is a need of sorting the names, this method requires copying the whole strings around and is inefficient.
Option 2, as you have shown only works for constant string allocations at compile time. If you want to read the string from a file or from the user you need to dynamically allocate the memory. Something like shown below.
char *newer_string[100];
char stringtemp[101]; // size this to the maximum string len you need to support.
int len;
for (i=0; i<100; i++)
{
scanf("%s",stringtemp);
len = strlen(stringtemp);
newer_string[i] = malloc(len+1);
if (newer_string[i] == NULL) { /*handle memory error*/ }
strcpy(newer_string[i], stringtemp);
}
The features here are
More effecient memory storage. e.g. if one string is long, only that array element has more memory
Needs dynamic memory allocation. So you also need to take care of free
Easier to sort. For a sorting algorithm, you need to only swap the pointers newer_string[i] and newer_string[i+1]

Version 2 does not occupy all the memory at once (as Version 1 would).
Also the length of your strings can be arbitrary, you are not bound to a specific length (e.g. 20).
You may get a small memory overhead in Version 2 (due to the pointers you need to save), but this is only really true if at least nearly all words are used and nearly all of them have the specified length.
In general, I would always recommend Version 2 (Array of strings/char pointers).
It is even easier to replace strings in this 1D-Array than in the 2D Version.

It depends on what you want to do with that variable. This is definitely not written in stone, and there are little guidelines.
The first option...
char our_array[100][20];
strcpy(our_array[0], "Hello";
strcpy(our_array[1], "Something");
...has the advantage that each element of our_array is actually an array of char. So you can modify that data. Those strings are not read-only.
On the other hand, you are limited to strings of 19 characteres, and it's quite easy to fumble that. Because you are using strcpy() to initialize that array of strings, any error you make will not be detected by the compiler.
The other option...
char *newer_string[100];
newer_string[0] = "Hello";
newer_string[1] = "Something";
...has the advantage that each element of the array is a pointer. The strings are kept in read-only, static storage. They ocupy less space, and you can easy change newer_string[i] to point to something else. However, you cannot modify that data.

Related

Using strcpy on a 2D char array allocated with malloc

I have a problem, and I cannot figure out the solution for it. I have to programm some code to a µC, but I am not familiar with it.
I have to create an analysis and show the results of it on the screen of the machine. The analysis is allready done and functional. But getting the results from the analysis to the screen is my problem.
I have to store all results in a global array. Since the stack is really limited on the machine, I have to bring it to the larger heap. The linker is made that way, that every dynamic allocation ends up on the heap. But this is done in C so I cannot use "new". But everything allocated with malloc ends up on the heap automatically and that is why I need to use malloc, but I haven't used that before, so I have real trouble with it. The problem with the screen is, it accepts only char arrays.
In summaray: I have to create a global 2D char array holding the results of up to 100 positions and I have to allocate the memory for it using malloc.
To make it even more complicated I have to declare the variable with "extern" in the buffer.h file and have to implement it in the buffer.c file.
So my buffer.h line looks like this:
extern char * g_results[100][10];
In the buffer.c I am using:
g_results[0][0] = malloc ( 100 * 10 )
Each char is 1 byte, so the array should have the size of 1000 byte to hold 100 results with the length of 9 and 1 terminating /0. Right?
Now I try to store the results into this array with the help of strcpy.
I am doing this in a for loop at the end of the analysis.
for (int i = 0; i < 100, i++)
{
// Got to convert it to text 1st, since the display does not accept anything but text.
snprintf(buffer, 9, "%.2f", results[i]);
strcpy(g_results[i][0], buffer);
}
And then I iterate through the g_results_buffer on the screen and display the content. The problem is: it works perfect for the FIRST result only. Everything is as I wanted it.
But all other lines are empty. I checked the results-array, and all values are stored in them, so that is not the cause for the problem. Also, the values are not overwritten, it is really the 1st value.
I cannot see what it is the problem here.
My guesses are:
a) allocation with malloc isn't done correctly. Only allocating space for the 1st element? When I remove the [0][0] I get a compiler error: "assignment to expression with array type". But I do not know what that should mean.
b) (totally) wrong usage of the pointers. Is there a way I can declare that array as a non-pointer, but still on the heap?
I really need your help.
How do I store the results from the results-array after the 1st element into the g_results-array?
I have to store all results in a global array. Since the stack is really limited on the machine, I have to bring it to the larger heap.
A “global array“ and “the larger heap” are different things. C does not have a true global name space. It does have objects with static storage duration, for which memory is reserved for the entire execution of the program. People use the “heap” to refer to dynamically allocated memory, which is reserved from the time a program requests it (as with malloc) until the time the program releases it (as with free).
Variables declared outside of functions have file scope for their names, external or internal linkage, and static storage duration. These are different from dynamic memory. So it is not clear what memory you want: static storage duration or dynamic memory?
“Heap” is a misnomer. Properly, that word refers to a type of data structure. You can simply call it “allocated memory.” A “heap” may be used to organize pieces of memory available for allocation, but it can be used for other purposes, and the memory management routines may use other data structures.
The linker is made that way, that every dynamic allocation ends up on the heap.
The linker links object modules together. It has nothing to do with the heap.
But everything allocated with malloc ends up on the heap automatically and that is why I need to use malloc,…
When you allocate memory, it does not end up on the heap. The heap (if it is used for memory management) is where memory that has been freed is kept until it is allocated again. When you allocate memory, it is taken off of the heap.
The problem with the screen is, it accepts only char arrays.
This is unclear. Perhaps you mean there is some display device that you must communicate with by providing strings of characters.
In summaray: I have to create a global 2D char array holding the results of up to 100 positions and I have to allocate the memory for it using malloc.
That would have been useful at the beginning of your post.
So my buffer.h line looks like this:
extern char * g_results[100][10];
That declares an array of 100 arrays of 10 pointers to char *. So you will have 1,000 pointers to strings (technically 1,000 pointers to the first character of strings, but we generally speak of a pointer to the first character of a string as a pointer to the string). That is not likely what you want. If you want 100 strings of up to 10 characters each (including the terminating null byte in that 10), then a pointer to an array of 100 arrays of 10 characters would suffice. That can be declared with:
extern char (*g_results)[100][10];
However, when working with arrays, we generally just use a pointer to the first element of the array rather than a pointer to the whole array:
extern char (*g_results)[10];
In the buffer.c I am using:
g_results[0][0] = malloc ( 100 * 10 )
Each char is 1 byte, so the array should have the size of 1000 byte to hold 100 results with the length of 9 and 1 terminating /0. Right?
That space does suffice for 100 instances of 10-byte strings. It would not have worked with your original declaration of extern char * g_results[100][10];, which would need space for 1,000 pointers.
However, having changed g_results to extern char (*g_results)[10];, we must now assign the address returned by malloc to g_results, not to g_results[0][0]. We can allocate the required space with:
g_results = malloc(100 * sizeof *g_results);
Alternately, instead of allocating memory, just use static storage:
char g_results[100][10];
Now I try to store the results into this array with the help of strcpy. I am doing this in a for loop at the end of the analysis.
for (int i = 0; i < 100, i++)
{
// Got to convert it to text 1st, since the display does not accept anything but text.
snprintf(buffer, 9, "%.2f", results[i]);
strcpy(g_results[i][0], buffer);
}
There is no need to use buffer; you can send the snprintf results directly to the final memory.
Since g_results is an array of 100 arrays of 10 char, g_results[i] is an array of 10 char. When an array is used as an expression, it is automatically converted to a pointer to its first element, except when it is the operand of sizeof, the operand of unary &, or is a string literal used to initialize an array (in a definition). So you can use g_results[i] to get the address where string i should be written:
snprintf(g_results[i], sizeof g_results[i], "%.2f", results[i]);
Some notes about this:
We see use of the array both with automatic conversion and without. The argument g_results[i] is converted to &g_results[i][0]. In sizeof g_results[i], sizeof gives the size of the array, not a pointer.
The buffer length passed to snprintf does not need to be reduced by 1 for allow for the terminating null character. snprintf handles that by itself. So we pass the full size, sizeof g_results[i].
But all other lines are empty.
That is because your declaration of g_results was wrong. It declared 1,000 pointers, and you stored an address only in g_results[0][0], so all the other pointers were uninitialized.
This is all odd, you seem to just want:
// array of 100 arrays of 10 chars
char g_results[100][10];
for (int i = 0; i < 100, i++) {
// why snprintf+strcpy? Just write where you want to write.
snprintf(g_results[i], 10, "%.2f", results[i]);
// ^^^^^^^^ has to be float or double
// ^^ why 9? The buffer has 10 chars.
}
Only allocating space for the 1st element?
Yes, you are, you only assigned first element g_results[0][0] to malloc ( 100 * 10 ).
wrong usage of the pointers. Is there a way I can declare that array as a non-pointer, but still on the heap?
No. To allocate something on the heap you have to call malloc.
But there is no reason to use the heap, especially that you are on a microcontroller and especially that you know how many elements you are going to allocate. Heap is for unknowns, if you know that you want exactly 100 x 10 x chars, just take them.
Overall, consider reading some C books.
I do not know what that should mean.
You cannot assign to an array as a whole. You can assign to array elements, one by one.

difference between strlen(string) and strlen( *string)

Let's say I have an array of strings that are all of same size.
char strings[][MAX_LENGTH];
what would be the difference between strlen(strings) and strlen(*strings)?
I know that strings by itself would be the address of the first string in the array,
but what is *strings?
First, don't do this. C will allow you to do lots of things that are a bad idea. This doesn't mean you ought to do it. :)
While you may have compiler warnings, these two are effectively identical. The reason is that with this definition:
char strings[][MAX_LENGTH];
The allocation for this will end up being one continuous block. Within that block of memory, there are no "structures" or management devices that can be used to identify where individual strings start and stop. This creates an interesting situation.
Effectively, *string and string are both pointers to precisely the same memory location. This means that calling strlen on either one of them will return the null delimited string length of the first element in the first array.
However, I must reiterate... Don't do this.

C: Pointer-to-array and array of characters

there are many similar questions regarding this Topic, but they do not answer the following question:
Taking a swing
I am going to take a swing, if you want go straight to the question in the next heading. Please correct me if I make any wrong assumptions here.
Lets assume, I have this string declaration
char* cpHelloWorld = "Hello World!";
I understand the Compiler will make a char* to an anonymous Array stored somewhere in the Memory (by the way: where is it stored?).
If I have this declaration
char cHelloWorld[] = "Hello World!";
There will be no anonymous Array, as the Compiler will create the Array cHelloWorld right away.
The first difference between these two variables is that I can change cpHelloWorld, whereas the Array cHelloWorld is read-only, and I would have to redeclare it if I want to Change it.
My question is following
cpHelloWorld = "This is a pretty long Phrase, compared to the Hello World phrase above.";
How does my application allocate at runtime a new, bigger (anonymous) Array at runtime? Should I use this approach with the pointer, as it seems easier to use or are there any cons? On paper, I would have used malloc, if I had to work with dynamic Arrays.
My guess is that the Compiler (or runtime Environment?) creates a new anonymous Array every time I change the Content of my Array.
char* cpHelloWorld = "Hello World!";
is a String Literal stored in read-only memory. You cannot modify the contents of this string.
char cHelloWorld[] = "Hello World!";
is an array of char initialized to "Hello World!\0".
(note: where the brackets are placed)
The amount of memory allocated at run-time by the compiler is set by the initialization "This is a pretty long ... phrase above."; The compiler will initialize the literal allowing 1 char for each char in the initialization string +1 for the required nul-terminating character.
Whether you use a statically declared array (e.g. char my_str[ ] = "stuff";) or you seek to dynamically allocate storage for the characters, largely depends on whether you know what, and how much, of whatever you wish to store. Obviously, if you know beforehand what the string is, using a string literal or an initialized array of type char is a simple way to go.
However, when you do NOT know what will be stored, or how much, then declaring a pointer to char (e.g. char *my_string; and then once you have the data to store, you can allocate storage for my_string (e.g. my_string = malloc (len * sizeof *my_string); (of course sizeof *my_string will be 1 for character arrays, so that can be omitted) (note: parenthesis are required with sizeof (explicit type), e.g. sizeof (int), but are optional when used with a variable)
Then simply free whatever you have allocated when the values are no longer needed.
As a matter of fact all strings known to the compiler at compile-time are allocated in the data segment of the program. The pointer itself is located on the stack.
There is no memory allocation at run-time, so it is nothing like malloc. There are no performance drawbacks here.
Each of the constant "anonymous" strings used in these contexts exists at its fixed address. The only dynamic part is the actual pointer assignment. You should get the same string address each time you execute a specific pointer assignment from a specific anonymous string (each string has its own address).

Malloc and arrays

I want to create an array of 100 words and each word has 10 max characters but I also want to save in a variable the number of words in this array.
char array[1000];
Is this the most efficient way or can I use malloc() like this:
char *k;
k=(char *)malloc(1000 * sizeof(char));
How can I accomplish my task?
If you have an array of 100 words, perhaps you better use 2 dimesional array, like char array[100][11]. I'm specifying the second dimension as 11 because I'm taking into the consideration the null character (a word with max 10 chars + 1 null), the majority of string handling functions in C expect the strings to be null terminated.
it depends on how you want to use the stored data. What is its lifetime?
your first one is ambiguous:
char array[1000] outside a function is static data; inside a function its on the stack
the second one puts it on the heap.
You should read up on the differences between the characteristics of these 3 different allocation types (they all have pluses and minuses)
Stack, Static, and Heap in C++ is one place
In general, if you're looking for performance the declaration of char array[1000] is going to win because it's allocated on the stack.
When you allocate memory using malloc, it's coming from the heap. Also, it's good practice to always do static allocation if you can get away with it because it drastically reduces your chances of a memory leak, the memory will automatically be freed when you leave the scope.

C - is there a way to work with strings which have NULL character in the middle

Is it possible to have strings with NULL character somewhere except the end and work with them? Like get their size, use strcat, etc?
I have some ideas:
1) Write your own function for getting length (or something else), which is going to iterate over a string. If it meets a NULL char, it is going to check the next char of the string. If it is not NULL - continue counting chars. But it may (and WILL!) eventually lead to situation when you are reading memory OUTSIDE of the char array. So it is a bad idea.
2) Use sizeof(array)/sizeof(type), eg sizeof(input)/sizeof(char). That is going to work pretty good I think.
Do you have any other ideas on how this can be done? Maybe there are some function which I am not aware of (C newbie alert :))?
The only really safe method I can think of is to use "Pascal"-type strings (that is, something that has a string header and assorted other data associated with it).
Something like this:
typedef struct {
int len, allocated;
char *data;
} my_string;
You would then have to implement pretty much every string manipulation function yourself. Keeping both the "length of the string" and "the size of the allocation" allows you to have an allocation that's larger than the current contents, this may make repeated string concatenation cheaper (allows an amortized O(1) append).
You can have an array of char, either statically or dynamically allocated, that contains a zero byte in the middle, but only the part up to and including the zero can be considered a "string" in the standard C sense. Only that part will be recognized or considered by the standard library's string functions.
You can use a different terminator -- say two zeroes in a row -- and write your own string functions, but that just pushes off the problem. What happens when you need two zeroes in the middle of your string? In any case, you need to exercise even more care in this case than in the ordinary string case to ensure that your custom strings are properly terminated. You also have to be certain to avoid using them with the standard string functions.
If your special strings are stored in char array of known size then you can get the length of the overall array via sizeof, but that doesn't tell you what portion of the array contains meaningful data. It also doesn't help with any of the other string functions you might want to perform, and it does nothing for you if your handle on the pseudo-strings is a char *.
If you are contemplating custom string functions anyway, then you should consider string objects that have an explicit length stored with them. For example:
struct my_string {
unsigned allocated, length;
char *contents;
};
Your custom functions then handle objects of that type, being certain to do the right thing with the length member. There is no explicit terminator, so these strings can contain any char value. Also, you can be certain not to mixed these up with standard strings.
As long as you store the length of the array of chars then you can have strings with nul characters or even without a terminating nul.
struct MyString
{
int length;
char* buffer;
};
And then you would have to write all your equivalent functions for managing the string.
The bstring library http://bstring.sourceforge.net and Microsofts BSTR (uses wide chars) are existing libraries that work in this way and also offer some compatibilty with c-style strings.
pros - getting the length of the string is quick
cons - the strings need to be dynamically allocated.

Resources