What happens to strings under the hood in C? [closed] - c

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Transitioning myself from Python to C for an algorithms course, it has been really difficult for me to understand how common strings work in this new hell.
From what I've understood:
In C, there are no strings per se, but rather an array of characters.
Variable names of arrays point to the address of the first element in an array (which in memory is lined up), thus lacking the need to point out to every single character.
What confuses me is the following:
char greeting[] = "Hello world";
printf("%s", greeting);
1) How come there is no need to pass an array to greeting[ ] like {"H", "e", "l", "l", "o"} etc but a single string is enough?
2) Why does printf print out the whole message, when it's in actuality a simple array? Does using the string format in prinf go through a for loop, printing out each element without a new-line?
char *greeting = "Hello world";
printf("%s", greeting);
3) What? Let me get guess this... C takes the inserted string, gets its length, creates an array of characters and then does the point (2) magic? What kind of shenigans does the pointer variable do? Something something a[ ] == &a AND a[0] == *a???
char *moreGreetings[] = {"Hello", "Greetings", "Good morning"};
printf("%s", moreGreetings[0]); // Returns "Hello"
4) I just can't anymore... why does calling moreGreetings[0] call out the whole array of characters "Hello"???
Unless there is a bunch of shenigans going on under the hood, I have no idea how any of this makes sense. Could someone PLEASE explain what is going on?

Computers are aliens. They think nothing like we do. Computers don't know what strings are.
Programming languages are human-to-alien translators. Python is like reading an idiomatically translated book. C is like reading a literal translation, and even then it does a lot of work.
1) How come there is no need to pass an array to greeting[ ] like {"H", "e", "l", "l", "o"} etc but a single string is enough?
The compiler takes care of it for you. Also you're missing the null byte at the end. And those aren't characters.
C is the ultimate DIY language. Coming from Python it can be very disorienting. C gives you the bare minimum (yes, I see you Assembly programmers waving your arms in the back, don't complicate things). It does this A) to be very fast and B) to let you build anything. Unfortunately it doesn't always do this in the most obvious way. If you don't understand what's going on under the hood in C, the details of how computer memory works, you're in trouble.
For example, be careful of " vs '. 'H' is the single character H, really the short (ie. 1 byte) integer 72 (the exact number depends on your locale). "H" is a two character array, {'H', '\0'} which is really {72, 0}.
The key thing to understand about strings in C, and all arrays, is they're just a hunk of memory split into 1 byte chunks. That's it. They don't even store their own length, you have to either store that somewhere else (like in a struct) or terminate the list with something.
C strings are a hunk of memory split into 1 byte chunks terminated by a null byte (ie. 0). That's it. These are conceptually equivalent.
const char *string = "Hello";
char string[] = {'H', 'e', 'l', 'l', 'o', '\0'};
Both will contain the same bytes, they differ in how they're stored.
2) Why does printf print out the whole message, when it's in actuality a simple array? Does using the string format in prinf go through a for loop, printing out each element without a new-line?
printf is kinda like Python's str. You tell it how to convert the thing into characters, and it'll convert the thing. %s says it's a character array terminated by a null byte. %d says it's an integer. %f says a floating point number. All of these things are represented differently in memory and need different conversions to characters.
How printf actually works is an implementation detail, but it's a good exercise to implement it yourself. And you can do it with a for loop writing out one byte at a time and stopping at the null byte.
for( const char *pos = string; pos[0] != '\0'; pos++ ) {
putchar(pos[0]);
}
Note that rather than indexing through the array, I'm moving forward where the start of the array is. string is nothing more but a pointer to the start of the array. By copying it to pos I can change that pointer without affecting string. This avoids having to allocate an extra integer for the index, and it avoids having to do the extra math of an array lookup. pos[0] just reads 1 byte after pos.
And yes, if you forget that null byte it'll just keep on going reading the memory past the end of the string until it happens to see a 0 or the operating system smacks it for going out of the bounds of the process.
3) What? Let me get guess this... C takes the inserted string, gets its length, creates an array of characters and then does the point (2) magic? What kind of shenigans does the pointer variable do? Something something a[ ] == &a AND a[0] == *a???
No, C strings don't store the length. To get the length they'd have to iterate through the whole string, and then iterate through the whole string again to print it. Instead they print to the null byte.
4) I just can't anymore... why does calling moreGreetings[0] call out the whole array of characters "Hello"???
Because moreGreetings is an array of pointers to more character arrays. char *moreGreetings[] is roughly equivalent to char **moreGreetings. It's a pointer to a pointer to characters.
It's an array of strings and you asked for the first one, so you get a string out.
Keep in mind, Python is written in C (yeah, there's other implementations now). C is the bottom of the stack (almost). Python, and every other program, eventually has to deal with these same "shenanigans" C does, but really its dealing with the reality of how computers work.
Often they don't use C strings because they're so ungainly and error prone, they make up their own, but they're still filling fixed sized hunks of memory with numbers and calling them "strings".
The best advice I can give you is to turn on compiler warnings. All of them! C compiler warnings can shine a light on many simple mistakes, but they're off by default. The typical way you turn them on is with -Wall, but that's not all warnings. There's lots and lots of extras. This is the formula I use in my Makefile (have a Makefile).
CFLAGS += -Wall -Wshadow -Wwrite-strings -Wextra -Wconversion -std=c99 -pedantic $(OPTIMIZE)
That turns on "all" warnings, and "extra" warnings, and some additional specific warnings I've found useful. It says I'm using the ISO C standard from 1999 (more on that in a moment) and I want the compiler to be pedantic about following the standard so my code is portable between compilers and environments. I do a lot of Open Source work, but it's good when you're started so you don't get addicted to non-standard compiler extensions.
About the standard. C is quite old and was only standardized in 1990. Many, many people learned to code with non-standard C, and you see that in a lot of C teaching material. Even though there's a 2011 standard, many C programmers write and teach to C90 or even earlier. Even C99 is considered "new" by many. Visual Studio is particularly bad at standards compliance, but they're finally catching up in the latest versions.

1) How come there is no need to pass an array to greeting[ ] like {"H", "e", "l", "l", "o"} etc but a single string is enough?
Because the C syntax allows for "string" literals, which are a shorthand way of representing a C-style string.
Incidentally, {"H", "e", "l", "l", "o"} is an array of strings, not an array of chars. An array of characters would look like this: {'H', 'e', 'l', 'l', 'o'}, but "Hello" actually represents the array { 'H', 'e', 'l', 'l', 'o', '\0' } (strings work by having a string termination character \0 at the end).
2) Why does printf print out the whole message, when it's in actuality a simple array? Does using the string format in prinf go through a for loop, printing out each element without a new-line?
The %s token tells printf that you want it to treat the value as a "string", so it handles it as one, printing characters one by one until it encounters the string termination character \0, which is automatically at the end of any "string" you create using the string literal syntax.
3) What? Let me get guess this... C takes the inserted string, gets its length, creates an array of characters and then does the point (2) magic? What kind of shenigans does the pointer variable do? Something something a[ ] == &a AND a[0] == *a???
I have no idea what this question means.
4) I just can't anymore... why does calling moreGreetings[0] call out the whole array of characters "Hello"???
moreGreetings is an array of strings (or an array of pointers to arrays of chars, if you like). So moreGreetings[0] is the first element in that array, which is the "string" "Hello". If you pass that into printf and use %s to tell it to treat the value as a string, then it will.

How come there is no need to pass an array to greeting[ ] like {"H", "e", "l", "l", "o"} etc but a single string is enough?
It is indeed possible to assign "Hello" as an array.
char greetings[] = {'H', 'e', 'l', 'l', 'o', '\0'};
But this assignment is very hard to write so char greetings[] = "Hello" will be a shortcut. But the two assignments are the same.
Why does printf print out the whole message?
printf has different behaviors depending on the format argument it receives. When you ask printf to print a value in string format %s, it takes a pointer to a character and prints its value as well as its subsequent characters one by one until it reaches the null terminator \0.
Why does calling moreGreetings[0] call out the whole array of characters "Hello"?
A pointer to the array is a pointer to the first element of that array. So in both printf("%s", greetings[0]); and printf("%s", greetings); you are passing a pointer to the same memory location, which produces the same output.

It's a language feature - you can initialize an array of chars using a string literal, and it will do what you meant, i.e. char greeting[] = "foo" will be interpreted as char greeting[] = {'f', 'o', 'o', '\0'}. This comes without a price because otherwise char greeting[] = "foo" would be a compile-time error.
Google C array decay. In short, passing an array where a pointer is expected will behave as if the pointer to the first element of the array was passed. This is useful in many contexts, especially with strings.
See #2.
Because you declared an array of pointers to char (strings), and are passing the first of those pointers to printf. It is equivalent to writing printf("%s", "Hello").

1) How come there is no need to pass an array to greeting[ ] like {"H", "e", "l", "l", "o"} etc but a single string is enough?
When you pass an array or a string (it turns out that they are both the same thing), you are giving the memory address of the first element in the array. Because array elements are stored in memory, one after another, all that is needed to access the next element in the array (or character in string) is to increment the memory address that was passed.
2) Why does printf print out the whole message, when it's in actuality a simple array? Does using the string format in prinf go through a for loop, printing out each element without a new-line?
Usually, all the system supports is a simple putchar() function call. In order to use more convenient IO function, libraries were created. The printf function probably uses a for loop to print each element in the string.
3) What? Let me get guess this... C takes the inserted string, gets its length, creates an array of characters and then does the point (2) magic? What kind of shenigans does the pointer variable do? Something something a[ ] == &a AND a[0] == *a???
The C compiler does count the length of the string. I just want to clarify that this does not happen at runtime, it happens at compiletime. During runtime, the string is referred to by its pointer.
The pointer variable is just an ordinary variable. It just contains some memory address, somewhere. In order for the compiler to know how to treat the pointer, the pointer is given a type, i.e. int*, char*.
Note: There is such a thing as a void* without a referencing type.
When the program wants to access a memory location directly next to that pointed by some pointer, let's call it int*p, it just increments the value of p by p++ or p + 1.

Related

C compiler Differences? NUL control character in Char

Below is C code that will generate char array but with explicitly adding Null character or not. The results are unexpected in two compiler and I'm not sure why we even have to explicitly add the Null character?
//
// stringBugorNot.c
//
//
//
#include <string.h>
#include <stdio.h>
int main(void)
{
char aString[3] = {'a', 'b','c'};
char bString[4] = {'a', 'b', 'c', '\0'};
printf("\n");
printf("len of a is: %lu\n", strlen(aString));
printf("len of b is: %lu\n", strlen(bString));
printf("\n");
//Portion A
printf("last element of a is: '%c'\n", aString[strlen(aString)]);
printf("last element of b is: '%c'\n", bString[strlen(bString)]);
printf("\n");
//Portion B
printf("last element of a is: '%c'\n", aString[strlen(aString) - 1]);
printf("last element of b is: '%c'\n", bString[strlen(bString) - 1]);
}
Comments
+clang will give a runtime error because out of bounds on "aString".. makes sense
+gcc will not give any error and simply output "nothing" the null as expected. But maybe gcc is smarter and adds the null for me? Is the actual memory size different??
Clang OUTPUT ---->
len of a is: 3
len of b is: 3
bugOrNot.c:16:41: runtime error: index 3 out of bounds for type 'char [3]'
last element of a is: ''
last element of b is: ''
last element of a is: 'c'
last element of b is: 'c'
GCC OUTPUT ---->
len of a is: 9
len of b is: 3
last element of a is: ''
last element of b is: ''
last element of a is: ''
last element of b is: 'c'
Unexpected behavior that you see is called undefined behavior (UB) in the C standard:
Calling strlen on aString is UB because there is no null termination
Dereferencing aString at its undefined index is UB, unless the index is 0, 1, or 2
gcc could insert null terminator inadvertently by aligning bString at 4-byte boundary. It doesn’t change the fact that it’s still a UB, though.
When you say
char bString[4] = {'a', 'b', 'c', '\0'};
you have properly constructed a null-terminated string. It is precisely as if you had said
char bString[4] = "abc";
Since this is a proper, null-terminated string, it is meaningful and legal to call strlen(bString), and you will get a result of 3.
When you say
char aString[3] = {'a', 'b','c'};
on the other hand, and as I think you know, you have not constructed a proper null-terminated string. Therefore, it is not legal or meaningful to call strlen(aString) -- formally, we say that the result is undefined, meaning that absolutely anything can happen.
You tried the code with two different compilers, and were surprised to get two different results. This is perfectly normal. (It's perfectly normal to get two different results, and it's perfectly normal to be surprised by this, because it is pretty surprising, the first few times you encounter it.)
It is not the case that one compiler is "smarter" than the other, or that it "guessed" that you were trying to construct a string and so automatically supplied the "missing" \0 for you. It was simply a fluke, a random happenstance. (It is also certainly not the case that one compiler or the other has any kind of a bug. Again, there's no right result here, so a compiler can't be wrong, no matter what it does.)
If you want to work with strings in C, make sure that they're all properly null-terminated. If you should ever happen to accidentally do something stringlike with a non-properly-null-terminated string, don't try to interpret the results, don't assume that they mean anything, and especially don't decide that it's the "right" result that you can therefore depend on. You can't. It's likely to change for no reason, like when you use a different compiler next week, or when your customer uses your program on vital data instead of test data.
In C, a string is a sequence of character values including the nul terminator. That terminator is how the various C library routines know where the end of the string is. If you don't terminate a string properly, library routines like strlen and strcpy and printf with %s will all scan past the end of the string into other memory, resulting in garbled output or runtime errors.
The reason you got different results for the length of a with the two different compilers is that in the clang case, the byte immediately following the last element of a contained 0, whereas in the gcc case the bytes immediately following a did not contain 0.
Strictly speaking, the behavior on passing a non-terminated sequence of characters to the string handling routines is undefined - the language specification places no requirements on the compiler or runtime environment to "do the right thing", whatever that would be. You've basically voided the warranty at that point, and pretty much anything can happen.
Note that the C language specification does not require bounds checks on array accesses - the fact that you got the index out of bounds exception for clang is due to the compiler being extra friendly and going beyond what the language standard actually requires.

C character array and its length

I am studying now C with "C Programming Absolute Beginner's Guide" (3rd Edition) and there was written that all character arrays should have a size equal to the string length + 1 (which is string-termination zero length). But this code:
#include <stdio.h>
main()
{
char name[4] = "Givi";
printf("%s\n",name);
return 0;
}
outputs Givi and not Giv. Array size is 4 and in that case it should output Giv, because 4 (string length) + 1 (string-termination zero character length) = 5, and the character array size is only 4.
Why does my code output Givi and not Giv?
I am using MinGW 4.9.2 SEH for compilation.
You are hitting what is considered to be undefined behavior. It's working now, but due to chance, not correctness.
In your case, it's because the memory in your program is probably all zeroed out at the beginning. So even though your string is not terminated properly, it just so happens that the memory right after it is zero, so printf knows when to stop.
+-----------------------+
|G|i|v|i|\0|\0|... |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Other languages, such as Java, have safeguards against this sort of situations. Languages like C, however, do less hand holding, which, on the one hand, allows more flexibility, but on the other, give you much, much more ways to shoot you in the foot with subtle issues such as this one. In other words, if your code compiles, that doesn't mean it's correct and it won't blow up now, in 5 minutes or in 5 years.
In real life, this is almost never the case, and your string might end up getting stored next to other things, which would always end up getting printed out together with your string. You never want this. Situations like this might lead to crashes, exploits and leaked confidential information.
See the following diagram for an example. Imagine you're working on a web server and the string "secret"--a user's password or key is stored right next to your harmless string:
+-----------------------+
|G|i|v|i|s|e|c|r|e|t |
+-----------------------+
| your | rest of |
| stuff | memory (stack)|
+-----------------------+
Every time you would output what you would think is "Givi", you'd end up printing out the secret string, which is not what you want.
The byte after the last character always has to be 0, otherwise printf would not know when the string is terminanted and would try to access bytes (or chars) while they are not 0.
As Andrei said, apparently it just happened, that the compiler put at least one byte with the value 0 after your string data, so printf recognized the end of the string.
This can vary from compiler to compiler and thus is undefined behaviour.
There could, for instance, be a chance to have printf accessing an address, which your program is not allowed to. This would result in a crash.
In C text strings are stored as zero terminated arrays of characters. This means that the end of a text string is indicated by a special character, a numeric value of zero (0), to indicate the end of the string.
So the array of text characters to be used to store a C text string must include an array element for each of the characters as well as an additional array element for the end of string.
All of the C text string functions (strcpy(), strcmp(), strcat(), etc.) all expect that the end of a text string is indicated by a value of zero. This includes the printf() family of functions that print or output text to the screen or to a file. Since these functions depend on seeing a zero value to terminate the string, one source of errors when using C text strings is copying too many characters due to a missing zero terminator or copying a long text string into a smaller buffer. This type of error is known as a buffer overflow error.
The C compiler will perform some types of adjustments for you automatically. For instance:
char *pText = "four"; // pointer to a text string constant, compiler automatically adds zero to an additional array element for the constant "four"
char text[] = "four"; // compiler creates a array with 5 elements and puts the characters four in the first four array elements, a value of 0 in the fifth
char text[5] = "four"; // programmer creates array of 5 elements, compiler puts the characters four in the first four array elements, a value of 0 in the fifth
In the example you provided a good C compiler should issue at the minimum a warning and probably an error. However it looks like your compiler is truncating the string to the array size and is not adding the additional zero string terminator. And you are getting lucky in that there is a zero value after the end of the string. I suppose there is also the possibility that the C compiler is adding an additional array element anyway but that would seem unlikely.
What your book states is basically right, but there is missing the phrase "at least". The array can very well be larger.
You already stated the reason for the min length requirement. So what does that tell you about the example? It is crap!
What it exhibits is called undefined behaviour (UB) and might result in daemons flying out your nose for the printf() - not the initializer. It is just not covered by the C standard (well ,the standard actually says this is UB), so the compiler (and your libraries) are not expected to behave correctly.
For such cases, no terminator will be appended explicitly, so the string is not properly terminated when passed to `printf()".
Reason this does not produce an error is likely some legacy code which did exploit this to safe some bytes of memory. So, instead of reporting an error that the implicit trailing '\0' terminator does not fit, it simply does not append it. Silently truncating the string literal would also be a bad idea.
The following line:
char name[4] = "Givi";
May give warning like:
string for array of chars is too long
Because the behavior is Undefined, still compiler may pass it. But if you debug, you will see:
name[0] 'G'
name[1] 'i'
name[2] 'V'
name[3] '\0'
And so the output is
Giv
Not Give as you mentioned in the question!
I'm using GCC compiler.
But if you write something like this:
char name[4] = "Giv";
Compiles fine! And output is
Giv

How do I store an actual character into an array?

So I want to know how to store an actual character into an array.
char arr[4] = "sup!";
char backwards[4];
backwards[0] = *(arr + 3);
I guess my second question is if I do this, will, if i prompt a printf of backwards[0] using %c, will the actual character appear?
First off, let's fix your array size problem:
Character arrays in C are null-terminated: This means that you need to add space for a trailing \0 to terminate your string. You could use char arr[5] on the first line. (Note that if you are ABSOLUTELY CERTAIN you are NEVER going to use this array of characters with any of the C string handling functions, AND you assigned your characters individual as chars instead of as a string, this is not technically required. But save yourself some debugging time and use up the extra byte.) This is probably the source of your "weird error."
It seems like you know, but the other thing to bear in mind is in C, arrays are zero-based. This means that when you declare and array like char arr[4], you really get
arr[0]
arr[1]
arr[2]
arr[3]
C has no qualms about letting you walk off the end of your array and stomp on data or read in bad values. Here be dragons. Be careful.
Now, on to your actual questions:
1) You assign actual characters using single quotes arr[2]='x'; If you use double quotes, you are assigning a C string, which is null-terminated, as discussed.
2) Yes, printf with %c should do the trick.

Why don't numeric arrays end with a '\0' or null character?

Why don't the numeric arrays end with a null character?
For example,
char name[] = {'V', 'I', 'J', 'A', 'Y', '\0'};
But in case of numeric arrays there is no sign of null character at the end...
For example,
int marks[] = {20, 22, 23};
What is the reason behind that?
The question asked contains a hidden assumption, that all char arrays do end with a null character. This is in fact not always the case: this char array does not end with \0:
char no_zero[] = { 'f', 'o', 'o' };
The char arrays that must end with the null character are those meant for use as strings, which indeed require termination.
In your example, the char array only ends with a null character because you made it so. The single place where the compiler will insert the null character for you is when declaring a char array from a string literal, such as:
char name[] = "VIJAY";
// the above is sugar for:
char name[] = { 'V', 'I', 'J', 'A', 'Y', '\0' };
In that case, the null character is inserted automatically to make the resulting array a valid C string. No such requirement exists for arrays of other numeric types, nor can they be initialized from a string literal. In other words, appending a zero to a numeric array would serve no purpose whatsoever, because there is no code out there that uses the zero to look for the array end, since zero is a perfectly valid number.
Arrays of pointers are sometimes terminated with a NULL pointer, which makes sense because a NULL pointer cannot be confused with a valid pointer. The argv array of strings, received by main(), is an example of such an array.
An array can end in anything that is a valid value of the array element type. But only a \0 terminated char array is called a string.
For example
char name[]={'V','I','J','A','Y'};
Valid, but not a string, the limit is that you can't use it in functions expecting a string like strlen etc.
To clarify from OP's comment below, by the C standard, any character literals like 'a', '1', etc, including '\0' are type int. And you can put a '\0' at the end of an int array like this:
int arr[] = {20, 22, 23, '\0'};
But people usually don't do that because it's conventional that '\0' is only used to terminated strings. The above code is equivalent to
int arr[] = {20, 22, 23, 0};
A string ends with a 0 terminator, but a string is not the same thing as an array. We use arrays to store strings, but we also use arrays to store things that are not strings. That's why arrays in general don't automatically have a 0 appended to them.
Besides, in any generic array of int, 0 may be a valid (non-sentinel) value.
You can make an int array end with 0 as well, if you wish:
int iarray[] = {1, 2, 3, 0};
Since '\0' and 0 are exactly the same, you could even replace the 0 above by '\0'.
Your confusion might be due to the automatic insertion of '\0' in a declaration such as:
char s[] = "hello";
In the above, the definition of s is equivalent to char s[] = {'h', 'e', 'l', 'l', 'o', '\0'};. Think of this a a convenient shortcut provided by the C standard. If you want, you can force a non-zero terminated char array by being explicit about the size:
char s[5] = "hello";
In the above example, s won't be NUL terminated.
Also note that character literals in C are of type int, so '\0' is actually an int. (Also further, char is an integral type.)
There are three, maybe four decent ways of tracking an array's length, only two of which are common in C:
Keep track of the length yourself and pass it along with the pointer.
This is how arrays typically work. It doesn't require any special formatting, and makes sub-array views trivial to represent. (Add to the pointer, subtract from the length, and there ya go.)
Any function in the standard library that works with non-string arrays expects this already. And even some functions that mess with strings (like strncat or fgets) do it for safety's sake.
Terminate the array with some "sentinel" value.
This is how C strings work. Because nearly every character set/encoding in existence defines '\0' as a non-printable, "do nothing" control character, it's thus not a typical part of text, so using it to terminate a string makes sense.
Note that when you're using a char[] as a byte array, though, you still have to specify a length. That's because bytes aren't characters. Once you're dealing with bytes rather than characters, 0 loses its meaning as a sentinel value and goes back to being plain old data.
The big issue is that with most fundamental types, every possible arrangement of sizeof(type) bytes might represent a valid, useful value. For integral values, zero is particularly common; it is probably one of the most used and most useful numbers in all of computing. I fully expect to be able to put a 0 into an array of integers without appearing to lose half my data.
So then the question becomes, what would be a good sentinel value? What otherwise-legal number should be outlawed in arrays? And that question has no good, universal answer; it depends entirely on your data. So if you want to do such a thing, you're on your own.
Besides the lack of a decent sentinel value, this approach falls flat with non-character types for another reason: it's more complicated to represent subsets of the array. In order for a recursive function to pass part of the array to itself, it would have to insert the sentinel value, call itself, and then restore the old value. Either that, or it could pass a pointer to the start of the range and the length of the range. But wait...isn't that what you are trying to avoid? :P
For completeness, the two other methods:
Create a struct that can store the array's length along with a pointer to the data.
This is a more object-oriented approach, and is how arrays work in nearly every modern language (and how vectors work in C++). It works OK in C iff you have an API to manage such structs, and iff you use that API religiously. (Object-oriented languages provide a way to attach the API to the object itself. C doesn't, so it's up to you to stick to the API.) But any function that wasn't designed to work with your structs will need to be passed a pointer (and possibly a length) using one of the above two methods.
Pass two pointers.
This is a common way to pass a "range" in C++. You pass a pointer to the start of the array, and a pointer just past the end. It's less common in C, though, because with raw pointers, (start,length) and (start,end) represent the same data -- and C doesn't have the iterators and templates that make this so much more useful.
You need to end C strings with '\0' since this is how the library knows where the string ends.
The NUL-termination is what differentiates a char array from a string (a NUL-terminated char-array). Most string-manipulating functions relies on NUL to know when the string is finished (and its job is done), and won't work with simple char-array (eg. they will keep on working past the boundaries of the array, and continue until it finds a NUL somewhere in memory - often corrupting memory as it goes).
You do not have to have a '\0' char at at the end of character array! This is a wrong assumption. There is no rule which says you do. Characters (char type) are exactly like any other kind of data.
You do have to have a null terminated char array if you want to print the array using standard printf-family functions. But only because those functions depend on the ending of the character array - '\0' char.
Functions often have rules concerning the kind of data they expect. String (char[]) functions are not exception. But this is not a language requirement, it's the API you're using which has these requirements.
Char array ends with special char '\0' so that it can be treated as string.
And when you are manipulating string there must be some way to tell the length(boundary) of that string.
Look at function prototype of strcpy
char * strcpy ( char * destination, const char * source );
How does it know to copy how many chars from source to destination? The answer is by looking at the position of '\0'.
The '\0' char is prominent when dealing with string as a char *. Without '\0' as an end marker you wouldn't have been able to treat char * as string.
Arrays by themselves do not have to be 0\ terminated, it is the usage of the character arrays in a specific way that needs them to be \0 terminated. The standard library functions which act on character arrays will use the \0 to detect end of the array and hence treat it as a string, this behavior means the users of these functions will need to follow the \0 termination precondition. If your character array usage doesn't use any such functionality then it doesn't need the \0 terminator.
An array of char not necessarilly ends with \0.
It is a C convention that strings are ended with \0.
This is useful to find the end of the string.
But if you are only interested in holding data that is of type char, you can have a \0 at end or not.
If your array of char is intended to be used as a string, you should add \0 at the end of it.
EDIT: What is ended by \0 are the string literals, not the array of char.
The question is ill formulated.
An example to take the point that how \0 will confuse if taken in a integer array:-
int marks[]={20,22,23,0,93,'\0'};
^
So now your array will assume that 0(marked) is an end of the array which is not true.
\0 is usually used to terminate a string. In string \0 is regarded as the end of the string.
In your example you dont need to terminate it with '\0'
Found a very interesting wiki post:-
At the time C (and the languages that it was derived from) was
developed, memory was extremely limited, so using only one byte of
overhead to store the length of a string was attractive. The only
popular alternative at that time, usually called a "Pascal string"
(though also used by early versions of BASIC), used a leading byte to
store the length of the string. This allows the string to contain NUL
and made finding the length need only one memory access (O(1)
(constant) time). However, C designer Dennis Ritchie chose to follow
the convention of NUL-termination, already established in BCPL, to
avoid the limitation on the length of a string caused by holding the
count in an 8- or 9-bit slot, and partly because maintaining the count
seemed, in our experience, less convenient than using a terminator.
Also check a related post:- nul terminating a int array
We have a convention: special character '0' with numeric code 0, marks end of the string.
But if you want to mark end of int array, how will you know is that 0 is a valid array member or end-of-array mark? So, in general, it is not possible to have such a mark.
In other words:
The character '\0' (but not character '0', code 48) have no any sense in context of text string (by convention, it is a special character, that marks end), so it can be used as end-of-array mark:
Integer 0 or \0 (which is the same), are valid integer. It can have sense, and that is why it cannot be used as end-of-array mark:
int votesInThisThread[] = { 0, -1, 5, 0, 2, 0 }; // Zeroes here is a valid numbers of upvotes
If you'l try to detect end of this example array by searching 0, you'll get size of 0.
That is what question about?

Strings without a '\0' char?

If by mistake,I define a char array with no \0 as its last character, what happens then?
I'm asking this because I noticed that if I try to iterate through the array with while(cnt!='\0'), where cnt is an int variable used as an index to the array, and simultaneously print the cnt values to monitor what's happening the iteration stops at the last character +2.The extra characters are of course random but I can't get it why it has to stop after 2.Does the compiler automatically inserts a \0 character? Links to relevant documentation would be appreciated.
To make it clear I give an example. Let's say that the array str contains the word doh(with no '\0'). Printing the cnt variable at every loop would give me this:
doh+
or doh^
and so on.
EDIT (undefined behaviour)
Accessing array elements outside of the array boundaries is undefined behaviour.
Calling string functions with anything other than a C string is undefined behaviour.
Don't do it!
A C string is a sequence of bytes terminated by and including a '\0' (NUL terminator). All the bytes must belong to the same object.
Anyway, what you see is a coincidence!
But it might happen like this
,------------------ garbage
| ,---------------- str[cnt] (when cnt == 4, no bounds-checking)
memory ----> [...|d|o|h|*|0|0|0|4|...]
| | \_____/ -------- cnt (big-endian, properly 4-byte aligned)
\___/ ------------------ str
If you define a char array without the terminating \0 (called a "null terminator"), then your string, well, won't have that terminator. You would do that like so:
char strings[] = {'h', 'e', 'l', 'l', 'o'};
The compiler never automatically inserts a null terminator in this case. The fact that your code stops after "+2" is a coincidence; it could just as easily stopped at +50 or anywhere else, depending on whether there happened to be \0 character in the memory following your string.
If you define a string as:
char strings[] = "hello";
Then that will indeed be null-terminated. When you use quotation marks like that in C, then even though you can't physically see it in the text editor, there is a null terminator at the end of the string.
There are some C string-related functions that will automatically append a null-terminator. This isn't something the compiler does, but part of the function's specification itself. For example, strncat(), which concatenates one string to another, will add the null terminator at the end.
However, if one of the strings you use doesn't already have that terminator, then that function will not know where the string ends and you'll end up with garbage values (or a segmentation fault.)
In C language the term string refers to a zero-terminated array of characters. So, pedantically speaking there's no such thing as "strings without a '\0' char". If it is not zero-terminated, it is not a string.
Now, there's nothing wrong with having a mere array of characters without any zeros in it, as long as you understand that it is not a string. If you ever attempt to work with such character array as if it is a string, the behavior of your program is undefined. Anything can happen. It might appear to "work" for some magical reasons. Or it might crash all the time. It doesn't really matter what such a program will actually do, since if the behavior is undefined, the program is useless.
This would happen if, by coincidence, the byte at *(str + 5) is 0 (as a number, not ASCII)
As far as most string-handling functions are concerned, strings always stop at a '\0' character. If you miss this null-terminator somewhere, one of three things will usually happen:
Your program will continue reading past the end of the string until it finds a '\0' that just happened to be there. There are several ways for such a character to be there, but none of them is usually predictable beforehand: it could be part of another variable, part of the executable code or even part of a larger string that was previously stored in the same buffer. Of course by the time that happens, the program may have processed a significant amount of garbage. If you see lots of garbage produced by a printf(), an unterminated string is a common cause.
Your program will continue reading past the end of the string until it tries to read an address outside its address space, causing a memory error (e.g. the dreaded "Segmentation fault" in Linux systems).
Your program will run out of space when copying over the string and will, again, cause a memory error.
And, no, the C compiler will not normally do anything but what you specify in your program - for example it won't terminate a string on its own. This is what makes C so powerful and also so hard to code for.
I bet that an int is defined just after your string and that this int takes only small values such that at least one byte is 0.

Resources