I have a question for better understing how arrays and nullbytes work in C.
Let's say I have an int array of 13 cells.
Let's say I want cells number: 1, 2, 3 and 10 to have a value. The others that are left as default, automatically get the nullchar \0 as value ?
My understanding of \0 was that the nullbyte is always at the end of the array and its function is to tell the program where array ends. But seems to be wrong
I wrote a simple prog to verify that and seems it is like that:
int nums[13] = {1,2,3};
nums[10] = 69;
int i;
for(i=0;i<13;i++) {
if(nums[i]=='\0') {
printf("null char found! in position: %d\n",i);
}
else {
printf("element: %d found in position: %d of int array\n",nums[i],i);
}
}
return 0;
here is the output:
element: 1 found in position: 0 of int array
element: 2 found in position: 1 of int array
element: 3 found in position: 2 of int array
null char found! in position: 3
null char found! in position: 4
null char found! in position: 5
null char found! in position: 6
null char found! in position: 7
null char found! in position: 8
null char found! in position: 9
element: 69 found in position: 10 of int array
null char found! in position: 11
null char found! in position: 12
| 1 | | 2 | | 3 | | \0 | | \0 | | \0 | | \0 | | \0 | | \0 | | 69 | | \0 | | \0 | | \0 |
So why default cells are set with the \0 value ? instead of being left empty for example ?
Shouldn't the null char be just once at the end of the entire array ?
Thanks
There is no requirement in C that arrays need a \0 at the end. A NUL-terminator is only needed for C strings (which usually have the char or wchar_t or other character type). In a C string the \0 byte also doesn't have to be at the end of the array that contains it, but it must be at the end of the string part. It is perfectly valid to have 0's anywhere within an array. But if that array is used as a string, then the standard C string functions will interpret the 0 with the lowest index to signify the end of the string.
When you declare a variable (nums) in C with an initializer ({1,2,3}) in
int nums[13] = {1,2,3};
all indexes that aren't mentioned in the initializer (3 through 12) have their value initialized to 0. It is not possible to have 'empty' cells in an array. All cells will have a value, it is up to the program(mer) what values to consider empty.
C types correspond to memory, and memory has no real concept of "empty". There are languages where everything (or almost) can be made "empty" by putting some "empty" constant (Python has None, for instance), but C doesn't allow that. One reason to not allow it is that it forces you to have a special universal pattern for the empty state, and this has low-level repercussions. For instance, a character can take any value from 0 to 255 inclusively. That's because characters occupy 8 bits. If you also wanted to have an empty state without sacrificing possible values for characters, you'd need at least one more bit since the 8 other bits can be used for legitimate reasons, and this is undesirable for a lot of reasons.
For your array, the initialization syntax that you're using sets every unspecified element to zero. If you write:
char foo[4] = {1, 2, 3, 4};
then every element has a value (notice that it has no null byte in the end, because arrays don't need to have a null byte in the end–however, if you're using them as strings, then they very much should). If you write:
char foo[4] = {1, 2};
elements 0 and 1 have a specified value, but 2 and 3 don't, and with this syntax C will assume that you want to make them zero. On the other hand, if you write:
char foo[4];
you are not assigning any value to any element, and in this case C will not initialize the array at all. It would be undefined behavior to read from it; in practice, usually, the elements will take the values of whatever happened to exist at its memory location previously.
NULL defined as (void*)0 -
It is zero with generic ptr casting,
wich is equal to the NUL character's (\0) ascii code - 0
Arrays do not need to end with any special character/number.
strings do need to end with a special character, and the reason is simple, it lets functions wich operates on strings "know" where the string ends , for example:
char str[100] = {'h','e','l','l','o',0}; // same as {'h','e','l','l','o','\0'}
printf("%s",str);
prints:
hello
if the last character in the string was not NUL it will print 95 garbage characters after the string ("hello") because the array size is 100 and there is no way for the compiler to know where the string ends.
Even though the zero at the 6th cell is ending the string in most compilers you can set only the "hello" string and they will fill out the rest of the cells with zeroes, so it will be ok in both cases.
First of all, you are confusing C strings with regular arrays. With strings, there is always a \0 at the end of the chararray. It signifies the end of the string. For example, say you have this:
char myText[] = "hello";
In this case, the array places look like this:
myText[0] = 'h';
myText[1] = 'e';
myText[2] = 'l';
myText[3] = 'l';
myText[4] = 'o';
myText[5] = '\0';
However, arrays do not terminate with '\0'. Take another example:
int myArray[3] = {1, 2, 3};
According to your rule, since arrays have to terminate with a '\0', this is not a legal statement since we only give the array 3 elements instead of 4, and we would need 4 elements to include a '\0'. However, this is a completely legal statement in C. Clearly, space for the '\0' is not needed in arrays, just at the end of C strings.
Also note that '\0' is equivalent to the integer, as Kninnug pointed out in the comments:
\0 (the null character) isn't the same as the NULL-pointer. \0 is a byte with all bits set to 0, which would always compare equal to the int 0.
So, in your program, you could just equally check if:
if(nums[i] == 0)
Now, let's prove why you are getting your output.
Shouldn't the null char be just once at the end of the entire array?
No. Any other elements left empty will be initialized with the value of zero. So that is why you are seeing the output that you have; elements that are not num[0], num[1], num[2], or num[10] will be initialized with zero. Since you are checking for \0 (also 0) then everything else with not those elements will be 0.
As alk pointed out in the comments, the null character and the null pointer literal are different. At the end of C strings, you see the null character (NUL) which is '/0' or 0. However, the null pointer literal (NULL) is different.
Related
I need to initialize a char array with a loop and print it. Just like that:
int main( void )
{
char array[ 10 ];
for( int i = 1; i < 10; ++i ) {
array[ i - 1 ] = i;
}
// array[] contains numbers from 1 to 9 and an unitialized subscript
printf( "%s", array );
}
I want to know if I need to put the '\0' character in array[ 9 ] or if it is already there.
In other words: once I declared char array[ 10 ]; does the last subscript contains '\0' ?
I searched for similar questions and the better I could find is this where the array is filled with a loop but till the end, leaving no space for the terminating character.
Please tell me the truth.
In other words: once I declared char array[ 10 ]; does the last subscript contains '\0' ?
No.
You define a local variable and do not initialize it. These variables are not initialized by default but hold indetermined values.
If you want to have a defined value, you need to initialize or assign it:
char array[ 10 ] = "";
This will define an array with 10 elements.
As there is an initializer, the first element will be set to 0 (=='\0') due to the provided string literal.
Furthermore all other elements will be set to 0 because you provide less initializer values than you have elements in your array.
once I declared char array[10]; does the last subscript contains '\0' ?
The answer is NO: when you define the array as an automatic variable (a local variable in a function), it is uninitialized. Hence none of its elements can be assumed to have any specific value. If you initialize the array, even partially, all elements will be initialized, either explicitly from the values provided in the initializer or implicitly to 0 if there are not enough initializers.
0 and '\0' are equivalent, they are int constants representing the value 0. It is idiomatic to use '\0' to represent the null byte at the end of a char array that makes it a C string. Note that '0' is a different thing: it is the character code for the 0 digit. In ASCII, '0' has the value 48 (or 0x30), but some ancient computers used to use different encodings where '0' had a different value. The C standard mandates that the codes for all 10 digits from 0 to 9 must be consecutive, so the digit n has the code '0' + n.
Note that the loop in your code sets the value of 9 elements of the array to non zero values, and leaves the last entry uninitialized so the array is not null terminated, hence it is not a C string.
If you want to use the char array as a C string, you must null terminate it by setting array[9] to '\0'.
Note also that you can print a char array that is not null terminated by specifying the maximum number of bytes to output as a precision field in the conversion specifier: %.9s.
Finally, be aware that array[0] = 1; does not set a valid character in the first position of array, but a control code that might not be printable. array[0] = '0' + 1; set the character '1'.
#include <stdio.h>
int main(void) {
char array[10];
/* use the element number as the loop index: less error prone */
for (int i = 0; i < 9; ++i) {
array[i] = `0` + i + 1;
}
// array[] contains numbers from 1 to 9 and an unitialized subscript
printf("%.9s\n", array); // prints up to 9 bytes from `array`
array[9] = '\0';
// array[] contains numbers from 1 to 9 and a null terminator, a valid C string
printf("%s\n", array); // produce the same output.
return 0;
}
once I declared char array[ 10 ]; does the last subscript contains '\0' ?
No. It's uninitialized. It has garbage values in every element from whatever code used the same piece of memory last.
NO, You dont need to put '\0' at the end that is array[9].
why?
because when an array (char,int,float) is uninitialized it contains garbage values.
After initializing partial or full all other elements becomes 0 or \0 in case of char array.
example:
char array[10];
all elements contains garbage value.
after intialization
char array[10]={'a' ,'b'}; all other elements automitically becomes '\0'
this is true in case of structures also.
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void print_reverse(char *s)
{
size_t len=strlen(s);
char *t=s+len-1;
printf("%s %s\n",t,s);
while(t>=s){
printf("%c",*t);
t=t-1;
}
puts("");
}
int main(){
print_reverse("Hello");
}
Can anyone tell how char *t=s+len-1; and while(t>=s) works. I cant understand how a number can be added to pointer and how the pointers are compared in while loop. This program is for reversing a string in c.
Lets do this line by line:
print_reverse("Hello");
void print_reverse(char *s)
Now s points to a string that contains:
- - ----+----+----+----+----+----+----+---- - -
| H | e | l | l | o | \0 |
- - ----+----+----+----+----+----+----+---- - -
^
s
That last character is called the string "NUL" terminator because "NUL" is the name of the character with ASCII value zero (all ASCII values that are not printable have three letter names).
size_t len=strlen(s);
Now len has a value of five. Notice it does not include the "NUL" terminator so even though the string takes 6 bytes the length is five.
char *t=s+len-1;
Now t has a value of s+4. If you count the memory locations this is what you get:
- - ----+----+----+----+----+----+----+---- - -
| H | e | l | l | o | \0 |
- - ----+----+----+----+----+----+----+---- - -
^ ^
s t
Note that s+strlen(s) would point to the "NUL" terminator.
printf("%s %s\n",t,s);
That printf should print Hello o
while(t>=s)
This while loop will continue as long as t>=s which means it will do the body of the loop for every character, including the one where s is pointing.
printf("%c",*t);
This prints the contents of the memory that t is pointing at. It starts with the o and continues backwards towards the H.
t=t-1;
That the part that moves t backwards. Eventually t will be past s and then the loop will end. When the loop finishes it will look like this:
- - ----+----+----+----+----+----+----+---- - -
| H | e | l | l | o | \0 |
- - ----+----+----+----+----+----+----+---- - -
^ ^
t s
Then there is this one final line:
puts("");
That prints an empty string and a final linefeed - there wasn't a linefeed in the string but we needed one so this is a way to do that.
Pointer Arithmetic
When a pointer points into an array, adding integers to the pointer or subtracting integers from the pointer moves the pointer back and forth within the array.
This function should be passed a char *s that points to a string, which is an array of characters ending in a null character ('\0'). Then size_t len = strlen(s); sets len to the size of this string, and char *t = s+len-1; sets t to point to the last character before the null character.
Then, in the loop t=t-1; moves t backward.
Unfortunately, this loop uses t>=s as its control condition. This is intended to stop when t has been moved to the character before s, meaning it has gone back before the start point. However, the C standard only defines pointer arithmetic for elements within the array plus a special position at the end of the array. If this function is passed an s that points to the beginning of an array, then the loop will eventually make t point before the array, and the C standard does not define the resulting behavior.
Other Things to Know About Pointer Arithmetic
Any object may be treated as an array of one element. If you have some type T and some object T x;, you may set a pointer T *p = &x;, and then it is allowed to advance the pointer by one element, p = p+1;. Dereferencing that pointer with *p is not defined, but you can compare it, as in &x == p, or you can subtract one from it.
If print_reverse were passed a pointer into an array beyond the beginning, then its loop would be okay. However, that is now how it is used in the example code; print_reverse("Hello"); is not good code.
Any object may be treated as an array of characters. You can convert a pointer to any object to a pointer to unsigned char and then examine the bytes that make up an object. This is used for special purposes. You should not use it in general code while you are learning C, but you should be aware it exists.
I'm creating a simple program to see how a string populates an array.
#include <stdio.h>
#include <string.h>
#include <stddef.h>
#include <stdlib.h>
int main(void)
{
char string1[100];
int i;
printf("Enter sentence.\n");
fgets(string1, 100, stdin);
for (i=0; i<=15; i++)
puts(&string1[i]);
return 0;
}
I'm having a bit of a problem understanding how the string is populating an array. My expectation is that the string will be completely stored in string1[0] and any further indexes will come up blank. However, when I throw the loop to see if my assumption is true, it turns out that every index has been filled in by the string. Am I misunderstanding how the string is filling the array?
For the string "Hello!", the memory representation would be something like this
+-------+-------+-------+-------+-------+-------+-------+
| 'H' | 'e' | 'l' | 'l' | 'o' | '!' | '\0' |
+-------+-------+-------+-------+-------+-------+-------+
The first cell, at index 0, contains the first character. And each subsequent character is contained in a cell with an increasing index.
Library functions like puts expect you to pass the address of the first character, and then they read the string up to \0.
So if you pass simply string1 or &string1[0], it will resolve to the address of 'H'.
If you pass &string[1], it will resolve to the address of 'e', and the library function will think that is the first character, because that's the contract C strings are designed with.
Your problem is not string1 layout per se but how puts interprets it. Strings are represented by char arrays in C while their end is marked as null terminator (character with code 0):
S e n t e n c e \0
^ ^
string1 &string1[5]
&string1[5] is a pointer to a one character, but since the following character is not null terminator, following memory is interpreted as a string and nce gets printed.
You'll need to use putc and access individual characters:
putc(string1[i])
string is not stored in string1[0] but string's first character is stored at string1[0] or string starts at (string1+0). Here, &string1[0] or (string1+0) can be seen as a pointer, a pointer to C String string1.
In that sense, every valid index i of string1 will give you a valid pointer (string1 + i) which will point to some part of C String string1.
In the last for loop you are printing the suffixes of string string1 which are pointed by (string1 + 0), (string1 + 1), (string1 + 2)...
The following code works but I don't quite understand how *if (s == 0) works.
It checks if the string is 0?
Also for return(isnumber(s+1)) what is the logic behind that?
I know s is a string but I can just pass s+1 into a function? How does it even know what character I'm looking for?
int isnumber(char *s) {
if (*s == 0) {
return 1; /* Reached end, we've only seen digits so far! */
}
if(!isdigit(*s)) {
printf("The number is invalid\n");
return 0; /* first character is not a digit, so no go */
}
return(isnumber(s+1));
}
int main () {
char inbuf[LENGTH];
int i, j;
printf("Enter a string > ");
fgets(inbuf, LENGTH-1, stdin); // ignore carriage return
inbuf[strlen(inbuf)-1] = 0;
j = isnumber(inbuf);
....
}
This function is a recursive function that checks if a string contains all numbers. To understand how the code works, you must understand how C stores strings. If you have the string "123", C stores this string in memory, like this:
|-----------------------------------|
| 0x8707 | 0x8708 | 0x8709 | 0x870A |
|--------|--------|--------|--------|
| | | | |
| '1' | '2' | '3' | '\0' |
|-----------------------------------|
What C does is it breaks your sting up into characters, stores them in some arbitrary location in memory and adds a null character (\0) (ASCII 0) to the end of the string. This null character is how C knows where the string ends.
Your isnumber() function takes a char *s as a parameter. This is called a pointer. Internally, whats going on is your main() function calls isdigit() and it actually passes in the address of your string, not the string itself. This is important:
j = isnumber(inbuf);
How the compiler interprets this is call isnumber() and pass along the address of inbuf and assign the return value to j.
Now back up at the isnumber() function, its receiving the address of inbuf and assigning it to s. By placing an asterisk (*) in front of s, you are doing something called dereferencing s. Dereferencing means you want the value contained at the address of s. So the line that says if (*s == 0) is basically saying If the value contained at the address of s is equal to 0. Remember earlier I told you in memory, strings always have a terminating null (\0) character? This is how your function knows to end and return.
The next thing to understand is pointer arithmetic. Depending on your system, a char might occupy either 1 byte of memory or 2 bytes. You can find out for sure by printing a sizeof(char). But when you refer to (s+1), that is telling the computer to take the memory address pointed to by s and add to it whatever the size of a char is. So if a char is 1 byte long and s is pointed to 0x8707, then (s+1) will make s equal 0x8708 and *s will point to the '2' in our string (see my memory block diagram above). This is how we iterate through each character in the string.
Hopefully this clears up the confusion!
The statement if (*s == 0) checks to see if the char s points to is zero. In other words, it checks to see if s is a zero-length string and returns 1 if so.
The statement return (isnumber(s+1)) adds 1 to s, causing it to point to the second char in the string, and passes that to isnumber(). isnumber returns true if the string at s[1] is a digit.
In C, strings are terminated with a null character.
(*s == 0) is checking for the null terminator.
This code is a little weirder.
return(isnumber(s+1));
Since the current character is a digit, keep going...call the function again starting at the NEXT character. This is a recursive function call and there is really no need when iteration would be simpler.
I have a piece of code that has the following lines:
char* someString = "Blah"
char* someOtherString = someString;
if (someString) {
while (*someOtherString) {
printf("%d\n",someOtherString);
++someOtherString;
}
--someOtherString;
}
This prints out:
4206692
4206693
4206694
4206695
How does it even work? I get that it goes through the if statement as true because someString can be interpreted as a true condition, but how does it exit the while loop?
When I type the while loop by itself like so, I the loop goes infinitely:
while (*someOtherString) {
printf("%d\n",someOtherString);
++someOtherString;
}
Thanks for help in advanced!
The while loop exits when it finds the NUL character at the end of the "string".
Working :
In C, declaring and assigning constant strings inside double-quotes declares a null-terminated string. Memory equal to the
number of characters in side the double-quotes + 1 (for the trailing NUL character) is allocated. Each of the characters is stored sequentially in the memory block and a NUL character is automatically assigned to the end of the memory block.
Thus "Blah" is stored in memory as
B L A H \0
Intially someOtherString and someString both point to B.
someOtherString is incremented to point to the next character in each iteration of the while loop. So after 4 iterations it ends up pointing to the null character.
A while loop running on un-initialised data could run for a large
number of iterations as there is no guarantee that it will encounter a
NUL character (byte = 0x00).
Finally a list of popular functions and extensions for processing strings in C.
This is just a technique to scan through a string.
When you just tell char *string1 = "hello" the starting address of the string will be stored inside the variable char *string1 and not the string. One practice is to assign a pointer to NULL when it is not used, instead allowing the old address values in it, which now is not valid. Therefore if at some point we did string1 = NULL, then the if (string1) will be false, becaues NULL evaluates to 0.
a1 a2 a3 a4 a5 a6
+-----+-----+-----+-----+-----+-----+
string1 = | h | e | l | l | o | \0 |
+-----+-----+-----+-----+-----+-----+
Whereas when we do *string1 it is basically in the form *(string + x). Depending on the type of the pointer string1 , x elements will be skipped first, then * will do the dereferencing at the location, x elements away from the address what is stored in string1.
a1 a2 a3 a4 a5 a6
+-----+-----+-----+-----+-----+-----+
string1 = | h | e | l | l | o | \0 |
+-----+-----+-----+-----+-----+-----+
^
|
+-------------+
|
*(string1 + 3) same as string1[3]
Therefore doing *string1 will fetch the value pointed by the address stored in string1.
Therefore
if (string1)
{
while (*string1)
{
count++;
string1++;
}
}
This will enter the if iff the address stored in the string is not NULL, ie some valid address is assigned (if we follow the convention to assign NULL to unused). *string will be true until the address stored in string1 in a specific iteration points to a non-zero value. C strings are terminated with a NUL character which is '\0' and has an ASCII value 0. In each iteration we do string1++, this will increment the address value stored, and after each increment the value of string1 will point to the next element adjacent (first a1 then a2 etc). When it points to the address where '\0' is stored, *string1 will be 0 and while (*string1) is false thus breaks out of the loop.
Your string literal is stored in memory like:
{'B', 'l', 'a', 'h', '\0'}
...where '\0' is the null character (i.e. a character with a value of 0 when interpreted as an integer).
So eventually (after 4 increments), *someOtherString will evaluate to 0, which will be interpreted as false and end the loop.
I'm not sure how you're getting the infinite loop. Your cut-down versions works just fine in ideone: http://ideone.com/4fTDJb