Null terminated C character arrays - c

1. Which of the following has a null terminator character added at the end?
int main()
{
char arr[]="sample";
char arr2[6]="sample";
char arr3[7]="sample";
char* strarr="sample";
char* strarr1=arr;
char* strarr2=arr2;
char* strarr3=arr3;
return 0;
}
2. Would printf("%s",somestr) fail in case:
somestr is an array of char with no null termination character at end?
somestr is a char* pointing to a continuous location of chars with no null termination character at end?
Edit : Is there a way I can check in gdb if a char* or a char array is null terminated or not?

First, "sample" is called a string literal. It declares a const char array terminated with a null character.
Let us go on:
char arr[]="sample";
The right hand part in a const char array of size 7 (6 characters and a '\0'. The dimension of arr is deduced from its initialization and is also 7. The char array is then initialized from the literal string.
char arr2[6]="sample";
arr2 has a declared size of 6. It is initialized from a string literal of size 7: only the 6 declared position are initialized to {'s', 'a', 'm', 'p', 'l', 'e'} with no terminating null. Nothing is wrong here, except that passing arr2 to a function that expects a null terminated string invokes Undefined Behaviour.
char arr3[7]="sample";
Declared size an initialization literal string size are both 7: it is just an explicit version of the first use case. Rather dangerous because if you later add one character to the initialization string you will get a not null terminated char array.
char* strarr="sample";
Avoid that. You are declaring a non const char pointer on a string literal. While the standard declares explicitely:
If the program attempts to modify such an array, the behavior is
undefined.
strarr[3] = 'i' would then invoke Undefined Behaviour with no warning. That being said and provided you never modify the string, you have a nice null terminated string.
char* strarr1=arr;
Ok, you declare a pointer to another string. Or more exactly a pointer to the first character of another string. And it is correctly null terminated.
char* strarr2=arr2;
You have a pointer to the first character of a not null terminated char array... You could not pass arr2 to a function expecting a null terminated char array, and you cannot either pass strarr2.
char* strarr3=arr3;
You have another pointer pointing to a string. Same behaviour as strarr1.
As per how to check in gdb for the terminating null, you cannot print it directly, because gdb knows enough of C strings to automatically stop printing a string on first null character. But you can always use p arr[7] to see whether the character after the array is a null or not.
For arr2, arr2+7 is one past the array. So it is undefined what lies there and in a truely bad system, using p arr[7] could raise a signal because it could be after the end of a memory segment - but I must admit that I have never seen that...

Each of arr and arr3 contains a null terminated string allocated on the stack when the function is called.
strarr points to a null terminated string allocated in the read-only data section of the program.
strarr1 points to a null terminated string allocated on the stack when the function is called.
strarr3 points to a null terminated string allocated on the stack when the function is called.
str points to the same string as strarr1.

Related

Does directly assigning a string of char's to a char pointer on initialization automatically add a null terminator?

For example in this code:
char *ptr = "string";
Is there a null terminator in the stored in the ptr[6] address?
When I test this and print a string, it prints "string", and if I print the ptr[6] char I get ''. I wanted to test this further so I did some research and found someone saying that strlen will always crash if there is not a null terminator. I ran this in my code and it returned 6, so does this mean that assigning a string to a char pointer initializes with a null terminator or am I misunderstanding what's happening?
Yes. String literals used as pointers will always end in a NUL byte. String literals used as array initializers will too, unless you specify a length that's too small for it to (e.g., char arr[] = "string"; and char arr[7] = "string"; both will, but char arr[6] = "string"; won't).

Accessing null-terminated character

Inspired by this question.
Code:
#include <stdio.h>
int main()
{
char arr[] = "Hello";
char *ptr = arr + 5;
printf("%s\n",ptr);
}
In the above code, I have accessed null-terminated character.
So, What actually happens when accessing null terminated character in literal string? Is it Undefined behaviour?
Essentially, you're passing an empty string as the argument, so it should be treated as such.
For %s conversion specifier, with printf() family
[...]Characters from the array are
written up to (but not including) the terminating null character.[...]
In your case, the null terminator happens to appear at the first element in the array, that's it.
Just for clarification, accessing a null-terminator is OK, accessing a NULL pointer is not OK, and they both are different things!!
You are basically still accessing a null-terminated string.
It is just zero characters long, i.e. it does not contain anything to print.
Your code is basically the same as
printf("");
Compare this, not duplicate but similar question:
Effect of "+1" after the format string parameter to printf()
Nothing particular. A pointer to the null character is interpreted as a zero-length string by functions that expect a string.

Why can I store a string in the memory address of a char?

I'm starting to understand pointers and how to dereference them etc. I've been practising with ints but I figured a char would behave similarly. Use the * to dereference, use the & to access the memory address.
But in my example below, the same syntax is used to set the address of a char and to save a string to the same variable. How does this work? I think I'm just generally confused and maybe I'm overthinking it.
int main()
{
char *myCharPointer;
char charMemoryHolder = 'G';
myCharPointer = &charMemoryHolder;
printf("%s\n", myCharPointer);
myCharPointer = "This is a string.";
printf("%s\n", myCharPointer);
return 0;
}
First, you need to understand how "strings" work in C.
"Strings" are stored as an array of characters in memory. Since there is no way of determining how long the string is, a NUL character, '\0', is appended after the string so that we know where it ends.
So for example if you have a string "foo", it may look like this in memory:
--------------------------------------------
| 'f' | 'o' | 'o' | '\0' | 'k' | 'b' | 'x' | ...
--------------------------------------------
The things after '\0' are just stuff that happens to be placed after the string, which may or may not be initialised.
When you assign a "string" to a variable of type char *, what happens is that the variable will point to the beginning of the string, so in the above example it will point to 'f'. (In other words, if you have a string str, then str == &str[0] is always true.) When you assign a string to a variable of type char *, you are actually assigning the address of the zeroth character of the string to the variable.
When you pass this variable to printf(), it starts at the pointed address, then goes through each char one by one until it sees '\0' and stops. For example if we have:
char *str = "foo";
and you pass it to printf(), it will do the following:
Dereference str (which gives 'f')
Dereference (str+1) (which gives 'o')
Dereference (str+2) (which gives another 'o')
Dereference (str+3) (which gives '\0' so the process stops).
This also leads to the conclusion that what you're currently doing is actually wrong. In your code you have:
char charMemoryHolder = 'G';
myCharPointer = &charMemoryHolder;
printf("%s\n", myCharPointer);
When printf() sees the %s specifier, it goes to address pointed to by myCharPointer, in this case it contains 'G'. It will then try to get next character after 'G', which is undefined behaviour. It might give you the correct result every now and then (if the next memory location happens to contain '\0'), but in general you should never do this.
Several comments
Static strings in c are treated as a (char *) to a null terminated
array of characters. Eg. "ab" would essentially be a char * to a block of memory with 97 98 0. (97 is 'a', 98 is 'b', and 0 is the null termination.)
Your code myCharPointer = &charMemoryHolder; followed by printf("%s\n", myCharPointer) is not safe. printf should be passed a null terminated string, and there's no guarantee that memory contain the value 0 immediately follows your character charMemoryHolder.
In C, string literals evaluate to pointers to read-only arrays of chars (except when used to initialize char arrays). This is a special case in the C language and does not generalize to other pointer types. A char * variable may hold the address of either a single char variable or the start address of an array of characters. In this case the array is a string of characters which has been stored in a static region of memory.
charMemoryHolder is a variable that has an address in memory.
"This is a string." is a string constant that is stored in memory and also has an address.
Both of these addresses can be stored in myCharPointer and dereferenced to access the first character.
In the case of printf("%s\n", myCharPointer), the pointer will be dereferenced and the character displayed, then the pointer is incremented. It repeasts this until finds a null (value zero) character and stops.
Hopefully you are now wondering what happens when you are pointing to the single 'G' character, which is not null-terminated like a string constant. The answer is "undefined behavior" and will most likely print random garbage until it finds a zero value in memory, but could print exactly the correct value, hence "undefined behavior". Use %c to print the single character.

2-D character array

#include<stdio.h>
void main()
{
char a[10][5] = {"hi", "hello", "fellow"};
printf("%s",a[0]);
}
Why this code printing only hi
#include<stdio.h>
void main()
{
char a[10][5] = {"hi", "hello", "fellow"};
printf("%s",a[1]);
}
While this code is printing "hellofellow"
Why this code printing only hi
You've told printf to print the string stored at a[0], and that string happens to be "hi".
While this code is printing "hellofellow"
This one is by coincidence, in fact your code ought to be rejected by the compiler due to a constraint violation:
No initializer shall attempt to provide a value for an object not contained within the entity being initialized.
The string "fellow", specifically the 'w' at the end of it does not fit within the char[5] being initialised, and this violates the C standard. Perhaps also by coincidence, your compiler provides an extension (making it technically a non-C compiler), and so you don't see the error messages that I do:
prog.c:3:6: error: return type of 'main' is not 'int' [-Werror=main]
void main()
^
prog.c: In function 'main':
prog.c:5:37: error: initializer-string for array of chars is too long [-Werror]
char a[10][5] = {"hi", "hello", "fellow"};
^
Note that the second error message is complaining about "fellow", but not "hello". Your "hello" initialisation is valid by exception:
An array of character type may be initialized by a character string literal or UTF-8 string literal, optionally enclosed in braces. Successive bytes of the string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
The emphasis is mine. What the emphasised section states is that if there isn't enough room for a terminal '\0' character, that won't be used in the initialisation.
Your code:
char a[10][5] = {"hi", "hello", "fellow"};
Allocates 10 char [5]
"hello" takes up 5 so there is no room for the terminating \0, so it runs into "fellow"
If you try it, a [3] should be "w" because "fellow" is too big and the "w" runs over from a[2] to a[3]
Aside from being undefined behavior, it is confusing what you were trying to do
It will give undefines behaviour as string are null-terminated.
And element hello has length of 5.
Declare your array as a[10][7] then you will get intended output.
See here -https://ideone.com/c2zUs0
Why this code printing only hi
Because a[0][2] is null indicating termination thus giving you hi.
This is undefined behavior due to insufficient space to store \0 character.
Please note that the memory allocated is 5bytes per string in your array of strings. Thus, for the a[1] there is not sufficient memory to store the \0 character as all five bytes are assigned with "hello".
Thus, the subsequent memory is read until the \0 character is found.
Thus, you can change the line:
char a[10][5] = {"hi", "hello", "fellow"};
to
char a[][7] = {"hi", "hello", "fellow"};
Why this code printing only hi
This is because the \0 character is already encountered at a[0][2] and thus the reading of the characters is stopped.
What Your Code Does:
Look at the following statement:
char a[10][5] = {"hi", "hello", "fellow"};
It allocates 10 rows. 5 characters are allocated for each index of a.
What is the Problem:
Strings are Null Terminated there is always a null-terminator needed to be stored except for the given characters, so basically the used size of array is numOfCharacters+1, the extra one byte is for the null terminator. When you are initializing the array with exactly size number of characters, the null terminator is skipped. Normally the character array value is printed until the first \0(null terminator) is not found. Please also have a look at this.
The Solution:
No need to worry about this problem, all you need to do is just to set the size equal to the numOfCharactersInString + 1. You can use the following statement:
char a[10][7] = {"hi", "hello", "fellow"};
Since the largest string is "fellow" which contains 6 characters, you need to set the size 6 + 1 that is why the statement should use char a[10][7] instead of char a[10][5]
Hope it helps.
When you declare a 2-D character array as
char a[10][5] = {"hi", "hello", "fellow"};
char a[10][5] reserves memory to store 10 strings each of length 5 which means 4 characters + 1 '\0' character. A point to note is that the array elements are stored in contiguous memory locations.
a[0] points to the first string, a[1] to the second and so on.
Also when you initialize an array partially the other uninitialized elements become 0 instead of being garbage values.
Now in your case,after initialization if you try to visualize the array it would be something like
hi\0\0\0hellofello\0\0...
Now the command
printf("%s",a[0]);
prints characters starting from 'h' of "hi" and stops printing when a '\0' is encountered so "hi" is printed.
Now for the second case,
printf("%s",a[1]);
characters are printed starting from the 'h' of "hello" till a '\0' is encountered.Now the '\0' character is encountered only after printing "hellofello" and hence the output.

Arguments passed to puts function in C

I have only recently started learning C. I was going through the concept of arrays and pointers, when I came across a stumbling block in my understanding of it.
Consider this code -
#include<stdio.h>
int main()
{
char string[]="Hello";
char *ptr;
ptr=string;
puts(*ptr);
return(0);
}
It compiles, but runs into segmentation fault on execution.
The warning that I get is:
type error in argument 1 to `puts'; found 'char' expected 'pointer to char'
Now *ptr does return a character "H" and my initial impression was that it would just accept a char as an input.
Later, I came to understand that puts() expects a pointer to a character array as it's input, but my question is when I pass something like this - puts("H"), isn't that the same thing as puts(*ptr), given that *ptr does contain the character "H".
"H" is a string literal that consists of 2 bytes 'H' and '\0'. Whenever you have "H" in your code, a pointer to the memory region with 2 bytes is meant. *ptr simply returns a single char variable.
By doing puts(*str), you're dereferencing the str variable. This would then try and use the 'H' character as a memory address (since that's what str) points to, then segfault since it will be an invalid pointer (since it will probably fall outside your process' memory). This is because the puts function accepts a pointer as an argument.
What you really want is puts(str).
As an aside, the latter example puts("h") populates the string table with "h" at compile time and replaces the definition there with an implicit pointer.
The puts() function takes a pointer to a string and what you are doing is specifying a single character.
Take a look at this Lesson 9: C Strings.
So rather than doing
#include<stdio.h>
int main()
{
char string[]="Hello";
char *ptr;
ptr=string; // store address of first character of the char array into char pointer variable ptr
// ptr=string is same as ptr=&string[0] since string is an array and an
// array variable name is treated like a constant pointer to the first
// element of the array in C.
puts(*ptr); // get character pointed to by pointer ptr and pass to function puts
// *ptr is the same as ptr[0] or string[0] since ptr = &string[0].
return(0);
}
You should instead be doing
#include<stdio.h>
int main()
{
char string[]="Hello";
char *ptr;
ptr=string; // store address of first character of the char array into char pointer variable ptr
puts(ptr); // pass pointer to the string rather than first character of string.
return(0);
}
When ever you enter string in gets or want to display it using puts you had to actually pass the location of the pointer or the string
for example
char name[] = "Something";
if you want to print that
you have to write printf("%s",name); --> name actually stores the address of the string "something"
and by using puts if you want to display
puts(name) ----> same as here address is put in the arguments
No.
'H' is the character literal.
"H" is, in effect, a character array with two elements, those being 'H' and the terminating '\0' null byte.
puts is waiting as input a string pointer so it's waiting a memory address. but in your example you provided the content of the memory which is *ptr. the *ptr is the content of the memory with address ptr which is h
ptr is memory address
*ptr is the content of this memory
the input parameter of puts is an address type but you have provided a char type (content of the address)
the puts start the printing character by character starting by the address you give it as input until the memory which contain 0 and then it stop printing

Resources