Shouldn't it be impossible to point directly to text in C? - c

I am learning C and I came across the pointers.
Even though I learned more with this tutorial than from the textbook I still wonder about the char pointers.
If I program this
#include <stdio.h>
int main()
{
char *ptr_str;
ptr_str = "Hello World";
printf(ptr_str);
return 0;
}
The result is
Hello World
I don't understand how there isn't an error while compiling since the pointer ptr_str is pointing directly to the text and not to the first character of the text. I thought that only this would work
#include <stdio.h>
int main()
{
char *ptr_str;
char var_str[] = "Hello World";
ptr_str = var_str;
printf(ptr_str);
return 0;
}
So in the first example how was I pointing directly to the text?

Your code works because string literals are essentially static arrays.
ptr_str = "Hello World";
is treated by the compiler as if it were
static char __tmp_0[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0' };
ptr_str = __tmp_0;
(except trying to modify the contents of a string literal has undefined behavior).
You can even apply sizeof to a string literal and you'll get the size of the array: sizeof "Hello" is 6, for example.

In the context of assignment to a char pointer the 'value' of a string literal is the address of its first character.
so
ptr_str = "Hello World";
sets ptr_str to the address of the 'H'

Why won't the first one work? It will work as you have seen.
String literals are arrays. From §6.4.5p6 C11 Standard N1570
The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.
Now in the first case literal array decayed into pointer to first element - so decayed pointer will basically be pointing to 'H'. You assigned that pointer to ptr_str. Now printf will expect a format specifier and the corresponding argument. Here it will be %s and corresponding argument would be char*. And printf will print every character until it reached the \0. That's all it happened. This is how you ended up pointing directly to the text.
Note that second case is quite different from first case in that - second case a copy is being made which can be modified (Trying to modify the first one would be undefined behavior). We are basically initializing a char array with the content of the string literal.

Related

Are C constant character strings always null terminated?

Are C constant character strings always null terminated without exception?
For example, will the following C code always print "true":
const char* s = "abc";
if( *(s + 3) == 0 ){
printf( "true" );
} else {
printf( "false" );
}
A string is only a string if it contains a null character.
A string is a contiguous sequence of characters terminated by and including the first null character. C11 §7.1.1 1
"abc" is a string literal. It also always contains a null character. A string literal may contain more than 1 null character.
"def\0ghi" // 2 null characters.
In the following, though, x is not a string (it is an array of char without a null character). y and z are both arrays of char and both are strings.
char x[3] = "abc";
char y[4] = "abc";
char z[] = "abc";
With OP's code, s points to a string, the string literal "abc", *(s + 3) and s[3] have the value of 0. To attempt to modified s[3] is undefined behavior as 1) s is a const char * and 2) the data pointed to by s is a string literal. Attempting to modify a string literal is also undefined behavior.
const char* s = "abc";
Deeper: C does not define "constant character strings".
The language defines a string literal, like "abc" to be a character array of size 4 with the value of 'a', 'b', 'c', '\0'. Attempting to modify these is UB. How this is used depends on context.
The standard C library defines string.
With const char* s = "abc";, s is a pointer to data of type char. As a const some_type * pointer, using s to modify data is UB. s is initialized to point to the string literal "abc". s itself is not a string. The memory s initial points to is a string.
In short, yes. A string constant is of course a string and a string is by definition 0-terminated.
If you use a string constant as an array initializer like this:
char x[5] = "hello";
you won't have a 0 terminator in x simply because there's no room for it.
But with
char x[] = "hello";
it will be there and the size of x is 6.
The notion of a string is determinate as a sequence of characters terminated by zero character. It is not important whether the sequence is modifiable or not that is whether a corresponding declaration has the qualifier const or not.
For example string literals in C have types of non-constant character arrays. So you may write for example
char *s = "Hello world";
In this declaration the identifier s points to the first character of the string.
You can initialize a character array yourself by a string using a string literal. For example
char s[] = "Hello world";
This declaration is equivalent to
char s[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' };
However in C you may exclude the terminating zero from an initialization of a character array.
For example
char s[11] = "Hello world";
Though the string literal used as the initializer contains the terminating zero it is excluded from the initialization. As result the character array s does not contain a string.
In C, there isn't really a "string" datatype like in C++ and Java.
Important principle that every competent computer science degree program should mention: Information is symbols plus interpretation.
A "string" is defined conventionally as any sequence of characters ending in a null byte ('\0').
The "gotcha" that's being posted (character/byte arrays with the value 0 in the middle of them) is only a difference of interpretation. Treating a byte array as a string versus treating it as bytes (numbers in [0, 255]) has different applications. Obviously if you're printing to the terminal you might want to print characters until you reach a null byte. If you're saving a file or running an encryption algorithm on blocks of data you will need to support 0's in byte arrays.
It's also valid to take a "string" and optionally interpret as a byte array.

C String manipulation pointer vs array notation [duplicate]

This question already has answers here:
getting segmentation fault in a small c program
(3 answers)
Closed 7 years ago.
Why does the first version make the program crash, while the second one doesn't? Aren't they the same thing?
Pointer Notation
char *shift = "mondo";
shift[3] = shift[2];
Array Notation
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
MWE
int main( void )
{
char *shift = "mondo";
shift[3] = shift[2];
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
shift[3] = shift[2];
return 0;
}
No! This is one of the important issues in C. In the first, you create a pointer to a read-only part of memory, i.e. you can not change it, only read it. The second, makes an array of characters, i.e. a part of memory of continuous characters where you can have both read and write access, meaning you can both read and change the values of the array.
First one points to a string literal (usually in a read only section of code, should really be const char * but able to get away with it due to historical reasons)|.
The second one creates an array and then populates that array.
Therefore they are not the same
The first is allocating memory in the .TEXT segment while the second is putting it into the .BSS. Memory in the .TEXT segment is, effectively, read only or const:
char *string = "AAAA";
This creates what is effectively a const char * since the memory will be allocated in the .TEXT segment as a string literal. Since this will typically be marked read-only, an attempt to write to it will generate an access violation or segmentation fault.
You want to do this:
char string[] = "AAAA";
This will work as expected and allocate memory for a string of four capital As and use the variable string as a pointer to the location.
This creates a pointer to an existing string:
char *shift = "mondo";
This creates a new array of characters:
char shift[] = {'m', 'o', 'n', 'd', 'o', '\0'};
In the second case, you are allowed to modify the characters because they are the ones that you just created.
In the first case, you are just pointing to an existing string, which should never be modified. The details of where the string is stored is up to the particular compiler. For example, it can store the string in unmodifyable memory. The compiler is also allowed to do tricks to save space. For example:
char *s1 = "hello there";
char *s2 = "there";
s2 might actually point to the same letter 't' that is at the seventh position of the string that s1 points to.
To avoid confusion, prefer to use const pointers with string literals:
const char *shift = "mondo";
This way, the compiler will let you know if you accidentally try to modify it.
Whenever you define a string using
char * str = "hello";
This is implicitly expressed by compiler
const char * str= "hello";
Which makes this symbol goes to read only location of program memory.
But in case of array the same is interpreted as
char const *array[];
That's why compiler screams when user try to change base address of array.
This is implicit done by compiler

How to terminate a character pointer at a certain location in c?

I'm trying to terminate a character pointer in c, at a specific location by setting the null terminator to it.
for examples if I have a char pointer
char *hi="hello";
I want it to be "hell" by setting the o to null.
I have tried doing this with strcpy with something like
strcpy(hi+4, "\0");
But it is not working.
"hello" is a string literal so it cannot modified, and in your code, hi points to the first element in such a literal. Any attempt to modify the thing it points to is undefined behaviour.
However, if you create your own char array, you can insert a null terminator at will. For example,
char hi[] = "hello"; // hi is array with {'h', 'e', 'l', 'l', 'o', '\0'}
hi[4] = '\0';
Here, hi is a length 6 array of char which you own and whose contents you can modify. After setting the 5th element, it contains {'h', 'e', 'l', 'l', '\0', '\0'}, and printing it would yield hell.
Point 1:
In your code
char *hi="hello";
hi is a pointer to a string literal. It may not be modifiable. You've to use a char array instead and initialize that with the same string literal. Then you can modify the contenets of that array as you may want.
Point 2:
You don't need strcpy() to copy a single char. You can simply assign the value using the assignment operator =.
Note: You don't terminate a pointer, you terminate achar array with a null-terminator to make that a string.
If the string is a literal you can't modify it. Otherwise:
To terminate a C string after 4 characters you could use:
*(he+4) = 0;
or
he[4] = 0;
he[4] = '\0';
or, since strcpy() copies all the characters specified and then appends a '\0' character:
strcpy(he+4, "");
but this is rather obfuscated.

In a C function what's the difference between char and string?

I'm using a template from my teacher and at the beginning of the code is says:
#include "lab8.h"
void main(void)
{
int response;
int count;
string words[MAX_COUNT];
Later on in the function, a whole lot of words get put inside the words string. So I was like looking at that last line and got confused. I thought char declared strings? What does that last line even do? I also noticed in a couple of function parameter lists later on, there was entered "string words" instead of what I expected that mention char or something.
EDIT:
typedef char string[MAX_LENGTH];
had been written in the .h file didn't see it.
C does not have a basic data type called string.
Check the lab8.h file carefully. Usually, string should be a typedef of unsigned char.
Essentially, string words[MAX_COUNT]; defines an array of variable type string containing MAX_COUNT number of variables.
C does not have a dedicated string data type. In C, a string is a sequence of character values followed by a zero-valued byte. Strings are stored as arrays of char, but not all arrays of char contain strings.
For example,
char word[] = { 'h', 'e', 'l', 'l', 'o', 0 };
stores the string "hello" in the array variable word. The array size is taken from the size of the initializer, which is 6 (5 characters plus the 0 terminator). The zero-valued byte serves as a sentinel value for string handling functions like strlen, strcpy, strcat, and for arguments to printf and scanf that use the %s and %[ conversion specifiers.
By contrast,
char arr[] = { 'h', 'e', 'l', 'l', 'o' };
stores a sequence of character values, but since there's no terminating 0-valued byte, this sequence is not considered a string, and you would not want to use it as an argument to any string-handling function (since there's no terminator, the function has no way of knowing where the string ends and will wind up attempting to access memory outside of the array, which can lead to anything from garbage output to a crash).
Without seeing the contents of lab8.h, I'm going to speculate that the string type is a typedef for an array of char, something like
#define MAX_STRING_LENGTH 20 // or some other value
typedef char string[MAX_STRING_LENGTH];
Thus, an array of string is an array of arrays of char; it would be equivalent to
char words[MAX_COUNT][MAX_STRING_LENGTH];
So each words[i] is an N-element array of char.

strtok - char array versus char pointer [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
strtok wont accept: char *str
When using the strtok function, using a char * instead of a char [] results in a segmentation fault.
This runs properly:
char string[] = "hello world";
char *result = strtok(string, " ");
This causes a segmentation fault:
char *string = "hello world";
char *result = strtok(string, " ");
Can anyone explain what causes this difference in behaviour?
char string[] = "hello world";
This line initializes string to be a big-enough array of characters (in this case char[12]). It copies those characters into your local array as though you had written out
char string[] = { 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' };
The other line:
char* string = "hello world";
does not initialize a local array, it just initializes a local pointer. The compiler is allowed to set it to a pointer to an array which you're not allowed to change, as though the code were
const char literal_string[] = "hello world";
char* string = (char*) literal_string;
The reason C allows this without a cast is mainly to let ancient code continue compiling. You should pretend that the type of a string literal in your source code is const char[], which can convert to const char*, but never convert it to a char*.
In the second example:
char *string = "hello world";
char *result = strtok(string, " ");
the pointer string is pointing to a string literal, which cannot be modified (as strtok() would like to do).
You could do something along the lines of:
char *string = strdup("hello world");
char *result = strtok(string, " ");
so that string is pointing to a modifiable copy of the literal.
strtok modifies the string you pass to it (or tries to anyway). In your first code, you're passing the address of an array that's been initialized to a particular value -- but since it's a normal array of char, modifying it is allowed.
In the second code, you're passing the address of a string literal. Attempting to modify a string literal gives undefined behavior.
In the second case (char *), the string is in read-only memory. The correct type of string constants is const char *, and if you used that type to declare the variable you would get warned by the compiler when you tried to modify it. For historical reasons, you're allowed to use string constants to initialize variables of type char * even though they can't be modified. (Some compilers let you turn this historic license off, e.g. with gcc's -Wwrite-strings.)
The first case creates a (non const) char array that is big enough to hold the string and initializes it with the contents of the string. The second case creates a char pointer and initializes it to point at the string literal, which is probably stored in read only memory.
Since strtok wants to modify the memory pointed at by the argument you pass it, the latter case causes undefined behavior (you're passing in a pointer that points at a (const) string literal), so its unsuprising that it crashes
Because the second one declares a pointer (that can change) to a constant string...
So depending on your compiler / platform / OS / memory map... the "hello world" string will be stored as a constant (in an embedded system, it may be stored in ROM) and trying to modify it will cause that error.

Resources