Is this undefined behaviour ( working with string literal) - c

#include<stdio.h>
int main()
{
char *s = "Abc";
while(*s)
printf("%c", *s++);
return 0;
}
I have seen this (on a site) as a correct code but I feel this is undefined behavior.
My reasoning:
Here s stores the address of the string literal Abc. So while traversing through the while loop :
Iteration - 1:
Here *(s++) increments the address stored in s by 1 and returns the non-incremented address (i.e the previous/original value of s). So, no problem everything works fine and Abc is printed.
Iteration - 2:
Now s points to a completely different address (which may be either valid or not). Now when trying to perform while(*s) isn't it undefined behavior ?
Any help would be really appreciated!

No. There's no undefined behaviour here.
*s++ is evaluated as *(s++) due to higher precedence of postfix increment operator than the dereference operator. So the loop simply iterates over the string and prints the bytes and stop when it sees the null byte.
Now s points to a completely different address (which may be either valid or not). Now when trying to perform while(*s) isn't it undefined behavior ?
No. In the first iteration s points to the address at the char A and at b in the next and at c in the next. And the loop terminates when s reaches the null byte at end of the string (i.e. *s is 0).
Basically, there's no modification of the string literal. The loop is functionally equivalent to:
while(*s) {
printf("%c", *s);
s++;
}

Iteration - 1:
Here *(s++) increments the address stored in s by 1 and returns the non-incremented address (i.e the previous/original value of s). So, no problem everything works fine and Abc is printed.
No, “Abc” is not printed. %c tells printf to expect a character value and print that. It prints a single character, not a string. Initially, s points to the first character of "Abc". s++ increments it to point to the next character.
Iteration - 2:
Now s points to a completely different address (which may be either valid or not). Now when trying to perform while(*s) isn't it undefined behavior ?
In iteration 2, s is pointing to “b”.
You may have been thinking of some char **p for which *p had been set to a pointer to "abc". In that case, incrementing p would change it to point to a different pointer (or to uncontrolled memory), and there would be a problem. That is not the case; for char *s, s points to a single character, and incrementing it adjusts it to point to the next character.

Now s points to a completely different address
Indeed, it is a completely different but well defined address. s referenced the next char of the string literal. So it just adds 1 to the pointer.
Because string literal is nul (zero) terminated the while loop will stop when s will reference it.
There is no UB.

Related

How to understand strings with pointers

I have been studying the C language for the last few months. I'm using a book and I have this exercise:
char vector[N_STRINGS][20] = {"ola", "antonio", "susana"};
char (*ptr)[20] = vector;
char *p;
while(ptr-vector<N_STRINGS)
{
p = *ptr;
while(*p)
putchar(*p++);
putchar('\n');
ptr++;
}
I understand everything expect the while(*p)! I can't figure out what the while(*p) is doing.
The variable p in your code is defined as a pointer to a char. The get the actual char value that p points to, you need to dereference the pointer, using the * operator.
So, the expression in your while loop, *p evaluates - at the beginning of each loop - to the char variable that p is currently pointing to. Inside the loop, the putchar call also uses this dereference operator but then increments the pointer's value so, after sending that character to the output, the pointer is incremented (the ++ operator) and it then points to the next character in the string.
Conventionally (in fact, virtually always), character strings in C are NUL-terminated, meaning that the end of the string is signalled by having a character with the value of zero at the end of the string.
When the while loop in your code reaches this NUL terminator, the value of the expression *p will thus be ZERO. And, as ZERO is equivalent to a logical "false" in C (any non-zero value is considered "true"), the while loop will end.
Feel free to ask for further clarification and/or explanation.
From the C Standard (6.8.5 Iteration statements)
4 An iteration statement causes a statement called the loop body to be
executed repeatedly until the controlling expression compares equal to
0.
In this part of the program
p = *ptr;
while(*p)
//…
the pointer p points to the first character of a current string. String in C is a sequence of characters terminated by the zero character '\0'.
So let's for example the pointer initially points to the first character of the string "ola". The string is represented in the corresponding character array like
{ 'o', 'l', 'a', '\0' }
The condition in the loop
while(*p)
may be rewritten like
while(*p != 0 )
So the loop will be executed for all characters of the string except the last zero-terminated character and there will be outputted the first three characters of the string.
Pay attention to that (6.5.9 Equality operators)
3 The == (equal to) and != (not equal to) operators are analogous to
the relational operators except for their lower precedence.108) Each
of the operators yields 1 if the specified relation is true and 0 if it
is false. The result has type int. For any pair of operands, exactly
one of the relations is true.

Clarification about precedence of operators

I got this snippet from some exercises and the question: which is the output of following code:
main()
{
char *p = "ayqm";
printf("%c", ++*(p++));
}
My expected answer was z but the actual answer was in fact b. How is that possible?
Later edit: the snippet is taken as it is from an exercise and did not focus on the string literal or syntax issues existent in other than the printf() code zone.
As posted, the program has multiple problems:
it tries to modify the string constant "ayqm", which described as undefined behavior in the C Standard.
it uses printf without a proper declaration, again producing undefined behavior.
its output is not terminated with a newline, causing implementation defined behavior.
the prototype for main without a return type is obsolete, no longer supported by the C Standard.
incrementing characters produces implementation defined behavior. If the execution character set is ASCII, 'a'+1 does produce 'b', but it is not guaranteed by the C Standard. Indeed in the EBCDIC character set still used in older mainframe computers letters are in a single monotonic sequence (ie: 'a'+1 == 'b' but 'i'+1 != 'j' in this character set).
Here is a corrected version:
#include <stdio.h>
int main(void) {
char str[] = "ayqm";
char *p = str;
printf("%c\n", ++*(p++));
return 0;
}
p is post-incremented, which means the current value of p is used for the * operator and the value of p is incremented before the next sequence point, namely the call to the printf function. The character read through p, 'a' is then incremented, which may or may not produce 'b' depending on the execution character set.
After printf returns to the main function, p points to str[1] and str contains the string "byqm".
Your program is having undefined behavior because it is trying to modify the string literal "ayqm". As per the standard attempting to modify a string literal results in undefined behavior because it may be stored in read-only storage.
The pointer p is pointing to string literal "ayqm". This expression
printf ("%c", ++*(p++));
end up attempting to modify the string literal that pointer p is pointing to.
An undefined behavior in a program includes it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended.

Post increment with pointers in while loop in C

I am writing the strcat function
/*appends source string to destination string*/
#include <stdio.h>
int main()
{
char srcstr[100], deststr[100];
char *psrcstr, *pdeststr;
printf("\n Enter source string: ");
gets(srcstr);
printf("\n Enter destination string: ");
gets(deststr);
pdeststr = deststr;
psrcstr = srcstr;
while(*pdeststr++)
;
while(*pdeststr++ = *psrcstr++)
;
printf("%s", deststr);
return 0;
}
For srcstr = " world" and deststr = "hello" I get hello, when I expect to see hello world, which is what I see if I change the first while so
while(*pdeststr);
pdeststr++;
why can't I write all in one line in the first while, just like in the second while?
Your one line loop
while(*pdeststr++);
Is equivalent to
while(*pdeststr)
pdeststr++;
pdeststr++;
Because the postincrement operator is executed before the condition is tested, but after the value for the test is determined.
So you could cater for this with
while(*pdeststr++);
pdeststr--;
Mandatory introduction: do not use gets(), use fgets()!.
Your problem is here:
while(*pdeststr++)
;
The side effect of incrementing is carried out in your last iteration step (when pdeststr points to the NUL terminator), so after this loop, pdeststr points one after your NUL terminator. Write it like this instead:
while(*pdeststr) ++pdeststr;
The boolean value for the while condition is computed before the ++ post-increment.
So when your while loop exits, the post-increment operator is executed one last time, hence pdeststr is pointing right after the null terminator char that follows the word "hello".
Then the rest of the program appends more data after that null char. You end up with the string "hello\0world\0". The print function thinks the string ends at the first null char it encounters.
You have an extra incrementation that point you after the NULL char, and so finally you could print only the first string.
By precedence the postfix increment operator (ptr++) is higher than the indirection (dereference) operator (*ptr). Therefore the
while(*pdeststr++);
will always increment the pdeststr first then evaluate the previously pointed value. As an outcome, when the result of evaluation is 0 the pdeststr actually points to the next element, so there will be a null-terminator character ('\0') between your concatenated words.
As a one-liner solution with while loop you can use the short-circuit evaluation as follows:
while(*pdeststr && pdeststr++);
The code snippet above will stop when *foo results 0 and won't evaluate the foo++ part.
My conclusion is that at the end, deststr = "hello\0 world\0" and because of that printf("%s", deststr); find the first \0 and gives as output hello

String Concatenation in C with Pointers

So this is the standard string concatenation code in C:
char *stringcat(char *dest, const char *src){
char *save=dest;
while(*save !='\0'){
save++;
}
while(*src!='\0'){
*save=*src;
save++;
src++;
}
*save='\0';
return dest;
}
My question is why when we replace the first while loop with the following:
while(*save++){};
It does not work, but, when replaced with:
while(*++save){};
It does work. In the first two instances, save points to the null terminator at the end of dest at the end, which is then overwritten by the first character in src. However, in the third instance, it seems like save will be pointing to the character after the null terminator, which is weird.
If you make it while (*save++) {}, the operation you are repeating is: load byte, increment pointer, check whether byte was zero. Therefore, what will happen at the end of the string is: load null byte, increment pointer, check whether byte was zero, see it was, exit loop. So save will be pointing just after the null byte. But you want to start copying the second string on top of that null byte, not after it.
(Is it possible that you have the meanings of ++save and save++ interchanged in your head? If so, here's a useful mnemonic: ++save means increment, then load the value; save++ means load the value, then increment. So the order in which the variable name and the ++ appear corresponds to the order of operations.)
while(*save++)
First dereference save and compare that value to 0, then increment save.
while(*save !='\0'){
save++;
}
First dereference save, compare to 0 and only increment if non-zero.
See the difference?

How do you explain the output from this function-like macro `slice` in C?

#include <stdio.h>
#define slice(bare_string,start_index) #bare_string+start_index
#define arcane_slice(bare_string,start_index) "ARCANE" #bare_string+start_index
int main(){
printf("slice(FIRSTA,0)==> `%s`\n",slice(FIRSTA,0));
printf("slice(SECOND,2)==> `%s`\n",slice(SECOND,2));
printf("slice(THIRDA,5)==> `%s`\n",slice(THIRDA,5));
printf("slice(FOURTH,6)==> `%s`\n",slice(FOURTH,6));
printf("slice(FIFTHA,7)==> `%s`\n",slice(FIFTHA,7));
printf("arcane_slice(FIRSTA,0)==> `%s`\n",arcane_slice(FIRST,0));
printf("arcane_slice(SECOND,2)==> `%s`\n",arcane_slice(SECOND,2));
printf("arcane_slice(THIRDA,5)==> `%s`\n",arcane_slice(THIRDA,5));
printf("arcane_slice(FOURTH,6)==> `%s`\n",arcane_slice(FOURTH,6));
printf("arcane_slice(FIFTHA,7)==> `%s`\n",arcane_slice(FIFTHA,7));
return 0;
}
OUTPUT:
slice(FIRSTA,0)==> `FIRSTA`
slice(SECOND,2)==> `COND`
slice(THIRDA,5)==> `A`
slice(FOURTH,6)==> ``
slice(FIFTHA,7)==> `slice(FIFTHA,7)==> `%s`
`
arcane_slice(FIRSTA,0)==> `ARCANEFIRST`
arcane_slice(SECOND,2)==> `CANESECOND`
arcane_slice(THIRDA,5)==> `ETHIRDA`
arcane_slice(FOURTH,6)==> `FOURTH`
arcane_slice(FIFTHA,7)==> `IFTHA`
I have the above C code that I need help on. I am getting weird behaviour from
the function-like macro slice that is supposed to 'slice' from a passed index
to the end of the string. It does not slice in the real sense but passes
a pointer from a certain point to printf which starts printing from that
address. I have managed to figure out that in arcane_slice the strings
are concatenated first then 'sliced'. I also have figured out that when start_index
is equal to 6 printf starts printing from the null byte and that is why
you get the 'empty' string. The strange part is when start_index is 7. It prints
the first argument to printf(interpolator string) concatendated with the passed bare string in both.
arcane_slice and slice(as shown in the 5th and 10th lines in the output)
Why is that so?
My wildest guess is that when the start_index exceeds the length of the strings,
the pointer points to the start of the data segment in the program's address space. But
then you could counter that with "why didn't it start printing from FIRSTA"
Not any "data segment", the stack. This is what I remember: when C calls a function it first puts data on stack, first variable arguments, then the format, all being the addresses to the memory sequentially allocated with your text. In that block of memory, the last argument (c-string) goes first, and the first goes last, thus:
Memory:
"FIFTHA\0slice(FIFTHA,7)==> `%s`\n\0"
Arguments:
<pointer-to-"FIFTHA"> <pointer-to-"slice...">
Since you overincrement the first one it skips the '\0' character and points at the format as well.
Try to experiment with this with more placeholders, like
printf("1: %s, 2: %s\n", slice(FIFTHA,7), slice(FIFTHA,6));
slice(bare_string,start_index) #bare_string+start_index
you are passing a string and bare_string stores the starting address of string which you have passed and then you returning changed pointer location which is bare_string+start_index
char str[6]="Hello";
char *ptr =str;
printf("%s\n",str);//prints hello
printf("%s\n",str+1);//prints ello
printf("%s\n",str+2);//prints llo
printf("%s\n",str+3);//prints lo
printf("%s\n",str+4);//prints o
printf("%s %c=%d \n",str+5,*(str+5),*(str+5));//prints Null
printf("%s %c=%d \n",str+6,*(str+6),*(str+6));//prints Null or may be Undefined behavior
printf("%s %c=%d \n",str+7,*(str+7),*(str+7));//prints Null or may be Undefined behaviour
the same scenario is happing in your case.
Test Code:
#include<stdio.h>
main()
{
char str[6]="Hello";
char *ptr =str;
printf("%s\n",str);//prints hello
printf("%s\n",str+1);//prints ello
printf("%s\n",str+2);//prints llo
printf("%s\n",str+3);//prints lo
printf("%s\n",str+4);//prints o
printf("%s %c=%d \n",str+5,*(str+5),*(str+5));//prints Null
printf("%s %c=%d \n",str+6,*(str+6),*(str+6));//prints Null or may be Undefined behavior
printf("%s %c=%d \n",str+7,*(str+7),*(str+7));//prints Null or may be Undefined behaviour
}
You have answered your question yourself. "FIFTHA"+7 gives you a pointer outside the string object, which is undefined behavior in C.
There's no easy way to get a more Python-like behavior for such "slices" in C. You could make it work for indexes up to a certain upper limit by adding a suffix to your string, full of zero bytes:
#define slice(bare_string,start_index) ((#bare_string "\0\0\0\0\0\0\0")+(start_index))
Also, when using macros, it's good practice (and avoids bugs too) to use parentheses excessively.
#define slice(bare_string,start_index) ((#bare_string)+(start_index))
#define arcane_slice(bare_string,start_index) (("ARCANE" #bare_string)+(start_index))

Resources