Is it a bug in str_replace? - c

I searched for a string replacement function and found this question
What is the function to replace string in C?
If I use the code from the answer, it works but it looks wrong and gives a warning:
/home/dac/osh/util.c: In function ‘str_replace’:
/home/dac/osh/util.c:867:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
for (count = 0; tmp = strstr(ins, rep); ++count) {
It looks like it's maybe a bug with = and == . Is it a bug or did I misunderstand? Should it be == instead?

No, it's not. In this case, the value of tmp is actually intended to be used as the condition.
The return value of strstr:
char * strstr(char * str1, const char * str2 );
Returns a pointer to the first occurrence of str2 in str1, or a null pointer if str2 is not part of str1.
To remove the warning, try this:
for (count = 0; (tmp = strstr(ins, rep)) != NULL; ++count) {

No, it's not a bug. As per the body of the loop:
for (count = 0; tmp = strstr(ins, rep); ++count) {
ins = tmp + len_rep;
}
it actually uses tmp for something. The continuation condition in that for statement will assign the result of strstr() to tmp then execute the body as long as it's non-zero (i.e., as long as it found the string). That's because strstr() returns NULL only if the string cannot be found.
I suspect this is just gcc being paranoid in that it realises the continuation condition (the middle bit) on a for statement usually tends to be a comparison and you may have accidentally used = rather then ==.
That's why the diagnostic states warning: suggest ... rather than error: what the heck? :-) If you want to get rid of the warning (which isn't a bad position to take), simply do what it suggests and surround the entire continuation condition with parentheses.

[ strstr reference ] states the return value for char * strstr (char *str1,const char *str2 ) is :
A pointer to the first occurrence in str1 of the entire sequence of
characters specified in str2, or a null pointer if the sequence is not
present in str1.
Now for a little more C terminology, when you do :
tmp = strstr(ins, rep)
the main intent of C is to evaluate the expression as a whole and C evaluates it to the return value of strstr(ins, rep) here. The side effect is assigning this return value to tmp. The error :
suggest parentheses around assignment used as truth value
is a way gcc helps you to avoid a careless mistake, say typing a=b instead of a==b, I believe. Note that in the first case the value of b is used as truth value, but in the second case the result of is a equal b is used a truth value. By putting a () around tmp = strstr(ins, rep) you give the compiler the green signal to evaluate the value of the expression as truth value.
Side Note :
Putting () around tmp = strstr(ins, rep) makes it a full expression, and full expression is considered as a sequence point. A sequence point is a point in program execution at which all side effects are evaluated before going on to the next step.

Related

user defined concatenation function

I'm learning C and I came across the below code which implements concatenation but I am struggling to understand the second portion despite recapping pointer/increment precedence and associativity.
I have run examples of all the different combinations of dereferencing and post/pre increment and now recall that pre-increment with dereferencing is right to left associative e.g *++q where the inc would occur first. The page I learned from stated that post-increment with the dereference operator has the increment as higher precedence and goes on to say that the associativity of this example is left to right. I don't particularly know why it mentions the associativity as I understand precedence is regarded before it e.g *p++ would increment before its dereference.
The below code starts by recursively calling itself with a pre-inc on dest to get to the end of the destination string which is denoted by it reaching and dereferencing the null byte. At this point, I lose track because I would have thought it would simply the dereferenced dest to the dereferenced src however it post increments it which based off what I've learned would increment to the memory location after the null byte dereference it and proceed to assign the string "eeksfor" to the position above the null byte in dest. Not only does this confuse me but the program goes on to call itself in another ternary operator / return 0 for seemingly no reason.
Thanks
/* my_strcat(dest, src) copies data of src to dest. To do so, it first reaches end of the string dest using recursive calls my_strcat(++dest, src). Once end of dest is reached, data is copied using
(*dest++ = *src++)? my_strcat(dest, src). */
void my_strcat(char *dest, char *src)
{
(*dest)? my_strcat(++dest, src): (*dest++ = *src++)? my_strcat(dest, src): 0 ;
}
/* driver function to test above function */
int main()
{
char dest[100] = "geeksfor";
char *src = "geeks";
my_strcat(dest, src);
printf(" %s ", dest);
getchar();
}
I ran the program and it does what it is expected to do. That is it returns the string "geeksforgeeks" so clearly I'm just not understanding something
Ok, so you understand the first recursive bit fine (keep incrementing dst to find the end).
Once you've found the end, it's time to start copying. The code it's using to copy one byte is: (*dest++ = *src++)
It might help your understanding to expand that code to something like: *dest = *src; dest += 1; src += 1
(because they are post-increment operators)
Now, a normal 'copy' function would repeat that statement in a while or for loop, but because we are cleverly showing off, we use a second ternary operator and recursion. The final '0' doesn't really do anything, but it needs to be there as the "else" part of the ternary operator.
While this may not be the worst implementation of strcat ever written, it's certainly a strong contender.

C language: find output of given program

So i have this main:
#define NUM 5
int main()
{
int a[NUM]={20,-90,450,-37,87};
int *p;
for (p=a; (char *)p < ((char *)a + sizeof(int) * NUM); ) //same meaning: for (p=a; p<a+NUM;)
*p++ = ++*p < 60 ? *p : 0; //same meaning: *(p++)=++(*p)<60?*p:0;
for(p=a; (char *)p < ((char *)a + sizeof(int) * NUM); )
printf("\n %d ", *p++);
return 0;
}
And i need to find what is the output.
So after try to understand without any idea i run it and this is the output:
21
-89
0
-36
0
So i will glad to explanation how to solve this kind of questions (i have exam soon and this type of questions probably i will see..)
EDIT:
at the beginning i want to understand what the first forstatement doing:
This jump 1 integer ? and what this going inside the block ?
And what is the different between *p++ and ++*p
The question is similar to Why are these constructs (using ++) undefined behavior in C? although not an exact duplicate due to the (subtle) sequence point inside the ?: operator.
There is no predictable output since the program contains undefined behavior.
While the sub-expression ++*p is sequenced in a well-defined way compared to *p because of the internal sequence point of the ?: operator, this is not true for the other combinations of sub-expressions. Most notably, the order of evaluation of the operands to = is not specified:
C11 6.5.15/3:
The evaluations of the operands are unsequenced.
*p++ is not sequenced in relation to ++*p. The order of evaluation of the sub-expressions is unspecified, and since there are multiple unsequenced side-effects on the same variable, the behavior is undefined.
Similarly, *p++ is not sequenced in relation to *p. This also leads to undefined behavior.
Summary: the code is broken and full of bugs. Anything can happen. Whoever gave you the assignment is incompetent.
at the beginning i want to understand what the first for statement doing
This is what one would call code obfuscation... The difficult part is obviously this one:
(char *)p < ((char *)a+sizeof(int)*NUM);
OK, we convert p to a pointer to char, then compare it to another pointer retrieved from array a that points to the first element past a: sizeof(int)*NUM is the size of the array - which we could have gotten much more easily by just having sizeof(a), so (char*)p < (char*)a + sizeof(a)
Be aware that comparing pointers other than with (in-)equality is undefined behaviour if the pointers do not point into the same array or one past the end of the latter (they do, in this example, though).
Typically, one would have this comparison as p < a + sizeof(a)/sizeof(*a) (or sizeof(a)/sizeof(a[0]), if you prefer).
*p++ increments the pointer and dereferences it afterwards, it is short for p = p + 1; *p = .... ++*p, on the other hand first dereferences the pointer and increments the value it is pointing to (note the difference to *++p, yet another variant - can you get it yourself?), i. e. it is equivalent to *p = *p + 1.
The entire line *p++ = ++*p<60 ? *p : 0; then shall do the following:
increment the value of *p
if the result is less than 60, use it, otherwise use 0
assign this to *p
increment p
However, this is undefined behaviour as there is no sequence point in between read and write access of p; you do not know if the left or the right side of the assignment is evaluated first, in the former case we would assign a[i] = ++a[i + 1], in the latter case, a[i] = ++a[i]! You might have gotten different output with another compiler!!!
However, these are only the two most likely outputs – actually, if falling into undefined behaviour, anything might happen, the compiler might just to ignore the piece of code in question, decide not to do anything at all (just exit from main right as the first instruction), the program might crash or it could even switch off the sun...
Be aware that one single location with undefined behaviour results in the whole program itself having undefined behaviour!
Short answer: because of this line
*p++ = ++*p<60 ? *p : 0;
it is impossible to say how the program behaves. When we access *p on the right-hand side, does it use the old or the new value of p, that is, before or after the p++ on the left-hand side gets to it? There is no rule in C to tell us. What there is instead is a rule that says that for this reason the code is undefined.
Unfortunately the person setting the question didn't understand this, thinks that "tricky" code line this is something to make a puzzle about, instead of something to be avoided at all costs.
The only way to really understand this kind of stuff (memory management, pointer behaviour, etc.) is to experiment yourself. Anyway, I smell someone is trying to seem clever fooling students, so I will try to clarify a few things.
int a[NUM]={20,-90,450,-37,87};
int *p;
This structure in memory would be something like:
This creates a vector of five int, so far, so good. The obvious move, given that data, is to run over the elements of a using p. You would do the following:
for(p = a; p < (a + NUM); ++p) {
printf("%d ", *p);
}
However, the first change to notice is that both loops convert the pointers to char. So, they would be:
for (p=a;(char *)p<((char *)a+sizeof(int)*NUM); ++p) {
printf("%d ", *p);
}
Instead of pointing to a with a pointer to int the code converts pto a pointer to char. Say your machine is a 32bit one. Then an int will probably occupy four bytes. With p being a pointer to int, when you do ++p then you effectively go to the next element in a, since transparently your compiler will jump four bytes. If you convert the int pointer to a char instead, then you cannot add NUM and assume that you are the end of the array anymore: a char is stored in one byte, so ((char *)p) + 5 will point to the second byte in the second element of a, provided it was pointing at the beginning of a before. That is way you have to call sizeof(int) and multiply it by NUM, in order to get the end of the array.
And finally, the infamous *p++ = ++*p<60 ? *p : 0;. This is something unfair to face students with, since as others have already pointed out, the behaviour of that code is undefined. Lets go expression by expression.
++*p means "access p and add 1 to the result. If p is pointing to the first position of a, then the result would be 21. ++*pnot only returns 21, but also stored 21 in memory in the place where you had 20. If you were only to return 21, you would write; *p + 1.
++*p<60 ? *p : 0 means "if the result of permanently adding 1 to the value pointed by p is less than 60, then return that result, otherwise return 0.
*p++ = x means "store the value of x in the memory address pointed by p, and then increment p. That's why you don't find ++p or p++ in the increment part of the for loop.
Now about the whole instruction (*p++ = ++*p<60 ? *p : 0;), it is undefined behaviour (check #Lundin's answer for more details). In summary, the most obvious problem is that you don't know which part (the left or the right one), around the assignment operator, is going to be evaluated first. You don't even know how the subexpressions in the expression at the right of the assignment operator are going to be evaluated (which order).
How could you fix this? It would be actually be very simple:
for (p=a;(char *)p<((char *)a+sizeof(int)*NUM); ++p) {
*p = (*p + 1) <60 ? (*p + 1) : 0;
}
And much more readable. Even better:
for (p = a; p < (a + NUM); ++p) {
*p = (*p + 1) <60 ? (*p + 1) : 0;
}
Hope this helps.

If in if condition string is given it is treated as true but what it return?

Why exactly is a string literal in an if-condition treated as true?
if("whatiamreturning")
//this is true. I want to know y?
Based on the above, what happens here?
#‎include‬<stdio.h>
void main() {
static int i;
for(;;) { //infinite loop
if(i+++"The Matrix")
// what is happening in the above line?
printf("Memento");
else
break;
}
}
if("whatiamreturning")
is equivalent to
if (1)
This is because "whatiamreturning" is a char [] that decays into a non-NULL char const* inside the if(). Any non-NULL pointer evaluates to true in the context of a boolean expression.
The line
if(i+++"The Matrix")
can be simplified to:
if( (i++) + "The Matrix")
In the first iteration of the loop, the value of i is 0. Hence, the (i++) + "The Matrix" evaluates to "The Matrix".
In the second iteration of the loop, the value of i is 1. Hence, the (i++) + "The Matrix" evaluates to "he Matrix".
However, the loop never ends and goes into the territory of undefined behavior since (i++) + "The Matrix" never evaluates to 0 and the value of i keeps on increasing.
Perhaps they meant to use:
if(i++["The Matrix"])
which will allow the expression inside if() it to be 0 after 10 iterations.
Update
If you are following somebody else's code, stay away anything else that they have written. The main function can be cleaned up to:
int main() {
char name[] = "The Matrix";
int i = 0;
for( ; name[i] != '\0'; ++i )
{
printf("Memento\n");
}
}
if(i+++"The Matrix") // what is happening here please help here to understand
This will take the value of i, add the pointer value of the location of the string "The Matrix" in memory and compare it to zero. After that it will increase the value of i by one.
It's not very useful, since the pointer value could be basically any random number (it depends on architecture, OS, etc). And thus the whole program amounts to printing Memento a random number of times (likely the same number each run though).
Perhaps you meant to write if(*(i+++"The Matrix")). That would loop 10 times until it i+"The Matrix" evaluates to the address pointing to the NUL byte at the end of the string, and *(i+"The Matrix") will thus return 0.
Btw, spaces are a nice way to make your code more readable.
It will return the address of first element of the string whatiamreturning.
Basically when you assign a string literal to a char pointer
char *p;
p = "whatiamreturning";
the assignment doesn't copy the the characters in whatiamreturning, instead it makes p point to the first character of the string and that's why string literals can be sub-scripted
char ch = "whatiamreturning"[1];
ch will will have character h now. This worked because compiler treated whatiamreturning as a char * and calculated the base address of the literal.
if(i+++"The Matrix") is equivalent to
if( i++ + "The Matrix")
or it can be rewritten as
if(&("The Matrix"[i++]))
which will be true for every i and results in an infinite loop. Ultimately, the code will suffer from undefined behavior due to integer overflow for variable i.
Why exactly is a string literal in an if-condition treated as true?
if("whatiamreturning")
The string literal "whatiamreturning" is a constant of type char[].
In nearly all contexts, including this one, arrays decay to pointers to their first element.
In a boolean context, like the condition of an if-statement, all non-zero values are true.
As the pointer points to an object, it is not the null-pointer, and thus is true.
Based on the above, what happens here?
#‎include‬<stdio.h>
void main() {
The above is your first instance of Undefined Behavior, whatever happens, it is right.
We will now pretend the error is corrected by substituting int for void.
Now, your loop:
static int i;
Static variables are default initialized, so i starts with value 0.
for(;;) { //infinite loop
if(i+++"The Matrix")
// what is happening in the above line?
printf("Memento");
else
break;
}
This loop has Undefined Behavior as well.
The condition takes i and adds it to the string literal "Memento" which decayed to a pointer like in the previous example, interpreting the resultant pointer in a boolean context, and as a side-effect incrementing i.
As long as i is no more than strlen("The Matrix")+1 on entry, everything is ok, the pointer points to an element of the string literal or one past, and the standard guarantees that's not a null pointer.
The moment it is though, all hell breaks loose because calculating such a pointer is Undefined Behavior.
Well, now that we know the loop is UB, let's ignore the loop too.
The rest of the program is:
}
Which is ok, because even though main has a return type of int, there's a special rule which states that if control reaches the end of main without executing a return-statement, an implicit return 0; is added.
Side-note: If an execution of a program encounters Undefined Behavior anywhere, the whole program is undefined, not only from that point on:
Undefined behavior can result in time travel (among other things, but time travel is the funkiest)

Evaluating a postfix Expression in C

I'm trying to write a program that evaluates a postfix arithmetic expression. The program sends a character string to my function evaluatePostfix, which proceeds to identify operands and operators and come up with an integer solution. I am manipulating stacks in this program by pushing the scanned character as it is identified and of course doing the appropriate pop functions when needing to evaluate. Right now though, I'm having a problem with the program hanging in what appears to be an infinite loop. I guess I'm not really sure how to tell the function to proceed to the next character in the string after it has evaluated the first character. Another thing to note is that the user puts a space in-between each operand and operator. Here is my function:
int evaluatePostfix(char *postfixStr)
{
stack * s;
int x, y;
stackInit(&s);
do {
if(isOperand(postfixStr) == 1) {
stackPush(&s, postfixStr);
}
if(isOperator(postfixStr) == 1) {
y = atoi(stackPop(s));
x = atoi(stackPop(s));
char *str = malloc(10 * sizeof(char));
sprintf(str, "%d", applyOperator(x, y, postfixStr));
stackPush(&s, str);
}
} while (postfixStr != NULL);
return stackPop(s);
}
I know the functions that manipulate the stack are correct as they were provided by my instructor. Could someone perhaps give me a clue as to what I'm missing?
You could change the while condition to while (++postfixStr != NULL) to increment the pointer to the next character in postfixStr.
This increment is done using the prefix notation (++var vs var++) so that the next character is compared to NULL. I'm not familiar with the behavior of the stack functions you're using, but I would recommend changing the do { ... } while (++postfixStr != NULL); loop to a while (postfixStr != NULL) { ... } loop, and increment postfixStr at the end of that while loop's block.
The safest thing to do is add a string length parameter to your function:
int evaluatePostfix(char *postfixStr, int strLength)
You would then use a loop that explicitly steps from the beginning of the string at index 0 to index strLength - 1, which would safely handle empty and non-NULL-terminated strings.

Copying a string in C

I am confused about this code: (http://www.joelonsoftware.com/articles/CollegeAdvice.html)
while (*s++ = *t++);
What is the order of execution? Is *s = *t first done, and then are they each incremented? Or other way around?
Thanks.
EDIT: And what if it was:
while(*(s++) = *(t++));
and
while(++*s = ++*t);
while (*s++ = *t++);
From the precedence table you can clearly see ++ is having higher precedence than *. But ++ is used here as post increment operator, so the incrementation happens after the assignment expression. So *s = *t happens first, then s and t are incremented.
EDIT:
while(*(s++) = *(t++));
Is same as above. You are making it more explicit with the use of parenthesis. But remember ++ is still a post increment.
while(++*s = ++*t);
There is just one operator next to s. So * is applied first and on that result ++ is applied which results in the lvalue required error.
while(*++s = *++t);
Again just operator next to s,t. So the incrementation happens first followed by copy. So we are effectively skipping the copy of the first char from t to s.
You are right. *s = *t is done first, and then they are incremented.
The increment is a post-increment. Post not just because it comes after the variable being incremented, but also because it comes after the expression is evaluated. So the order of execution is
*s = *t
then s++ and t++
EDIT::
#chrisgoyal
Order of execution is an ambiguous term. There are two different things here. The syntactical order, and the semantics of the expression.
Syntactically, the operator ++ is applied first. If the *s is applied first, then the following is equivalent to what #Hogan said:
(*s)++ = (*t)++
Which is very different from Joel's sample.
The semantics of the operator ++ is that it is executed after the expression.
Hope that clarifies what I meat.
Actually, s++ and t++ are applied first. Don't forget that the post-fix operator is executed after the expression is done. Basically the operator ++ is applied for both, then *s = *t is executed.
In Post increment operation variable is used first and then after its gets modified.
So there are two forms of increment
++s // increment before using value
s++ // increment after using value
And the result of these can be dereferenced:
*++s // or...
*s++
This worked out really well on one of the very first machines for C to run on, the PDP-11, which had a register-indirect addressing mode that increment the register after. The following ops were available in hardware:
*--s // or
*s++
You could do either
*x++ = *y++; // or
*--x = *--y; // or some combination
And if you did, the whole line happened in a single instruction. Since // comments were introduced by C99, however, you couldn't actually get away with my comment syntax.
The code: (while *s++ = *t++); is roughly equivalent to:
while (*s = *t) {
++s;
++t;
}
The second is exactly the same -- the extra parens don't change anything (in this case). For the parens to do anything, they'd have to be like: while ((*s)++ = (*t)++);. This would do roughly the same as your third example (covered in the paragraph below).
The last example: while(++*s = ++*t); is completely different. Since the dereference (*) is closer to the operand, this dereferences the operand, and increments the result of the dereference, which means it increments what the pointer points AT, instead of incrementing the pointer itself. As a result, this would copy the first character, then increment that character, then check whether that character was non-zero and continue the same until it was zero. The result would be both the source and the destination becoming empty strings (since the first character of both would now be a zero, which is used to terminate strings).

Resources