Finding end of string: *s++ VS *s then s++ - c

I'm writing a simple string concatenation program.
The program works the way I have posted it. However, I first wrote it using the following code to find the end of the string:
while (*s++)
;
However, that method didn't work. The strings I passed to it weren't copied correctly. Specifically, I tried to copy "abc" to a char[] variable that held "\0".
From reading the C K&R book, it looks like it should work. That compact form should take the following steps.
*s is compared with '\0'
s points to the next address
So why doesn't it work? I am compiling with gcc on Debian.
I found that this version does work:
strncat(char *s, const char *t, int n)
{
char *s_start = s;
while (*s)
s++;
for ( ; n > 0 && *t; n--, s++, t++)
*s = *t;
*(s++) = '\0';
return s_start;
}
Thanks in advance.

After the end of while (*s++);, s points to the character after the null terminator. Take that into account in the code that follows.

The problem is that
while (*s++)
;
Always Increments s, even when s is zero (*s is false)
while (*s)
s++;
only increments s when *s is nonzero
so the first one will leave s pointing to first character after the first \0, while the second one will leave s pointing to the first \0.

There is difference. In the first case, s will point to the position after '\0', while the second stops right at '\0'.

As John Knoeller said, at the end of the run it'll s will point to the location after the NULL. BUT There is no need to sacrifice performance for the correct solution.. Take a look for yourself:
while (*s++); --s;
Should do the trick.

In addition what has been said, note that in C it is technically illegal for a pointer to point to unallocated memory, even if you don't dereference it. So be sure to fix your program, even if it appears to work.

Related

Why isn't this pointing to the null character in array? ('\0')

Sorry about the poorly worded question, I couldn't think of a better name.
I am learning C, have just moved onto pointers and have written a function, strcat(char *s, char *t), which adds t to the end of s:
void strcat(char *s, char *t) //add t to the end of s
{
while(*s++) //get to the end of s
;
*s--; //unsure why I need this
while(*s++ = *t++) //copy t to the end of s
;
return;
}
Now the question I have is why do I need the line:
*s--;
When I originally added it I thought it made sense until I went through the code.
I would have thought the following was true though:
1) The first loop increments continually and when *s is 0 (or the null character) it moves on so now *s points to the null character of the array.
2) So all I should have to do is implement the second loop. The original null character of s will be replaced by the first character of t until we get to t's null character at which point we exit the second loop and returns.
Clearly I am missing something as the code doesn't work without it!!
After the first loop *s points to one position beyond '\0' but my question is why?
Thanks in advance :)
First *s is evaluated then s is incremented.
So when reaching s's 0-terminator the loop ends, but s still is incremented one more time.
Also there is no need to do:
*s--;
Doing
--s;
or
s--;
would be enough. There is no need to de-reference s here.
Or simply do
while (*s)
++s;
to get rid of --s;'s need at all.
You incremented the pointer after checking the value of the location it was pointing at. Functionally this is happening in while( *s++ ):
while( *s )
++s;
Change your first while to:
if (*s) {
while(*(++s)) //get to the end of s
;
}
In your code, you would always be checking if it was pointing to '\0' and then incrementing, so when you reach the '\0' you would check it only on the next iteration, and then you would increment it. Note that changing to pre-increment will not check if the pointer currently points to '\0', so you need to check it before the while.
Note that your code (post-increment and a decrement after the while) might be faster on most platforms (usually a branch is slower than a decrement), my code in this answer is just for you understand the problem.
The ++ operator after the variable name does postincrement, which means it increments by one, but the result of the operator is the value before the increment. If you used ++s, it would be different.
If s is 4 , then s will be 5 after x=++s as well as after x=s++. But the result (value of x) in the first case is 5, while it's 4 in the second case.
So in your while *s++, when s points to the '\0', you increment it, then take the old, un-incremented pointer, dereference it, see the \0, and stop the loop.
Btw, your '*s--' should be s-- because you don't need the character 'behind' the pointer there.

String copy in C with and without pointer

There are two versions of string copy functions written in C. My question is why the version1 need "!= '\0'" but the version2 doesn't. What if I have a character 0 to be copied using version2, will the '0' terminate coping process?
void version1(char to[], char from[])
{
int i;
i = 0;
while ((to[i] = from[i]) != '\0')
++i;
}
char *version2(char *dest, const char *src)
{
char *addr = dest;
while (*dest++ = *src++);
return addr;
}
In addition, why an input like "1230456" will not terminate the coping since '0' appears in the middle of the string?
This is because in C comparison to zero is optional. When you use an expression in a context requiring a logical expression, C would insert an implicit comparison to zero for you.
You can rewrite the first function as follows without changing the semantic:
while ((to[i] = from[i]))
++i;
Moreover, you can rewrite the second function as follows:
while ((*dest++ = *src++) != '\0');
There is exactly that same != 0 test in the second version, but it's impicit: The result of the expression *dest++ = *src++ becomes the value checked by the while, and in C, all tests boil down to a comparison with zero.
By the same token, in the first example, the while line could be rewritten:
while (to[i] = from[i])
and not change the meaning.
Both versions make the same check, in the second version you just don't see it. You can try to remove != '\0' from the first version, it should still work.
version1 does not NEED the != '\0' but it is better programming practice to include it. It just so happens that '\0' is equal to zero and so version2 will work, but, if you happen to come across a system where '\0' is NOT zero, version2 will not work.
One thing perhaps not mentioned yet is that the reason the zero termination gets copied in the second example is because of the post increment ensures that the value is copied BEFORE it gets checked for a zero value, which terminates the loop.

C pointers: difference between while(*s++) { ;} and while(*s) { s++;}

I'm going through K & R, and am having difficulty with incrementing pointers. Exercise 5.3 (p. 107) asks you to write a strcat function using pointers.
In pseudocode, the function does the following:
Takes 2 strings as inputs.
Finds the end of string one.
Copies string two onto the end of string one.
I got a working answer:
void strcats(char *s, char *t)
{
while (*s) /* finds end of s*/
s++;
while ((*s++ = *t++)) /* copies t to end of s*/
;
}
But I don't understand why this code doesn't also work:
void strcats(char *s, char *t)
{
while (*s++)
;
while ((*s++ = *t++))
;
}
Clearly, I'm missing something about how pointer incrementation works. I thought the two forms of incrementing s were equivalent. But the second code only prints out string s.
I tried a dummy variable, i, to check whether the function went through both loops. It did. I read over the sections 5.4 and 5.5 of K & R, but I couldn't find anything that sheds light on this.
Can anyone help me figure out why the second version of my function isn't doing what I would like it to? Thanks!
edit: Thanks everyone. It's incredible how long you can stare at a relatively simple error without noticing it. Sometimes there's no better remedy than having someone else glance at it.
This:
while(*s++)
;
due to post-increment, locates the nul byte at the end of the string, then increments it once more before exiting the loop. t is copied after then nul:
scontents␀tcontents␀
Printing s will stop at the first nul.
This:
while(*s)
s++;
breaks from the loop when the 0 is found, so you are left pointing at the nul byte. t is copied over the nul:
scontentstcontents␀
It's an off-by-one issue. Your second version increments the pointer every time the test is evaluated. The original increments one fewer time -- the last time when the test evaluates to 0, the increment isn't done. Therefore in the second version, the new string is appended after the original terminating \0, while in the first version, the first character of the new string overwrites that \0.
This:
while (*s)
s++;
stops as soon as *s is '\0', at which point it leaves s there (because it doesn't execute the body of the loop).
This:
while (*s++)
;
stops as soon as *s is '\0', but still executes the postincrement ++, so s ends up pointing right after the '\0'. So the string-terminating '\0' never gets overwritten, and it still terminates the string.
There's one less operation in while (*s) ++s; When *s is zero, then the loop breaks, while the form while (*s++) breaks but still increments s one last time.
Strictly speaking, the latter form may be incorrect (i.e. UB) if you attempt to form an invalid pointer. This is contrived, of course, but here's an example: char x = 0, * p = &x; while (*x++) { }.
Independent of that, it's best to write clean, readable and deliberate code rather than trying to outsmart yourself. Sometimes you can write nifty code in C that is actually elegant, and other times it's better to spell something out properly. Use your judgement, and ask someone else for feedback (or watch their faces as they look at your code).
let's assume the following characters in memory:
Address 0x00 0x01 0x02 0x03
------- ---- ---- ---- ----
0x8000 'a' 'b' 'c' 0
0x8004 ...
While executing loop, it happens in memory.
1. *s = 'a'
2. s = 0x8001
3. *s = 'b'
4. s = 0x8002
5. *s = 'c'
6. s = 0x8003
7. *s = 0;
8. s = 0x8004
9. end loop
While evaluating, *s++ advances the pointer even if the value of *s is 0.
// move s forward until it points one past a 0 character
while (*s++);
It doesn't work at all because s ends up pointing to a different place.
As it summarizes, we get a garbage value as last character in our target string. That garbage string is because of while loop exceed the limit of '\0' by one step forward.
You can eliminate it by using the below code, I think it is efficient
while (*s)
s++;
It execute as below in memory perspective.
1. *s = 'a'
2. s = 0x8001
3. *s = 'b'
4. s = 0x8002
5. *s = 'c'
6. s = 0x8003
7. *s = 0
8. end loop

understanding strlen function in C

I am learning C. And, I see this function find length of a string.
size_t strlen(const char *str)
{
size_t len = 0U;
while(*(str++)) ++len; return len;
}
Now, when does the loop exit? I am confused, since str++, always increases the pointer.
while(*(str++)) ++len;
is same as:
while(*str) {
++len;
++str;
}
is same as:
while(*str != '\0') {
++len;
++str;
}
So now you see when str points to the null char at the end of the string, the test condition fails and you stop looping.
C strings are terminated by the NUL character which has the value of 0
0 is false in C and anything else is true.
So we keep incrementing the pointer into the string and the length until we find a NUL and then return.
You need to understand two notions to grab the idea of the function :
1°) A C string is an array of characters.
2°) In C, an array variable is actually a pointer to the first case of the table.
So what strlen does ? It uses pointer arithmetics to parse the table (++ on a pointer means : next case), till it gets to the end signal ("\0").
Once *(str++) returns 0, the loop exits. This will happen when str points to the last character of the string (because strings in C are 0 terminated).
Correct, str++ increases the counter and returns the previous value. The asterisk (*) dereferences the pointer, i.e. it gives you the character value.
C strings end with a zero byte. The while loop exits when the conditional is no longer true, which means when it is zero.
So the while loop runs until it encounters a zero byte in the string.

Stuck with C syntax

I am trying to remove spaces from the end of a char array (string).
This is the pseudo code of what I am doing, but it keeps deleting the whole string:
if(string length - 1 != a space)
return
Otherwise, it must equal a space, so
while *mypointer-- != a space
//This should loop back to where there is a character.
Outside of the while loop, I now add one to the pointer by doing
mypointer ++;
and then set mypointer '\0' to signal the end of the string.
I am doing something fundamentally wrong inside of my while, but I cannot seem to figure it out. What could it be?
A little clairvoyance:
while(*mypointer == a space) {
--mypointer;
}
mypointer[1] = '\0';
Notice that it is == a space not != a space.
Edit:
If I were going to write this, I'd probably really use something like:
#include <string.h>
...
char *mypointer = strrchr(string, ' ');
if (mypointer) *mypointer = '\0';
You're probably setting mypointer to \0, which sets the pointer to NULL. You need to set mypointer[1] to \0. Also, make sure you do the right thing for the following edge-cases:
when a string is all spaces,
when string length is 0.
Without the code we can just guess:
How is mypointer initialized?
Are you sure about the while loop condition? Should it be while 'last character is space'?
A slight improvement perhaps:
while(*mypointer == a space) {
*mypointer-- = '\0';
}
Have a look at this code that does the exact job to strip spaces from the end of the string...this is from the Snippets archive - it's an educational thing to watch what's going on when stepping through the debugger and very useful collection of C code also...

Resources