Trying to re-implement strcat and getting wrong results

Trying to re-implement strcat and getting wrong results - c

Why i can't put *a++ in this while loop and get what i want ,( i saw in book for C that this form can be used), but i got something else in output.
void strcat(char *a, char *b)
{
while( *a != '\0'){
a++;
}
for ( ;*b != '\0' ; *a++ = *b++);
}
When i checked what is current value, after this while loop ,for *a it print at both ways (up and down) same value and it is 0. But when i print my result is correct only for up way.
Why i can't do something like this?
while( *a++ != '\0');

while( *a != '\0'){
a++;
and
while( *a++ != '\0');
are not identical.
The first one increments a as long as it does not point to the terminator,
the second increments a and repeats that as long as a did not point to the terminator before the increment.
The difference is exactly one increment of a, making the second code an off-by-one-error.
You have a similar problem with the second loop:
for ( ;*b != '\0' ; *a++ = *b++);
It checks whether it reached the terminator, and otherwise copies one element from b to a.
Thus, it does not copy the terminator!
Change to:
while((*a++ = *b++)) {}
(Double-parentheses to suppress compiler-warning about possibly erroneous assignment in conditional expression.)
Additional tip:
Make intentional empty statements more obvious, use {}.
Also, when you re-implement the standard-library, consider following its definition, return the result-string.
Final code:
char* strcat(char *a, const char *b)
{
char* ret = a;
while(*a)
a++;
while((*a++ = *b++))
{}
return ret;
}

Related

How exactly does pointer value incrementing work?

so I was working on creating the raw function of concatenating a string in c. One solution that was provided to me was :
char *_strcat(char *dest, char *src)
{
int c, c2;
c = 0;
while (dest[c])
c++;
for (c2 = 0; src[c2] ; c2++)
dest[c++] = src[c2];
return (dest);
}
The part that confuses me is while (dest[c]), and other similar parts. I've already gone through pointers through various resources but I can't seem to understand this part. A good explanation will be much appreciated.

For starters the function is incorrect. It does not build a concatenated string because it does not append the terminating zero character '\0' to the result (dest) string in this for loop
for (c2 = 0; src[c2] ; c2++)
dest[c++] = src[c2];
Also the function should be declared like
char * _strcat( char *dest, const char *src );
because the appended string (src) is not changed.
This while loop
while (dest[c])
c++;
is equivalent to
while (dest[c] != '\0' )
c++;
and this for loop
for (c2 = 0; src[c2] ; c2++)
dest[c++] = src[c2];
is equivalent to
for (c2 = 0; src[c2] != '\0' ; c2++)
dest[c++] = src[c2];
That is the loops continue their iterations until the terminating zero character '\0' is encountered in the while loop in the string dest (to find its end) and in the second loop in the string src to find its end..
A non-zero scalar expression is evaluated as a logical true in conditions.
And the variables c and c2 should have the unsigned type size_t instead of the type int because objects of the type int can be not large enough to store string lengths.
Also you should not define names starting from the underscore character.
As for your question
How exactly does pointer value incrementing work?
then the pointers themselves are not incremented. There are used expressions with the subscript operator to access elements of strings as for example dest[c] or dest[c++] or src[c2].
The function can be defined the following way
char * my_strcat( char *dest, const char *src )
{
char *p = dest;
while ( *p != '\0' ) ++p;
while ( ( *p++ = *src++ ) != '\0' );
return dest;
}
In the shown function there are indeed incremented pointers p and src and neither expression with the subscript operator is used..

char *dest is an pointer to char and it's pointed to the first character by default. Then the following loop will move the pointer offset to the end of the string.
c = 0;
while (dest[c])
c++;

Replicating the strcmp() function from string.h library

I am trying to replicate the strcmp() function from the string.h library and here is my code
/**
* string_compare - this function compares two strings pointed
* by s1 and s2. Is a replica of the strcmp from the string.h library
* #s1: The first string to be compared
* #s2: The second string to be compared
*
* Return: On success, it returns:
* 0 if s1 is equal to s2
* negative value if s1 is less that s2
* positive value if s1 is greater than s2
*/
int string_compare(char *s1, char *s2)
{
int sum = 0, i;
for (i = 0; s1[i] != '\0' && s2[i] != '\0'; i++)
sum += (s1[i] - s2[i]);
for ( ; s1[i] != '\0'; i++)
sum += (s1[i] - 0);
for ( ; s2[i] != '\0'; i++)
sum += (0 - s2[i]);
return (sum);
}
I tried my function using this sample code:
#include <stdio.h>
int main(void)
{
char s1[] = "Hello";
char s2[] = "World!";
printf("%d\n", string_compare(s1, s2));
printf("%d\n", string_compare(s2, s1));
printf("%d\n", string_compare(s1, s1));
return (0);
}
And I get the following output,
-53
-500
0
But I should be getting:
-15
15
0
Why am I getting such a result??

This approach is incorrect.
Let's assume that the first string is "B" and the second string is "AB".
It is evident that the first string is greater than the second string in the lexicographical order.
But the result will be negative due to this for loop
for ( ; s2[i] != '\0'; i++)
sum += (0 - s2[i]);
though the function shall return a positive value.
Moreover there can occur an overflow for the variable sum of the type int.
Also the function should be declared at least like
int string_compare( const char *s1, const char *s2);
because passed strings are not changed within the function.
The function can be defined the following way
int string_compare( const char *s1, const char *s2 )
{
while ( *s1 && *s1 == *s2 )
{
++s1;
++s2;
}
return ( unsigned char )*s1 - ( unsigned char )*s2;
}

You are overcomplicating very simple function.
#define UC unsigned char
int mystrcmp(const char *s1, const char *s2)
{
int result;
while(!(result = (UC)*s1 - (UC)*s2++) && *s1++);
return result;
}

Strings in C are arrays of characters terminated with a null character (\0).
When you pass a string to a function, you are passing a pointer to its first element. That pointer is passed by value. You can modify that pointer within the function without any side-effects on the string it points to, as long as you don't dereference and assign to the address it points to.
That's why the pointer math from
0___________'s answer works.
int mystrcmp1(const char *s1, const char *s2) {
int result = 0;
while(!(result = *s1 - *s2++) && *s1++);
return result;
}
*s1++ could be rewritten as *(s1++) to disambiguate. s1++ returns the current pointer to the beginning of the first string, and then increments the pointer so it points to the next character. That pointer is then dereferenced to give us the character. The same happens with the s2 pointer.
Then we're comparing them by subtraction. If they're the same, we get 0, which in C is false in a boolean context. This result is assigned to result.
We can now see that the loop continues while corresponding characters in the two strings are equal and while dereferencing s1 does not give us the null terminator.
When the loop continues it means there was either a difference or we reached the end of the first string.
The difference will be stored in result, which the function returns.

C strlen using pointers

I have seen the standard implementation of strlen using pointer as:
int strlen(char * s) {
char *p = s;
while (*p!='\0')
p++;
return p-s;
}
I get this works, but when I tried to do this using 3 more ways (learning pointer arithmetic right now), I would want to know whats wrong with them?
This is somewhat similar to what the book does. Is this wrong?
int strlen(char * s) {
char *p = s;
while (*p)
p++;
return p-s;
}
I though it would be wrong if I pass an empty string but still gives me 0, kinda confusing since p is pre increment: (and now its returning me 5)
int strlen(char * s) {
char *p = s;
while (*++p)
;
return p-s;
}
Figured this out, does the post increment and returns +1 on it.
int strlen(char * s) {
char *p = s;
while (*p++)
;
return p-s;
}

1) Looks fine to me. I personally prefer the explicit comparison against '\0' so that it's clear you didn't mean to (for example) compare p to the NULL pointer in situations where it's not clear from context.
2) When your program runs, the area of memory known as the stack is uninitialized. Local variables live there. The way you wrote your program puts p in the stack (if you made it const or used malloc, it would almost certainly live elsewhere). What happens when you look at *p is that you then peek at the stack. If the string is length 0, this is the same as char p[1] = {0}. Pre-incrementing looks at the byte immediately after the \0, so you're looking at undefined memory. Here be dragons!
3) I don't think there's a question there :) As you see, it always returns one more than the correct answer.
Addendum: You can also write this using a for-loop, if you prefer this style:
size_t strlen(char * s) {
char *p = s;
for (; *p != '\0'; p++) {}
return p - s;
}
Or (more error-prone-ly)
size_t strlen(char * s) {
char *p = s;
for (; *p != '\0'; p++);
return p - s;
}
Also, strlen can't return a negative number, so you should use an unsigned value. size_t is even better.

Version 1 is fine - while (*p != '\0') is equivalent to while (*p != 0), which is equivalent to while (*p).
In the original code and version 1, the pointer p is advanced if and only if *p is not 0 (IOW, you're not at the end of the string).
Versions 2 and 3 advance p regardless of whether *p is 0 or not. *p++ evaluates to the character p points to, and as a side effect advances p. *++p evaluates to the character following the character p points to, and as a side effect advances p. Therefore, versions 2 and 3 will always advance p past the end of the string, which is why your values are off.

One issue you will run into when you compare the performance of strlen replacement functions is their performance will suffer compared to the actual strlen function for long strings? Why? strlen processes more than one-byte per iteration in searching for the end of string. How can you implement a more efficient replacement?
It's not that difficult. The basic approach is to look at 4-bytes per iteration and adjust the return based on where within those 4-bytes the nul-byte is found. You could do something like the following (using array indexing):
size_t strsz_idx (const char *s) {
size_t len = 0;
for(;;) {
if (s[0] == 0) return len;
if (s[1] == 0) return len + 1;
if (s[2] == 0) return len + 2;
if (s[3] == 0) return len + 3;
s += 4, len += 4;
}
}
You can do the exact same thing using pointers and masks:
size_t strsz (const char *s) {
size_t len = 0;
for(;;) {
unsigned x = *(unsigned*)s;
if((x & 0xff) == 0) return len;
if((x & 0xff00) == 0) return len + 1;
if((x & 0xff0000) == 0) return len + 2;
if((x & 0xff000000) == 0) return len + 3;
s += 4, len += 4;
}
}
Either way, you will find a 4-byte comparison each iteration will give you performance equivalent to strlen itself.

How does strcmp() work?

I've been looking around a fair bit for an answer. I'm going to make a series of my own string functions like my_strcmp(), my_strcat(), etc.
Does strcmp() work through each index of two arrays of characters and if the ASCII value is smaller at an identical index of two strings, that string is there alphabetically greater and therefore a 0 or 1 or 2 is returned? I guess what Im asking is, does it use the ASCII values of characters to return these results?
Any help would be greatly appreciated.
[REVISED]
OK, so I have come up with this... it works for all cases except when the second string is greater than the first.
Any tips?
int my_strcmp(char s1[], char s2[])
{
int i = 0;
while ( s1[i] != '\0' )
{
if( s2[i] == '\0' ) { return 1; }
else if( s1[i] < s2[i] ) { return -1; }
else if( s1[i] > s2[i] ) { return 1; }
i++;
}
return 0;
}
int main (int argc, char *argv[])
{
int result = my_strcmp(argv[1], argv[2]);
printf("Value: %d \n", result);
return 0;
}

The pseudo-code "implementation" of strcmp would go something like:
define strcmp (s1, s2):
p1 = address of first character of str1
p2 = address of first character of str2
while contents of p1 not equal to null:
if contents of p2 equal to null:
return 1
if contents of p2 greater than contents of p1:
return -1
if contents of p1 greater than contents of p2:
return 1
advance p1
advance p2
if contents of p2 not equal to null:
return -1
return 0
That's basically it. Each character is compared in turn an a decision is made as to whether the first or second string is greater, based on that character.
Only if the characters are identical do you move to the next character and, if all the characters were identical, zero is returned.
Note that you may not necessarily get 1 and -1, the specs say that any positive or negative value will suffice, so you should always check the return value with < 0, > 0 or == 0.
Turning that into real C would be relatively simple:
int myStrCmp (const char *s1, const char *s2) {
const unsigned char *p1 = (const unsigned char *)s1;
const unsigned char *p2 = (const unsigned char *)s2;
while (*p1 != '\0') {
if (*p2 == '\0') return 1;
if (*p2 > *p1) return -1;
if (*p1 > *p2) return 1;
p1++;
p2++;
}
if (*p2 != '\0') return -1;
return 0;
}
Also keep in mind that "greater" in the context of characters is not necessarily based on simple ASCII ordering for all string functions.
C has a concept called 'locales' which specify (among other things) collation, or ordering of the underlying character set and you may find, for example, that the characters a, á, à and ä are all considered identical. This will happen for functions like strcoll.

Here is the BSD implementation:
int
strcmp(s1, s2)
register const char *s1, *s2;
{
while (*s1 == *s2++)
if (*s1++ == 0)
return (0);
return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}
Once there is a mismatch between two characters, it just returns the difference between those two characters.

It uses the byte values of the characters, returning a negative value if the first string appears before the second (ordered by byte values), zero if they are equal, and a positive value if the first appears after the second. Since it operates on bytes, it is not encoding-aware.
For example:
strcmp("abc", "def") < 0
strcmp("abc", "abcd") < 0 // null character is less than 'd'
strcmp("abc", "ABC") > 0 // 'a' > 'A' in ASCII
strcmp("abc", "abc") == 0
More precisely, as described in the strcmp Open Group specification:
The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.
Note that the return value may not be equal to this difference, but it will carry the same sign.

This, from the masters themselves (K&R, 2nd ed., pg. 106):
// strcmp: return < 0 if s < t, 0 if s == t, > 0 if s > t
int strcmp(char *s, char *t)
{
int i;
for (i = 0; s[i] == t[i]; i++)
if (s[i] == '\0')
return 0;
return s[i] - t[i];
}

Here is my version, written for small microcontroller applications, MISRA-C compliant.
The main aim with this code was to write readable code, instead of the one-line goo found in most compiler libs.
int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while ( (*s1 != '\0') && (*s1 == *s2) )
{
s1++;
s2++;
}
return (int8_t)( (int16_t)*s1 - (int16_t)*s2 );
}
Note: the code assumes 16 bit int type.

This code is equivalent, shorter, and more readable:
int8_t strcmp (const uint8_t* s1, const uint8_t* s2)
{
while( (*s1!='\0') && (*s1==*s2) ){
s1++;
s2++;
}
return (int8_t)*s1 - (int8_t)*s2;
}
We only need to test for end of s1, because if we reach the end of s2 before end of s1, the loop will terminate (since *s2 != *s1).
The return expression calculates the correct value in every case, provided we are only using 7-bit (pure ASCII) characters. Careful thought is needed to produce correct code for 8-bit characters, because of the risk of integer overflow.

I found this on web.
http://www.opensource.apple.com/source/Libc/Libc-262/ppc/gen/strcmp.c
int strcmp(const char *s1, const char *s2)
{
for ( ; *s1 == *s2; s1++, s2++)
if (*s1 == '\0')
return 0;
return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}

This is how I implemented my strcmp:
it works like this:
it compares first letter of the two strings, if it is identical, it continues to the next letter. If not, it returns the corresponding value. It is very simple and easy to understand:
#include
//function declaration:
int strcmp(char string1[], char string2[]);
int main()
{
char string1[]=" The San Antonio spurs";
char string2[]=" will be champins again!";
//calling the function- strcmp
printf("\n number returned by the strcmp function: %d", strcmp(string1, string2));
getch();
return(0);
}
/**This function calculates the dictionary value of the string and compares it to another string.
it returns a number bigger than 0 if the first string is bigger than the second
it returns a number smaller than 0 if the second string is bigger than the first
input: string1, string2
output: value- can be 1, 0 or -1 according to the case*/
int strcmp(char string1[], char string2[])
{
int i=0;
int value=2; //this initialization value could be any number but the numbers that can be returned by the function
while(value==2)
{
if (string1[i]>string2[i])
{
value=1;
}
else if (string1[i]<string2[i])
{
value=-1;
}
else
{
i++;
}
}
return(value);
}

Is just this:
int strcmp(char *str1, char *str2){
while( (*str1 == *str2) && (*str1 != 0) ){
++*str1;
++*str2;
}
return (*str1-*str2);
}
if you want more fast, you can add "register " before type, like this:
register char
then, like this:
int strcmp(register char *str1, register char *str2){
while( (*str1 == *str2) && (*str1 != 0) ){
++*str1;
++*str2;
}
return (*str1-*str2);
}
this way, if possible, the register of the ALU are used.

Why can't I copy an array by using `=`?

I'm starting to learn C by reading K&R and going through some of the exercises. After some struggling, I was finally able to complete exercise 1-19 with the code below:
/* reverse: reverse the character string s */
void reverse(char s[], int slen)
{
char tmp[slen];
int i, j;
i = 0;
j = slen - 2; /* skip '\0' and \n */
tmp[i] = s[j];
while (i <= slen) {
++i;
--j;
tmp[i] = s[j];
}
/* code from copy function p 29 */
i = 0;
while ((s[i] = tmp[i]) != '\0')
++i;
}
My question is regarding that last bit of code where the tmp char array is copied to s. Why doesn't a simple s = tmp; work instead? Why does one have to iterate through the array copying index by index?

Maybe I'm just old and grumpy, but the other answers I've seen seem to miss the point completely.
C does not do array assignments, period. You cannot assign one array to another array by a simple assignment, unlike some other languages (PL/1, for instance; Pascal and many of its descendants too - Ada, Modula, Oberon, etc.). Nor does C really have a string type. It only has arrays of characters, and you can't copy arrays of characters (any more than you can copy arrays of any other type) without using a loop or a function call. [String literals don't really count as a string type.]
The only time arrays are copied is when the array is embedded in a structure and you do a structure assignment.
In my copy of K&R 2nd Edition, exercise 1-19 asks for a function reverse(s); in my copy of K&R 1st Edition, it was exercise 1-17 instead of 1-19, but the same question was asked.
Since pointers have not been covered at this stage, the solution should use indexes instead of pointers. I believe that leads to:
#include <string.h>
void reverse(char s[])
{
int i = 0;
int j = strlen(s) - 1;
while (i < j)
{
char c = s[i];
s[i++] = s[j];
s[j--] = c;
}
}
#ifdef TEST
#include <stdio.h>
int main(void)
{
char buffer[256];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
int len = strlen(buffer);
if (len == 0)
break;
buffer[len-1] = '\0'; /* Zap newline */
printf("In: <<%s>>\n", buffer);
reverse(buffer);
printf("Out: <<%s>>\n", buffer);
}
return(0);
}
#endif /* TEST */
Compile this with -DTEST to include the test program and without to have just the function reverse() defined.
With the function signature given in the question, you avoid calling strlen() twice per line of input. Note the use of fgets() — even in test programs, it is a bad idea to use gets(). The downside of fgets() compared to gets() is that fgets() does not remove the trailing newline where gets() does. The upsides of fgets() are that you don't get array overflows and you can tell whether the program found a newline or whether it ran out of space (or data) before encountering a newline.

Your tmp array was declared on stack and so when your method completes, the memory used to hold the values will be freed because of scoping.
s = tmp means that s should point to the same memory location as tmp. This means that when tmp is freed, s will still be pointing to a now possible invalid, freed memory location.
This type of error is referred to as a dangling pointer.
Edit: This isn't a dangling modifier as pointed out in the comments of this answer. The issue is that saying s = tmp only changes what the parameter points to, not what the actual array that was passed.
Also, you could perform your reverse with a single pass and without allocating a whole array in memory by just swapping the values in place one by one:
void reverse(char s[], int slen) {
int i = 0; // First char
int j = slen - 2; // Last char minus \n\0
char tmp = 0; // Temp for the value being swapped
// Iterate over the array from the start until the two indexes collide.
while(i < j) {
tmp = s[i]; // Save the eariler char
s[i] = s[j]; // Replace it with the later char
s[j] = tmp; // Place the earlier char in the later char's spot
i++; // Move forwards with the early char
j--; // Move backwards with the later char
}
}

Because both s and tmp are memory addressees. If you s = tmp, both pointers would point to the same array.
Suppose that we have
char s[] ="ab";
/*
* Only for explanatory purposes.
*
*/
void foo(char s[]){
char tmp [] = "cd";
s= tmp;
}
foo(s);
after s= tmp you would have
s[0] : 'c'
s[1] : 'd'
s[2] : '\0'
Even though both arrays have the same data, a change in tmp, will affect both of them, because both arrays are actually the same. They both contain data that´s in the same memory address. So by changing any position of the tmp array, or destroying the tmp array, s would be affected in the same way.
By looping over the array, what you are doing is moving a piece of data from one memory address to another.
In my copy of K & R, pointers are explained in chapter 4. A quick glance through the first pages may be of help.

To round out the discussion here are two other possible ways to reverse as string:
void reverse(char string1[], char string2[])
{
int i = 0, len = 0;
while(string2[len] != '\0') // get the length of the string
len++;
while(len > 0)
{
string1[i] = string2[len-1]; // copy the elements in reverse
i++;
len--;
}
string1[i] = '\0'; // terminate the copied string
}
Or recursively:
void reverse (const char *const sPtr)
{
//if end of string
if (sPtr[0] == '\0')
{
return;
}
else //not end of the string...
{
reverse(&sPtr[1]); //recursive step
putchar(sPtr[0]); //display character
}
}

because tmp is a pointer, and you need to get a copy, not a "link".

In case of s=tmp, the value of tmp which is the also the beginning address of the array, would get copied to s.
That way both s and tmp will point to the same address in memory, which I think is not the purpose.
cheers

Try experimenting and see what happens when you do things like this:
void modifyArrayValues(char x[], int len)
{
for (int i = 0; i < len; ++i)
x[i] = i;
}
void attemptModifyArray(char x[], int len)
{
char y[10];
for (int i = 0; i < len; ++i)
y[i] = i;
x = y;
}
int main()
{
int i = 0;
char x[10];
for (i = 0; i < 10; ++i)
x[i] = 0;
attemptModifyArray(x, 10);
for (i=0; i < 10; ++i)
printf("%d\n", x[i]); // x is still all 0's
modifyArrayValues(x, 10);
for (i=0; i < 10; ++i)
printf("%d\n", x[i]); // now x has 0-9 in it
}
What happens when you modify the array directly in attemptModifyArray, you are just overwriting a local copy of the address of the array x. When you return, the original address is still in main's copy of x.
When you modify the values in the array in modifyArrayValues, you are modifying the actual array itself which has its address stored in modifyArrayValues local copy of x. When you return, x is still holding on to the same array, but you have modified the values in that array.

There's an interesting sub-thread in this thread about arrays and pointers
I found this link on wikipedia with a peculiar code snippet showing just how 'plasticine' C can be!
/* x designates an array */
x[i] = 1;
*(x + i) = 1;
*(i + x) = 1;
i[x] = 1; /* strange, but correct: i[x] is equivalent to *(i + x) */
Of course what's even more confusing in C is that I can do this:
unsigned int someval = 0xDEADD00D;
char *p = (char *)&someval;
p[2] = (char)0xF0;
So the interchangibility of pointers and arrays seems so deep-set in the C language as to be almost intentional.
What does everyone else think?
---Original Post---
s and tmp are both pointers so doing s = tmp will simply make s point at the address where tmp lives in memory.
Another problem with what you outlined is that tmp is a local variable so will become 'undefined' when it goes out of scope i.e when the function returns.
Make sure you thoroughly grasp these three concepts and you won't go far wrong
Scope
The difference between the stack and the heap
Pointers
Hope that helps and keep going!

A very straight forward answer would be -
both s and tmp are pointers to a memory location and not the arrays themselves.
In other words, s and tmp are memory addresses where the array values are stored but not the values themselves.
And one of the common ways to access these array values are by using indices like s[0] or tmp[0].
Now, if you will try to simply copy, s = tmp, the memory address of tmp array will be copied over to s. This means that, the original s array will be lost and even s memory pointer will now point to tmp array.
You will understand these concepts well with due time so keep going through the book.
I hope this elementary explanation helps.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Trying to re-implement strcat and getting wrong results - c

Related

How exactly does pointer value incrementing work?

Replicating the strcmp() function from string.h library

C strlen using pointers

How does strcmp() work?

Why can't I copy an array by using `=`?

Categories

Resources