C strlen using pointers - c

I have seen the standard implementation of strlen using pointer as:
int strlen(char * s) {
char *p = s;
while (*p!='\0')
p++;
return p-s;
}
I get this works, but when I tried to do this using 3 more ways (learning pointer arithmetic right now), I would want to know whats wrong with them?
This is somewhat similar to what the book does. Is this wrong?
int strlen(char * s) {
char *p = s;
while (*p)
p++;
return p-s;
}
I though it would be wrong if I pass an empty string but still gives me 0, kinda confusing since p is pre increment: (and now its returning me 5)
int strlen(char * s) {
char *p = s;
while (*++p)
;
return p-s;
}
Figured this out, does the post increment and returns +1 on it.
int strlen(char * s) {
char *p = s;
while (*p++)
;
return p-s;
}

1) Looks fine to me. I personally prefer the explicit comparison against '\0' so that it's clear you didn't mean to (for example) compare p to the NULL pointer in situations where it's not clear from context.
2) When your program runs, the area of memory known as the stack is uninitialized. Local variables live there. The way you wrote your program puts p in the stack (if you made it const or used malloc, it would almost certainly live elsewhere). What happens when you look at *p is that you then peek at the stack. If the string is length 0, this is the same as char p[1] = {0}. Pre-incrementing looks at the byte immediately after the \0, so you're looking at undefined memory. Here be dragons!
3) I don't think there's a question there :) As you see, it always returns one more than the correct answer.
Addendum: You can also write this using a for-loop, if you prefer this style:
size_t strlen(char * s) {
char *p = s;
for (; *p != '\0'; p++) {}
return p - s;
}
Or (more error-prone-ly)
size_t strlen(char * s) {
char *p = s;
for (; *p != '\0'; p++);
return p - s;
}
Also, strlen can't return a negative number, so you should use an unsigned value. size_t is even better.

Version 1 is fine - while (*p != '\0') is equivalent to while (*p != 0), which is equivalent to while (*p).
In the original code and version 1, the pointer p is advanced if and only if *p is not 0 (IOW, you're not at the end of the string).
Versions 2 and 3 advance p regardless of whether *p is 0 or not. *p++ evaluates to the character p points to, and as a side effect advances p. *++p evaluates to the character following the character p points to, and as a side effect advances p. Therefore, versions 2 and 3 will always advance p past the end of the string, which is why your values are off.

One issue you will run into when you compare the performance of strlen replacement functions is their performance will suffer compared to the actual strlen function for long strings? Why? strlen processes more than one-byte per iteration in searching for the end of string. How can you implement a more efficient replacement?
It's not that difficult. The basic approach is to look at 4-bytes per iteration and adjust the return based on where within those 4-bytes the nul-byte is found. You could do something like the following (using array indexing):
size_t strsz_idx (const char *s) {
size_t len = 0;
for(;;) {
if (s[0] == 0) return len;
if (s[1] == 0) return len + 1;
if (s[2] == 0) return len + 2;
if (s[3] == 0) return len + 3;
s += 4, len += 4;
}
}
You can do the exact same thing using pointers and masks:
size_t strsz (const char *s) {
size_t len = 0;
for(;;) {
unsigned x = *(unsigned*)s;
if((x & 0xff) == 0) return len;
if((x & 0xff00) == 0) return len + 1;
if((x & 0xff0000) == 0) return len + 2;
if((x & 0xff000000) == 0) return len + 3;
s += 4, len += 4;
}
}
Either way, you will find a 4-byte comparison each iteration will give you performance equivalent to strlen itself.

Related

C append chars into char array one by one

I have made up a function which returns some chars , all I want to do is to append all those returned chars into one string .
#include <stdio.h>
#include <string.h>
char func(int n);
int main()
{
int i;
char str[] = "";
size_t p = strlen(str);
for (i =0 ; i < 5; i++){
str[p++] = func(i);
str[p] = '\0';
p++;
}
printf("%s",str);
return 0;
}
char func(int n){
if (n == 0)
return '1';
if (n == 1)
return '2';
if (n > 1)
return '3';
}
//EDIT Output for this is 19
char func(n){
if (n == 0)
return '1';
if (n == 1)
return '2';
if (n > 1)
return '3';
}
You should always specify the type for variables.
Please use something like int n instead of just n.
It's also bad that all of your returns are conditional, it's better to have a return statement that's guaranteed to be executed no matter what *:
char func(int n) {
if (n == 0) return '1';
if (n == 1) return '2';
return '3';
}
* Because not returning a value from a function that should return a value is undefined behaviour.
Now that we have that out of the way, let's have a look at your main():
int main() {
int i;
char str[] = "";
size_t p = strlen(str);
for (i =0 ; i < 5; i++){
str[p++] = func(i);
str[p] = '\0';
p++;
}
printf("%s",str);
return 0;
}
str[] is not big enough to store all the characters you write to it, resulting in undefined behaviour.
Your loop body is written in a weird way, why are you incrementing p twice?
Here a very simple program that writes 5 characters into str:
#include <stdio.h>
char func(int n) {
if (n == 0) return '1';
if (n == 1) return '2';
return '3';
}
int main() {
int i;
// Allocate 6 bytes (5 characters) on the stack
char str[6] = "";
for (i = 0 ; i < 5; i++) {
str[i] = func(i);
}
// Strings *must* be NULL terminated in C
str[5] = 0;
printf("%s",str);
return 0;
}
The size of your str here is "0" (0 using the strlen and 1 using the sizeof operator because it counts the '\0' caracter) so you can not add more element to the str, and if you try, the program will crash. So you have two possibilies here, the first is to declare a fixed table size and the number n will be limited by the size, the second is a dynamic one using mallic. To intialize it to zeros you can just use the memset API.
Well short answer is everything you did would be right if you have in the array enough memory to hold those 5 characters and the \0 if you want to treat it as a string (NUL terminated char array).
"" is a string literal containing only the \0. Length of this string is 0. What about the array? Applying sizeof over it reveals that it is capable of holding one character. (Well it contained \0).
Now with your code you surely did access positions that are beyond the size of the array. This is undefined behavior - mentioned by the C standard.
Solution is to either have an array having size capable of holding the maximum character you would like to store someday. Or you can have a char* to which you can assign address of allocated chunk by using functions like malloc,realloc etc. Benefit of this, you can increase memory as much as you need on runtime depending on the number of characters you want to store.

Head First C string.h related questions

#include <stdio.h>
#include <string.h>
void print_reverse(char *s)
{
size_t len = strlen(s);
char *t = s + len - 1;
while (t >= s)
{
printf("%c", *t);
t = t - 1;
}
puts("");
}
Above is a function that will display a string backward on the screen. But I don't understand the 7th line (char *t = s+ len-1;). Could anybody explain this is spoken English please?
For starters this function
void print_reverse(char *s)
{
size_t len = strlen(s);
char *t = s + len - 1;
while (t >= s)
{
printf("%c", *t);
t = t - 1;
}
puts("");
}
is wrong and has undefined behavior.:)
There are two problems.
The first one is that the passed string as the argument can have a zero-length. In this case this declaration
char *t = s + len - 1;
will look like
char *t = s - 1;
and the pointer t can be wrong.
The second problem is that this expression statement
t = t - 1;
has undefined behavior in case when the pointer t is equal to s.
From the C Standard (6.5.6 Additive operators)
...If both the pointer operand and the result point to elements of the same
array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
A correct function implementation can look the following way
void print_reverse( const char *s)
^^^^^
{
size_t len = strlen(s);
const char *t = s + len;
^^^^^^^
while (t != s)
^^^^^^
{
printf("%c", *--t);
^^^^
}
puts("");
}
As for your question then in this declaration
char *t = s + len - 1;
the pointer t is tried to be initialized by the address of the last character of the string before the terminating zero.
Main logic behind this functions is that this code:
char *t = s+ len-1;
return a pointer to the address of the last char in the char pointer you are passing to the function. The loop prints it by decrementing it:
t = t - 1;
So in simple words it prints the char pointer from backwards.

C programming - integer/long to string representation

I'm just reading "C Interfaces and Implementations". There are some really interesting concepts described in the book. The code is quite ugly sometimes (in my opinion), but now I got a question regarding conversion of an integer/long to a string (char array). What is described in the book is:
const char *Atom_int(long n) {
char str[43];
char *s = str + sizeof str;
unsigned long m;
if (n == LONG_MIN)
m = LONG_MAX + 1UL;
else if (n < 0)
m = -n;
else
m = n;
do
*--s = m%10 + '0';
while ((m /= 10) > 0);
if (n < 0)
*--s = '-';
return Atom_new(s, str + sizeof str - s);
}
As there's no description why that function is used the way it is...I wonder why it's not just something simple like:
const char *Atom_int(long n)
{
char str[43];
char *s = str;
sprintf(str, "%ld", n);
return Atom_new(s, str + sizeof str - s);
}
Is there any difference? Anything I missed about my "simple" approach using sprintf that could cause a different result than the function from the book? I mean, if it's just to show how one could convert a long to a string not using ltoa/sprintf/..., nice. But it's unnecessary complex if that's the only reason,...
There were two major problems with the original code you posted for both functions:
The str array is not '\0' terminated, invoking undefined behavior when passed to printf.
Returning a pointer s to an array with automatic storage str is also incorrect. Dereferencing this return value will invoke undefined behavior as well.
Regarding your questions, the purpose of the first function is to show the implementation of an integer to string converter. Using sprintf defeats this purpose. Note how the author handles the subtile case of INT_MIN: computing -n would invoke undefined behavior because of the integer overflow on most systems, eventhough the result would be correct on all modern systems. But complete conformance to the Standard os a difficult art: his solution assumes 2s complement and will fail otherwise.
Here is an improved solution with the same prototype. It is more portable, does not need to special case LONG_MIN, makes fewer divisions and modulo operations.
const char *Atom_int(long n) {
char str[43];
char *s = str + sizeof str;
unsigned long m;
if (n < 0)
m = (unsigned long)-(n + 1) + 1;
else
m = n;
while (m >= 10) {
*--s = m % 10 + '0';
m /= 10;
}
*--s = m + '0';
if (n < 0)
*--s = '-';
return Atom_new(s, str + sizeof str - s);
}
Also note that your proposed alternative is incorrect: you pass the wrong length to Atom_new(). You should pass the number of bytes returned by sprintf or snprintf. Here is an improved version:
const char *Atom_int(long n) {
char str[43];
return Atom_new(str, snprintf(str, sizeof str, "%ld", n));
}

C substrings / C string slicing?

Hy everybody!
I am trying to write a program that checks if a given string of text is a palindrome (for this I made a function called is_palindrome that works) and if any of it's substrings is a palindrome, and I can't figure out what is the optimal way to do this:
For example, for the string s = "abcdefg" it should first check "a", then "ab", "abc", "abcd" and so on, for each character
In Python this is the equivalent of
s[:1], s[:2], ... (a, ab, ...)
s[1:2], s[1:3] ... (b, bc, ...)
What function/method is there that I can use in a similar way in C ?
This is the one liner I use to get a slice of a string in C.
void slice(const char *str, char *result, size_t start, size_t end)
{
strncpy(result, str + start, end - start);
}
Pretty straightforward.
Given you've checked boundaries and made sure end > start.
This slice_str() function will do the trick, with end actually being the end character, rather than one-past-the-end as in Python slicing:
#include <stdio.h>
#include <string.h>
void slice_str(const char * str, char * buffer, size_t start, size_t end)
{
size_t j = 0;
for ( size_t i = start; i <= end; ++i ) {
buffer[j++] = str[i];
}
buffer[j] = 0;
}
int main(void) {
const char * str = "Polly";
const size_t len = strlen(str);
char buffer[len + 1];
for ( size_t start = 0; start < len; ++start ) {
for ( int end = len - 1; end >= (int) start; --end ) {
slice_str(str, buffer, start, end);
printf("%s\n", buffer);
}
}
return 0;
}
which, when used from the above main() function, outputs:
paul#horus:~/src/sandbox$ ./allsubstr
Polly
Poll
Pol
Po
P
olly
oll
ol
o
lly
ll
l
ly
l
y
paul#horus:~/src/sandbox$
There isn't; you'll have to write your own.
In order to check a string, you would need to supply to the number of characters to check in order to check for a palindrome:
int palindrome(char* str, int len)
{
if (len < 2 )
{
return 0;
}
// position p and q on the first and last character
char* p = str;
char* q = str + len - 1;
// compare start char with end char
for ( ; p < str + len / 2; ++p, --q )
{
if (*p != *q)
{
return 0;
}
}
return 1;
}
now you would need to call the function above for each substring (as you described it, i.e. always starting from the beginning) e.g.
char candidate[] = "wasitaratisaw";
for (int len = 0; len < strlen(candidate); ++len)
{
if (palindrome(candidate, len))
{
...
}
}
disclaimer: not compiled.
Honestly, you don't need a string slicing function just to check for palindromes within substrings:
/* start: Pointer to first character in the string to check.
* end: Pointer to one byte beyond the last character to check.
*
* Return:
* -1 if start >= end; this is considered an error
* 0 if the substring is not a palindrome
* 1 if the substring is a palindrome
*/
int
ispalin (const char *start, const char *end)
{
if (start >= end)
return -1;
for (; start < end; ++start)
if (*start != *--end)
return 0;
return 1;
}
With that, you can create the following:
int
main ()
{
const char *s = "madam";
/* i: index of first character in substring
* n: number of characters in substring
*/
size_t i, n;
size_t len = strlen (s);
for (i = 0; i < len; ++i)
{
for (n = 1; n <= len - i; ++n)
{
/* Start of substring. */
const char *start = s + i;
/* ispalin(s[i:i+n]) in Python */
switch (ispalin (start, start + n))
{
case -1:
fprintf (stderr, "error: %p >= %p\n", (void *) start, (void *) (start + n));
break;
case 0:
printf ("Not a palindrome: %.*s\n", (int) n, start);
break;
case 1:
printf ("Palindrome: %.*s\n", (int) n, start);
break;
} /* switch (ispalin) */
} /* for (n) */
} /* for (i) */
}
Of course, if you really wanted a string slicing function merely for output (since you technically shouldn't cast a size_t to int), and you still want to be able to format the output easily, the answer by Paul Griffiths should suffice quite well, or you can use mine or even one of strncpy or the nonstandard strlcpy, though they all have their strengths and weaknesses:
/* dest must have
* 1 + min(strlen(src), n)
* bytes available and must not overlap with src.
*/
char *
strslice (char *dest, const char *src, size_t n)
{
char *destp = dest;
/* memcpy here would be ideal, but that would mean walking the string twice:
* once by calling strlen to determine the minimum number of bytes to copy
* and once for actually copying the substring.
*/
for (; n != 0 && *src != 0; --n)
*destp++ = *src++;
*destp = 0;
return dest;
}
strslice actually works like a combination of strncpy and the nonstandard strlcpy, though there are differences between these three functions:
strlcpy will cut the copied string short to add a null terminator at dest[n - 1], so copying exactly n bytes before adding a null terminator requires you to pass n + 1 as the buffer size.
strncpy may not terminate the string at all, leaving dest[n - 1] equal to src[n - 1], so you would need to add a null terminator yourself just in case. If n is greater than the src string length, dest will be padded with null terminators until n bytes have been written.
strslice will copy up to n bytes if necessary, like strncpy, and will require an extra byte for the null terminator, meaning a maximum of n+1 bytes are necessary. It doesn't waste time writing unnecessary null terminators as strncpy does. This can be thought of as a "lightweight strlcpy" with a small difference in what n means and can be used where the resulting string length won't matter.
You could also create a memslice function if you wanted, which would allow for embedded null bytes, but it already exists as memcpy.
There is not any built-in function/method in any standard C library which can handle this. However, you can come up with your own method to do the same.

Why can't I copy an array by using `=`?

I'm starting to learn C by reading K&R and going through some of the exercises. After some struggling, I was finally able to complete exercise 1-19 with the code below:
/* reverse: reverse the character string s */
void reverse(char s[], int slen)
{
char tmp[slen];
int i, j;
i = 0;
j = slen - 2; /* skip '\0' and \n */
tmp[i] = s[j];
while (i <= slen) {
++i;
--j;
tmp[i] = s[j];
}
/* code from copy function p 29 */
i = 0;
while ((s[i] = tmp[i]) != '\0')
++i;
}
My question is regarding that last bit of code where the tmp char array is copied to s. Why doesn't a simple s = tmp; work instead? Why does one have to iterate through the array copying index by index?
Maybe I'm just old and grumpy, but the other answers I've seen seem to miss the point completely.
C does not do array assignments, period. You cannot assign one array to another array by a simple assignment, unlike some other languages (PL/1, for instance; Pascal and many of its descendants too - Ada, Modula, Oberon, etc.). Nor does C really have a string type. It only has arrays of characters, and you can't copy arrays of characters (any more than you can copy arrays of any other type) without using a loop or a function call. [String literals don't really count as a string type.]
The only time arrays are copied is when the array is embedded in a structure and you do a structure assignment.
In my copy of K&R 2nd Edition, exercise 1-19 asks for a function reverse(s); in my copy of K&R 1st Edition, it was exercise 1-17 instead of 1-19, but the same question was asked.
Since pointers have not been covered at this stage, the solution should use indexes instead of pointers. I believe that leads to:
#include <string.h>
void reverse(char s[])
{
int i = 0;
int j = strlen(s) - 1;
while (i < j)
{
char c = s[i];
s[i++] = s[j];
s[j--] = c;
}
}
#ifdef TEST
#include <stdio.h>
int main(void)
{
char buffer[256];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
int len = strlen(buffer);
if (len == 0)
break;
buffer[len-1] = '\0'; /* Zap newline */
printf("In: <<%s>>\n", buffer);
reverse(buffer);
printf("Out: <<%s>>\n", buffer);
}
return(0);
}
#endif /* TEST */
Compile this with -DTEST to include the test program and without to have just the function reverse() defined.
With the function signature given in the question, you avoid calling strlen() twice per line of input. Note the use of fgets() — even in test programs, it is a bad idea to use gets(). The downside of fgets() compared to gets() is that fgets() does not remove the trailing newline where gets() does. The upsides of fgets() are that you don't get array overflows and you can tell whether the program found a newline or whether it ran out of space (or data) before encountering a newline.
Your tmp array was declared on stack and so when your method completes, the memory used to hold the values will be freed because of scoping.
s = tmp means that s should point to the same memory location as tmp. This means that when tmp is freed, s will still be pointing to a now possible invalid, freed memory location.
This type of error is referred to as a dangling pointer.
Edit: This isn't a dangling modifier as pointed out in the comments of this answer. The issue is that saying s = tmp only changes what the parameter points to, not what the actual array that was passed.
Also, you could perform your reverse with a single pass and without allocating a whole array in memory by just swapping the values in place one by one:
void reverse(char s[], int slen) {
int i = 0; // First char
int j = slen - 2; // Last char minus \n\0
char tmp = 0; // Temp for the value being swapped
// Iterate over the array from the start until the two indexes collide.
while(i < j) {
tmp = s[i]; // Save the eariler char
s[i] = s[j]; // Replace it with the later char
s[j] = tmp; // Place the earlier char in the later char's spot
i++; // Move forwards with the early char
j--; // Move backwards with the later char
}
}
Because both s and tmp are memory addressees. If you s = tmp, both pointers would point to the same array.
Suppose that we have
char s[] ="ab";
/*
* Only for explanatory purposes.
*
*/
void foo(char s[]){
char tmp [] = "cd";
s= tmp;
}
foo(s);
after s= tmp you would have
s[0] : 'c'
s[1] : 'd'
s[2] : '\0'
Even though both arrays have the same data, a change in tmp, will affect both of them, because both arrays are actually the same. They both contain data that´s in the same memory address. So by changing any position of the tmp array, or destroying the tmp array, s would be affected in the same way.
By looping over the array, what you are doing is moving a piece of data from one memory address to another.
In my copy of K & R, pointers are explained in chapter 4. A quick glance through the first pages may be of help.
To round out the discussion here are two other possible ways to reverse as string:
void reverse(char string1[], char string2[])
{
int i = 0, len = 0;
while(string2[len] != '\0') // get the length of the string
len++;
while(len > 0)
{
string1[i] = string2[len-1]; // copy the elements in reverse
i++;
len--;
}
string1[i] = '\0'; // terminate the copied string
}
Or recursively:
void reverse (const char *const sPtr)
{
//if end of string
if (sPtr[0] == '\0')
{
return;
}
else //not end of the string...
{
reverse(&sPtr[1]); //recursive step
putchar(sPtr[0]); //display character
}
}
because tmp is a pointer, and you need to get a copy, not a "link".
In case of s=tmp, the value of tmp which is the also the beginning address of the array, would get copied to s.
That way both s and tmp will point to the same address in memory, which I think is not the purpose.
cheers
Try experimenting and see what happens when you do things like this:
void modifyArrayValues(char x[], int len)
{
for (int i = 0; i < len; ++i)
x[i] = i;
}
void attemptModifyArray(char x[], int len)
{
char y[10];
for (int i = 0; i < len; ++i)
y[i] = i;
x = y;
}
int main()
{
int i = 0;
char x[10];
for (i = 0; i < 10; ++i)
x[i] = 0;
attemptModifyArray(x, 10);
for (i=0; i < 10; ++i)
printf("%d\n", x[i]); // x is still all 0's
modifyArrayValues(x, 10);
for (i=0; i < 10; ++i)
printf("%d\n", x[i]); // now x has 0-9 in it
}
What happens when you modify the array directly in attemptModifyArray, you are just overwriting a local copy of the address of the array x. When you return, the original address is still in main's copy of x.
When you modify the values in the array in modifyArrayValues, you are modifying the actual array itself which has its address stored in modifyArrayValues local copy of x. When you return, x is still holding on to the same array, but you have modified the values in that array.
There's an interesting sub-thread in this thread about arrays and pointers
I found this link on wikipedia with a peculiar code snippet showing just how 'plasticine' C can be!
/* x designates an array */
x[i] = 1;
*(x + i) = 1;
*(i + x) = 1;
i[x] = 1; /* strange, but correct: i[x] is equivalent to *(i + x) */
Of course what's even more confusing in C is that I can do this:
unsigned int someval = 0xDEADD00D;
char *p = (char *)&someval;
p[2] = (char)0xF0;
So the interchangibility of pointers and arrays seems so deep-set in the C language as to be almost intentional.
What does everyone else think?
---Original Post---
s and tmp are both pointers so doing s = tmp will simply make s point at the address where tmp lives in memory.
Another problem with what you outlined is that tmp is a local variable so will become 'undefined' when it goes out of scope i.e when the function returns.
Make sure you thoroughly grasp these three concepts and you won't go far wrong
Scope
The difference between the stack and the heap
Pointers
Hope that helps and keep going!
A very straight forward answer would be -
both s and tmp are pointers to a memory location and not the arrays themselves.
In other words, s and tmp are memory addresses where the array values are stored but not the values themselves.
And one of the common ways to access these array values are by using indices like s[0] or tmp[0].
Now, if you will try to simply copy, s = tmp, the memory address of tmp array will be copied over to s. This means that, the original s array will be lost and even s memory pointer will now point to tmp array.
You will understand these concepts well with due time so keep going through the book.
I hope this elementary explanation helps.

Resources