Not able to understand this C program - c

I am new to programming, not getting the code below. This program checks if a character c is in the string s.
int is_in(char *s, char c){
while(*s){
if(*s==c) return 1;
else s++;
return 0;
}
The main thing confusing me is, how the while loop will stop, as, I think s++ will go through all over the memory, after the end of string also. Can anyone explain this please? Please correct me if I am wrong.

The loop stops when *s is 0, i.e. at the end of the NUL-terminated string.
The idiomatic way of modelling strings in C is to terminate them with 0. Note that if s is not formed in this way, then the behaviour of your function is undefined.
Personally I'd prefer the function to be int is_in(const char *s, char c) to signify to the caller that the function doesn't modify the string.

Your intuition that the pointer s will continue to loop indefinitely would be correct were it not for two things:
C strings are terminated by a null-terminator (the character '\0'). This acts as a sentinel value for functions that process strings; this is necessary since when an array is passed to a function it decays to a pointer to its first element, losing length information.
The loop condition while(*s) will be false when the null terminator is reached.
In fact, while(*s) { loop-body; s++; } is a well-known idiom in C for processing strings.

The string char *s is supposed to end with terminating NUL. The value of NUL is zero. Zero is what *s is supposed to "expand" to eventually.

Related

How to write my_strchr() in C

Right now I hope to write my own my_strchr() in the C language.
I checked that the answer should be like this:
char *my_strchr(const char *src1, int c) {
while (*src1 != '\0') { //when the string goes to the last, jump out
if (c == *src1) {
return src1;
}
src1++;
}
return NULL;
}
I'm quite confused by:
why we use *src1 in the while loop condition (*src1 != '\0'), is *src1 a pointer to the const char*? Can we just use src1 instead?
When we return value and src1++, we do not have that *src1, why?
For this function, it in fact prints the whole string after the required characters, but why is that? Why does it not print only the character that we need to find?
src1 is the pointer to the character, but we need the character itself. It's the same reason as in point 2, but the other way round.
If you write return *src1; you simply return the character you've found, that's always c, so your function would be pointless. You want to return the pointer to that char.
Because that's what the function is supposed to do. It returns the pointer to the first character c found. So printing the string pointed by that pointer prints the rest of the string.
It's important here to remember that in C a string is a series of characters that ends with a null ('\0') character. We reference the string in our code using a character pointer that points to the beginning of the string. When we pass a string as a parameter to a function what we're really getting is a pointer to the first character in the string.
Because of this fact, we can use pointer math to increment through a string. The pattern:
while (*src1 != '\0') {
//do stuff
src1++;
}
is a very common idiom in C. We might phrase it in English as:
While the value of the character in the string we are looking at (dereference src1 with the * operator) is not (inequality operator !=) the end of string indicator (null byte, 0 or '\0'), do some stuff, then move the pointer to point to the next character in the string (increment operator ++).
We often use the same kind of code structure to process other arrays or linked lists of things, comparing pointers to the NULL pointer.
To question #2, we're returning the value of the pointer from this function src1 and not the value of what it points to *scr1 because the question that this function should answer is "Where is the first instance of character c in the string that starts at the location pointed to by src1.
Question #3 implies that the code that calls this function is printing a string that starts from the pointer returned from this function. My guess is that the code looks something like this:
printf("%s", my_strchr(string, 'a'));
printf() and friends will print from the location provided in the argument list that matches up with the %s format specifier and then keep printing until it gets to the end of string character ('\0', the null terminator).
In C, a string is basically an array of char, an array is a pointer pointing to the first element in the array, and a pointer is the memory address of the value. Therefore:
You can use *src1 to get the value that it is pointing to.
src1++ means to +1 on the address, so you are basically moving where the pointer is pointing at.
Since you are returning a pointer, it is essentially equal to a string (char array).
In addition to Jabberwocky's answer, please note that the code in the question has 2 bugs:
c must be converted to char for the comparison with *src1: strchr("ABC", 'A' + 256) returns a pointer to the string literal unless char has more than 8 bits.
Furthermore, if c converted to a char has the value 0, a pointer to the null terminator should be returned, not NULL as in the posted code.
Here is a modified version:
char *my_strchr(const char *s, int c) {
for (;;) {
if ((char)c == *s) {
return src1;
}
if (*s++ == '\0') {
return NULL;
}
}
}

Why C function strlen() returns a wrong length of a char?

My C codes are listed below:
char s="MIDSH"[3];
printf("%d\n",strlen(&s));
The result of running is 2, which is wrong because char s is just an 'S'.
Does anybody know why and how to solve this problem?
That's actually quite an interesting question. Let's break it up:
"MIDSH"[3]
String literals have array types. So the above applies the subscript operator to the array and evaluates to the 4th character 'S'. It then assigns it to the single character variable s.
printf("%d\n",strlen(&s));
Since s is a single character, and not part of an actual string, the behavior is undefined for the above code.
Signature of strlen is:
size_t strlen(const char *s);
/* The strlen() function calculates the
length of the string s, excluding the
terminating null byte ('\0'). */
strlen expects the input const char array is null terminated. But when you pass the address of an auto variable, you can't guarantee this and thus your program has an undefined behavior.
Does anybody know why and how to solve this problem?
sizeof(char) is guaranteed to be 1. So use sizeof or 1.
The statement
printf("%d\n",strlen(&s));
make no sense for the given case. strlen expects a null terminating string, s is of char type and &s need not necessarily point to an string. What you are getting is one the result of undefined behavior of the program.
To get the size of s you can use sizeof operator
printf("%zu\n", sizeof(s));
The strlen function treats its argument as a pointer to a sequence of characters, where the sequence is terminated by the '\0' character.
By passing a pointer to the single character variable s you effectively say that &s is the first character in such a sequence, but it's not. That means strlen will continue to search in memory under false premises and you will have undefined behavior.
when you use
"char s=" you create a new address on the stack for 's',and this address can't be add or reduce!so though you give strlen a char* but it can't find '\0' by add address.All is wrong.
you should use strlen with a address for char which is a array.like:
char* s = "MIDSH";
printf("%d\n", strlen(s)); //print 5
s++;
printf("%d\n", strlen(s)); //print 4

How to store '\0' in a char array

Is it possible to store the char '\0' inside a char array and then store different characters after? For example
char* tmp = "My\0name\0is\0\0";
I was taught that is actually called a string list in C, but when I tried to print the above (using printf("%s\n", tmp)), it only printed
"My".
Yes, it is surely possible, however, furthermore, you cannot use that array as string and get the content stored after the '\0'.
By definition, a string is a char array, terminated by the null character, '\0'. All string related function will stop at the terminating null byte (for example, an argument, containing a '\0' in between the actual contents, passed to format specifier%s in printf()).
Quoting C11, chapter §7.1.1, Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null
character. [...]
However, for byte-by-byte processing, you're good to go as long as you stay within the allocated memory region.
The problem you are having is with the function you are using to print tmp. Functions like printf will assume that the string is null terminated, so it will stop when it sees the first \0
If you try the following code you will see more of the value in tmp
int main(int c,char** a){
char* tmp = "My\0name\0is\0\0";
write(1,tmp,12);
}

How does this C copy function exit its loop?

void copy (char *source, char *dest) {
while (*dest++ = *source++);
}
The char that is represented by *source is copied to the field *dest points to. For the next iteration, each char pointer points to the next field in memory, is that correct?
When does this loop actually stop? The only condition I can think of is that there's no space left in memory, but then the function must terminate with an error, shouldn't it?
I'm completely new to C, so forgive me the simple questions.
chars are integral types. Integral types are interpreted as conditionals in the following way:
0 -> false
Anything else -> true
Since "strings" in C are null-terminated (meaning 0 or '\0') when it reaches the end of the string it stops.
The 'result' of an assignment is the right-hand value. So x=1; actually returns a value; in this case, '1'.
Your code copies characters until it encountered the terminating 0 at the end of the source string.
Your interpretation of the copy is correct. The loop stops when what dest is pointing to is zero, i.e., the '\0' character. See http://en.wikipedia.org/wiki/Null-terminated_string

How does "while(*s++ = *t++)" copy a string?

My question is, what does this code do (from http://www.joelonsoftware.com/articles/CollegeAdvice.html):
while (*s++ = *t++);
the website says that the code above copies a string but I don't understand why...
does it have to do with pointers?
It is equivalent to this:
while (*t) {
*s = *t;
s++;
t++;
}
*s = *t;
When the char that t points to is '\0', the while loop will terminate. Until then, it will copy the char that t is pointing to to the char that s is pointing to, then increment s and t to point to the next char in their arrays.
This has so much going on under the covers:
while (*s++ = *t++);
The s and t variables are pointers (almost certainly characters), s being the destination. The following steps illustrate what's happening:
the contents of t (*t) are copied to s (*s), one character.
s and t are both incremented (++).
the assignment (copy) returns the character that was copied (to the while).
the while continues until that character is zero (end of string in C).
Effectively, it's:
while (*t != 0) {
*s = *t;
s++;
t++;
}
*s = *t;
s++;
t++;
but written out in a much more compact way.
Let's assume s and t are char *s that point to strings (and assume s is at least as large as t). In C, strings all end in 0 (ASCII "NUL"), correct? So what does this do:
*s++ = *t++;
First, it does *s = *t, copying the value at *t to *s. Then, it does s++, so s now points to the next character. And then it does t++, so t points to the next character. This has to do with operator precedence and prefix vs. postfix increment/decrement.
Operator precedence is the order in which operators are resolved. For a simple example, look:
4 + 2 * 3
Is this 4 + (2 * 3) or (4 + 2) * 3? Well, we know it is the first one because of precedence - the binary * (multiplication operator) has higher precedence than the binary + (addition operator), and is resolved first.
In *s++, we have unary * (pointer dereference operator) and unary ++ (postfix increment operator). In this case, ++ has higher precedence (also said to "bind tighter") than *. If we had said ++*s, we would increment the value at *s rather than the address pointed to by s because prefix increment has lower precedence* as dereference, but we used postfix increment, which has higher precedence. If we had wanted to use prefix increment, we could have done *(++s), since the parenthesis would have overridden all lower precedences and forced ++s to come first, but this would have the undesirable side effect of leaving an empty character at the beginning of the string.
Note that just because it has higher precedence doesn't mean it happens first. Postfix increment specifically happens after the value has been used, which his why *s = *t happens before s++.
So now you understand *s++ = *t++. But they put it in a loop:
while(*s++ = *t++);
This loop does nothing - the action is all in the condition. But check out that condition - it returns "false" if *s is ever 0, which means *t was 0, which means they were at the end of the string (yay for ASCII "NUL"). So this loop loops as long as there are characters in t, and copies them dutifully into s, incrementing s and t all the way. When this loop exits, s has been NUL-terminated, and is a proper string. The only problem is, s points to the end. Keep another pointer handy that points to the beginning of s (i.e. s before the while() loop) - that will be your copied string:
char *s, *string = s;
while(*s++ = *t++);
printf("%s", string); // prints the string that was in *t
Alternatively, check this out:
size_t i = strlen(t);
while(*s++ = *t++);
s -= i + 1;
printf("%s\n", s); // prints the string that was in *t
We started by getting the length, so when we ended, we did more pointer arithmetic to put s back at the beginning, where it started.
Of course, this code fragment (and all my code fragments) ignore buffer issues for simplicity. The better version is this:
size_t i = strlen(t);
char *c = malloc(i + 1);
while(*s++ = *t++);
s -= i + 1;
printf("%s\n", s); // prints the string that was in *t
free(c);
But you knew that already, or you'll soon ask a question on everyone's favorite website about it. ;)
* Actually, they have the same precedence, but that's resolved by different rules. They effectively have lower precedence in this situation.
while(*s++ = *t++);
Why do people think it is equivalent to:
while (*t) {
*s = *t;
s++;
t++;
}
*s = *t; /* if *t was 0 at the beginning s and t are not incremented */
when it obviously isn't.
char tmp = 0;
do {
tmp = *t;
*s = tmp;
s++;
t++;
} while(tmp);
is more like it
EDIT: Corrected a compilation error. The tmp variable must be declared outside of the loop.
The aspect that is mysterious about this is the order of operations. If you look up the C language spec, it states that in this context, the order of operations is as follows:
1. * operator
2. = (assignment) operator
3. ++ operator
So the while loop then becomes, in english:
while (some condition):
Take what is at address "t" and copy it over to location at address "s".
Increment "s" by one address location.
Increment "t" by one address location.
Now, what is "some condition"? The C lang specification also says that the value of an assignment expression is the assigned value itself, which in this case is *t.
So "some condition" is "t points to something that is non-zero", or in a simpler way, "while the data at location t is not NULL".
The C Programming Language (K&R) by Brian W. Kernighan and Dennis M. Ritchie gives a detailed explanation of this.
Second Edition, Page 104:
5.5 Character Pointers and Functions
A string constant, written as
"I am a string"
is an array of characters. In the internal representation, the array is terminated with the null character '\0' so that programs can find the end. The length in storage is thus one more than the number of characters between the double quotes.
Perhaps the most common occurrence of string constants is as arguments to functions, as in
printf("hello, world\n");
Where a character string like this appears in a program, access to it is through a character pointer; printf receives a pointer to the beginning of the character array. That is, a string constant is accessed by a pointer to its first element.
String constants need not be functions arguments. If pmessage is declared as
char *pmessage;
then the statement
pmessage = "now is the time";
assigns to pmessage a pointer to the character array. This is not a string copy; only pointers are involved. C does not provide any operators for processing an entire string of characters as a unit.
There is an important different between these definitions:
char amessage[] = "now is the time"; /* an array */
char *pmessage = "now is the time"; /* a pointer */
amessage is an array, just big enough to hold the sequence of characters and '\0' that initializes it. Individual characters within the array may be changed by amessage will always refer to the same storage. On the other hand, pmessage is a pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is undefined if you try to modify the string contents.
+---+ +--------------------+
pmessage: | o-------->| now is the time \0 |
+---+ +--------------------+
+--------------------+
amessage: | now is the time \0 |
+--------------------+
We will illustrate more aspects of pointers and arrays by studying versions of two useful functions adapted from the standard library. The first function is strcpy(s,t), which copies the string t to the string s. It would be nice just to say s = t but this copies the pointer, not the characters.To copy the characters, we need a loop. The array version is first:
/* strcpy: copy t to s; array subscript version */
void strcpy(char *s, char *t)
{
int i;
i = 0;
while((s[i] = t[i]) != '\0')
i ++;
}
For contrast, here is a version of strcpy with pointers:
/* strcpy: copy t to s; pointer version 1 */
void strcpy(char *s, char *t)
{
while((*s = *t) != '\0')
{
s ++;
t ++;
}
}
Because arguments are passed by value, strcpy can use the parameters s and t in any way it pleases. Here they are conveniently initialized pointers, which are marched along the arrays a character at a time, until the '\0' that terminates t has been copied to s.
In practice, strcpy would not be written as we showed it above. Experienced C programmers would prefer
/* strcpy: copy t to s; pointer version 2 */
void strcpy(char *s, char *t)
{
while((*s++ = *t++) != '\0')
;
}
This moves the increment of s and t into the test part of the loop. The value of *t++ is the character that t pointed to before t was incremented; the postfix ++ doesn't change t until after this character has been fetched. In the same way, the character is stored into the old s position before s is incremented. This character is also the value that is compared against '\0' to control the loop. The net effect is that characters are copied from t to s, up to and including the terminating '\0'.
As the final abbreviation, observe that a comparison against '\0' is redundant, since the question is merely whether the expression is zero. So the function would likely be written as
/* strcpy: cope t to s; pointer version 3 */
void strcpy(char *s, char *t)
{
while(*s++ = *t++);
}
Although this may seem cryptic as first sight, the notational convenience is considerable, and the idiom should be mastered, because you will see if frequently in C programs.
The strcpy in the standard library (<string.h>) returns the target string as its function value.
This is the end of the relevant parts of this section.
PS: If you enjoyed reading this, consider buying a copy of K&R - it is not expensive.
It works by copying characters from the string pointed to by 't' into the string pointed to by 's'. For each character copies, both pointers are incremented. The loop terminates when it finds a NUL character (equal to zero, hence the exit).
HINTS:
What does the operator '=' do?
What is the value of the expression "a = b"? Eg: if you do "c = a = b" what value does c get?
What terminates a C string? Does it evaluate true or false?
In "*s++", which operator has higher precedence?
ADVICE:
Use strncpy() instead.
it copies a string because arrays are always passed by reference, and string is just a char array. Basically what is happening is (if i remember the term correctly) pointer arithmetic. Here's a bit more information from wikipedia on c arrays.
You are storing the value that was dereferenced from t in s and then moving to the next index via the ++.
Say you have something like this:
char *someString = "Hello, World!";
someString points to the first character in the string - in this case 'H'.
Now, if you increment the pointer by one:
someString++
someString will now point to 'e'.
while ( *someString++ );
will loop until whatever someString points at becomes NULL, which is what signals the end of a string ("NULL Terminated").
And the code:
while (*s++ = *t++);
is equal to:
while ( *t != NULL ) { // While whatever t points to isn't NULL
*s = *t; // copy whatever t points to into s
s++;
t++;
}
Yes, it does have to do with pointers.
The way to read the code is this: "the value that is pointed to by the pointer "s" (which gets incremented after this operation) gets the value which is pointed to by the pointer "t" (which gets incremented after this operation; the entire value of this operation evaluates to the value of the character copied; iterate across this operation until that value equals zero". Since the value of the string null terminator is the character value of zero ('/0'), the loop will iterate until a string is copied from the location pointed to by t to the location pointed to by s.
Many adherents of С language are convinced that the "while (* s ++ = * t ++)"
is a genuine grace.
In the conditional expression of the loop "while",three side effects are inserted(shift of one pointer, shift of the second pointer, assignment).
The body of the loop as a result was empty, since all the functionality is placed in a conditional expression.
use for with int i:
char t[]="I am a programmer",s[20];
for(int i=0;*(t+i)!='\0';i++)
*(s+i)=*(t+i);
*(s+i)=*(t+i); //the last char in t '\0'
printf("t is:%s\n",t);
printf("s is:%s\n",s);
use for with pointer++:
char t[]="I am a programmer",s[20];
char *p1,*p2;
p1=t,p2=s;
for(;*p1!='\0';p1++,p2++)
*p2 = *p1;
*p2 = *p1;
printf("t is:%s\n",t);
printf("s is:%s\n",s);
use while with pointer++:
char t[]="I am a programmer",s[20];
char *p1,*p2;
p1=t,p2=s;
while(*p2++=*p1++);
printf("t is:%s\n",t);
printf("s is:%s\n",s);
printf("t is:%s\n",p1-18);
printf("s is:%s\n",p2-18);
use array to initialize pointers:
char a[20],*t="I am a programmer",*s;
s=a;
while(*s++=*t++);
printf("t is:%s\n",t-18);
printf("s is:%s\n",s-18);
printf("s is:%s\n",a);
starts a while loop....
*s = *t goes first, this assigns to what t points at to what s points at. ie, it copies a character from t string to s string.
what is being assigned is passed to the while condition... any non zero is "true" so it will continue, and 0 is false, it will stop.... and it just happens the end of a string is also zero.
s++ and t++ they increment the pointers
and it all starts again
so it keeps assigning looping, moving the pointers, until it hits a 0, which is the end of the string
Yes this uses pointers, and also does all the work while evaluating the while condition. C allows conditional expressions to have side-effects.
The "*" operator derefereces pointers s and t.
The increment operator ("++") increments pointers s and t after the assignment.
The loop terminates on condition of a null character, which evaluates as false in C.
One additional comment.... this is not safe code, as it does nothing to ensure s has enough memory allocated.
The question I provided the following answer on was closed as a duplicate of this question, so I am copying the relevant part of the answer here.
The actual semantic explanation of the while loop would be something like:
for (;;) {
char *olds = s; // original s in olds
char *oldt = t; // original t in oldt
char c = *oldt; // original *t in c
s += 1; // complete post increment of s
t += 1; // complete post increment of t
*olds = c; // copy character c into *olds
if (c) continue; // continue if c is not 0
break; // otherwise loop ends
}
The order that s and t are saved, and the order that s and t are incremented may be interchanged. The save of *oldt to c can occur any time after oldt is saved and before c is used. The assignment of c to *olds can occur any time after c and olds are saved. On the back of my envelop, this works out to at least 40 different interpretations.
Well this is true just in the case of the char if there is no \0 and the it is an integer array the the program will crash because there will be a address whose elements are not the part of the array or pointer, if the system has memory that was allocated using the malloc then the system will keep giving the memory

Resources