Hi all I have a question regarding a passing a string to a function in C. I am using CS50 library and I know they are passing string as a char array (char pointer to a start of array) so passing is done by reference. My function is receiving array as argument and it returns array. When I change for example one of the element of array in function this change is reflected to original string as I expect. But if I assign new string to argument, function returns another string and original string is not change. Can you explain the mechanics behind this behaviour.
#include <stdlib.h>
#include <cs50.h>
#include <stdio.h>
string test(string s);
int main(void)
{
string text = get_string("Text: ");
string new_text = test(text);
printf("newtext: %s\n %s\n", text, new_text);
printf("\n");
return 0;
}
string test(string s)
{
//s[0] = 'A';
s = "Bla";
return s;
}
First example reflects change in the first letter on both text and newtext strings, but second example prints out text unchanged and newtext as "Bla"
Thanks!
This is going to take a while.
Let's start with the basics. In C, a string is a sequence of character values including a 0-valued terminator. IOW, the string "hello" is represented as the sequence {'h', 'e', 'l', 'l', 'o', 0}. Strings are stored in arrays of char (or wchar_t for "wide" strings, which we won't talk about here). This includes string literals like "Bla" - they're stored in arrays of char such that they are available over the lifetime of the program.
Under most circumstances, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", so most of the time when we're dealing with strings we're actually dealing with expressions of type char *. However, this does not mean that an expression of type char * is a string - a char * may point to the first character of a string, or it may point to the first character in a sequence that isn't a string (no terminator), or it may point to a single character that isn't part of a larger sequence.
A char * may also point to the beginning of a dynamically allocated buffer that has been allocated by malloc, calloc, or realloc.
Another thing to note is that the [] subscript operator is defined in terms of pointer arithmetic - the expression a[i] is defined as *(a + i) - given an address value a (converted from an array type as described above), offset i elements (not bytes) from that address and dereference the result.
Another important thing to note is that the = is not defined to copy the contents of one array to another. In fact, an array expression cannot be the target of an = operator.
The CS50 string type is actually a typedef (alias) for the type char *. The get_string() function performs a lot of magic behind the scenes to dynamically allocate and manage the memory for the string contents, and makes string processing in C look much higher level than it really is. I and several other people consider this a bad way to teach C, at least with respect to strings. Don't get me wrong, it's an extremely useful utility, it's just that once you don't have cs50.h available and have to start doing your own string processing, you're going to be at sea for a while.
So, what does all that nonsense have to do with your code? Specifically, the line
s = "Bla";
What's happening is that instead of copying the contents of the string literal "Bla" to the memory that s points to, the address of the string literal is being written to s, overwriting the previous pointer value. You cannot use the = operator to copy the contents of one string to another; instead, you'll have to use a library function like strcpy:
strcpy( s, "Bla" );
The reason s[0] = A worked as you expected is because the subscript operator [] is defined in terms of pointer arithmetic. The expression a[i] is evaluated as *(a + i) - given an address a (either a pointer, or an array expression that has "decayed" to a pointer as described above), offset i elements (not bytes!) from that address and dereference the result. So s[0] is pointing to the first element of the string you read in.
This is difficult to answer correctly without a code example. I will make one but it might not match what you are doing.
Let's take this C function:
char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
s[4] = 'X';
}
}
return s;
}
That function will accept a pointer to a character array and if the pointer is not NULL and the zero-terminated array is longer than 4 characters, it will replace the fifth character at index 4 with an 'X'. There are no references in C. They are always called pointers. They are the same thing, and you get access to a pointed-at value with the dereference operator *p, or with array syntax like p[0].
Now, this function:
char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
char *new_s = malloc(len+1);
strcpy(new_s, s);
new_s[4] = 'X';
return new_s;
}
}
s = malloc(1);
s[0] = '\0';
return s;
}
That function returns a pointer to a newly allocated copy of the original character array, or a newly allocated empty string. (By doing that, the caller can always print it out and call free on the result.)
It does not change the original character array because new_s does not point to the original character array.
Now you could also do this:
const char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
return "string was longer than 4";
}
}
s = "string was not longer than 4";
return s;
}
Notice that I changed the return type to const char* because a string literal like "string was longer than 4" is constant. Trying to modify it would crash the program.
Doing an assignment to s inside the function does not change the character array that s used to point to. The pointer s points to or references the original character array and then after s = "string" it points to the character array "string".
#include <stdio.h>
#include <string.h>
int main ()
{
const char str[] = "http://www.tutorialspoint.com";
const char ch = '.';
char *ret;
ret = strchr(str, ch);
printf("String after |%c| is - |%s|\n", ch, ret);
return(0);
}
This code is copied from tutorialspoint.
From what I understand, ret is a pointer to a character. To use the value/ what the pointer is pointed to, I do *ret.
However, in this example, just by calling ret,printf() prints out .tutorialspoint.com. Why don't we use *ret to get .tutorialspoint.com since the string is the value in ret, which is accessed by *ret?
Have a look at the %s conversion specifier properties. When used in printf(), it expects an argument of type char * and so it gets. All is well.
To quote C11 standard, chapter §7.21.6.1
s If no l length modifier is present, the argument shall be a pointer to the initial
element of an array of character type.280) Characters from the array are
written up to (but not including) the terminating null character. [...]
In other words, to print out a string using %s, you have to supply the pointer to the start of the string, that is res, not *res.
To use the value/ what the pointer is pointed to, I do *ret.
Yes this is useful if code needs to use that value, a single character only. But code wants to print the contents of a string. A string is a sequence (or array) of characters up to and including the null character.
Why don't we use *ret to get ".tutorialspoint.com" since the string is the value in ret, which is accessed by *ret?
*ret resolves to a single a character. printf("%s", ... expects a pointer to a sequence or array of characters. By using printf("|%s|\n", ret);, printf() knows the address of the beginning of the string and can then print the initial character '.' and then subsequent ones 't', 'u', 't', ... until encountering a null character.
Better code would have checked the ret != NULL before attempting the printf()
ret = strchr(str, ch);
if (ret != NULL) {
printf("String after |%c| is - |%s|\n", ch, ret);
}
"since the string is the value in ret, which is accessed by *ret"
No. A string in C is a number of chars ending with \0. So *ret will point to the first char of the many. There is no unitary value of a String as e.g. in Java.
* is the dereference operator.*ret will give the . since you use * to access the pointed element. The %s-printf format uses a pointer to chars (the pointer is denoted only by the name ret). It goes on until it encounters \0, the terminating character.
The function strchr from Linux man(3):
The strchr() function returns a pointer to the first occurrence of
the character c in the string s.
You could implement it manually with a while loop and simply incrementing the pointer. However, you would still need the %s-specifier to print it. Or the tedious for loop and a function call on every char. Inefficient.
Note: Every for-loop is to be represented by a while loop, so the above mentioned loops are not firmly set.
void copy (char *source, char *dest) {
while (*dest++ = *source++);
}
The char that is represented by *source is copied to the field *dest points to. For the next iteration, each char pointer points to the next field in memory, is that correct?
When does this loop actually stop? The only condition I can think of is that there's no space left in memory, but then the function must terminate with an error, shouldn't it?
I'm completely new to C, so forgive me the simple questions.
chars are integral types. Integral types are interpreted as conditionals in the following way:
0 -> false
Anything else -> true
Since "strings" in C are null-terminated (meaning 0 or '\0') when it reaches the end of the string it stops.
The 'result' of an assignment is the right-hand value. So x=1; actually returns a value; in this case, '1'.
Your code copies characters until it encountered the terminating 0 at the end of the source string.
Your interpretation of the copy is correct. The loop stops when what dest is pointing to is zero, i.e., the '\0' character. See http://en.wikipedia.org/wiki/Null-terminated_string
I have just started learning pointers, and after much adding and removing *s my code for converting an entered string to uppercase finally works..
#include <stdio.h>
char* upper(char *word);
int main()
{
char word[100];
printf("Enter a string: ");
gets(word);
printf("\nThe uppercase equivalent is: %s\n",upper(word));
return 0;
}
char* upper(char *word)
{
int i;
for (i=0;i<strlen(word);i++) word[i]=(word[i]>96&&word[i]<123)?word[i]-32:word[i];
return word;
}
My question is, while calling the function I sent word which is a pointer itself, so in char* upper(char *word) why do I need to use *word?
Is it a pointer to a pointer? Also, is there a char* there because it returns a pointer to a character/string right?
Please clarify me regarding how this works.
That's because the type you need here simply is "pointer to char", which is denoted as char *, the asterisk (*) is part of the type specification of the parameter. It's not a "pointer to pointer to char", that would be written as char **
Some additional remarks:
It seems you're confusing the dereference operator * (used to access the place where a pointer points to) with the asterisk as a pointer sign in type specifcations; you're not using a dereference operator anywhere in your code; you're only using the asterisk as part of the type specification! See these examples: to declare variable as a pointer to char, you'd write:
char * a;
To assign a value to the space where a is pointing to (by using the dereference operator), you'd write:
*a = 'c';
An array (of char) is not exactly equal to a pointer (to char) (see also the question here). However, in most cases, an array (of char) can be converted to a (char) pointer.
Your function actually changes the outer char array (and passes back a pointer to it); not only will the uppercase of what was entered be printed by printf, but also the variable word of the main function will be modified so that it holds the uppercase of the entered word. Take good care the such a side-effect is actually what you want. If you don't want the function to be able to modify the outside variable, you could write char* upper(char const *word) - but then you'd have to change your function definition as well, so that it doesn't directly modify the word variable, otherwise the Compiler will complain.
char upper(char c) would be a function that takes a character and returns a character. If you want to work with strings the convention is that strings are a sequence of characters terminated by a null character. You cannot pass the complete string to a function so you pass the pointer to the first character, therefore char *upper(char *s). A pointer to a pointer would have two * like in char **pp:
char *str = "my string";
char **ptr_to_ptr = &str;
char c = **ptr_ptr_ptr; // same as *str, same as str[0], 'm'
upper could also be implemented as void upper(char *str), but it is more convenient to have upper return the passed string. You made use of that in your sample when you printf the string that is returned by upper.
Just as a comment, you can optimize your upper function. You are calling strlen for every i. C strings are always null terminated, so you can replace your i < strlen(word) with word[i] != '\0' (or word[i] != 0). Also the code is better to read if you do not compare against 96 and 123 and subtract 32 but if you check against and calculate with 'a', 'z', 'A', 'Z' or whatever character you have in mind.
the *words is even though a pointer bt the array word in function and the pointer word are actually pointing to the one and the same thing while passing arguments jst a copy of the "pointee" ie the word entered is passed and whatever operation is done is done on the pointer word so in the end we have to return a pointer so the return type is specified as *.
My question is, what does this code do (from http://www.joelonsoftware.com/articles/CollegeAdvice.html):
while (*s++ = *t++);
the website says that the code above copies a string but I don't understand why...
does it have to do with pointers?
It is equivalent to this:
while (*t) {
*s = *t;
s++;
t++;
}
*s = *t;
When the char that t points to is '\0', the while loop will terminate. Until then, it will copy the char that t is pointing to to the char that s is pointing to, then increment s and t to point to the next char in their arrays.
This has so much going on under the covers:
while (*s++ = *t++);
The s and t variables are pointers (almost certainly characters), s being the destination. The following steps illustrate what's happening:
the contents of t (*t) are copied to s (*s), one character.
s and t are both incremented (++).
the assignment (copy) returns the character that was copied (to the while).
the while continues until that character is zero (end of string in C).
Effectively, it's:
while (*t != 0) {
*s = *t;
s++;
t++;
}
*s = *t;
s++;
t++;
but written out in a much more compact way.
Let's assume s and t are char *s that point to strings (and assume s is at least as large as t). In C, strings all end in 0 (ASCII "NUL"), correct? So what does this do:
*s++ = *t++;
First, it does *s = *t, copying the value at *t to *s. Then, it does s++, so s now points to the next character. And then it does t++, so t points to the next character. This has to do with operator precedence and prefix vs. postfix increment/decrement.
Operator precedence is the order in which operators are resolved. For a simple example, look:
4 + 2 * 3
Is this 4 + (2 * 3) or (4 + 2) * 3? Well, we know it is the first one because of precedence - the binary * (multiplication operator) has higher precedence than the binary + (addition operator), and is resolved first.
In *s++, we have unary * (pointer dereference operator) and unary ++ (postfix increment operator). In this case, ++ has higher precedence (also said to "bind tighter") than *. If we had said ++*s, we would increment the value at *s rather than the address pointed to by s because prefix increment has lower precedence* as dereference, but we used postfix increment, which has higher precedence. If we had wanted to use prefix increment, we could have done *(++s), since the parenthesis would have overridden all lower precedences and forced ++s to come first, but this would have the undesirable side effect of leaving an empty character at the beginning of the string.
Note that just because it has higher precedence doesn't mean it happens first. Postfix increment specifically happens after the value has been used, which his why *s = *t happens before s++.
So now you understand *s++ = *t++. But they put it in a loop:
while(*s++ = *t++);
This loop does nothing - the action is all in the condition. But check out that condition - it returns "false" if *s is ever 0, which means *t was 0, which means they were at the end of the string (yay for ASCII "NUL"). So this loop loops as long as there are characters in t, and copies them dutifully into s, incrementing s and t all the way. When this loop exits, s has been NUL-terminated, and is a proper string. The only problem is, s points to the end. Keep another pointer handy that points to the beginning of s (i.e. s before the while() loop) - that will be your copied string:
char *s, *string = s;
while(*s++ = *t++);
printf("%s", string); // prints the string that was in *t
Alternatively, check this out:
size_t i = strlen(t);
while(*s++ = *t++);
s -= i + 1;
printf("%s\n", s); // prints the string that was in *t
We started by getting the length, so when we ended, we did more pointer arithmetic to put s back at the beginning, where it started.
Of course, this code fragment (and all my code fragments) ignore buffer issues for simplicity. The better version is this:
size_t i = strlen(t);
char *c = malloc(i + 1);
while(*s++ = *t++);
s -= i + 1;
printf("%s\n", s); // prints the string that was in *t
free(c);
But you knew that already, or you'll soon ask a question on everyone's favorite website about it. ;)
* Actually, they have the same precedence, but that's resolved by different rules. They effectively have lower precedence in this situation.
while(*s++ = *t++);
Why do people think it is equivalent to:
while (*t) {
*s = *t;
s++;
t++;
}
*s = *t; /* if *t was 0 at the beginning s and t are not incremented */
when it obviously isn't.
char tmp = 0;
do {
tmp = *t;
*s = tmp;
s++;
t++;
} while(tmp);
is more like it
EDIT: Corrected a compilation error. The tmp variable must be declared outside of the loop.
The aspect that is mysterious about this is the order of operations. If you look up the C language spec, it states that in this context, the order of operations is as follows:
1. * operator
2. = (assignment) operator
3. ++ operator
So the while loop then becomes, in english:
while (some condition):
Take what is at address "t" and copy it over to location at address "s".
Increment "s" by one address location.
Increment "t" by one address location.
Now, what is "some condition"? The C lang specification also says that the value of an assignment expression is the assigned value itself, which in this case is *t.
So "some condition" is "t points to something that is non-zero", or in a simpler way, "while the data at location t is not NULL".
The C Programming Language (K&R) by Brian W. Kernighan and Dennis M. Ritchie gives a detailed explanation of this.
Second Edition, Page 104:
5.5 Character Pointers and Functions
A string constant, written as
"I am a string"
is an array of characters. In the internal representation, the array is terminated with the null character '\0' so that programs can find the end. The length in storage is thus one more than the number of characters between the double quotes.
Perhaps the most common occurrence of string constants is as arguments to functions, as in
printf("hello, world\n");
Where a character string like this appears in a program, access to it is through a character pointer; printf receives a pointer to the beginning of the character array. That is, a string constant is accessed by a pointer to its first element.
String constants need not be functions arguments. If pmessage is declared as
char *pmessage;
then the statement
pmessage = "now is the time";
assigns to pmessage a pointer to the character array. This is not a string copy; only pointers are involved. C does not provide any operators for processing an entire string of characters as a unit.
There is an important different between these definitions:
char amessage[] = "now is the time"; /* an array */
char *pmessage = "now is the time"; /* a pointer */
amessage is an array, just big enough to hold the sequence of characters and '\0' that initializes it. Individual characters within the array may be changed by amessage will always refer to the same storage. On the other hand, pmessage is a pointer, initialized to point to a string constant; the pointer may subsequently be modified to point elsewhere, but the result is undefined if you try to modify the string contents.
+---+ +--------------------+
pmessage: | o-------->| now is the time \0 |
+---+ +--------------------+
+--------------------+
amessage: | now is the time \0 |
+--------------------+
We will illustrate more aspects of pointers and arrays by studying versions of two useful functions adapted from the standard library. The first function is strcpy(s,t), which copies the string t to the string s. It would be nice just to say s = t but this copies the pointer, not the characters.To copy the characters, we need a loop. The array version is first:
/* strcpy: copy t to s; array subscript version */
void strcpy(char *s, char *t)
{
int i;
i = 0;
while((s[i] = t[i]) != '\0')
i ++;
}
For contrast, here is a version of strcpy with pointers:
/* strcpy: copy t to s; pointer version 1 */
void strcpy(char *s, char *t)
{
while((*s = *t) != '\0')
{
s ++;
t ++;
}
}
Because arguments are passed by value, strcpy can use the parameters s and t in any way it pleases. Here they are conveniently initialized pointers, which are marched along the arrays a character at a time, until the '\0' that terminates t has been copied to s.
In practice, strcpy would not be written as we showed it above. Experienced C programmers would prefer
/* strcpy: copy t to s; pointer version 2 */
void strcpy(char *s, char *t)
{
while((*s++ = *t++) != '\0')
;
}
This moves the increment of s and t into the test part of the loop. The value of *t++ is the character that t pointed to before t was incremented; the postfix ++ doesn't change t until after this character has been fetched. In the same way, the character is stored into the old s position before s is incremented. This character is also the value that is compared against '\0' to control the loop. The net effect is that characters are copied from t to s, up to and including the terminating '\0'.
As the final abbreviation, observe that a comparison against '\0' is redundant, since the question is merely whether the expression is zero. So the function would likely be written as
/* strcpy: cope t to s; pointer version 3 */
void strcpy(char *s, char *t)
{
while(*s++ = *t++);
}
Although this may seem cryptic as first sight, the notational convenience is considerable, and the idiom should be mastered, because you will see if frequently in C programs.
The strcpy in the standard library (<string.h>) returns the target string as its function value.
This is the end of the relevant parts of this section.
PS: If you enjoyed reading this, consider buying a copy of K&R - it is not expensive.
It works by copying characters from the string pointed to by 't' into the string pointed to by 's'. For each character copies, both pointers are incremented. The loop terminates when it finds a NUL character (equal to zero, hence the exit).
HINTS:
What does the operator '=' do?
What is the value of the expression "a = b"? Eg: if you do "c = a = b" what value does c get?
What terminates a C string? Does it evaluate true or false?
In "*s++", which operator has higher precedence?
ADVICE:
Use strncpy() instead.
it copies a string because arrays are always passed by reference, and string is just a char array. Basically what is happening is (if i remember the term correctly) pointer arithmetic. Here's a bit more information from wikipedia on c arrays.
You are storing the value that was dereferenced from t in s and then moving to the next index via the ++.
Say you have something like this:
char *someString = "Hello, World!";
someString points to the first character in the string - in this case 'H'.
Now, if you increment the pointer by one:
someString++
someString will now point to 'e'.
while ( *someString++ );
will loop until whatever someString points at becomes NULL, which is what signals the end of a string ("NULL Terminated").
And the code:
while (*s++ = *t++);
is equal to:
while ( *t != NULL ) { // While whatever t points to isn't NULL
*s = *t; // copy whatever t points to into s
s++;
t++;
}
Yes, it does have to do with pointers.
The way to read the code is this: "the value that is pointed to by the pointer "s" (which gets incremented after this operation) gets the value which is pointed to by the pointer "t" (which gets incremented after this operation; the entire value of this operation evaluates to the value of the character copied; iterate across this operation until that value equals zero". Since the value of the string null terminator is the character value of zero ('/0'), the loop will iterate until a string is copied from the location pointed to by t to the location pointed to by s.
Many adherents of С language are convinced that the "while (* s ++ = * t ++)"
is a genuine grace.
In the conditional expression of the loop "while",three side effects are inserted(shift of one pointer, shift of the second pointer, assignment).
The body of the loop as a result was empty, since all the functionality is placed in a conditional expression.
use for with int i:
char t[]="I am a programmer",s[20];
for(int i=0;*(t+i)!='\0';i++)
*(s+i)=*(t+i);
*(s+i)=*(t+i); //the last char in t '\0'
printf("t is:%s\n",t);
printf("s is:%s\n",s);
use for with pointer++:
char t[]="I am a programmer",s[20];
char *p1,*p2;
p1=t,p2=s;
for(;*p1!='\0';p1++,p2++)
*p2 = *p1;
*p2 = *p1;
printf("t is:%s\n",t);
printf("s is:%s\n",s);
use while with pointer++:
char t[]="I am a programmer",s[20];
char *p1,*p2;
p1=t,p2=s;
while(*p2++=*p1++);
printf("t is:%s\n",t);
printf("s is:%s\n",s);
printf("t is:%s\n",p1-18);
printf("s is:%s\n",p2-18);
use array to initialize pointers:
char a[20],*t="I am a programmer",*s;
s=a;
while(*s++=*t++);
printf("t is:%s\n",t-18);
printf("s is:%s\n",s-18);
printf("s is:%s\n",a);
starts a while loop....
*s = *t goes first, this assigns to what t points at to what s points at. ie, it copies a character from t string to s string.
what is being assigned is passed to the while condition... any non zero is "true" so it will continue, and 0 is false, it will stop.... and it just happens the end of a string is also zero.
s++ and t++ they increment the pointers
and it all starts again
so it keeps assigning looping, moving the pointers, until it hits a 0, which is the end of the string
Yes this uses pointers, and also does all the work while evaluating the while condition. C allows conditional expressions to have side-effects.
The "*" operator derefereces pointers s and t.
The increment operator ("++") increments pointers s and t after the assignment.
The loop terminates on condition of a null character, which evaluates as false in C.
One additional comment.... this is not safe code, as it does nothing to ensure s has enough memory allocated.
The question I provided the following answer on was closed as a duplicate of this question, so I am copying the relevant part of the answer here.
The actual semantic explanation of the while loop would be something like:
for (;;) {
char *olds = s; // original s in olds
char *oldt = t; // original t in oldt
char c = *oldt; // original *t in c
s += 1; // complete post increment of s
t += 1; // complete post increment of t
*olds = c; // copy character c into *olds
if (c) continue; // continue if c is not 0
break; // otherwise loop ends
}
The order that s and t are saved, and the order that s and t are incremented may be interchanged. The save of *oldt to c can occur any time after oldt is saved and before c is used. The assignment of c to *olds can occur any time after c and olds are saved. On the back of my envelop, this works out to at least 40 different interpretations.
Well this is true just in the case of the char if there is no \0 and the it is an integer array the the program will crash because there will be a address whose elements are not the part of the array or pointer, if the system has memory that was allocated using the malloc then the system will keep giving the memory