Using a string in CS50 library - c

Hi all I have a question regarding a passing a string to a function in C. I am using CS50 library and I know they are passing string as a char array (char pointer to a start of array) so passing is done by reference. My function is receiving array as argument and it returns array. When I change for example one of the element of array in function this change is reflected to original string as I expect. But if I assign new string to argument, function returns another string and original string is not change. Can you explain the mechanics behind this behaviour.
#include <stdlib.h>
#include <cs50.h>
#include <stdio.h>
string test(string s);
int main(void)
{
string text = get_string("Text: ");
string new_text = test(text);
printf("newtext: %s\n %s\n", text, new_text);
printf("\n");
return 0;
}
string test(string s)
{
//s[0] = 'A';
s = "Bla";
return s;
}
First example reflects change in the first letter on both text and newtext strings, but second example prints out text unchanged and newtext as "Bla"
Thanks!

This is going to take a while.
Let's start with the basics. In C, a string is a sequence of character values including a 0-valued terminator. IOW, the string "hello" is represented as the sequence {'h', 'e', 'l', 'l', 'o', 0}. Strings are stored in arrays of char (or wchar_t for "wide" strings, which we won't talk about here). This includes string literals like "Bla" - they're stored in arrays of char such that they are available over the lifetime of the program.
Under most circumstances, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", so most of the time when we're dealing with strings we're actually dealing with expressions of type char *. However, this does not mean that an expression of type char * is a string - a char * may point to the first character of a string, or it may point to the first character in a sequence that isn't a string (no terminator), or it may point to a single character that isn't part of a larger sequence.
A char * may also point to the beginning of a dynamically allocated buffer that has been allocated by malloc, calloc, or realloc.
Another thing to note is that the [] subscript operator is defined in terms of pointer arithmetic - the expression a[i] is defined as *(a + i) - given an address value a (converted from an array type as described above), offset i elements (not bytes) from that address and dereference the result.
Another important thing to note is that the = is not defined to copy the contents of one array to another. In fact, an array expression cannot be the target of an = operator.
The CS50 string type is actually a typedef (alias) for the type char *. The get_string() function performs a lot of magic behind the scenes to dynamically allocate and manage the memory for the string contents, and makes string processing in C look much higher level than it really is. I and several other people consider this a bad way to teach C, at least with respect to strings. Don't get me wrong, it's an extremely useful utility, it's just that once you don't have cs50.h available and have to start doing your own string processing, you're going to be at sea for a while.
So, what does all that nonsense have to do with your code? Specifically, the line
s = "Bla";
What's happening is that instead of copying the contents of the string literal "Bla" to the memory that s points to, the address of the string literal is being written to s, overwriting the previous pointer value. You cannot use the = operator to copy the contents of one string to another; instead, you'll have to use a library function like strcpy:
strcpy( s, "Bla" );
The reason s[0] = A worked as you expected is because the subscript operator [] is defined in terms of pointer arithmetic. The expression a[i] is evaluated as *(a + i) - given an address a (either a pointer, or an array expression that has "decayed" to a pointer as described above), offset i elements (not bytes!) from that address and dereference the result. So s[0] is pointing to the first element of the string you read in.

This is difficult to answer correctly without a code example. I will make one but it might not match what you are doing.
Let's take this C function:
char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
s[4] = 'X';
}
}
return s;
}
That function will accept a pointer to a character array and if the pointer is not NULL and the zero-terminated array is longer than 4 characters, it will replace the fifth character at index 4 with an 'X'. There are no references in C. They are always called pointers. They are the same thing, and you get access to a pointed-at value with the dereference operator *p, or with array syntax like p[0].
Now, this function:
char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
char *new_s = malloc(len+1);
strcpy(new_s, s);
new_s[4] = 'X';
return new_s;
}
}
s = malloc(1);
s[0] = '\0';
return s;
}
That function returns a pointer to a newly allocated copy of the original character array, or a newly allocated empty string. (By doing that, the caller can always print it out and call free on the result.)
It does not change the original character array because new_s does not point to the original character array.
Now you could also do this:
const char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
return "string was longer than 4";
}
}
s = "string was not longer than 4";
return s;
}
Notice that I changed the return type to const char* because a string literal like "string was longer than 4" is constant. Trying to modify it would crash the program.
Doing an assignment to s inside the function does not change the character array that s used to point to. The pointer s points to or references the original character array and then after s = "string" it points to the character array "string".

Related

Pointer arithmetic in C when used as a target array for strcat()

When studying string manipulation in C, I've come across an effect that's not quite what I would have expected with strcat(). Take the following little program:
#include <stdio.h>
#include <string.h>
int main()
{
char string[20] = "abcde";
strcat(string + 1, "fghij");
printf("%s", string);
return 0;
}
I would expect this program to print out bcdefghij. My thinking was that in C, strings are arrays of characters, and the name of an array is a pointer to its first element, i.e., the element with index zero. So the variable string is a pointer to a. But if I calculate string + 1 and use that as the destination array for concatenation with strcat(), I get a pointer to a memory address that's one array element (1 * sizeof(char), in this case) away, and hence a pointer to the b. So my thinking was that the target destination is the array starting with b (and ending with the invisible null character), and to that the fghij is concatenated, giving me bcdefghij.
But that's not what I get - the output of the program is abcdefghij. It's the exact same output as I would get with strcat(string, "fghij"); - the addition of the 1 to string is ignored. I also get the same output with an addition of another number, e.g. strcat(string + 4, "fghij");, for that matter.
Can somebody explain to me why this is the case? My best guess is that it has to do with the binding precedence of the + operator, but I'm not sure about this.
Edit: I increased the size of the original array with char string[20] so that it will, in any case, be big enough to hold the concatenated string. Output is still the same, which I think means the array overflow is not key to my question.
You will get an output of abcdefghij, because your call to strcat hasn't changed the address of string (and nor can you change that – it's fixed for the duration of the block in which it is declared, just like the address of any other variable). What you are passing to strcat is the address of the second element of the string array: but that is still interpreted as the address of a nul-terminated string, to which the call appends the second (source) argument. Appending that second argument's content to string, string + 1 or string + n will produce the same result in the string array, so long as there is a nul-terminator at or after the n index.
To print the value of the string that you actually pass to the strcat call (i.e., starting from the 'b' character), you can save the return value of the call and print that:
#include <stdio.h>
#include <string.h>
int main()
{
char string[20] = "abcde";
char* result = strcat(string + 1, "fghij"); // strcat will return the "string + 1" pointer
printf("%s", result); // bcdefghij
return 0;
}
char string[] = "abcde";
strcat(string + 1, "fghij");
Append five characters to a full string array. Booom. Undefined behavior.
Adding something to a string array is a performance optimization that tells the runtime that the string is known to be at least that many characters long.
You seem to believe that a string is a thing of its own and not an array, and strcat is doing something to its first argument. That's not how that works. Strings are arrays*; and strcat is modifying the array contents.
*Somebody's going to come by and claim that heap allocated strings are not arrays. OP is not dealing with heap yet.
Arrays are non-modibfiable lvalues. For example you may not write
char string[20] = "abcde";
char string2[] = ""fghij"";
string = string2;
Used in expressions arrays with rare exceptions are implicitly converted to pointers to their first elements.
If you will write for example string + 1 then the address of the array will not be changed.
In this call
strcat(string + 1, "fghij");
elements of the array string are being overwritten starting from the second element of the array.
In this statement
printf("%s", string);
there is outputted the whole array starting from its first character (again the array designator used as an argument is converted to a pointer to its first element).
You could write for example
printf("%s", string + 1);
In this case the array is outputted starting from its second element.
These are just two pointers to different parts of the same memory inside the same array. There is nothing in your code which creates a second array. "the name of an array is a pointer to its first element" well, not really, it decays into a pointer to its first element whenever used in an expression. So in case of string + 1, this decay first happens to the string operand and then you get pointer arithmetic afterwards. You can actually never do pointer arithmetic on array types, only on decayed pointers. Details here: Do pointers support "array style indexing"?
As for strcat, it basically does two things: call strlen on the original string to find where it ends, then call strcpy to append the new string at the position where the null terminator was stored. It's the very same thing as typing strcpy(&src[strlen(src)], dst);
Therefore it won't matter if you pass string + 1 or string, because in either case strcat will look for the null terminator and nothing else.

c function definition calls for pointer but example does not use pointers

I relatively new to low level programming such as c. I am reviewing the strstr() function here. When reviewing the function definition char *strstr(const char *str1, const char *str2); I understand that function will return a pointer or a NULL depending if str2 was found in str1.
What I can't understand though, is if the funciton requires the two inputs to be pointers, when does the example not use pointers?
#include <string.h>
int main ()
{
char string[55] ="This is a test string for testing";
char *p;
p = strstr (string,"test");
if(p)
{
printf("string found\n" );
printf ("First occurrence of string \"test\" in \"%s\" is"\
" \"%s\"",string, p);
}
else printf("string not found\n" );
return 0;
}
In strstr(string,"test");, string is an array of 55 char. What strstr needs here is a pointer to the first element of string, which we can write as &string[0]. However, as a convenience, C automatically converts string to a pointer to its first element. So the desired value is passed to strstr, and it is a pointer.
This automatic conversion happens whenever an array is used in an expression and is not the operand of sizeof, is not the operand of unary &, and is not a string literal used to initialize an array.
"test" is a string literal. It causes the creation of an array of char initialized with the characters in the string, followed by a terminating null character. The string literal in source code represents that array. Since it is an array used in an expression, it too is converted to a pointer to its first element. So, again, the desired pointer is passed to strstr.
You could instead write &"test"[0], but that would confuse people who are not used to it.

Is there any way to get the correct size of a string using sizeof function with pointers in C

int main() {
char *s = "hello world!";
printf("%d\n",sizeof(s));
}
I know it will return the size of the pointer. But I want to know is there any way of get the length of a string using sizeof function with pointers.
I don't want you to use (char s[] ) s as an array take it as a pointer
Please reply as soon as possible. I also don't want to use strlen().
Pointers do not keep the information whether they point to a single object or a first element of an array.
Consider the following example
char *p;
char c = 'A';
p = "Hello";
p = &c;
As it is seen the same pointer can point to the string literal "Hello" (its first character) that has the type char[6] and to the single character c.
Usually when a character array is passed to a function it is implicitly converted to pointer to its first element. However if you will pass a pointer to the whole array then you can use the sizeof operator to determine its size.
For example
#include <stdio.h>
void f( char (*p)[6] )
{
printf( "%zu\n", sizeof( *p ) );
}
int main( void )
{
f( &"Hello" );
}
But such an approach has a drawback that in the declaration of the function parameter you have to specify preliminary the size of the accepted array by reference.
So a general approach is declaring function with additional parameter that specifies the size of the passed array if it is required to know the size of the array within the function.
Take into account that a string can be contained in a character array that has much more size than it is required to store the string. So to determine the size of a string itself you should use the standard string function strlen.
There's only one way:
#include <stdio.h>
int main()
{
char s[] = "hello world!";
printf("%s -> %zu\n", s, sizeof s - 1);
}
You need to declare s as an array, not as a pointer, sizeof operator, applied to a pointer, gives you the size of a pointer variable, which is not the size of an array (and it is not what you want).
As you see, I initialize the array in the declaration, so I don't need to put the size between the brackets (this doesn't function with an assignment statement, it must be a declaration with an initializer) And then I subtract one to the size of the array, as the sizeof operator gives me the size of the array (not the string length), and this includes the space for the \0 character at the end.
When you run it:
$ a.out
hello world! -> 12
$ _

Something I don't get about C strings

A few questions regarding C strings:
both char* and char[] are pointers?
I've learned about pointers and I can tell that char* is a pointer, but why is it automatically a string and not just a char pointer that points to 1 char; why can it hold strings?
Why, unlike other pointers, when you assign a new value to the char* pointer you are actually allocating new space in memory to store the new value and, unlike other pointers, you just replace the value stored in the memory address the pointer is pointing at?
A pointer is not a string.
A string is a constant object having type array of char and, also, it has the property that the last element of the array is the null character '\0' which, in turn, is an int value (converted to char type) having the integer value 0.
char* is a pointer, but char[] is not. The type char[] is not a "real" type, but an incomplete type. The C language is specified in such a way that, in the moment that you define a concrete variable (object) having array of char type, the size of the array is well determined in some way or another. Thus, none variable has type char[] because this is not a type (for a given object).
However, automatically every object having type array of N objects of type char is promoted to char *, that is, a pointer to char pointing to the initial object of the array.
On the other hand, this promotion is not always performed. For example, the operator sizeof() will give different results for char* than for an array of N chars. In the former case, the size of a pointer to char is given (which is in general the same amount for every pointer...), and in the last case gives you the value N, that is, the size of the array.
The behaviour is differente when you declare function arguments as char* and char[]. Since the function cannot know the size of the array, you can think of both declarations as equivalent.
Actually, you are right here: char * is a pointer to just 1 character object. However, it can be used to access strings, as I will explain you now: In the paragraph 1. I showed you that the strings are considered objects in memory having type array of N chars for some N. This value N is big enough to allow an ending null character (as all "string" is supposed to be in C).
So, what's the deal here?
The key point to understand this issues is the concept of object (in memory).
When you have a string or, more generally, an array of char, this means that you have figured out some manner to hold an array object in memory.
This object determines a portion of RAM memory that you can access safely, because C has assigned enough memory for it.
Thus, when you point to the first byte of this object with a char* variable, actually you have guaranteed access to all the adjacent elements to the "right" of that memory place, because those places are well defined by C as having the bytes of the array above.
Briefly: the adjacent (to the right) bytes of the byte pointed by a char* variable can be accessed, they are valid places to access, so the pointer can be "iterated" to walk through these bytes, up to the end of the string, without "risks", since all the bytes in an array are contiguous well defined positions in memory.
This is a complicated question, but it reveals that you are not understanding the relationship between pointers, arrays, and string literals in C.
A pointer is just a variable pointing to a position in memory.
A pòinter to char points to just 1 object having type char.
If the adjacent bytes of the pointed position correspond to an array of chars, they will be accessible by the pointer, so the pointer can "walk on" the memory bytes occupied by the array object.
A string literal is considered as an array of char object, which implictely add an ending byte with value 0 (the null character).
In any case, an array of T object has a well defined "size".
A string literal has an additional property: it's a constant object.
Try to fit and gather these concepts in your mind to figure out what's going on.
And ask me for clarification.
ADDITIONAL REMARKS:
Consider the following piece of code:
#include <stdio.h>
int main(void)
{
char *s1 = "not modifiable";
char s2[] = "modifiable";
printf("%s ---- %s\n\n", s1, s2);
printf("Size of array s2: %d\n\n", (int)sizeof(s2));
s2[1] = '0', s2[3] = s2[5] = '1', s2[4] = '7',
s2[6] = '4', s2[7] = '8', s2[9] = '3';
printf("New value of s2: %s\n\n",s2);
//s1[0] = 'X'; // Attempting to modify s1
}
In the definition and initialization of s1 we have the string literal "not modifiable", which has constant content and constant address. Its address is assigned to the pointer s1 as initialization.
Any attempt to modify the bytes of the string will give some kind of error, because the array content is read-only.
In the definition and initialization of s2, we have the string literal "modifiable", which has, again, constant content and constant address. However, what happens now is that, as part of the initialization, the content of the string is copied to the array of char s2. The size of the array s2 is not specified (the declaration char s2[] gives an incomplete type), but after initialization the size of the array is well determined and defined as the exact size of the copied string (plus 1 character used to hold the null character, or end-of-string mark).
So, the string literal "modifiable" is used to initialize the bytes of the array s2, which is modifiable.
The right manner to do that is by changing a character at the time.
For more handy ways of modifying and assigning strings, it has to be used the standard header <string.h>.
char *s is a pointer, char s[] is an array of characters. Ex.
char *s = "hello";
char c[] = "world";
s = c; //Legal
c = address of some other string //Illegal
char *s is not a string; it points to an address. Ex
char c[] = "hello";
char *s = &c[3];
Assigning a pointer is not creating memory; you are pointing to memory. Ex.
char *s = "hello";
In this example when you type "hello" you are creating special memory to hold the string "hello" but that has nothing to do with the pointer, the pointer simply points to that spot.

Passing string through a function (C programming)

I have just started learning pointers, and after much adding and removing *s my code for converting an entered string to uppercase finally works..
#include <stdio.h>
char* upper(char *word);
int main()
{
char word[100];
printf("Enter a string: ");
gets(word);
printf("\nThe uppercase equivalent is: %s\n",upper(word));
return 0;
}
char* upper(char *word)
{
int i;
for (i=0;i<strlen(word);i++) word[i]=(word[i]>96&&word[i]<123)?word[i]-32:word[i];
return word;
}
My question is, while calling the function I sent word which is a pointer itself, so in char* upper(char *word) why do I need to use *word?
Is it a pointer to a pointer? Also, is there a char* there because it returns a pointer to a character/string right?
Please clarify me regarding how this works.
That's because the type you need here simply is "pointer to char", which is denoted as char *, the asterisk (*) is part of the type specification of the parameter. It's not a "pointer to pointer to char", that would be written as char **
Some additional remarks:
It seems you're confusing the dereference operator * (used to access the place where a pointer points to) with the asterisk as a pointer sign in type specifcations; you're not using a dereference operator anywhere in your code; you're only using the asterisk as part of the type specification! See these examples: to declare variable as a pointer to char, you'd write:
char * a;
To assign a value to the space where a is pointing to (by using the dereference operator), you'd write:
*a = 'c';
An array (of char) is not exactly equal to a pointer (to char) (see also the question here). However, in most cases, an array (of char) can be converted to a (char) pointer.
Your function actually changes the outer char array (and passes back a pointer to it); not only will the uppercase of what was entered be printed by printf, but also the variable word of the main function will be modified so that it holds the uppercase of the entered word. Take good care the such a side-effect is actually what you want. If you don't want the function to be able to modify the outside variable, you could write char* upper(char const *word) - but then you'd have to change your function definition as well, so that it doesn't directly modify the word variable, otherwise the Compiler will complain.
char upper(char c) would be a function that takes a character and returns a character. If you want to work with strings the convention is that strings are a sequence of characters terminated by a null character. You cannot pass the complete string to a function so you pass the pointer to the first character, therefore char *upper(char *s). A pointer to a pointer would have two * like in char **pp:
char *str = "my string";
char **ptr_to_ptr = &str;
char c = **ptr_ptr_ptr; // same as *str, same as str[0], 'm'
upper could also be implemented as void upper(char *str), but it is more convenient to have upper return the passed string. You made use of that in your sample when you printf the string that is returned by upper.
Just as a comment, you can optimize your upper function. You are calling strlen for every i. C strings are always null terminated, so you can replace your i < strlen(word) with word[i] != '\0' (or word[i] != 0). Also the code is better to read if you do not compare against 96 and 123 and subtract 32 but if you check against and calculate with 'a', 'z', 'A', 'Z' or whatever character you have in mind.
the *words is even though a pointer bt the array word in function and the pointer word are actually pointing to the one and the same thing while passing arguments jst a copy of the "pointee" ie the word entered is passed and whatever operation is done is done on the pointer word so in the end we have to return a pointer so the return type is specified as *.

Resources