ANSI C - count the size of the string pointer - c

Is it possible to create string variables using pointers? So that I don't have to pass its size everytime, like char x[4] = "aaa"?
How can I get the size of such string?
And can I initialize an empty string with a pointer?

Remember that strings in C are ended by the null terminator character, written as \0. If you have a well-formed string stored in your pointer variable, you can therefore determine the length by searching for this character:
char *x = "hello"; // automatically appends a null terminator
int len = 0;
while (x[len] != '\0') {
len++;
}
If your variables are uninitialized or otherwise not well-formed (e.g., by being NULL) you obviously cannot take this approach; however, most functions are generally written under the assumption that the strings are well-formed, because this leads to faster code.
If you wish to initialize a pointer, you have three options: NULL, a valid address (e.g., char *x = &someCharVar), or a string constant (e.g., char *x = "hello"). Note that if you use a string constant, it is illegal for you to write into that pointer unless you re-assign to it with the address of a non-constant string.
// Get enough space for 24 characters plus null terminator
char *myString = (char*) malloc(25 * sizeof(char));
strcpy(myString, "some text"); // fill the new memory
fgets(myString, 25, stdin); // fill with keyboard input
Note that sizeof(char) is unnecessary here, since a char is always defined to be exactly 1 byte. However, it's a good habit to get into for when you're using other data types, and it helps make your code self-documenting by making your intentions very clear.

If you're initializing an array of char with a string literal, you don't need to specify the size:
char str[] = "This is a test";
This will create str as a 15-element array of char (the size is taken from the length of the initiliazer, including the 0 terminator) and copy the contents of the string literal to it.
A string literal is an array expression of type "N-element array of char" (const char in C++). Except when it is being used to initialize an array in a declaration (such as above) or is the operand of the sizeof or unary & operators, an expression of type "array of T" will be converted to an expression of type "pointer to T", and the value of the expression will be the address of the first element of the array.
If you write
const char *str = "This is a test";
the expression "This is a test" is converted from type "15-element array of char" to "pointer to char", and the value of the expression is the address of the first character, which is written to the variable str.
The behavior on attempting to modify the contents of a string literal is undefined; some platforms store string literals in read-only memory, some do not. Some map multiple occurrences of the same string literal to a single instance, others don't. It's best to always treat a string literal as unmodifiable, which is why I declared str as const char * instead of just char *.
To get the length of a string, use strlen:
char str[] = "This is a test"; // or const char *str = "This is a test";
size_t len = strlen(str); // or strlen("This is a test");
This will return the number of characters in the string up to (but not including) the 0 terminator; strlen("This is a test") will return 14.
To get the size of the buffer containing the string, you would use the sizeof operator:
char str[] = "This is a test";
size_t len = sizeof str; // or sizeof "This is a test"
Note that this won't give you the size of the buffer if you declared str as a pointer, such as
const char *str = "This is a test";
In that case, sizeof str; only gives you the size of a char *, not the string it points to.

Related

Using a string in CS50 library

Hi all I have a question regarding a passing a string to a function in C. I am using CS50 library and I know they are passing string as a char array (char pointer to a start of array) so passing is done by reference. My function is receiving array as argument and it returns array. When I change for example one of the element of array in function this change is reflected to original string as I expect. But if I assign new string to argument, function returns another string and original string is not change. Can you explain the mechanics behind this behaviour.
#include <stdlib.h>
#include <cs50.h>
#include <stdio.h>
string test(string s);
int main(void)
{
string text = get_string("Text: ");
string new_text = test(text);
printf("newtext: %s\n %s\n", text, new_text);
printf("\n");
return 0;
}
string test(string s)
{
//s[0] = 'A';
s = "Bla";
return s;
}
First example reflects change in the first letter on both text and newtext strings, but second example prints out text unchanged and newtext as "Bla"
Thanks!
This is going to take a while.
Let's start with the basics. In C, a string is a sequence of character values including a 0-valued terminator. IOW, the string "hello" is represented as the sequence {'h', 'e', 'l', 'l', 'o', 0}. Strings are stored in arrays of char (or wchar_t for "wide" strings, which we won't talk about here). This includes string literals like "Bla" - they're stored in arrays of char such that they are available over the lifetime of the program.
Under most circumstances, an expression of type "N-element array of T" will be converted ("decay") to an expression of type "pointer to T", so most of the time when we're dealing with strings we're actually dealing with expressions of type char *. However, this does not mean that an expression of type char * is a string - a char * may point to the first character of a string, or it may point to the first character in a sequence that isn't a string (no terminator), or it may point to a single character that isn't part of a larger sequence.
A char * may also point to the beginning of a dynamically allocated buffer that has been allocated by malloc, calloc, or realloc.
Another thing to note is that the [] subscript operator is defined in terms of pointer arithmetic - the expression a[i] is defined as *(a + i) - given an address value a (converted from an array type as described above), offset i elements (not bytes) from that address and dereference the result.
Another important thing to note is that the = is not defined to copy the contents of one array to another. In fact, an array expression cannot be the target of an = operator.
The CS50 string type is actually a typedef (alias) for the type char *. The get_string() function performs a lot of magic behind the scenes to dynamically allocate and manage the memory for the string contents, and makes string processing in C look much higher level than it really is. I and several other people consider this a bad way to teach C, at least with respect to strings. Don't get me wrong, it's an extremely useful utility, it's just that once you don't have cs50.h available and have to start doing your own string processing, you're going to be at sea for a while.
So, what does all that nonsense have to do with your code? Specifically, the line
s = "Bla";
What's happening is that instead of copying the contents of the string literal "Bla" to the memory that s points to, the address of the string literal is being written to s, overwriting the previous pointer value. You cannot use the = operator to copy the contents of one string to another; instead, you'll have to use a library function like strcpy:
strcpy( s, "Bla" );
The reason s[0] = A worked as you expected is because the subscript operator [] is defined in terms of pointer arithmetic. The expression a[i] is evaluated as *(a + i) - given an address a (either a pointer, or an array expression that has "decayed" to a pointer as described above), offset i elements (not bytes!) from that address and dereference the result. So s[0] is pointing to the first element of the string you read in.
This is difficult to answer correctly without a code example. I will make one but it might not match what you are doing.
Let's take this C function:
char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
s[4] = 'X';
}
}
return s;
}
That function will accept a pointer to a character array and if the pointer is not NULL and the zero-terminated array is longer than 4 characters, it will replace the fifth character at index 4 with an 'X'. There are no references in C. They are always called pointers. They are the same thing, and you get access to a pointed-at value with the dereference operator *p, or with array syntax like p[0].
Now, this function:
char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
char *new_s = malloc(len+1);
strcpy(new_s, s);
new_s[4] = 'X';
return new_s;
}
}
s = malloc(1);
s[0] = '\0';
return s;
}
That function returns a pointer to a newly allocated copy of the original character array, or a newly allocated empty string. (By doing that, the caller can always print it out and call free on the result.)
It does not change the original character array because new_s does not point to the original character array.
Now you could also do this:
const char* edit_string(char *s) {
if(s) {
size_t len = strlen(s);
if(len > 4) {
return "string was longer than 4";
}
}
s = "string was not longer than 4";
return s;
}
Notice that I changed the return type to const char* because a string literal like "string was longer than 4" is constant. Trying to modify it would crash the program.
Doing an assignment to s inside the function does not change the character array that s used to point to. The pointer s points to or references the original character array and then after s = "string" it points to the character array "string".

The array expression is not converted into a pointer When a string literal is used to initialize an array of characters

From C in a Nutshell:
In most cases, the compiler implicitly converts an expression
with an array type, such as the name of an array, into a pointer to
the array’s first element.
The array expression is not converted into a pointer only in the
following cases:
• When the array is the operand of the sizeof operator
• When the array is the operand of the address operator &
• When a string literal is used to initialize an array of char ,
wchar_t , char16_t , or char32_t
Could you explain what the last bullet means with some positive and
negative examples? I don't find an example in the book for the last
bullet.
Also why is an array of characters, not other element types?
char *ptr = "Hello OP!!";
ptr is an pointer to first char of the string literal stored in the RODATA segment. When you dereference it you can only read but not write values as string literals are constant char arrays.
char arr[] = "Hello OP!! How are you my friend?";
In this case:
Is allocated space for the arr array of the length of size literal including the trailing zero.
String literal is copied into the space allocated for the arr array
In this case arr is used as place in the memory where the string literal is copied.
You can read and write as the arr elements are read & write
And now answering the question
sizeof of an array is the size in bytes if all array elements. If the array was converted to pointer - the size would be the size of the pointer which is obviously wrong in this case
Array is only the continuous space in the memory accommodating all its elements. So the address of the array is always the address of this memory location
Third case i have explained above
you can see the code
https://godbolt.org/g/xVL5cR
** Note to TIM ** String literals are not converted to anything. String literal is only stored as a char (wchar_t ....) array with NUL (NOT NULL) teriminator at the end, in the RO memory.
Why is an array of characters, not other element types?
Its becouse string literals have static storage duration, and thus exist in memory for the life of the program.
Attempting to modify a string literal(with pointer to literal) results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals.
Any of other constants arent stored like this, so this is why only array of characters (literals).
Could you explain what the last bullet means with some positive and
negative examples? I don't find an example in the book for the last
bullet.
String literal initialization looks like this:
char ptr[] = "Hello world!"; // This is char[]
char ptr[] = L"Hello world!"; // This is wchar_t[]
char ptr[] = u8"Hello world!"; // This is char[]
char ptr[] = u"Hello world!"; // This is char16_t[]
char ptr[] = U"Hello world!"; // This is char32_t[]
The string literal is copied from static storage duration to automatic storage duration and its possible to modify him.
While
char ptr[] = {'H','e','l','l',o',' ','w','o','r','l','d','\0'};
wont be string literal and wont have static duration storage.

Getting wrong string length

I am trying to get the length of a string but i am getting the wrong value, it is saying that it is only 4 characters long. Why is this? am i using sizeof() correctly?
#include <stdio.h>
int main(void)
{
char *s;
int len;
s = "hello world";
len = sizeof(s);
printf("%d\n", len);
}
The sizeof operator is returning the size of the pointer. If you want the length of a string, use the strlen function.
Even if you had an array (e.g. char s[] = "hello world") the sizeof operator would return the wrong value, as it would return the length of the array which includes the string terminator character.
Oh and as a side note, if you want a string pointer to point to literal string, you should declare it const char *, as string literals are constant and can't be modified.
You have declared s as a pointer. When applied to a pointer, sizeof() returns the size of the pointer, not the size of the element pointed to. On your system, the size of a pointer to char happens to be four bytes. So you will see 4 as your output.
In addition to strlen(), you can assign string literal to array of chars
char s[] = "hello world", in this case sizeof() returns size of array in bytes. In this particular case 12, one extra byte for \0 character at the end of the string.
Runtime complexity of sizeof() is O(1).
Complexity of strlen() is O(n).

C Program String Literals

When we do
char *p ="house";
p = 'm';
Its not allowed.
But when we do
char p[] = "house";
p[0] = 'm';
printf(p);
It gives O/P as : mouse
I am not able to understand how and where C does memory allocation for string literals?
char p[] = "house";
"house" is a string literal stored in a read only location, but, p is an array of chars placed on stack in which "house" is copied.
However, in char *p = "house";, p actually points to the read-only location which contains the string literal "house", thus modifying it is UB.
A note from the standard 6.7.8 Initialization
14 An array of character type may be initialized by a character string
literal, optionally enclosed in braces. Successive characters of the
character string literal (including the terminating null character if
there is room or if the array is of unknown size) initialize the
elements of the array.
So you basically have an array of characters. It should not be so difficult or puzzle you in understanding how this array gets modified if you have used arrays of ints, floats etc.
When you use char *p="house" - the compiler collects all the "house" strings and puts them in one read only space.
When you use char p[]="house" the compiler creates space for the string as an array in the local scope.
Basic difference is that 1000's of pointer could share the first one (which is why you cannot modify) and the second is local to the scope - so as long as it stays the same size it is modifiable.
char *p = "house"; // const char* p = "house";
String literal "house" resides in read only location and cannot be modified. Now what you are doing is -
*p = 'm' ; // trying to modify read-only location; Missed the dereferencing part
Now,
char p[] = "house";
"house" is copied to the array p. So, it's contents are modifiable. So, this actually works.
p[0] = 'm'; // assigning `m` at 0th index.

How to properly initialize a string

How would I go about defining the following string for the following function?
As of now I get the warning:
C4047: '=' : 'const char' differs in levels of indirection from 'char [4]'
and the error:
C2166: l-value specifies const object.
Both in the third line of the code below:
uint8_t *buffer= (uint8_t *) malloc(sizeof(uint32_t));
const char *stringaling= (const char *) malloc(sizeof(uint32_t));
*stringaling = "fun";
newval = protobuf_writeString (buffer, stringaling);
uint32_t protobuf_writeString(uint8_t *out,const char * str)
{
if (str == NULL)
{
out[0] = 0;
return 1;
}
else
{
size_t len = strlen (str);
size_t rv = uint32_pack (len, out);
memcpy (out + rv, str, len);
return rv + len;
}
}
const char *stringaling= (const char *) malloc(sizeof(uint32_t));
*stringaling = "fun";
This is not valid code. You are trying to assign to a const variable, which is illegal. Then you are trying to assign an array of characters to a character. And finally, even if you had a non-const array of characters of the right size, you still can't assign arrays, because they're not first-class values.
Try using
char *stringaling = malloc(sizeof(uint32_t));
strcpy(stringaling, "fun");
...instead, and see if that doesn't work better. Note, however, that it's pretty much accidental that (at least usually) sizeof(uint32_t) happens to be the right size to hold "fun". You normally don't want to do that.
Alternatively, you may want:
char const *stringaling = "fun";
or:
char stringaling[] = "fun";
The assignment you had won't work though -- C has only the very most minimal support for strings built into the language; most operations (including copying a string) are normally done via library functions such as strcpy.
"fun" is a string literal, which is essentially a const char *.
stringaling is also a const char *, so your third line is trying to assign a const char * to a const char, which is not going to fly.
If it's a constant string, you can just do this:
const char *stringaling = "fun";
If your input string is dynamic, you can do this:
char *stringaling= (char *) malloc(strlen(inputString)+1);
strcpy(stringaling, inputString);
Obviously, if you malloc it, you need to free it, or feel the wrath of a memory leak.
If you really want to initialize the char *, you could write this instead:
const char *stringaling = "fun";
And here's some reference.
without all the stuff you can also use:
newval = protobuf_writeString (buffer, "fun" );
First problem:
const char *stringaling= (const char *) malloc(sizeof(uint32_t));
Several problems on this line.
First of all, you don't want to declare stringaling as const char *; you will not be able to modify whatever stringaling points to (IOW, *stringaling will not be writable). This matters since you want to copy the contents of another string to the location pointed to by stringaling. Drop the const keyword.
Secondly, malloc(sizeof(uint32_t)) just happens to allocate enough bytes (4) for this particular string, but it's not clear that you meant to allocate 4 bytes. When allocating memory for an array (and strings are arrays), explicitly indicate the number of elements you intend to allocate.
Finally, casting the result of malloc is considered bad practice in C. The cast will suppress a useful diagnostic message if you forget to include stdlib.h or otherwise don't have a prototype for malloc in scope. As of the 1989 standard, malloc returns void *, which can be assigned to any other object pointer type without needing to cast. This isn't true in C++, so a cast is required there, but if you're writing C++ you should be using new instead of malloc anyway.
So, change that line to read
char *stringaling = malloc(LEN); // or malloc(LEN * sizeof *stringaling), but
// in this case that's redundant since
// sizeof (char) == 1
where LEN is the number of chars you want to allocate.
The general form for a malloc call is
T *p = malloc (N * sizeof *p);
where T is the base type (int, char, float, struct ..., etc.), and N is the number of elements of type T you want to allocate. Since the type of the expression *p is T, sizeof *p == sizeof(T); if you ever change the type of p, you don't have to replicate that change in the malloc call itself.
Second problem:
*stringaling = "fun";
Again, there are several issues at play. First, you cannot assign string values using the = operator. String literals are array expressions, and in most contexts array expressions have their type implicitly converted ("decay") from "N-element array of T" to "pointer to T". Instead of copying the contents of the string literal, you would be simply assigning a pointer to the first character in the string.
Which would "work" (see below), except that you're dereferencing stringaling in the assignment; the type of the expression *stringaling is const char (char after making the change I indicated above), which is not compatible for assignment with type char *. If you drop the dereference operator and write
stringaling = "fun";
you'd fix the compile-time error, but now you have another problem; as mentioned above, you haven't copied the contents of the string literal "fun" to the memory block you allocated with malloc; instead, you've simply copied the address of the string literal to the variable stringaling. By doing so, you lose track of the dynamically-allocated block, causing a memory leak.
In order to copy the string contents from one place to another, you'll have to use a library function like strcpy or strncpy or memcpy, like so:
strcpy(stringaling, "fun");
If stringaling doesn't need to live on the heap (for example, you're only using it within a single function and deallocating it before returning), you could avoid memory management completely by declaring it as a regular array of char and initializing it with "fun":
char stringaling[] = "fun";
This is a special case of initializing an array in a declaration, not an assignment expression, so the = does copy the contents of the string literal to the stringaling array. This only works in an array declaration, however. You can later modify the array with other string values (up to 3 characters plus the 0 terminator), but you'd have to use strcpy again:
strcpy(stringaling, "one");
If you don't need to modify the contents of stringaling, you could just do
const char *stringaling = "fun";
This copies the address of the string literal "fun" to the variable stringaling. And since attempting to modify the contents of a string literal invokes undefined behavior, we do want to declare stringaling as const char * in this case; that will prevent you from accidentally modifying the string literal.

Resources