I have some doubts in basic C programming.
I have a char array and I have to copy it to a char pointer. So I did the following:
char a[] = {0x3f, 0x4d};
char *p = a;
printf("a = %s\n",a);
printf("p = %s\n",p);
unsigned char str[] = {0x3b, 0x4b};
unsigned char *pstr =str;
memcpy(pstr, str, sizeof str);
printf("str = %s\n",str);
printf("pstr = %s\n",pstr);
My printf statements for pstr and str get appended with the data "a".
If I remove memcpy I get junk. Can some C Guru enlighten me?
Firstly, C strings (the %s in printf) are expected to be NUL-terminated. You're missing the terminators. Try char a[] = {0x3f, 0x4d, 0} (same goes for str).
Secondly, pstr and str point to the same memory, so your memcpy is a no-op. This is a minor point compared to the first one.
Add a null terminator, cause that's what you printf expects:
char a[] = {0x3f, 0x4d, '\0'};
The standard way C strings are represented is that in memory, they are a sequence of non-zero bytes representing the characters, followed by a zero (or NULL) byte. You should declare:
char a[] = {0x3f, 0x4d, 0};
When you assign a string pointer (as in unsigned char *pstr = str;) both pointers point to the same memory area, and thus the same characters. There is no need to copy the characters.
When you do need to copy characters, you should be using strlen(), the sizeof() operator returns the number of bytes its argument uses in memory. sizeof(pointer) is the number of bytes the pointer uses, not the length of the string. You find the length of a string (i.e. the number of bytes it occupies in memory) with the strlen() function. Also, there are standard functions to copy C strings. You should rely on those to do the right thing:
strcpy(pstr, str);
printf's %s expects a 0-terminated string, your strings aren't. The uninitialized memory following your arrays may however happen to start with a 0-byte, in which case your code will appear to be correct - it still isn't.
You're declaring an array "str", then pointing to it with pstr. Note that you have no null-terminating character, so after using memcpy you copy the block to itself with no null terminator, as a string requires. Thus, printf can't find the end of the string and continues printing until it finds a 0 (or '\0' in character terms)
Agreed. You'll have to add a null byte at the end of your array of chars.
char a[] = {0x3f, 0x4d, '\0'};
The reason being is that you're creating a string without declaring where it actually ends. Your memcpy() function copies *str to *pstr and automatically adds a null byte for you, which is why it works.
Without memcpy() there the string never knows when to end, so it reaches into subsequent memory addresses and returns whatever random values are stored there. When you're creating a string out of characters, always remember to end it with a null byte.
Related
I am actually supposed to dynamically store a string. I have tried the below,
It is printing everything but it terminating as soon as a space is included in my input. can someone explain is why?
Also what is the right way to do it :
int i;
char *a;
a=(char *)malloc(sizeof(char));
scanf("%s",a);
for(i=0;*(arr+i)!='\0';i++)
printf("%c",*(arr+i));
It is printing everything but it terminating ...
Consider your memory allocation statements:
char *a;
a=(char *)malloc(sizeof(char));
By allocating only sizeof(char) bytes to the buffer a, then attempting to write anything more than the null terminator to it, you are invoking undefined behavior. (Note: sizeof(char) in C is by definition equal to 1, always)
C strings are defined as a null terminated character array. You have allocated only one byte. The only legal C string possible is one containing only the null termination byte. But your code attempts to write much more, and in so doing encroaches on memory locations not owned by your process. So in general, when creating strings, follow two simple rules:
Determine max length of string you need
allocate memory to max length + 1 bytes to accommodate termination byte.
Example if max string is x characters long, create the memory for x + 1 characters:
char inputStr[] = {"This string is x char long"};
char string = malloc(strlen(inputStr) +1); //+1 for null byte
strcpy(string, inputStr);
Note, in C it is not recommended to cast the return of malloc() and family.
You've two problems with your code. Firstly, you only allocate enough space for 1 character and since strings have to be NUL terminated, the longest string you could have is 0 characters long. Since you don't know how long the text you're going to read in, you could start with an arbitrary size (say 1024).
a=malloc(1024);
Secondly, scanf will only read up to the next space when you use "%s". It also isn't constrained by the available space in a. A better way to to read in an entire line of text, is to use fgets like this
fgets(a,1024,stdin);
This will read up to 1023 characters or up to and including the next newline character. It will NUL terminate the string for you as well.
You can then print it as a string.
printf("%s",a);
char *a;
/* Initial memory allocation */
a = (char *) malloc(1024); // your buffer size is 1024
// do something with a
free(a);
Copy bellow string in your variable then print the string with "%s" as string and theres no need use of "%c" :
strcpy(a, "this is a string");
printf("String = %s", a);
Dont forget using of free(), if you dont use of this then you will get memory leak problem.
I've studied C programming at university for 4 months. My professor always said that strings don't really exist. Since I finished those 2 small courses, I really started programming (java). I can't remember WHY strings don't really exist. I wasn't concerned about this before, but I'm curious now. Why don't they exist? And do they exist in Java? I know it has to do something with that "under the hood strings are just characters", but does that mean that strings are all saved as multiple characters etc? And doesn't that take more memory?
a string type does not exist in C, but C strings do exist. They are defined as a null terminated character array. For example:
char buffer1[] = "this is a C string";//string literal
creates a C string that looks like this in memory:
|t|h|i|s| |i|s| |a| |C| |s|t\r|i|n|g|\0|?|?|?|
< string >
Note that this is not a string:
char *buffer2;
Until it contains a series of char terminated by a \0, it is just a pointer to char. (char *)
buffer2 = calloc(strlen(buffer1)+1, 1);
strcpy(buffer2, buffer1); //now buffer2 is pointing to a string
References:
Strings in C 1
Strings in C 2
Stirngs in C 3
and many more...
Edit:
(to address discussion in comments on strings:)
Based on the following definition: (From here)
Strings are actually one-dimensional array of characters terminated by
a null character '\0'.
First, since null termination is integral to a conversation about C strings, here are some clarifications:
The term NULL is a pointer, typically defined as (void*)0), or
just 0. It can be, and typically is used to initialize pointer
variables.
The term '\0' is a character. In C, it means exactly the same
thing as the integer constant 0. (same value 0, same type
int). It is used to initialize char arrays.
Things that are strings:
char string[] = {'\0'}; //zero length or _empty_ string with `sizeof` 1.
In memory:
|\0|
...
char string[10] = {'\0'} also zero length or _empty_ with `sizeof` 10.
In memory:
|\0|\0|\0|\0|\0|\0|\0|\0|\0|\0|
...
char string[] = {"string"}; string of length 6, and `sizeof` 7.
In memory:
|s|t|r|i|n|g|\0|
...
char [2][5] = {{0}}; 2 strings, each with zero length, and `sizeof` 5.
In memory:
|0|0|0|0|0|0|0|0|0|0| (note 0 equivalent to \0)
...
char *buf = {"string"};//string literal.
In memory:
|s|t|r|i|n|g|\0|
Things that are not strings:
char buf[6] = {"string"};//space for 6, but "string" requires 7 for null termination.
In Memory:
|s|t|r|i|n|g| //no null terminator
|end of space in memory.
...
char *buf = {0};//pointer to char (`char *`).
In memory:
|0| //null initiated pointer residing at address of `buf` (eg. 0x00123650)
Strings don't exist in C as a data type. There is int, char, byte, etc., but no "string".
This means you can declare a variable as an int, but not as a "string" because there is no data type named "string" .
The closest C has to a string is an array of chars, or a char * to a section of memory. The actual string is up to the programmer to define, as a sequence of chars terminated with a \0, or a number of chars with a known upper bound.
I am practicing allocation memory using malloc() with pointers, but 1 observation about pointers is that, why can strcpy() accept str variable without *:
char *str;
str = (char *) malloc(15);
strcpy(str, "Hello");
printf("String = %s, Address = %u\n", str, str);
But with integers, we need * to give str a value.
int *str;
str = (int *) malloc(15);
*str = 10;
printf("Int = %d, Address = %u\n", *str, str);
it really confuses me why strcpy() accepts str, because in my own understanding, "Hello" will be passed to the memory location of str that will cause some errors.
In C, a string is (by definition) an array of characters. However (whether we realize it all the time or not) we almost always end up accessing arrays using pointers. So, although C does not have a true "string" type, for most practical purposes, the type pointer-to-char (i.e. char *) serves this purpose. Almost any function that accepts or returns a string will actually use a char *. That's why strlen() and strcpy() accept char *. That's why printf %s expects a char *. In all of these cases, what these functions need is a pointer to the first character of the string. (They then read the rest of the string sequentially, stopping when they find the terminating '\0' character.)
In these cases, you don't use an explicit * character. * would extract just the character pointed to (that is, the first character of the string), but you don't want to extract the first character, you want to hand the whole string (that is, a pointer to the whole string) to strcpy so it can do its job.
In your second example, you weren't working with a string at all. (The fact that you used a variable named str confused me for a moment.) You have a pointer to some ints, and you're working with the first int pointed to. Since you're directly accessing one of the things pointed to, that's why you do need the explicit * character.
The * is called indirection or dereference operator.
In your second code,
*str = 10;
assigns the value 10 to the memory address pointed by str. This is one value (i.e., a single variable).
OTOTH, strcpy() copies the whole string all at a time. It accepts two char * parameters, so you don't need the * to dereference to get the value while passing arguments.
You can use the dereference operator, without strcpy(), copying element by element, like
char *str;
str = (char *) malloc(15); //success check TODO
int len = strlen("Hello"); //need string.h header
for (i = 0; i < len; i ++)
*(str+i)= "Hello"[i]; // the * form. as you wanted
str[i] = 0; //null termination
Many string manipulation functions, including strcpy, by convention and design, accept the pointer to the first character of the array, not the pointer to the whole array, even though their values are the same.
This is because their types are different; e.g. a pointer to char[10] has a different type from that of a pointer to char[15], and passing around the pointer to the whole array would be impossible or very clumsy because of this, unless you cast them everywhere or make different functions for different lengths.
For this reason, they have established a convention of passing around a string with the pointer to its first character, not to the whole array, possibly with its length when necessary. Many functions that operate on an array, such as memset, work the same way.
Well, here's what happens in the first snippet :
You are first dynamically allocating 15 bytes of memory, storing this address to the char pointer, which is pointer to a 1-byte sequence of data (a string).
Then you call strcpy(), which iterates over the string and copy characters, byte per byte, into the newly allocated memory space. Each character is a number based on the ASCII table, eg. character a = 97 (take a look at man ascii).
Then you pass this address to printf() which reads from the string, byte per byte, then flush it to your terminal.
In the second snippet, the process is the same, you are still allocating 15 bytes, storing the address in an int * pointer. An int is a 4 byte data type.
When you do *str = 10, you are dereferencing the pointer to store the value 10 at the address pointed by str. Remind what I wrote ahead, you could have done *str = 'a', and this index 0 integer would had the value 97, even if you try to read it as an int. you can event print it if you would.
So why strcpy() can take a int * as parameter? Because it's a memory space where it can write, byte per byte. You can store "Hell" in an int, then "o!" in the next one.
It's just all about usage easiness.
See there is a difference between = operator and the function strcpy.
* is deference operator. When you say *str, it means value at the memory location pointed by str.
Also as a good practice, use this
str = (char *) malloc( sizeof(char)*15 )
It is because the size of a data type might be different on different platforms. Hence use sizeof function to determine its actual size at the run time.
As the title implies, my question is how to get the size of a string in C. Is it good to use sizeof if I've declared it (the string) in a function without malloc in it? Or, if I've declared it as a pointer? What if I initialized it with malloc? I would like to have an exhaustive response.
You can use strlen. Size is determined by the terminating null-character, so passed string should be valid.
If you want to get size of memory buffer, that contains your string, and you have pointer to it:
If it is dynamic array(created with malloc), it is impossible to get
it size, since compiler doesn't know what pointer is pointing at.
(check this)
If it is static array, you can use sizeof to get its size.
If you are confused about difference between dynamic and static arrays, check this.
Use strlen to get the length of a null-terminated string.
sizeof returns the length of the array not the string. If it's a pointer (char *s), not an array (char s[]), it won't work, since it will return the size of the pointer (usually 4 bytes on 32-bit systems). I believe an array will be passed or returned as a pointer, so you'd lose the ability to use sizeof to check the size of the array.
So, only if the string spans the entire array (e.g. char s[] = "stuff"), would using sizeof for a statically defined array return what you want (and be faster as it wouldn't need to loop through to find the null-terminator) (if the last character is a null-terminator, you will need to subtract 1). If it doesn't span the entire array, it won't return what you want.
An alternative to all this is actually storing the size of the string.
While sizeof works for this specific type of string:
char str[] = "content";
int charcount = sizeof str - 1; // -1 to exclude terminating '\0'
It does not work if str is pointer (sizeof returns size of pointer, usually 4 or 8) or array with specified length (sizeof will return the byte count matching specified length, which for char type are same).
Just use strlen().
If you use sizeof()then a char *str and char str[] will return different answers. char str[] will return the length of the string(including the string terminator) while char *str will return the size of the pointer(differs as per compiler).
I like to use:
(strlen(string) + 1 ) * sizeof(char)
This will give you the buffer size in bytes. You can use this with snprintf() may help:
const char* message = "%s, World!";
char* string = (char*)malloc((strlen(message)+1))*sizeof(char));
snprintf(string, (strlen(message)+1))*sizeof(char), message, "Hello");
Cheers! Function: size_t strlen (const char *s)
There are two ways of finding the string size bytes:
1st Solution:
# include <iostream>
# include <cctype>
# include <cstring>
using namespace std;
int main()
{
char str[] = {"A lonely day."};
cout<<"The string bytes for str[] is: "<<strlen(str);
return 0;
}
2nd Solution:
# include <iostream>
# include <cstring>
using namespace std;
int main()
{
char str[] = {"A lonely day."};
cout<<"The string bytes for str[] is: "<<sizeof(str);
return 0;
}
Both solution produces different outputs. I will explain it to you after you read these.
The 1st solution uses strlen and based on cplusplus.com,
The length of a C string is determined by the terminating null-character: A C string is as long as the number of characters between the beginning of the string and the terminating null character (without including the terminating null character itself).
That can explain why does the 1st Solution prints out the correct string size bytes when the 2nd Solution prints the wrong string size bytes. But if you still don't understand, then continue reading.
The 2nd Solution uses sizeof to find out the string size bytes. Based on this SO answer, it says (modified it):
sizeof("f") must return 2 string size bytes, one for the 'f' and one for the terminating '\0' (terminating null-character).
That is why the output is string size bytes 14. One for the whole string and one for '\0'.
Conclusion:
To get the correct answer for 2nd Solution, you must do sizeof(str)-1.
References:
Sizeof string literal
https://cplusplus.com/reference/cstring/strlen/?kw=strlen
I have never really done much C but am starting to play around with it. I am writing little snippets like the one below to try to understand the usage and behaviour of key constructs/functions in C. The one below I wrote trying to understand the difference between char* string and char string[] and how then lengths of strings work. Furthermore I wanted to see if sprintf could be used to concatenate two strings and set it into a third string.
What I discovered was that the third string I was using to store the concatenation of the other two had to be set with char string[] syntax or the binary would die with SIGSEGV (Address boundary error). Setting it using the array syntax required a size so I initially started by setting it to the combined size of the other two strings. This seemed to let me perform the concatenation well enough.
Out of curiosity, though, I tried expanding the "concatenated" string to be longer than the size I had allocated. Much to my surprise, it still worked and the string size increased and could be printf'd fine.
My question is: Why does this happen, is it invalid or have risks/drawbacks? Furthermore, why is char str3[length3] valid but char str3[7] causes "SIGABRT (Abort)" when sprintf line tries to execute?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main() {
char* str1 = "Sup";
char* str2 = "Dood";
int length1 = strlen(str1);
int length2 = strlen(str2);
int length3 = length1 + length2;
char str3[length3];
//char str3[7];
printf("%s (length %d)\n", str1, length1); // Sup (length 3)
printf("%s (length %d)\n", str2, length2); // Dood (length 4)
printf("total length: %d\n", length3); // total length: 7
printf("str3 length: %d\n", (int)strlen(str3)); // str3 length: 6
sprintf(str3, "%s<-------------------->%s", str1, str2);
printf("%s\n", str3); // Sup<-------------------->Dood
printf("str3 length after sprintf: %d\n", // str3 length after sprintf: 29
(int)strlen(str3));
}
This line is wrong:
char str3[length3];
You're not taking the terminating zero into account. It should be:
char str3[length3+1];
You're also trying to get the length of str3, while it hasn't been set yet.
In addition, this line:
sprintf(str3, "%s<-------------------->%s", str1, str2);
will overflow the buffer you allocated for str3. Make sure you allocate enough space to hold the complete string, including the terminating zero.
void main() {
char* str1 = "Sup"; // a pointer to the statically allocated sequence of characters {'S', 'u', 'p', '\0' }
char* str2 = "Dood"; // a pointer to the statically allocated sequence of characters {'D', 'o', 'o', 'd', '\0' }
int length1 = strlen(str1); // the length of str1 without the terminating \0 == 3
int length2 = strlen(str2); // the length of str2 without the terminating \0 == 4
int length3 = length1 + length2;
char str3[length3]; // declare an array of7 characters, uninitialized
So far so good. Now:
printf("str3 length: %d\n", (int)strlen(str3)); // What is the length of str3? str3 is uninitialized!
C is a primitive language. It doesn't have strings. What it does have is arrays and pointers. A string is a convention, not a datatype. By convention, people agree that "an array of chars is a string, and the string ends at the first null character". All the C string functions follow this convention, but it is a convention. It is simply assumed that you follow it, or the string functions will break.
So str3 is not a 7-character string. It is an array of 7 characters. If you pass it to a function which expects a string, then that function will look for a '\0' to find the end of the string. str3 was never initialized, so it contains random garbage. In your case, apparently, there was a '\0' after the 6th character so strlen returns 6, but that's not guaranteed. If it hadn't been there, then it would have read past the end of the array.
sprintf(str3, "%s<-------------------->%s", str1, str2);
And here it goes wrong again. You are trying to copy the string "Sup<-------------------->Dood\0" into an array of 7 characters. That won't fit. Of course the C function doesn't know this, it just copies past the end of the array. Undefined behavior, and will probably crash.
printf("%s\n", str3); // Sup<-------------------->Dood
And here you try to print the string stored at str3. printf is a string function. It doesn't care (or know) about the size of your array. It is given a string, and, like all other string functions, determines the length of the string by looking for a '\0'.
Instead of trying to learn C by trial and error, I suggest that you go to your local bookshop and buy an "introduction to C programming" book. You'll end up knowing the language a lot better that way.
There is nothing more dangerous than a programmer who half understands C!
What you have to understand is that C doesn't actually have strings, it has character arrays. Moreover, the character arrays don't have associated length information -- instead, string length is determined by iterating over the characters until a null byte is encountered. This implies, that every char array should be at least strlen + 1 characters in length.
C doesn't perform array bounds checking. This means that the functions you call blindly trust you to have allocated enough space for your strings. When that isn't the case, you may end up writing beyond the bounds of the memory you allocated for your string. For a stack allocated char array, you'll overwrite the values of local variables. For heap-allocated char arrays, you may write beyond the memory area of your application. In either case, the best case is you'll error out immediately, and the worst case is that things appear to be working, but actually aren't.
As for the assignment, you can't write something like this:
char *str;
sprintf(str, ...);
and expect it to work -- str is an uninitialized pointer, so the value is "not defined", which in practice means "garbage". Pointers are memory addresses, so an attempt to write to an uninitialized pointer is an attempt to write to a random memory location. Not a good idea. Instead, what you want to do is something like:
char *str = malloc(sizeof(char) * (string length + 1));
which allocates n+1 characters worth of storage and stores the pointer to that storage in str. Of course, to be safe, you should check whether or not malloc returns null. And when you're done, you need to call free(str).
The reason your code works with the array syntax is because the array, being a local variable, is automatically allocated, so there's actually a free slice of memory there. That's (usually) not the case with an uninitialized pointer.
As for the question of how the size of a string can change, once you understand the bit about null bytes, it becomes obvious: all you need to do to change the size of a string is futz with the null byte. For example:
char str[] = "Foo bar";
str[1] = (char)0; // I'd use the character literal, but this editor won't let me
At this point, the length of the string as reported by strlen will be exactly 1. Or:
char str[] = "Foo bar";
str[7] = '!';
after which strlen will probably crash, because it will keep trying to read more bytes from beyond the array boundary. It might encounter a null byte and then stop (and of course, return the wrong string length), or it might crash.
I've written all of one C program, so expect this answer to be inaccurate and incomplete in a number of ways, which will undoubtedly be pointed out in the comments. ;-)
Your str3 is too short - you need to add extra byte for null-terminator and the length of "<-------------------->" string literal.
Out of curiosity, though, I tried
expanding the "concatenated" string to
be longer than the size I had
allocated. Much to my surprise, it
still worked and the string size
increased and could be printf'd fine.
The behaviour is undefined so it may or may not segfault.
strlen returns the length of the string without the trailing NULL byte (\0, 0x00) but when you create a variable to hold the combined strings you need to add that 1 character.
char str3[length3 + 1];
…and you should be all set.
C strings are '\0' terminated and require an extra byte for that, so at least you should do
char str3[length3 + 1]
will do the job.
In sprintf() ypu are writing beyond the space allocated for str3. This may cause any type of undefined behavior (If you are lucky then it will crash). In strlen(), it is just searching for a NULL character from the memory location you specified and it is finding one in 29th location. It can as well be 129 also i.e. it will behave very erratically.
A few important points:
Just because it works doesn't mean it's safe. Going past the end of a buffer is always unsafe, and even if it works on your computer, it may fail under a different OS, different compiler, or even a second run.
I suggest you think of a char array as a container and a string as an object that is stored inside the container. In this case, the container must be 1 character longer than the object it holds, since a "null character" is required to indicate the end of the object. The container is a fixed size, and the object can change size (by moving the null character).
The first null character in the array indicates the end of the string. The remainder of the array is unused.
You can store different things in a char array (such as a sequence of numbers). It just depends on how you use it. But string function such as printf() or strcat() assume that there is a null-terminated string to be found there.