allocating memory to pointer with exact character length - c

I am new to c and learning pointers at the moment what I know is that pointer points to the memory address of whatever it points to.
my question is this how you allocates memory exactly the length of the character or it will take 50 bytes?
Lets say they entered a title: hunger games
BOOL AddNewDVD(Database* data){
}

I am new to c and learning pointers
Pointers are tough for beginners. Make sure you get a solid foundation.
at the moment what I know is that pointer points to the memory address of whatever it points to.
Though that is in practice correct, that's not how I like to think of it. What you are describing is how pointers are typically implemented, not what they are conceptually. By confusing the implementation with the concept you set yourself up for writing bad code later that makes unwarranted assumptions. There is no requirement that a pointer be a number which is an address in a virtual memory system.
A better way to think of a pointer is not as an address, but rather:
A pointer to t is a value.
Applying the * operator to a pointer to t gives you a variable of type t.
Applying the & operator to a variable of type t gives you a pointer to t.
A variable of type t can fetch or store a value of type t.
An array is a set of variables each identified by an index.
If a pointer references the variable associated with index i in an array then p + x gives you a pointer that references the variable associated with index i + x.
Applying the [i] operator to a pointer is a shorthand for *(p+i).
That is, rather than thinking of a pointer as a number that refers to a location in memory, just think of it as something that you can force to give you a variable.
is this how you allocates memory exactly the length of the scanned string or it will take 50 bytes?
char *title = malloc(50 * sizeof(char));
scanf(" %[^\n]s", title);
malloc(50*sizeof(char)) gives you an array of 50 chars.
title is a pointer to char.
When dereferenced, title will give you the variable associated with the first item in the array. (Item zero; remember, the index is the distance from the first item, and the first item has zero distance from the first item.)
scanf fills in the characters typed by the user into your array of 50 chars.
If they type in more than 49 chars (remembering that there will be a zero char placed at the end by convention) then arbitrarily bad things can happen.
As you correctly note, either you are wasting a lot of space or you are possibly overflowing the buffer. The solution is: don't use scanf for any production code. It is far too dangerous. Instead use fgets. See this question for more details:
How to use sscanf correctly and safely

You need to have a buffer to know how long the entered name is. This is your title, which can be filled maximum with 49 chars. Then you compute len and see it is only 6 byte long. You allocate string to have exactely this size + 1.
Of course you can then write the content of title to string, even if title is a 50 byte long buffer, and string only 7 byte long - copying of the content ends with the \0 termination char, and this is guaranteed to be inside capacity of string.

You cannot use scanf to determine the length of a string and then allocate memory for it. You need to either:
Ask the user the length of the string. Obviously, this is a poor choice.
Create a static buffer that is more than large enough and then create a dynamic string that is the exact length you need. The problem is, determining what the maximum length the string may be. fgets might be what you need. Consider the following code fragment:
#define MAX_STR_LEN (50)
char buf[MAX_STR_LEN] = {0};
char *str, *cPtr;
/* Get User Input */
printf("Enter a string, no longer than %d characters: ", MAX_STR_LEN);
fgets(buf, MAX_STR_LEN, stdin);
/* Remove Newline Character If Present */
cPtr = strstr(buf, "\n");
if(cPtr)
*cPtr = '\0';
/* Allocate Memory For Exact Length Of String */
str = malloc(strlen(buf) + 1);
strncpy(str, buf, strlen(buf));
/* Display Result */
printf("Your string is \"%s\"\n", str);

Related

About pointers and strcpy() in C

I am practicing allocation memory using malloc() with pointers, but 1 observation about pointers is that, why can strcpy() accept str variable without *:
char *str;
str = (char *) malloc(15);
strcpy(str, "Hello");
printf("String = %s, Address = %u\n", str, str);
But with integers, we need * to give str a value.
int *str;
str = (int *) malloc(15);
*str = 10;
printf("Int = %d, Address = %u\n", *str, str);
it really confuses me why strcpy() accepts str, because in my own understanding, "Hello" will be passed to the memory location of str that will cause some errors.
In C, a string is (by definition) an array of characters. However (whether we realize it all the time or not) we almost always end up accessing arrays using pointers. So, although C does not have a true "string" type, for most practical purposes, the type pointer-to-char (i.e. char *) serves this purpose. Almost any function that accepts or returns a string will actually use a char *. That's why strlen() and strcpy() accept char *. That's why printf %s expects a char *. In all of these cases, what these functions need is a pointer to the first character of the string. (They then read the rest of the string sequentially, stopping when they find the terminating '\0' character.)
In these cases, you don't use an explicit * character. * would extract just the character pointed to (that is, the first character of the string), but you don't want to extract the first character, you want to hand the whole string (that is, a pointer to the whole string) to strcpy so it can do its job.
In your second example, you weren't working with a string at all. (The fact that you used a variable named str confused me for a moment.) You have a pointer to some ints, and you're working with the first int pointed to. Since you're directly accessing one of the things pointed to, that's why you do need the explicit * character.
The * is called indirection or dereference operator.
In your second code,
*str = 10;
assigns the value 10 to the memory address pointed by str. This is one value (i.e., a single variable).
OTOTH, strcpy() copies the whole string all at a time. It accepts two char * parameters, so you don't need the * to dereference to get the value while passing arguments.
You can use the dereference operator, without strcpy(), copying element by element, like
char *str;
str = (char *) malloc(15); //success check TODO
int len = strlen("Hello"); //need string.h header
for (i = 0; i < len; i ++)
*(str+i)= "Hello"[i]; // the * form. as you wanted
str[i] = 0; //null termination
Many string manipulation functions, including strcpy, by convention and design, accept the pointer to the first character of the array, not the pointer to the whole array, even though their values are the same.
This is because their types are different; e.g. a pointer to char[10] has a different type from that of a pointer to char[15], and passing around the pointer to the whole array would be impossible or very clumsy because of this, unless you cast them everywhere or make different functions for different lengths.
For this reason, they have established a convention of passing around a string with the pointer to its first character, not to the whole array, possibly with its length when necessary. Many functions that operate on an array, such as memset, work the same way.
Well, here's what happens in the first snippet :
You are first dynamically allocating 15 bytes of memory, storing this address to the char pointer, which is pointer to a 1-byte sequence of data (a string).
Then you call strcpy(), which iterates over the string and copy characters, byte per byte, into the newly allocated memory space. Each character is a number based on the ASCII table, eg. character a = 97 (take a look at man ascii).
Then you pass this address to printf() which reads from the string, byte per byte, then flush it to your terminal.
In the second snippet, the process is the same, you are still allocating 15 bytes, storing the address in an int * pointer. An int is a 4 byte data type.
When you do *str = 10, you are dereferencing the pointer to store the value 10 at the address pointed by str. Remind what I wrote ahead, you could have done *str = 'a', and this index 0 integer would had the value 97, even if you try to read it as an int. you can event print it if you would.
So why strcpy() can take a int * as parameter? Because it's a memory space where it can write, byte per byte. You can store "Hell" in an int, then "o!" in the next one.
It's just all about usage easiness.
See there is a difference between = operator and the function strcpy.
* is deference operator. When you say *str, it means value at the memory location pointed by str.
Also as a good practice, use this
str = (char *) malloc( sizeof(char)*15 )
It is because the size of a data type might be different on different platforms. Hence use sizeof function to determine its actual size at the run time.

How do I Initialize C code while only using words

how do i Initialize my code if all im using are words and no numbers?
I have been trying to just use char * but it is saying that its still not initialized
char *Carson;
printf("Enter a name:\n");
scanf("%s",Name);
printf("%s Hello Carson\n", Carson);
You either have to allocate memory dynamically and assign it to Carson (see e.g. `malloc? ), or make it an array. There's no way around it. And for that, the code must contain a number. The number could be input from the user though, so you won't have any actual numbers in the source.
Remember that in C all strings need an extra terminator character (added automatically by scanf) so remember to add space for it.
A solution without any number, I don't think this must be used for practical applications, just a hack
char Carson[sizeof(long long) * sizeof(long long)];
printf("Size = %d\n", sizeof Carson);
printf("Enter a name:\n");
scanf("%s",Carson);
printf("%s Hello Carson\n", Carson);
In my system it create a char array of 64 bytes = 8 * 8, the size of long long in most systems is 8 bytes although it's size depends on your compiler and operating system
you might like to initialize Carson like this:
char *Carson = malloc(sizeof(char)*200);/* for 200 characters */
Don't forget to add \0 terminator and also, donot forget to free it once you are done using it.
In order to initialize variables in C you need to use constants values, that is, expressions whose value can be known at compile time.
For integer or float types you can use mathematical formulas involving only constant operands, thus you can obtain still a constant value that can be used in a initiaiization.
What you call "words" have been called better "strings".
In C you are able to use strings that are constant at compile time, also called "string literals".
A string literal has to be indicated surrounded by quotes, like these examples:
"Hello world!"
"Peter & John"
"user#gmail.com"
and so on.
There are some rules that you need to remember: some special characters have "escape sequences" to be used inside a string literal.
Now you can use that string literals in order to initialize a char* variable:
char *name = "Mr. Smith";
char *city = "Amsterdam";
The result of the initialization gives a C string style, that is, an array of char object, whose length is the amount of quoted characters in the string literal, plus 1, because a null character is added at the end. Thus, in memory you have:
char *city ----> |A|m|s|t|e|r|d|a|m|\0|
Thus, city points to an array of 10 chars.
The last character, \0, means "null character", whose ASCII code is 0. Since it corresponds to a non-printable character, it has to be indicated with the escape sequence \0.
For more information, take a look on these websites:
Escape sequences in C
Storage of string literals
If you initialize a pointer to char object to a string literal, the compiler reserves memory automatically for you, son you don't need any malloc() at all.
However, you cannot modify the characters of such a string.
If you are interested in modify the characters, you can use better un array of char object:
char name[30] = "Schwarzenegger";
The array reserves 30 chars for the string literal "Schwarzenegger".
Only the first 14 are used for the string, plus 1 holding the null character attached at the end.
The rest of chars of the array have dummy information, but there is no problem because they are not printed. (The standard library functions always stop processing the string when they encounter a null character).
EDITED More information.
About your particular error message: "lack of initialization", the problem is that in the definition of the pointer to char object:
char *name;
you only have a "pointer to" an undefined block of memory.
You have to specify the array of char that name will be point to.
If you initialize with a string literal, there is not any problem, because the address of the string literal is passed to name.
But, since you are planning to use name for data input by means of scanf(), you have to allocate memory enough. You can do that other users have explained yet in their answers, that is, by using malloc().
I think there is need to do changes in your code,
char Carson = NULL;
Carson = (char)malloc(sizeof(char)*256);
printf("Enter a name:\n");
scanf("%s",Carson );
printf("%s Hello Carson\n", Carson);
in place of 256 u can use whatever value you want.
let me know if it works.

pointer related queries

Guys i have few queries in pointers. Kindly help to resolve them
char a[]="this is an array of characters"; // declaration type 1
char *b="this is an array of characters";// declaration type 2
question.1 : what is the difference between these 2 types of declaration ?
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
question.2 : i didn't get how is it working
char *d=malloc(sizeof(char)); // 1)
scanf("%s",d); // 2)
printf("%s",d);// 3)
question.3 how many bytes are being allocated to the pointer c?
when i try to input a string, it takes just a word and not the whole string. why so ?
char c=malloc(sizeof(char)); // 4)
scanf("%c",c); // 5)
printf("%c",c);// 6)
question.4 when i try to input a charcter why does it throw a segmentation fault?
Thanks in advance.. Waiting for your reply guys..
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
the %s expects a pointer to array of chars.
char *c=malloc(sizeof(char)); // you are allocating only 1 byte aka char, not array of char!
scanf("%s",c); // you need pass a pointer to array, not a pointer to char
printf("%s",c);// you are printing a array of chars, but you are sending a char
you need do this:
int sizeofstring = 200; // max size of buffer
char *c = malloc(sizeof(char))*sizeofstring; //almost equals to declare char c[200]
scanf("%s",c);
printf("%s",c);
question.3 how many bytes are being allocated to the pointer c? when i
try to input a string, it takes just a word and not the whole string.
why so ?
In your code, you only are allocating 1 byte because sizeof(char) = 1byte = 8bit, you need allocate sizeof(char)*N, were N is your "string" size.
char a[]="this is an array of characters"; // declaration type 1
char *b="this is an array of characters";// declaration type 2
Here you are declaring two variables, a and b, and initializing them. "this is an array of characters" is a string literal, which in C has type array of char. a has type array of char. In this specific case, the array does not get converted to a pointer, and a gets initialized with the array "this is an array of characters". b has type pointer to char, the array gets converted to a pointer, and b gets initialized with a pointer to the array "this is an array of characters".
printf("%s",*b); // gives a segmentation fault
printf("%s",b); // displays the string
In an expression, *b dereferences the pointer b, so it evaluates to the char pointed by b, i.e: T. This is not an address (which is what "%s" is expecting), so you get undefined behavior, most probably a crash (but don't try to do this on embedded systems, you could get mysterious behaviour and corrupted data, which is worse than a crash). In the second case, %s expects a pointer to a char, gets it, and can proceed to do its thing.
char *d=malloc(sizeof(char)); // 1)
scanf("%s",d); // 2)
printf("%s",d);// 3)
In C, sizeof returns the size in bytes of an object (= region of storage). In C, a char is defined to be the same as a byte, which has at least 8 bits, but can have more (but some standards put additional restrictions, e.g: POSIX requires 8-bit bytes, i.e: octets). So, you are allocating 1 byte. When you call scanf(), it writes in the memory pointed to by d without restraint, overwriting everything in sight. scanf() allows maximum field widths, so:
Allocate more memory, at least enough for what you want + 1 terminating ASCII NUL.
Tell scanf() to stop, e.g: scanf("%19s") for a maximum 19 characters (you'll need 20 bytes to store that, counting the terminating ASCII NUL).
And last (if markdown lets me):
char c=malloc(sizeof(char)); // 4)
scanf("%c",c); // 5)
printf("%c",c);// 6)
c is not a pointer, so you are trying to store an address where you shouldn't. In scanf, "%c" expects a pointer to char, which should point to an object (=region of storage) with enough space for the specified field width, 1 by default. Since c is not a pointer, the above may crash in some platforms (and cause worse things on others).
I see several problems in your code.
Question 1: The difference is:
a gets allocated in writable memory, the so-called data segment. Here you can read and write as much as you want. sizeof a is the length of the string plus 1, the so-called string terminator (just a null byte).
b, however, is just a pointer to a string which is located in the rodata. That means, in a data area which is read only. sizeof b is whatever is the pointer size on your system, maybe 4 or 8 on a PC or 2 on many embedded systems.
Question 2: The printf() format wants a pointer to a string. With *b, you dereferene the pointer you have and give it the first byte of data, which is a t (ASCII 84 or something like that). The callee, however, treats it as a pointer, dereferences it and BAM.
With b, however, everything goes fine, as it is exactly the right call.
Question 3: malloc(sizeof(char)) allocates exactly one byte. sizeof(char) is 1 by definition, so the call is effectively malloc(1). The input just takes a word because %s is defined that way.
Question 4:
char c=malloc(sizeof(char)); // 4)
shound give you a warning: malloc() returns a pointer which you try to put into a char. ITYM char *...
As you continue, you give that pointer to scanf(), which receives e.g. instead of 0x80043214 a mere 0x14, interprets it as a pointer and BAM again.
The correct way would be
char * c=malloc(1024);
scanf("%1024s", c);
printf("%s", c);
Why? Well, you want to read a string. 1 byte is too small, better allocate more.
In scanf() you should take care that you don't allow reading more than your buffer can hold - thus the limitation in the format specifier.
and on printing, you should use %s, because you want the whole string to be printed and not only the first character. (At least, I suppose so.)
Ad Q1: The first is an array of chars with a fixed pointer a pointing to it. sizeof(a) will return something like 20 (strlen(a)+1). Trying to assign something to a (like a = b) will fail, since a is fixed.
The second is a pointer pointing to an array of char and hence is the sizeof(b) usually 4 on 32-bit or 8 on 64-bit. Assigning something to b will work, since the pointer can take a new value.
Of course, *a or *b work on both.
Ad Q2: printf() with the %s argument takes a pointer to a char (those are the "strings" in C). Hence, printf("%s", *b) will crash, since the "pointer" used by printf() will contain the byte value of *b.
What you could do, is printf("%c", *b), but that would only print the first character.
Ad Q3: sizeof(char) is 1 (by definition), hence you allocate 1 byte. The scanf will most likely read more than one byte (remember that each string will be terminated by a null character occupying one char). Hence the scanf will trash memory, likely to cause memory sometime later on.
Ad 4: Maybe that's the trashed memory.
Both declaration are the same.
b point to the first byte so when you say *b it's the first character.
printf("%s", *b)
Will fail as %s accepts a pointer to a string.
char is one byte.

C strings confusion

I'm learning C right now and got a bit confused with character arrays - strings.
char name[15]="Fortran";
No problem with this - its an array that can hold (up to?) 15 chars
char name[]="Fortran";
C counts the number of characters for me so I don't have to - neat!
char* name;
Okay. What now? All I know is that this can hold an big number of characters that are assigned later (e.g.: via user input), but
Why do they call this a char pointer? I know of pointers as references to variables
Is this an "excuse"? Does this find any other use than in char*?
What is this actually? Is it a pointer? How do you use it correctly?
thanks in advance,
lamas
I think this can be explained this way, since a picture is worth a thousand words...
We'll start off with char name[] = "Fortran", which is an array of chars, the length is known at compile time, 7 to be exact, right? Wrong! it is 8, since a '\0' is a nul terminating character, all strings have to have that.
char name[] = "Fortran";
+======+ +-+-+-+-+-+-+-+--+
|0x1234| |F|o|r|t|r|a|n|\0|
+======+ +-+-+-+-+-+-+-+--+
At link time, the compiler and linker gave the symbol name a memory address of 0x1234.
Using the subscript operator, i.e. name[1] for example, the compiler knows how to calculate where in memory is the character at offset, 0x1234 + 1 = 0x1235, and it is indeed 'o'. That is simple enough, furthermore, with the ANSI C standard, the size of a char data type is 1 byte, which can explain how the runtime can obtain the value of this semantic name[cnt++], assuming cnt is an integer and has a value of 3 for example, the runtime steps up by one automatically, and counting from zero, the value of the offset is 't'. This is simple so far so good.
What happens if name[12] was executed? Well, the code will either crash, or you will get garbage, since the boundary of the array is from index/offset 0 (0x1234) up to 8 (0x123B). Anything after that does not belong to name variable, that would be called a buffer overflow!
The address of name in memory is 0x1234, as in the example, if you were to do this:
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00001234
For the sake of brevity and keeping with the example, the memory addresses are 32bit, hence you see the extra 0's. Fair enough? Right, let's move on.
Now on to pointers...
char *name is a pointer to type of char....
Edit:
And we initialize it to NULL as shown Thanks Dan for pointing out the little error...
char *name = (char*)NULL;
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
At compile/link time, the name does not point to anything, but has a compile/link time address for the symbol name (0x5678), in fact it is NULL, the pointer address of name is unknown hence 0x0000.
Now, remember, this is crucial, the address of the symbol is known at compile/link time, but the pointer address is unknown, when dealing with pointers of any type
Suppose we do this:
name = (char *)malloc((20 * sizeof(char)) + 1);
strcpy(name, "Fortran");
We called malloc to allocate a memory block for 20 bytes, no, it is not 21, the reason I added 1 on to the size is for the '\0' nul terminating character. Suppose at runtime, the address given was 0x9876,
char *name;
+======+ +======+ +-+-+-+-+-+-+-+--+
|0x5678| -> |0x9876| -> |F|o|r|t|r|a|n|\0|
+======+ +======+ +-+-+-+-+-+-+-+--+
So when you do this:
printf("The address of name is %p\n", name);
printf("The address of name is %p\n", &name);
Output would be:
The address of name is 0x00005678
The address of name is 0x00009876
Now, this is where the illusion that 'arrays and pointers are the same comes into play here'
When we do this:
char ch = name[1];
What happens at runtime is this:
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the subscript value of 1 and add it onto the pointer address, i.e. 0x9877 to retrieve the value at that memory address, i.e. 'o' and is assigned to ch.
That above is crucial to understanding this distinction, the difference between arrays and pointers is how the runtime fetches the data, with pointers, there is an extra indirection of fetching.
Remember, an array of type T will always decay into a pointer of the first element of type T.
When we do this:
char ch = *(name + 5);
The address of symbol name is looked up
Fetch the memory address of that symbol, i.e. 0x5678.
At that address, contains another address, a pointer address to memory and fetch it, i.e. 0x9876
Get the offset based on the value of 5 and add it onto the pointer address, i.e. 0x987A to retrieve the value at that memory address, i.e. 'r' and is assigned to ch.
Incidentally, you can also do that to the array of chars also...
Further more, by using subscript operators in the context of an array i.e. char name[] = "..."; and name[subscript_value] is really the same as *(name + subscript_value).
i.e.
name[3] is the same as *(name + 3)
And since the expression *(name + subscript_value) is commutative, that is in the reverse,
*(subscript_value + name) is the same as *(name + subscript_value)
Hence, this explains why in one of the answers above you can write it like this (despite it, the practice is not recommended even though it is quite legitimate!)
3[name]
Ok, how do I get the value of the pointer?
That is what the * is used for,
Suppose the pointer name has that pointer memory address of 0x9878, again, referring to the above example, this is how it is achieved:
char ch = *name;
This means, obtain the value that is pointed to by the memory address of 0x9878, now ch will have the value of 'r'. This is called dereferencing. We just dereferenced a name pointer to obtain the value and assign it to ch.
Also, the compiler knows that a sizeof(char) is 1, hence you can do pointer increment/decrement operations like this
*name++;
*name--;
The pointer automatically steps up/down as a result by one.
When we do this, assuming the pointer memory address of 0x9878:
char ch = *name++;
What is the value of *name and what is the address, the answer is, the *name will now contain 't' and assign it to ch, and the pointer memory address is 0x9879.
This where you have to be careful also, in the same principle and spirit as to what was stated earlier in relation to the memory boundaries in the very first part (see 'What happens if name[12] was executed' in the above) the results will be the same, i.e. code crashes and burns!
Now, what happens if we deallocate the block of memory pointed to by name by calling the C function free with name as the parameter, i.e. free(name):
+======+ +======+
|0x5678| -> |0x0000| -> NULL
+======+ +======+
Yes, the block of memory is freed up and handed back to the runtime environment for use by another upcoming code execution of malloc.
Now, this is where the common notation of Segmentation fault comes into play, since name does not point to anything, what happens when we dereference it i.e.
char ch = *name;
Yes, the code will crash and burn with a 'Segmentation fault', this is common under Unix/Linux. Under windows, a dialog box will appear along the lines of 'Unrecoverable error' or 'An error has occurred with the application, do you wish to send the report to Microsoft?'....if the pointer has not been mallocd and any attempt to dereference it, is guaranteed to crash and burn.
Also: remember this, for every malloc there is a corresponding free, if there is no corresponding free, you have a memory leak in which memory is allocated but not freed up.
And there you have it, that is how pointers work and how arrays are different to pointers, if you are reading a textbook that says they are the same, tear out that page and rip it up! :)
I hope this is of help to you in understanding pointers.
That is a pointer. Which means it is a variable that holds an address in memory. It "points" to another variable.
It actually cannot - by itself - hold large amounts of characters. By itself, it can hold only one address in memory. If you assign characters to it at creation it will allocate space for those characters, and then point to that address. You can do it like this:
char* name = "Mr. Anderson";
That is actually pretty much the same as this:
char name[] = "Mr. Anderson";
The place where character pointers come in handy is dynamic memory. You can assign a string of any length to a char pointer at any time in the program by doing something like this:
char *name;
name = malloc(256*sizeof(char));
strcpy(name, "This is less than 256 characters, so this is fine.");
Alternately, you can assign to it using the strdup() function, like this:
char *name;
name = strdup("This can be as long or short as I want. The function will allocate enough space for the string and assign return a pointer to it. Which then gets assigned to name");
If you use a character pointer this way - and assign memory to it, you have to free the memory contained in name before reassigning it. Like this:
if(name)
free(name);
name = 0;
Make sure to check that name is, in fact, a valid point before trying to free its memory. That's what the if statement does.
The reason you see character pointers get used a whole lot in C is because they allow you to reassign the string with a string of a different size. Static character arrays don't do that. They're also easier to pass around.
Also, character pointers are handy because they can be used to point to different statically allocated character arrays. Like this:
char *name;
char joe[] = "joe";
char bob[] = "bob";
name = joe;
printf("%s", name);
name = bob;
printf("%s", name);
This is what often happens when you pass a statically allocated array to a function taking a character pointer. For instance:
void strcpy(char *str1, char *str2);
If you then pass that:
char buffer[256];
strcpy(buffer, "This is a string, less than 256 characters.");
It will manipulate both of those through str1 and str2 which are just pointers that point to where buffer and the string literal are stored in memory.
Something to keep in mind when working in a function. If you have a function that returns a character pointer, don't return a pointer to a static character array allocated in the function. It will go out of scope and you'll have issues. Repeat, don't do this:
char *myFunc() {
char myBuf[64];
strcpy(myBuf, "hi");
return myBuf;
}
That won't work. You have to use a pointer and allocate memory (like shown earlier) in that case. The memory allocated will persist then, even when you pass out of the functions scope. Just don't forget to free it as previously mentioned.
This ended up a bit more encyclopedic than I'd intended, hope its helpful.
Editted to remove C++ code. I mix the two so often, I sometimes forget.
char* name is just a pointer. Somewhere along the line memory has to be allocated and the address of that memory stored in name.
It could point to a single byte of memory and be a "true" pointer to a single char.
It could point to a contiguous area of memory which holds a number of characters.
If those characters happen to end with a null terminator, low and behold you have a pointer to a string.
char *name, on it's own, can't hold any characters. This is important.
char *name just declares that name is a pointer (that is, a variable whose value is an address) that will be used to store the address of one or more characters at some point later in the program. It does not, however, allocate any space in memory to actually hold those characters, nor does it guarantee that name even contains a valid address. In the same way, if you have a declaration like int number there is no way to know what the value of number is until you explicitly set it.
Just like after declaring the value of an integer, you might later set its value (number = 42), after declaring a pointer to char, you might later set its value to be a valid memory address that contains a character -- or sequence of characters -- that you are interested in.
It is confusing indeed. The important thing to understand and distinguish is that char name[] declares array and char* name declares pointer. The two are different animals.
However, array in C can be implicitly converted to pointer to its first element. This gives you ability to perform pointer arithmetic and iterate through array elements (it does not matter elements of what type, char or not). As #which mentioned, you can use both, indexing operator or pointer arithmetic to access array elements. In fact, indexing operator is just a syntactic sugar (another representation of the same expression) for pointer arithmetic.
It is important to distinguish difference between array and pointer to first element of array. It is possible to query size of array declared as char name[15] using sizeof operator:
char name[15] = { 0 };
size_t s = sizeof(name);
assert(s == 15);
but if you apply sizeof to char* name you will get size of pointer on your platform (i.e. 4 bytes):
char* name = 0;
size_t s = sizeof(name);
assert(s == 4); // assuming pointer is 4-bytes long on your compiler/machine
Also, the two forms of definitions of arrays of char elements are equivalent:
char letters1[5] = { 'a', 'b', 'c', 'd', '\0' };
char letters2[5] = "abcd"; /* 5th element implicitly gets value of 0 */
The dual nature of arrays, the implicit conversion of array to pointer to its first element, in C (and also C++) language, pointer can be used as iterator to walk through array elements:
/ *skip to 'd' letter */
char* it = letters1;
for (int i = 0; i < 3; i++)
it++;
In C a string is actually just an array of characters, as you can see by the definition. However, superficially, any array is just a pointer to its first element, see below for the subtle intricacies. There is no range checking in C, the range you supply in the variable declaration has only meaning for the memory allocation for the variable.
a[x] is the same as *(a + x), i.e. dereference of the pointer a incremented by x.
if you used the following:
char foo[] = "foobar";
char bar = *foo;
bar will be set to 'f'
To stave of confusion and avoid misleading people, some extra words on the more intricate difference between pointers and arrays, thanks avakar:
In some cases a pointer is actually semantically different from an array, a (non-exhaustive) list of examples:
//sizeof
sizeof(char*) != sizeof(char[10])
//lvalues
char foo[] = "foobar";
char bar[] = "baz";
char* p;
foo = bar; // compile error, array is not an lvalue
p = bar; //just fine p now points to the array contents of bar
// multidimensional arrays
int baz[2][2];
int* q = baz; //compile error, multidimensional arrays can not decay into pointer
int* r = baz[0]; //just fine, r now points to the first element of the first "row" of baz
int x = baz[1][1];
int y = r[1][1]; //compile error, don't know dimensions of array, so subscripting is not possible
int z = r[1]: //just fine, z now holds the second element of the first "row" of baz
And finally a fun bit of trivia; since a[x] is equivalent to *(a + x) you can actually use e.g. '3[a]' to access the fourth element of array a. I.e. the following is perfectly legal code, and will print 'b' the fourth character of string foo.
#include <stdio.h>
int main(int argc, char** argv) {
char foo[] = "foobar";
printf("%c\n", 3[foo]);
return 0;
}
One is an actual array object and the other is a reference or pointer to such an array object.
The thing that can be confusing is that both have the address of the first character in them, but only because one address is the first character and the other address is a word in memory that contains the address of the character.
The difference can be seen in the value of &name. In the first two cases it is the same value as just name, but in the third case it is a different type called pointer to pointer to char, or **char, and it is the address of the pointer itself. That is, it is a double-indirect pointer.
#include <stdio.h>
char name1[] = "fortran";
char *name2 = "fortran";
int main(void) {
printf("%lx\n%lx %s\n", (long)name1, (long)&name1, name1);
printf("%lx\n%lx %s\n", (long)name2, (long)&name2, name2);
return 0;
}
Ross-Harveys-MacBook-Pro:so ross$ ./a.out
100001068
100001068 fortran
100000f58
100001070 fortran

I'm new to C, can someone explain why the size of this string can change?

I have never really done much C but am starting to play around with it. I am writing little snippets like the one below to try to understand the usage and behaviour of key constructs/functions in C. The one below I wrote trying to understand the difference between char* string and char string[] and how then lengths of strings work. Furthermore I wanted to see if sprintf could be used to concatenate two strings and set it into a third string.
What I discovered was that the third string I was using to store the concatenation of the other two had to be set with char string[] syntax or the binary would die with SIGSEGV (Address boundary error). Setting it using the array syntax required a size so I initially started by setting it to the combined size of the other two strings. This seemed to let me perform the concatenation well enough.
Out of curiosity, though, I tried expanding the "concatenated" string to be longer than the size I had allocated. Much to my surprise, it still worked and the string size increased and could be printf'd fine.
My question is: Why does this happen, is it invalid or have risks/drawbacks? Furthermore, why is char str3[length3] valid but char str3[7] causes "SIGABRT (Abort)" when sprintf line tries to execute?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main() {
char* str1 = "Sup";
char* str2 = "Dood";
int length1 = strlen(str1);
int length2 = strlen(str2);
int length3 = length1 + length2;
char str3[length3];
//char str3[7];
printf("%s (length %d)\n", str1, length1); // Sup (length 3)
printf("%s (length %d)\n", str2, length2); // Dood (length 4)
printf("total length: %d\n", length3); // total length: 7
printf("str3 length: %d\n", (int)strlen(str3)); // str3 length: 6
sprintf(str3, "%s<-------------------->%s", str1, str2);
printf("%s\n", str3); // Sup<-------------------->Dood
printf("str3 length after sprintf: %d\n", // str3 length after sprintf: 29
(int)strlen(str3));
}
This line is wrong:
char str3[length3];
You're not taking the terminating zero into account. It should be:
char str3[length3+1];
You're also trying to get the length of str3, while it hasn't been set yet.
In addition, this line:
sprintf(str3, "%s<-------------------->%s", str1, str2);
will overflow the buffer you allocated for str3. Make sure you allocate enough space to hold the complete string, including the terminating zero.
void main() {
char* str1 = "Sup"; // a pointer to the statically allocated sequence of characters {'S', 'u', 'p', '\0' }
char* str2 = "Dood"; // a pointer to the statically allocated sequence of characters {'D', 'o', 'o', 'd', '\0' }
int length1 = strlen(str1); // the length of str1 without the terminating \0 == 3
int length2 = strlen(str2); // the length of str2 without the terminating \0 == 4
int length3 = length1 + length2;
char str3[length3]; // declare an array of7 characters, uninitialized
So far so good. Now:
printf("str3 length: %d\n", (int)strlen(str3)); // What is the length of str3? str3 is uninitialized!
C is a primitive language. It doesn't have strings. What it does have is arrays and pointers. A string is a convention, not a datatype. By convention, people agree that "an array of chars is a string, and the string ends at the first null character". All the C string functions follow this convention, but it is a convention. It is simply assumed that you follow it, or the string functions will break.
So str3 is not a 7-character string. It is an array of 7 characters. If you pass it to a function which expects a string, then that function will look for a '\0' to find the end of the string. str3 was never initialized, so it contains random garbage. In your case, apparently, there was a '\0' after the 6th character so strlen returns 6, but that's not guaranteed. If it hadn't been there, then it would have read past the end of the array.
sprintf(str3, "%s<-------------------->%s", str1, str2);
And here it goes wrong again. You are trying to copy the string "Sup<-------------------->Dood\0" into an array of 7 characters. That won't fit. Of course the C function doesn't know this, it just copies past the end of the array. Undefined behavior, and will probably crash.
printf("%s\n", str3); // Sup<-------------------->Dood
And here you try to print the string stored at str3. printf is a string function. It doesn't care (or know) about the size of your array. It is given a string, and, like all other string functions, determines the length of the string by looking for a '\0'.
Instead of trying to learn C by trial and error, I suggest that you go to your local bookshop and buy an "introduction to C programming" book. You'll end up knowing the language a lot better that way.
There is nothing more dangerous than a programmer who half understands C!
What you have to understand is that C doesn't actually have strings, it has character arrays. Moreover, the character arrays don't have associated length information -- instead, string length is determined by iterating over the characters until a null byte is encountered. This implies, that every char array should be at least strlen + 1 characters in length.
C doesn't perform array bounds checking. This means that the functions you call blindly trust you to have allocated enough space for your strings. When that isn't the case, you may end up writing beyond the bounds of the memory you allocated for your string. For a stack allocated char array, you'll overwrite the values of local variables. For heap-allocated char arrays, you may write beyond the memory area of your application. In either case, the best case is you'll error out immediately, and the worst case is that things appear to be working, but actually aren't.
As for the assignment, you can't write something like this:
char *str;
sprintf(str, ...);
and expect it to work -- str is an uninitialized pointer, so the value is "not defined", which in practice means "garbage". Pointers are memory addresses, so an attempt to write to an uninitialized pointer is an attempt to write to a random memory location. Not a good idea. Instead, what you want to do is something like:
char *str = malloc(sizeof(char) * (string length + 1));
which allocates n+1 characters worth of storage and stores the pointer to that storage in str. Of course, to be safe, you should check whether or not malloc returns null. And when you're done, you need to call free(str).
The reason your code works with the array syntax is because the array, being a local variable, is automatically allocated, so there's actually a free slice of memory there. That's (usually) not the case with an uninitialized pointer.
As for the question of how the size of a string can change, once you understand the bit about null bytes, it becomes obvious: all you need to do to change the size of a string is futz with the null byte. For example:
char str[] = "Foo bar";
str[1] = (char)0; // I'd use the character literal, but this editor won't let me
At this point, the length of the string as reported by strlen will be exactly 1. Or:
char str[] = "Foo bar";
str[7] = '!';
after which strlen will probably crash, because it will keep trying to read more bytes from beyond the array boundary. It might encounter a null byte and then stop (and of course, return the wrong string length), or it might crash.
I've written all of one C program, so expect this answer to be inaccurate and incomplete in a number of ways, which will undoubtedly be pointed out in the comments. ;-)
Your str3 is too short - you need to add extra byte for null-terminator and the length of "<-------------------->" string literal.
Out of curiosity, though, I tried
expanding the "concatenated" string to
be longer than the size I had
allocated. Much to my surprise, it
still worked and the string size
increased and could be printf'd fine.
The behaviour is undefined so it may or may not segfault.
strlen returns the length of the string without the trailing NULL byte (\0, 0x00) but when you create a variable to hold the combined strings you need to add that 1 character.
char str3[length3 + 1];
…and you should be all set.
C strings are '\0' terminated and require an extra byte for that, so at least you should do
char str3[length3 + 1]
will do the job.
In sprintf() ypu are writing beyond the space allocated for str3. This may cause any type of undefined behavior (If you are lucky then it will crash). In strlen(), it is just searching for a NULL character from the memory location you specified and it is finding one in 29th location. It can as well be 129 also i.e. it will behave very erratically.
A few important points:
Just because it works doesn't mean it's safe. Going past the end of a buffer is always unsafe, and even if it works on your computer, it may fail under a different OS, different compiler, or even a second run.
I suggest you think of a char array as a container and a string as an object that is stored inside the container. In this case, the container must be 1 character longer than the object it holds, since a "null character" is required to indicate the end of the object. The container is a fixed size, and the object can change size (by moving the null character).
The first null character in the array indicates the end of the string. The remainder of the array is unused.
You can store different things in a char array (such as a sequence of numbers). It just depends on how you use it. But string function such as printf() or strcat() assume that there is a null-terminated string to be found there.

Resources