malloc(5) = 5 chars or 5 bytes? - c

String to be returned via function is 5 characters long
via malloc() 5 byte space has been reserved for the string.
char *Function () {
char *data;
data = malloc(5);
strcpy(data, "aaabb");
return data;
}
code below prints aaabb as expected.
char *test;
test = Function();
printf("%s", test);
free(test);
without changing malloc(5)..
change the string to aaabbbccc : 9 characters long.
prints aaabbbccc
but malloc(5) should have only reserved room for 5 characters. not 9.
Question 1 : what is the true meaning of
data = malloc(5);
Question 2: how to simply reserve room for exactly 5 characters ?

You allocated space for 5 bytes (and a char is by definition 1 byte), but wrote more than 5 bytes. C doesn't prevent you from writing outside the bounds of allocated memory or an array. If you do, you invoked undefined behavior.
With undefined behavior, anything can happen. Your code may crash, it may output strange results, or (as in your case) it may appear to work properly. Later on, a seemingly unrelated change such as adding an unused local variable or a printf for debugging can change how undefined behavior manifests itself.
Regarding you specific example, the string "aaabb" actually consists of 6 bytes: 5 for the characters in question, plus one more for the null byte that signals the end of the string. So for this string you would need to malloc(6) to get enough space for it. Similarly with "aaabbbccc", you need 10 bytes allocated instead of 9.

In the language of the C specification, bytes and chars are the same thing. So the answer to the question in the subject line is: both.
As for your second question, if you attempt to store more than 5 bytes in the allocated space that's only 5 bytes in size, the behavior of your program is undefined. Don't do that. The string "aaabb" is 6 bytes in size, since C strings are null-terminated (the sixth byte is the null character, with value 0).

From man(3) malloc:
malloc() allocates size bytes and returns a pointer to the allocated memory. The memory is not cleared. If size is 0, then malloc() returns
either NULL, or a unique pointer value that can later be successfully passed to free().
Bear in mind that in C you might write more bytes into a buffer that it's allocated. In such case you will overwrite some memory, which may lead to program failure or even more interesting results.
It's well explained in the excellent article "Smashing The Stack For Fun And Profit" by AlephOne.
http://insecure.org/stf/smashstack.html

Normally if you just want a few characters (less than 1000 on a modern system) you simply declare them on the stack. malloc() is for large allocations, or storage that needs to persist after the function has returned

Related

Giving array a bigger value doesn't increase its size?

Here's what I did:
#include <stdio.h>
#include <string.h>
int main() {
char name[] = "longname";
printf("Name = %s \n",name);
strcpy(name,"evenlongername");
printf("Name = %s \n",name);
printf("size of the array is : %d",sizeof(name));
return 0;
}
It works, but how? I thought that once memory is assigned to an array in a program, it is not possible to change it. But, the output of this program is:
Name = longname
Name = evenlongername
size of the array is 9
So the compiler affirms that the size of the array is still 9. How is it able to store the word 'evenlongername' which has a size of 15 bytes (including the string terminator)?
In this case, name is allocated to fit "longname", which is 9 bytes. When you copy "evenlongername" into it, you're writing outside of bounds of that array. It's undefined behavior to write outside of the bounds, this means it may or may not work. Some times, it'll work, other times you'll get seg fault, yet other times you'll get weird behavior.
So the compiler affirms that the size of the array is still 9. How is it able to store the word 'evenlongername' which has a size of 15 bytes(including the string terminator)?
You are using a dangerous function (see Bugs), strcpy, which blindly copies source string to destination buffer without knowing about its size; in your case of copying 15 bytes into a buffer with size 9 bytes, essentially you have overflown. Your program may work fine if the memory access is valid and it doesn't overwrite something important.
Because C is a lower-level programming language, a C char[] is "barebone" mapping of memory, and not a "smart" container like C++ std::vector which automatically manages its size for you as you dynamically add and remove elements. If you are still not clear about the philosophy of C in this, I'd recommend you read *YOU* are full of bullshit. Very classic and rewarding.
Using sizeof on a char array will return the size of the buffer, not the length of the null-terminated string in the buffer. If you use strcpy to try and overflow the array, and it just happens to work (it's still undefined behavior), sizeof is still going to report the size used at declaration. That never changes.
If what you're interested in is observing how the length of a string changes with different assignments:
Use an adequate buffer to store every string you're going to test.
Use the function strlen in <string.h> which will give you the actual length of the string, and not the length of your buffer, which, once declared, is constant.

Different input types for fscanf [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
My understanding of fscanf:
grabs a line from a file and based on format, stores it to a string.
That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).
Some assumptions:
1. fp is a valid FILE pointer.
2. The file has 1 line in it that reads "Something"
A pointer with allocated memory
char* temp = malloc(sizeof(char) * 1); // points to some small part in mem.
int resp = fscanf(fp,"%s", temp);
printf("Trying to print: %s\n",temp); // prints "Something" (that's what's in the file)
An array with predefined length (it's different from the pointer!)
char temp[100]; // this buffer MUST be big enough, or we get segmentation fault
int resp = fscanf(fp,"%s", temp);
printf("Trying to print: %s\n",temp); // prints "Something" (that's what's in the file)
A null pointer
char* temp; // null pointer
int resp = fscanf(fp,"%s", temp);
printf("Trying to print: %s\n",temp); // Crashes, segmentation fault
So a few questions have arisen!
How can a pointer with malloc of 1 contain longer texts?
Since the pointer's content doesn't seem to matter, why does a null pointer crash? I would expect the allocated pointer to crash as well, since it points to a small piece of memory.
Why does the pointer work, but an array (char temp[1];) crashes?
Edit:
I'm well aware that you need to pass a big enough buffer to contain the data from the line, I was wondering why it was still working and not crashing in other situations.
My understanding of fscanf:
grabs a line from a file and based on
format, stores it to a string.
No, that contains some serious and important misconceptions. fscanf() reads from a file as directed by the specified format, so as to assign values to some or all of the objects pointed-to by its third and subsequent arguments. It does not necessarily read a whole line, but on the other hand, it may read more than one.
In your particular usage,
int resp = fscanf(fp,"%s", temp);
, it attempts to skip any leading whitespace, including but not limited to empty and blank lines, then read characters into the pointed-to character array, up to the first whitespace character or the end of the file. Under no circumstance will it consume the line terminator of the line from which it populates the array contents, but it will not even get that far if there is other whitespace on the line following at least one non-whitespace character (though that is not the case in the particular sample input you describe).
That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).
Strings are not an actual data type in C. Arrays of chars are, but such arrays are not "strings" in the C sense unless they contain at least one null character. Furthermore, in that case, C string functions for the most part operate only on the portions of such arrays up to and including the first null, so it is those portions that are best characterized as "strings".
There is more than one way to obtain storage for character sequences that can be considered strings, but there is only one way to pass them around: by means of a pointer to their first character. Whether you obtain storage by declaring a character array, by a string literal, or by allocating memory for it, the contents are accessed only via pointers. Even when you declare a char array and access elements by applying the index operator, [], to the name of the array variable, you are actually still using a pointer to access the contents.
Why does a pointer with malloc of 1 can contain longer texts?
A pointer does not contain anything but itself. It is the space it points to that contains anything else, such as text. If you allocate only one byte, then the allocated space can contain only one byte. If you overrun that one byte by attempting to write a longer character sequence where the pointer points, then you invoke undefined behavior. In particular, C does not guarantee that an error will be generated, or that the program will fail to behave as you expect, but all manner of havoc can ensue, without limit.
Since the pointer content doesn't seem to matter, why does a null pointer crash, I would expect the allocated pointer to crash as
well, since it points to a small piece of memory.
Attempting to dereference an invalid pointer, including, but not limited to a null pointer, also produces undefined behavior. A crash is well within the realm of possible behaviors. C does not guarantee a crash in that case, but that's reliably provided by some implementations.
Why does the pointer work, but an array(char temp[1];) crashes?
You do not demonstrate your 1-character array alternative, but again, overrunning the bounds of the object -- in this case an array -- produces undefined behavior. It is undefined so it is not justified to suppose that the behavior would be the same as for overrunning the bounds of an allocated object, or even that either one of those behaviors would be consistent.
That being said, there are three (seemingly different) ways to pass "strings" around(array of chars).
For passing a C-"string" to scanf() & friends there is just one way: Pass it the address of enough valid memory.
If you don't the code would invoke the infamouse Undefined Behaviour, which means anything can happen, from crash to seemingly running fine.
Why does a pointer with malloc of 1 can contain longer texts?
In theory, it can't without causing undefined behavior. In practice, however, when you allocate a single byte, the allocator gives you a small chunk of memory of the smallest size it supports, which is usually sufficient for 8..10 characters without causing a crash. The additional memory serves as a "padding" that prevents a crash (but it is still undefined behavior).
Since the pointer content doesn't seem to matter, why does a null pointer crash, I would expect the allocated pointer to crash as well, since it points to a small piece of memory.
Null pointer, on the other hand, is not sufficient even for an empty string, because you need space for null terminator. Hence, it's a guaranteed UB, which manifests itself as a crash on most platforms.
Why does the pointer work, but an array(char temp[1]) crashes?
Because arrays are allocated without any extra "padding" memory after them. Note that a crash is not guaranteed, because the array may be followed by unused bytes of memory, which your string could corrupt without any consequences.
Because null pointers aren't allocated with memory.
When you request for a small piece of memory, it is allocated from a block of memory called "heap". The heap is always allocated and freed in units of blocks or pages, which will always be a little larger than a few bytes, usually several KBs.
So when you allocate memory with new or by defining an array (small), you get a piece of memory in the heap. The actually available space is larger and can (often) go over the amount you requested, so it's practically safe to write (and read) more than requested. But theoretically, it's an UB and should make the program crash.
When you create a null pointer, it points to 0, an invalid address that can't be read from or written to. So it's guaranteed that the program will crash, often by a segmentation fault.
Small arrays may crash more often than new and malloc because they aren't always allocated from heap, and may come without any extra space after them, so it's more dangerous to write over the limit. However they're often preceding unused (unallocated) memory areas, so sometimes your program may not crash, but gets corrupted data instead.

C or C++ sprintf and value in struct

code:
sprintf(tmp, "xbitmap_width %d\n", symbol->scale);
Output:
xbitmap_width 1075052544
expected output - value of scale which is 5 so it should be:
xbitmap_width 5
What am i missing??? Why is sprintf taking pointer value?
Update:
If symbol->scale is indeed not a pointer, then also ensure tmp is big enough, to avoid overflow. I hope tmp is at least 18 chars big, but best make it big enough (like 30 or bigger), and if it's allocated on the heap: initialize it to zeroes: memset or calloc(30, sizeof *tmp) would be preferable.
You may also want to ensure that symbol is not a stack value, returned by a function. This, too, would be undefined behaviour. However, given that you say you're using new or malloc (which _does not initialize the struct, BTW), that can't be the issue.
The not-initializing bit here (when using malloc) might be, though: malloc merely reserves enough memory to store a given object one or more times. The memory is not initialized, though:
char *str = malloc(100);
Is something like that thing where you give a bunch of monkeys type-writers: eventually one of them might wind up punching in a line of Shakespeare: well, if you malloc strings like this, and print them, eventually one of them might end up containing the string "Don't panic".
Now, this isn't exactly true, but you get the point...
To ensure your struct is initialized, either use calloc or memset those members that str giving you grief.
if your struct looks like this:
struct symbol
{
int *scale;
}
Then you are passing the value of scale to sprintf. This value is a memory address, not an int. An int, as you may no is guaranteed to be at least 2 bytes in size (most commonly it's 4 though). A pointer is 4 or 8 bytes in size, so passing a pointer, and have sprintf interpret it as an int, you get undefined behaviour.
To print 5 in your case:
struct symbol *symbol = malloc(sizeof *symbol);
int s = 5;
symbol->scale = &s;
printf("%d\n", *(symbol->scale));//dereference the scale pointer
But this is undefined behaviour:
printf("%d\n", symbol->scale);//passing pointer VALUE ==> memory address
//for completeness & good practices' sake:
free(symbol);
Oh, and as stated in the comments: snprintf is to sprintf what strncpy is to strcpy and strncat is to strcat: it's safer to use the function which allows you to specify a maximum of chars to set

C Language - Malloc unlimited space?

I'm having difficulty learning C language's malloc and pointer:
What I learned so far:
Pointer is memory address pointer.
malloc() allocate memory locations and returns the memory address.
I'm trying to create a program to test malloc and pointer, here's what I have:
#include<stdio.h>
main()
{
char *x;
x = malloc(sizeof(char) * 5);
strcpy(*x, "123456");
printf("%s",*x); //Prints 123456
}
I'm expecting an error since the size I provided to malloc is 5, where I put 6 characters (123456) to the memory location my pointer points to. What is happening here? Please help me.
Update
Where to learn malloc and pointer? I'm confused by the asterisk thing, like when to use asterisk etc. I will not rest till I learn this thing! Thanks!
You are invoking undefined behaviour because you are writing (or trying to write) beyond the bounds of allocated memory.
Other nitpicks:
Because you are using strcpy(), you are copying 7 bytes, not 6 as you claim in the question.
Your call to strcpy() is flawed - you are passing a char instead of a pointer to char as the first argument.
If your compiler is not complaining, you are not using enough warning options. If you're using GCC, you need at least -Wall in your compiler command line.
You need to include both <stdlib.h> for malloc() and <string.h> for strcpy().
You should also explicitly specify int main() (or, better, int main(void)).
Personally, I'm old school enough that I prefer to see an explicit return(0); at the end of main(), even though C99 follows C++98 and allows you to omit it.
You may be unlucky and get away with invoking undefined behaviour for a while, but a tool like valgrind should point out the error of your ways. In practice, many implementations of malloc() allocate a multiple of 8 bytes (and some a multiple of 16 bytes), and given that you delicately do not step over the 8 byte allocation, you may actually get away with it. But a good debugging malloc() or valgrind will point out that you are doing it wrong.
Note that since you don't free() your allocated space before you return from main(), you (relatively harmlessly in this context) leak it. Note too that if your copied string was longer (say as long as the alphabet), and especially if you tried to free() your allocated memory, or tried to allocate other memory chunks after scribbling beyond the end of the first one, then you are more likely to see your code crash.
Undefined behaviour is unconditionally bad. Anything could happen. No system is required to diagnose it. Avoid it!
If you call malloc you get and adress of a memory region on heap.
If it returns e.g. 1000 you memory would look like:
Adr Value
----------
1000 1
1001 2
1002 3
1003 4
1004 5
1005 6
1006 0
after the call to strcpy(). you wrote 7 chars (2 more than allocated).
x == 1000 (pointer address)
*x == 1 (dereferenced the value x points to)
There are no warnings or error messages from the compiler, since C doesn't have any range-checking.
My three cents:
Use x, as (*x) is the value that is stored at x (which is unknown in your case) - you are writing to unknown memory location. It should be:
strcpy(x, "123456");
Secondly - "123456" is not 6 bytes, it's 7. You forgot about trailing zero-terminator.
Your program with it's current code might work, but not guaranteed.
What I would do:
#include<stdio.h>
main()
{
char str[] = "123456";
char *x;
x = malloc(sizeof(str));
strcpy(x, str);
printf("%s",x); //Prints 123456
free(x);
}
Firstly, there is one problem with your code:
x is a pointer to a memory area where you allocated space for 5 characters.
*x it's the value of the first character.
You should use strcpy(x, "123456");
Secondly, the memory after your 5 bytes allocated, can be valid so you will not receive an error.
#include<stdio.h>
main()
{
char *x;
x = malloc(sizeof(char) * 5);
strcpy(x, "123456");
printf("%s",x); //Prints 123456
}
Use this...it will work
See difference in your & mine program
Now here you are allocating 5 bytes & writing 6 byte so 6th byte will be stored in next consecutive address. This extra byte can be allocated to some one else by memory management so any time that extra byte can be changed by other program because 6th byte is not yours because you haven't malloc'd that.. that's why this is called undefined behaviour.

How much memory is reserved when i declare a string?

What exactly happens, in terms of memory, when i declare something like:
char arr[4];
How many bytes are reserved for arr?
How is null string accommodated when I 'strcpy' a string of length 4 in arr?
I was writing a socket program, and when I tried to suffix NULL at arr[4] (i.e. the 5th memory location), I ended up replacing the values of some other variables of the program (overflow) and got into a big time mess.
Any descriptions of how compilers (gcc is what I used) manage memory?
sizeof(arr) bytes are saved* (plus any padding the compiler wants to put around it, though that isn't for the array per se). On an implementation with a stack, this just means moving the stack pointer sizeof(arr) bytes down. (That's where the storage comes from. This is also why automatic allocation is fast.)
'\0' isn't accommodated. If you copy "abcd" into it, you get a buffer overrun, because that takes up 5 bytes total, but you only have 4. You enter undefined behavior land, and anything could happen.
In practice you'll corrupt the stack and crash sooner or later, or experience what you did and overwrite nearby variables (because they too are allocated just like the array was.) But nobody can say for certain what happens, because it's undefined.
* Which is sizeof(char) * 4. sizeof(char) is always 1, so 4 bytes.
What exactly happens, in terms of
memory, when i declare something like:
char arr[4];
4 * sizeof(char) bytes of stack memory is reserved for the string.
How is null string accommodated when I
'strcpy' a string of length 4 in arr?
You can not. You can only have 3 characters, 4th one (i.e. arr[3]) should be '\0' character for a proper string.
when I tried to suffix NULL at arr[4]
The behavior will be undefined as you are accessing a invalid memory location. In the best case, your program will crash immediately, but it might corrupt the stack and crash at a later point of time also.
In C, what you ask for is--usually--exactly what you get. char arr[4] is exactly 4 bytes.
But anything in quotes has a 'hidden' null added at the end, so char arr[] = "oops"; reserves 5 bytes.
Thus, if you do this:
char arr[4];
strcpy(arr, "oops");
...you will copy 5 bytes (o o p s \0) when you've only reserved space for 4. Whatever happens next is unpredictable and often catastrophic.
When you define a variable like char arr[4], it reserves exactly 4 bytes for that variable. As you've found, writing beyond that point causes what the standard calls "undefined behavior" -- a euphemism for "you screwed up -- don't do that."
The memory management of something like this is pretty simple: if it's a global, it gets allocated in a global memory space. If it's a local, it gets allocated on the stack by subtracting an appropriate amount from the stack pointer. When you return, the stack pointer is restored, so they cease to exist (and when you call another function, will normally get overwritten by parameters and locals for that function).
When you make a declaration like char arr[4];, the compiler allocates as many bytes as you asked for, namely four. The compiler might allocate extra in order to accommodate efficient memory accesses, but as a rule you get exactly what you asked for.
If you then declare another variable in the same function, that variable will generally follow arr in memory, unless the compiler makes certain optimizations again. For that reason, if you try to write to arr but write more characters than were actually allocated for arr, then you can overwrite other variables on the stack.
This is not really a function of gcc. All C compilers work essentially the same way.

Resources