I am relatively new to 'C' and would appreciate some insight on this topic.
Basically, I am trying to create a 16 MB array and check if the memory content is initialized to zero or '\0' or some garbage value for a school project.
Something like this:
char buffer[16*1024*1024];
I know there is a limit on the size of the program stack and obviously I get a segmentation fault. Can this somehow be done using malloc()?
You can initialize the memory with malloc like so:
#define MEM_SIZE_16MB ( 16 * 1024 * 1024 )
char *buffer = malloc(MEM_SIZE_16MB * sizeof(char) );
if (buffer == NULL ) {
// unable to allocate memory. stop here or undefined behavior happens
}
You can then check the values in memory so (note that this will print for a very very long time):
for (int i = 0; i < MEM_SIZE_16MB; i++) {
if( i%16 == 0 ) {
// print a newline and the memory address every 16 bytes so
// it's a little easier to read
printf("\nAddr: %08p: ", &buffer[i]);
}
printf("%02x ", buffer[i]);
}
printf("\n"); // one final newline
Don't forget to free the memory when finished
free(buffer);
Yes, you will probably need to do this using malloc(), and here's why:
When any program (process ... thread ...) is started, it is given a chunk of memory which it uses to store (among other things ...) "local" variables. This area is called "the stack." It most-certainly won't be big enough to store 16 megabytes.
But there's another area of memory which any program can use: its "heap." This area (as the name, "heap," is intended to imply ...) has no inherent structure: it's simply a pool of storage, and it's usually big enough to store many megabytes. You simply malloc() the number of bytes you need, and free() those bytes when you're through.
Simply define a type that corresponds to the structure you need to store, then malloc(sizeof(type)). The storage will come from the heap. (And that, basically, is what the heap is for ...)
Incidentally, there's a library function called calloc() which will reserve an area that is "known zero." Furthermore, it might use clever operating-system tricks to do so very efficiently.
Sure:
int memSize = 16*1024*1024;
char* buffer = malloc( memSize );
if ( buffer != 0 )
{
// check contents
for ( i = 0; i < memSize; i++ )
{
if ( buffer[i] != 0 )
{
// holler
break;
}
}
free( buffer );
}
Strictly speaking, code can not check if buffer is not zeroed without risking undefined behavior. Had the type been unsigned char, then no problem. But char, which may be signed, may have a trap value. Attempting to work with that value leads to UB.
char buffer[16*1024*1024];
// Potential UB
if (buffer[0]) ...
Better to use unsigned char which cannot have trap values.
#define N (16LU*1204*1204)
unsigned char *buffer = malloc(N);
if (buffer) {
for (size_t i = 0; i<N; i++) {
if (buffer[i]) Note_NonZeroValue();
}
}
// Clean-up when done.
free(buffer);
buffer = 0;
The tricky thing about C is that even if char does not have a trap value, some smart compiler could identify code is attempting something that is UB per the spec and then optimize if (buffer[0]) into nothing. Reading uninitialized non-unsigned char data is a no-no.
Related
I know that on your hard drive, if you delete a file, the data is not (instantly) gone. The data is still there until it is overwritten. I was wondering if a similar concept existed in memory. Say I allocate 256 bytes for a string, is that string still floating in memory somewhere after I free() it until it is overwritten?
Your analogy is correct. The data in memory doesn't disappear or anything like that; the values may indeed still be there after a free(), though attempting to read from freed memory is undefined behaviour.
Generally, it does stay around, unless you explicitly overwrite the string before freeing it (like people sometimes do with passwords). Some library implementations automatically overwrite deallocated memory to catch accesses to it, but that is not done in release mode.
The answer depends highly on the implementation. On a good implementation, it's likely that at least the beginning (or the end?) of the memory will be overwritten with bookkeeping information for tracking free chunks of memory that could later be reused. However the details will vary. If your program has any level of concurrency/threads (even in the library implementation you might not see), then such memory could be clobbered asynchronously, perhaps even in such a way that even reading it is dangerous. And of course the implementation of free might completely unmap the address range from the program's virtual address space, in which case attempting to do anything with it will crash your program.
From a standpoint of an application author, you should simply treat free according to the specification and never access freed memory. But from the standpoint of a systems implementor or integrator, it might be useful to know (or design) the implementation, in which case your question is then interesting.
If you want to verify the behaviour for your implementation, the simple program below will do that for you.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* The number of memory bytes to test */
#define MEM_TEST_SIZE 256
void outputMem(unsigned char *mem, int length)
{
int i;
for (i = 0; i < length; i++) {
printf("[%02d]", mem[i] );
}
}
int bytesChanged(unsigned char *mem, int length)
{
int i;
int count = 0;
for (i = 0; i < MEM_TEST_SIZE; i++) {
if (mem[i] != i % 256)
count++;
}
return count;
}
main(void)
{
int i;
unsigned char *mem = (unsigned char *)malloc(MEM_TEST_SIZE);
/* Fill memory with bytes */
for (i = 0; i < MEM_TEST_SIZE; i++) {
mem[i] = i % 256;
}
printf("After malloc and copy to new mem location\n");
printf("mem = %ld\n", mem );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
free(mem);
printf("\n\nAfter free()\n");
printf("mem = %ld\n", mem );
printf("Bytes changed in memory = %d\n", bytesChanged(mem, MEM_TEST_SIZE) );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
}
This is a small piece of code that I made while trying to understand how malloc and pointers work.
#include <stdio.h>
#include <stdlib.h>
int *buffer (int count)
{
int *buffer = malloc (count * sizeof(int));
for (int i = 0; 0 <= i && i < count; i++)
{
buffer[i] = 0;
}
return &buffer;
}
int main ()
{
int size = 0;
int i = 0;
scanf ("%d", &size);
int *num = buffer (size);
while (i < size)
{
scanf ("%d", &num[i]);
i++;
}
}
For some reason that I can't understand, I keep getting a segmentation fault. This error repeatedly happens on the last scanf() and I do not know why. I know i have to pass pointer to scan f and num is already a pointer so i thought that i would not need to include the &. But, I received a segmentation fault earlier if i do not. Also, I believe I have allocated the correct amount of space using malloc but I am not sure. Any help with what is happening here would be appreciated.
You returned the pointer to the local variable buffer, which will banish on exiting the function buffer.
You should remove the & used in the return statement and return the pointer to allocated buffer.
Also checking whether malloc() is successful should be added.
There are a couple of issues that I can see, and one of them is definitely a problem.
In function, int *buffer (int count)
return &buffer;
This will return address of buffer which is already a local int * variable.
So when the return happens, variable buffer would no longer be valid. Hence, the address is invalid.
One of the ways to go ahead as of now would be avoiding a function call buffer and using calloc().
Because, subject to availability, calloc() will allocate the memory of requested length, which will be initialized to 0 by default.
Or, the other way would be making the buffer pointer a global variable.
Also, with existing implementation, there needs a piece of code which checks if malloc returned anything or not. That would indicate if the memory was allocated or not.
Something like this would do:
int *buffer = malloc (count * sizeof(int));
if(buffer == NULL)
{
// Some error handling
return 0;
}
Additionally, I see the for loop which looks a bit weird than what it should look like:
for (int i = 0; 0 <= i && i < count; i++)
I take that you are trying to loop the count times and fill a 0 in buffer. This could have been achieved by
for (int i = 0; i < count; i++)
So, a malloc() is followed by en error-check and then followed by a for to fill the allocated memory with zeroes. So, using calloc makes life a lot easier.
Importantly, you allocate memory but you don't seem to have a code that de-allocates (frees) it. There are ample of examples to refer for doing that. I would recommend you to read concepts like Memory Leakage, Dangling Pointers and using valgrind or similar thing to validate the memory usage.
As a side-note and not a rule of thumb, always make sure that the names you use for variables are different than the names you use with functions. That creates a hell a lot of confusion. Going ahead with existing naming habit, you'll have a tough day when the code is reviewed.
I have an array, say, text, that contains strings read in by another function. The length of the strings is unknown and the amount of them is unknown as well. How should I try to allocate memory to an array of strings (and not to the strings themselves, which already exist as separate arrays)?
What I have set up right now seems to read the strings just fine, and seems to do the post-processing I want done correctly (I tried this with a static array). However, when I try to printf the elements of text, I get a segmentation fault. To be more precise, I get a segmentation fault when I try to print out specific elements of text, such as text[3] or text[5]. I assume this means that I'm allocating memory to text incorrectly and all the strings read are not saved to text correctly?
So far I've tried different approaches, such as allocating a set amount of some size_t=k , k*sizeof(char) at first, and then reallocating more memory (with realloc k*sizeof(char)) if cnt == (k-2), where cnt is the index of **text.
I tried to search for this, but the only similar problem I found was with a set amount of strings of unknown length.
I'd like to figure out as much as I can on my own, and didn't post the actual code because of that. However, if none of this makes any sense, I'll post it.
EDIT: Here's the code
int main(void){
char **text;
size_t k=100;
size_t cnt=1;
int ch;
size_t lng;
text=malloc(k*sizeof(char));
printf("Input:\n");
while(1) {
ch = getchar();
if (ch == EOF) {
text[cnt++]='\0';
break;
}
if (cnt == k - 2) {
k *= 2;
text = realloc(text, (k * sizeof(char))); /* I guess at least this is incorrect?*/
}
text[cnt]=readInput(ch); /* read(ch) just reads the line*/
lng=strlen(text[cnt]);
printf("%d,%d\n",lng,cnt);
cnt++;
}
text=realloc(text,cnt*sizeof(char));
print(text); /*prints all the lines*/
return 0;
}
The short answer is you can't directly allocate the memory unless you know how much to allocate.
However, there are various ways of determining how much you need to allocate.
There are two aspects to this. One is knowing how many strings you need to handle. There must be some defined way of knowing; either you're given a count, or there some specific pointer value (usually NULL) that tells you when you've reached the end.
To allocate the array of pointers to pointers, it is probably simplest to count the number of necessary pointers, and then allocate the space. Assuming a null terminated list:
size_t i;
for (i = 0; list[i] != NULL; i++)
;
char **space = malloc(i * sizeof(*space));
...error check allocation...
For each string, you can use strdup(); you assume that the strings are well-formed and hence null terminated. Or you can write your own analogue of strdup().
for (i = 0; list[i] != NULL; i++)
{
space[i] = strdup(list[i]);
...error check allocation...
}
An alternative approach scans the list of pointers once, but uses malloc() and realloc() multiple times. This is probably slower overall.
If you can't reliably tell when the list of strings ends or when the strings themselves end, you are hosed. Completely and utterly hosed.
C don't have strings. It just has pointers to (conventionally null-terminated) sequence of characters, and call them strings.
So just allocate first an array of pointers:
size_t nbelem= 10; /// number of elements
char **arr = calloc(nbelem, sizeof(char*));
You really want calloc because you really want that array to be cleared, so each pointer there is NULL. Of course, you test that calloc succeeded:
if (!arr) perror("calloc failed"), exit(EXIT_FAILURE);
At last, you fill some of the elements of the array:
arr[0] = "hello";
arr[1] = strdup("world");
(Don't forget to free the result of strdup and the result of calloc).
You could grow your array with realloc (but I don't advise doing that, because when realloc fails you could have lost your data). You could simply grow it by allocating a bigger copy, copy it inside, and redefine the pointer, e.g.
{ size_t newnbelem = 3*nbelem/2+10;
char**oldarr = arr;
char**newarr = calloc(newnbelem, sizeof(char*));
if (!newarr) perror("bigger calloc"), exit(EXIT_FAILURE);
memcpy (newarr, oldarr, sizeof(char*)*nbelem);
free (oldarr);
arr = newarr;
}
Don't forget to compile with gcc -Wall -g on Linux (improve your code till no warnings are given), and learn how to use the gdb debugger and the valgrind memory leak detector.
In c you can not allocate an array of string directly. You should stick with pointer to char array to use it as array of string. So use
char* strarr[length];
And to mentain the array of characters
You may take the approach somewhat like this:
Allocate a block of memory through a call to malloc()
Keep track of the size of input
When ever you need a increament in buffer size call realloc(ptr,size)
I know that on your hard drive, if you delete a file, the data is not (instantly) gone. The data is still there until it is overwritten. I was wondering if a similar concept existed in memory. Say I allocate 256 bytes for a string, is that string still floating in memory somewhere after I free() it until it is overwritten?
Your analogy is correct. The data in memory doesn't disappear or anything like that; the values may indeed still be there after a free(), though attempting to read from freed memory is undefined behaviour.
Generally, it does stay around, unless you explicitly overwrite the string before freeing it (like people sometimes do with passwords). Some library implementations automatically overwrite deallocated memory to catch accesses to it, but that is not done in release mode.
The answer depends highly on the implementation. On a good implementation, it's likely that at least the beginning (or the end?) of the memory will be overwritten with bookkeeping information for tracking free chunks of memory that could later be reused. However the details will vary. If your program has any level of concurrency/threads (even in the library implementation you might not see), then such memory could be clobbered asynchronously, perhaps even in such a way that even reading it is dangerous. And of course the implementation of free might completely unmap the address range from the program's virtual address space, in which case attempting to do anything with it will crash your program.
From a standpoint of an application author, you should simply treat free according to the specification and never access freed memory. But from the standpoint of a systems implementor or integrator, it might be useful to know (or design) the implementation, in which case your question is then interesting.
If you want to verify the behaviour for your implementation, the simple program below will do that for you.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* The number of memory bytes to test */
#define MEM_TEST_SIZE 256
void outputMem(unsigned char *mem, int length)
{
int i;
for (i = 0; i < length; i++) {
printf("[%02d]", mem[i] );
}
}
int bytesChanged(unsigned char *mem, int length)
{
int i;
int count = 0;
for (i = 0; i < MEM_TEST_SIZE; i++) {
if (mem[i] != i % 256)
count++;
}
return count;
}
main(void)
{
int i;
unsigned char *mem = (unsigned char *)malloc(MEM_TEST_SIZE);
/* Fill memory with bytes */
for (i = 0; i < MEM_TEST_SIZE; i++) {
mem[i] = i % 256;
}
printf("After malloc and copy to new mem location\n");
printf("mem = %ld\n", mem );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
free(mem);
printf("\n\nAfter free()\n");
printf("mem = %ld\n", mem );
printf("Bytes changed in memory = %d\n", bytesChanged(mem, MEM_TEST_SIZE) );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
}
I know it could be done using malloc, but I do not know how to use it yet.
For example, I wanted the user to input several numbers using an infinite loop with a sentinel to put a stop into it (i.e. -1), but since I do not know yet how many he/she will input, I have to declare an array with no initial size, but I'm also aware that it won't work like this int arr[]; at compile time since it has to have a definite number of elements.
Declaring it with an exaggerated size like int arr[1000]; would work but it feels dumb (and waste memory since it would allocate that 1000 integer bytes into the memory) and I would like to know a more elegant way to do this.
This can be done by using a pointer, and allocating memory on the heap using malloc.
Note that there is no way to later ask how big that memory block is. You have to keep track of the array size yourself.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv)
{
/* declare a pointer do an integer */
int *data;
/* we also have to keep track of how big our array is - I use 50 as an example*/
const int datacount = 50;
data = malloc(sizeof(int) * datacount); /* allocate memory for 50 int's */
if (!data) { /* If data == 0 after the call to malloc, allocation failed for some reason */
perror("Error allocating memory");
abort();
}
/* at this point, we know that data points to a valid block of memory.
Remember, however, that this memory is not initialized in any way -- it contains garbage.
Let's start by clearing it. */
memset(data, 0, sizeof(int)*datacount);
/* now our array contains all zeroes. */
data[0] = 1;
data[2] = 15;
data[49] = 66; /* the last element in our array, since we start counting from 0 */
/* Loop through the array, printing out the values (mostly zeroes, but even so) */
for(int i = 0; i < datacount; ++i) {
printf("Element %d: %d\n", i, data[i]);
}
}
That's it. What follows is a more involved explanation of why this works :)
I don't know how well you know C pointers, but array access in C (like array[2]) is actually a shorthand for accessing memory via a pointer. To access the memory pointed to by data, you write *data. This is known as dereferencing the pointer. Since data is of type int *, then *data is of type int. Now to an important piece of information: (data + 2) means "add the byte size of 2 ints to the adress pointed to by data".
An array in C is just a sequence of values in adjacent memory. array[1] is just next to array[0]. So when we allocate a big block of memory and want to use it as an array, we need an easy way of getting the direct adress to every element inside. Luckily, C lets us use the array notation on pointers as well. data[0] means the same thing as *(data+0), namely "access the memory pointed to by data". data[2] means *(data+2), and accesses the third int in the memory block.
The way it's often done is as follows:
allocate an array of some initial (fairly small) size;
read into this array, keeping track of how many elements you've read;
once the array is full, reallocate it, doubling the size and preserving (i.e. copying) the contents;
repeat until done.
I find that this pattern comes up pretty frequently.
What's interesting about this method is that it allows one to insert N elements into an empty array one-by-one in amortized O(N) time without knowing N in advance.
Modern C, aka C99, has variable length arrays, VLA. Unfortunately, not all compilers support this but if yours does this would be an alternative.
Try to implement dynamic data structure such as a linked list
Here's a sample program that reads stdin into a memory buffer that grows as needed. It's simple enough that it should give some insight in how you might handle this kind of thing. One thing that's would probably be done differently in a real program is how must the array grows in each allocation - I kept it small here to help keep things simpler if you wanted to step through in a debugger. A real program would probably use a much larger allocation increment (often, the allocation size is doubled, but if you're going to do that you should probably 'cap' the increment at some reasonable size - it might not make sense to double the allocation when you get into the hundreds of megabytes).
Also, I used indexed access to the buffer here as an example, but in a real program I probably wouldn't do that.
#include <stdlib.h>
#include <stdio.h>
void fatal_error(void);
int main( int argc, char** argv)
{
int buf_size = 0;
int buf_used = 0;
char* buf = NULL;
char* tmp = NULL;
char c;
int i = 0;
while ((c = getchar()) != EOF) {
if (buf_used == buf_size) {
//need more space in the array
buf_size += 20;
tmp = realloc(buf, buf_size); // get a new larger array
if (!tmp) fatal_error();
buf = tmp;
}
buf[buf_used] = c; // pointer can be indexed like an array
++buf_used;
}
puts("\n\n*** Dump of stdin ***\n");
for (i = 0; i < buf_used; ++i) {
putchar(buf[i]);
}
free(buf);
return 0;
}
void fatal_error(void)
{
fputs("fatal error - out of memory\n", stderr);
exit(1);
}
This example combined with examples in other answers should give you an idea of how this kind of thing is handled at a low level.
One way I can imagine is to use a linked list to implement such a scenario, if you need all the numbers entered before the user enters something which indicates the loop termination. (posting as the first option, because have never done this for user input, it just seemed to be interesting. Wasteful but artistic)
Another way is to do buffered input. Allocate a buffer, fill it, re-allocate, if the loop continues (not elegant, but the most rational for the given use-case).
I don't consider the described to be elegant though. Probably, I would change the use-case (the most rational).