This program reads a text file into a string array, line by line. I can't understand the meaning of two lines in the code:
char **words = (char **)malloc(sizeof(char*)*lines_allocated);
...
words = (char **)realloc(words,sizeof(char*)*new_size);
...
Please could you help me understand them?
char **words = (char **)malloc(sizeof(char*)*lines_allocated);
Allocates lines_allocated pointers. When you use pointer to pointers you need to allocate space for the pointers, and them for each of those pointers you allocate space for you data, in this case, a char *.
words = (char **)realloc(words,sizeof(char*)*new_size);
This changes the size of the buffer, as the number of lines is unknown before you read the file, then you need to increase the number of pointers you allocate.
words points to a block that will store lines_allocated pointers at first moment and then it will be increased to new_size when needed.
In your code you also have a line like this:
/* Allocate space for the next line */
words[i] = malloc(max_line_len);
Which will allocate each string separately.
Also, don't cast the result of malloc:
Do I cast the result of malloc?
The first line allocates a chunk of dynamic memory (creates space for an array of pointers to char); the second line resizes that chunk.
A better way to write both lines is
char **words = malloc( sizeof *words * lines_allocated); // no cast, operand of sizeof
char **tmp = realloc( words, sizeof *words * new_size );
if ( tmp )
words = tmp;
In C, you don't need to cast the result of either call, and it's considered bad practice to do so. Also, note the operand to sizeof; if you ever change the base type of words (from char to wchar_t, for example), you won't have to change the malloc or realloc calls.
realloc will return NULL if it can't extend the buffer, so it's safer to assign the result to a temporary variable first, otherwise you risk losing your reference to that memory, meaning you won't be able to access or release it.
The first line allocates a pointer to a pointer to character. A pointer to something in C is equivalent to a pointer to an array of that same something, so this is equivalent to saying that it allocates a pointer to an array of pointers to char.
sizeof(char*) is the size of a pointer, and multiplying it by lines_allocated means that the number of pointers in the allocated array will be lines_allocated.
The second line reallocates the array of pointers so that it may now contain new_size pointers instead of lines_allocated pointers. If new_size is larger, the new pointers will be undefined, and must be initialized before being used.
Related
I have a char** which is designed to hold and unknown amount of strings with unknown length
I've initially allocated 10 bytes using
char **array = malloc(10);
and similarly, before adding strings to this array, I allocate
array[num] = malloc(strlen(source)+1)
I've noticed that my program crashes upon adding the 6th element to the array
My question is, how does memory with these arrays work? When I allocated 20 bytes, nothing happened, yet when I allocated 30, it suddenly could hold 10 elements. These were all strings of 2-3 characters in size. I'm struggling to think of a condition to realloc memory with, e.g
if condition{
memoryofarray += x amount
realloc(array, memoryofarray)
}
What exactly uses the memory in the char**? I was under the impression that each byte corresponds to how many lines they can hold, i.e. malloc(10) would allow the array to hold 10 strings. I need to know this to establish conditions + to know how much to increment the memory allocated to the array by.
Also, curiously, when I malloced
array[num] = malloc(0)
before assigning a string to that array element, it worked without problems. Don't you need to at least have strlen amount of bytes to store strings? This is confusing me massively
This line:
char **array = malloc(10);
allocates 10 bytes, however, remember that a pointer is not the same size as a byte.
Therefore you need to make sure you allocate an array of sufficient size by using the size of the related type:
char **array = malloc(10 * sizeof(char*));
Now that you have an array of 10 pointers you need to allocate memory for each of the 10 strings, e.g.
array[0] = malloc(25 * sizeof(char));
Here sizeof(char) is not needed but I added it to make it more obvious how malloc works.
If you want to hold 10 strings then you need to allocate memory for 10 char *'s and then allocate memory to those char pointers .You allocate memory of 10 bytes( not enough for 10 char *'s ) .Allocate like this -
char **array = malloc(10*sizeof(char *)); // allocate memory for 10 char *'s
And then do what you were doing -
array[num] = malloc(strlen(source)+1) // allocate desired memory to each pointer
note - take care that num is initialized and does not access out of bound index.
This will allocate enough memory for 10 pointers to char (char*) in array
char **array = malloc(10*sizeof(array[0]));
On a 64bit system the size of a char* is 8 bytes = 64 bits. The size of a char is typically 1 byte = 8 bits.
The advantage of using sizeof(array[0]) instead sizeof(char*) is that it's easier to change the type of array in the future.
char** is pointer to a pointer to char. It may point to the start of a memory block in the heap with pointers to char. Similarly char* is a pointer to char and it may point to the start of a memory block of char on the heap.
If you write beyond the allocated memory you get undefined behaviour. If you are lucky it may actually behave well! So when you do for example :
array[num] = malloc(0);
you may randomly not get a segmentation fault out of (good) luck.
Your use of realloc is wrong. realloc may have to move the memory block whose size you want to increase in which case it will return a new pointer. Use it like this :
if (condition) {
memoryofarray += amount;
array = realloc(array, memoryofarray);
}
Rather than allocating memory using the fault-prone style
pointer = malloc(n); // or
pointer = malloc(n * sizeof(type_of_pointer));
Use
pointer = malloc(sizeof *pointer * n);
Then
// Bad: certainly fails to allocate memory for 10 `char *` pointers
// char **array = malloc(10);
// Good
char **array = malloc(sizeof *array * 10);
how does memory with these arrays work?
If insufficient memory is allocated, it does not work. So step 1: allocate sufficient memory.
Concerning array[num] = malloc(0). An allocation of 0 may return NULL or a pointer to no writable memory or a pointer to some writable memory. Writing to that pointer memory is undefined behavior (UB) in any of the 3 cases. Code may crash, may "work", it is simply UB. Code must not attempt writing to that pointer.
To be clear: "worked without problems" does not mean code is correct. C is coding without a net. Should code do something wrong (UB), the language is not obliged to catch that error. So follow safe programming practices.
First allocate an array of pointers:
char* (*array)[n] = malloc( sizeof(*array) );
Then for each item in the array, allocate the variable-length strings individually:
for(size_t i=0; i<n; i++)
{
(*array)[i] = malloc( some_string_length );
}
When I run the below code, I get the given output.
#include <stdio.h> /* printf */
#include <stdlib.h> /* malloc, realloc */
int main()
{
char* characters = (char *) malloc(10 * sizeof(char));
printf("Before: characters at %p, size=%lu\n", (void *) characters, sizeof(characters) / sizeof(characters[0]));
char* temp = (char *) realloc(characters, 100);
if (temp)
{
printf("After realloc, characters size = %lu, temp size = %lu\n", sizeof(characters) / sizeof(characters[0]), sizeof(temp) / sizeof(temp[0]));
printf("After realloc, nums at %p, temp at %p\n", (void *) characters, (void *) temp);
//characters = temp;
free(temp);
}
free(characters);
}
/* Output:
Before: characters at 0x107b00900, size=8
After realloc, characters size = 8, temp size = 8
After realloc, nums at 0x107b00900, temp at 0x107b00910
test(3556) malloc: *** error for object 0x107b00900: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6
*/
I'm trying to figure out what is malfunctioning.
I think that malloc sets aside space for ten consecutive characters and gives me a pointer to the first element of that array, and I store that pointer in characters. I then print the size of characters, and I expect 10, but I get 8. That's the first weird thing. Then, I ask realloc to find a new spot in the heap, which has space for 100 consecutive characters and return to me a pointer to the first spot. I put that pointer into temp. When I print temp's size, I get 8 (again), even though I expect 100. When I print the pointers to characters and temp, I get two different locations in the heap. Usually, I would then reassign the characters pointer to point to whatever temp is pointing to. Then I tried to free my pointers, and it told me that it couldn't free 0x107b00900, the exact location characters is, because the object at that point wasn't malloced; however, I did malloc space for characters. Why is this happening? Am I misunderstanding the functionality of malloc, realloc, sizeof, or something else? Thank you for your help.
There is no way to find the actual size allocated by alloc/realloc. The trick with dividing the size of characters by size of *characters works only for arrays; it does not work for pointers.
The library does not keep track of the size of the allocated chunk of memory in a way that would be available to the users of the standard library. It may store some information for its own use, but there is no user-facing call to retrieve it.
A common way of working around this issue is to store the size of the allocation along with the pointer to the allocated area.
You can't use sizeof on your pointer to get the amount of memory allocated.
char* characters = (char *) malloc(10 * sizeof(char));
The characters variable does not know how much space it is pointing to. It's your job to keep track of it.
As far as
char* temp = (char *) realloc(characters, 100);
realloc can move the memory block - which is what happens here. When it does that it marks the memory originally pointed to by characters as unallocated. Thus, when you free characters on the last line, you get an error because you are freeing memory that the system has marked as unallocated.
I've read various tutorials on pointers, and I now come with a question,
is this:
char *input = malloc(sizeof(char)*24);
the same as
char *input[24];
I was under the impression that the malloc will also create my space on the heap with 24 slots. Usually, I see char input[24], but the char *input[24] I figured was a simpler way than mallocing.
Thanks!
No, they are not the same.
char *input = malloc(sizeof(char)*24);
will allocate a block of 24 char's on the heap and assign a pointer to the start of that block to input. (technically you are just telling it to allocate x number of bytes where x is 24 times the size, in bytes, of each char)
char *input[24];
will create an array of 24 char pointers on the stack. These pointers will not point to anything (or garbage on init) as you have it written.
For the second example, you could then take each pointer in the array input and allocate something for it to point to on the heap. Ex:
char *input[NUM_STRS];
for( int i = 0; i < NUM_STRS; i++ )
{
input[i] = malloc( MAX_STR_LEN * sizeof(char) );
}
Then you would have an array of character pointers on the stack. Each one of these pointers would point to a block of characters on the heap.
Keep in mind, however, that things on the stack will be popped off when the function exits and that variable goes out of scope. If you malloc something, that pointer will be valid until it is freed, but the same is not true of an array created on the stack.
EDIT:
Based on your comment, here is an example of making 24 character pointers on the heap and allocating space for them to point to:
#define NUM_STRS 24
#define MAX_STR_LEN 32
char **input = malloc( sizeof(char *) * NUM_STRS );
for( int i = 0; i < NUM_STRS; i++ )
{
input[i] = malloc( sizeof(char) * MAX_STR_LEN );
}
Please keep in mind that with this example you will have to free each pointer in input, and then input itself at the appropriate time to avoid leaking memory.
These are not the same at all.
char *input = malloc(sizeof(char)*24);
This allocates enough memory to hold 24 char, and assigns the address to input (a pointer). This memory is dynamically-allocated, so it needs to be released at some point with an appropriate call to free().
char *input[24];
This declares input to be an array of 24 pointers. This has automatic storage, which means you do not need to free it. (However, you may need to free the things being pointed to by each of the pointers, but that's a different matter!)
Fundamentally, the types of the two variables are different. In the first case, you declare a pointer to char that points to memory dynamically allocated by malloc (which you are morally obligated to free at a later instant). In the second case, you declare an array of pointers to char.
Couple of observations:
sizeof(char) is one by definition, so you can leave that out. (No, it does not convey a documenting purpose. You are better of rewriting it as char *input = malloc( 24 * sizeof *input );)
Very seldom will the call to malloc have an integer literal. If it does (24) in your example, then usually one would prefer to have an array to hold that (there are some considerations regarding stack usage that I am side-stepping here).
hope this helps,
You can compare it better to char input[24]; (note no *). With that you can use input in the same way but the memory is on the stack instead of on the heap.
In this code, the "array" is an array of pointers to chars? Or something else?
struct tmep{
char (*array) [SIZE];
}
Thanks in advance :)
It's a pointer to an array of SIZE chars.
Declaration mimics use, so you evaluate the parenthesis first, (*array) gives you a char[SIZE].
To allocate, the stable version is as usual
array = malloc(num_elements * sizeof *array);
to specify the size of each object (char[SIZE] here) in the block by taking the sizeof the dereferenced pointer. You don't need to change that allocation if the type changes e.g. to int (*)[SIZE].
If you want to specify the type,
array = malloc(num_elements * sizeof(char (*)[SIZE]));
This allocates - if malloc succeeds - a block large enough for num_elements arrays of SIZE chars, each of these arrays is accessed with
array[i]
and the chars in the arrays in the block with
array[i][j]
What form is correct in allocating string in C?
char *sample;
sample = malloc ( length * sizeof(char) );
or
sample = malloc ( length * sizeof(char*) );
Why does char* take 4 bytes when char takes 1 byte?
Assuming the goal is to store a string of length characters, the correct allocation is:
sample = malloc(length + 1);
Notes:
Don't use sizeof (char), since it's always 1 it doesn't add any value.
Remember the terminator, I assumed (based on name) that length is the length in visible characters of the string, i.e. the return of strlen() will be length.
I know you didn't, but it's worth pointing out that there should be no cast of the return value from malloc(), either.
The reason char * is larger is that it's a pointer type, and pointers are almost always larger than a single character. On many systems (such as yours, it seems) they are 32 bit, while characters are just 8 bits. The larger size is needed since the pointer needs to be able to represent any address in the machine's memory. On 64-bit computers, pointers are often 64 bits, i.e. 8 characters.
Why does char* take 4 bytes when char takes 1 byte?
Because you are on a 32-bit systems, meaning that pointers take four bytes; char* is a pointer.
char always takes exactly one byte, so you do not need to multiply by sizeof(char):
sample = malloc (length);
I am assuming that length is already padded for null termination.
sample = malloc ( length * sizeof(char) );
First is the correct one if you want to allocate memory for length number of characters.
char* is of type pointer which happens to be 4 bytes on your platform. So sizeof(char*) returns 4.
But sizeof(char) is always 1 and smae is guaranteed by the C standard.
In the given cases you are doing two different things:
In the first case : sample = malloc ( length * sizeof(char) );
You are allocating length multiplied by the size of type char which is 1 byte
While in the second case : sample = malloc ( length * sizeof(char*) );
You are allocating length multiplied by the size of pointer to char which is 4 byte
on your machine.
Consider that while case 1 remains immutable, on the second case the size is variable.
sample = malloc(length);
is the right one
char* is a pointer, a pointer uses 4 bytes (say on a 32-bit platform)
char is a char, a char uses 1 byte
In your case, you want to alloc an array of length characters. You will store in sample a pointer to an array of length times the size of what you point to. The sizeof(char*) is the size of a pointer to char. Not the size of a char.
A good practice is
sample = malloc(length * sizeof(*sample));
Using that, you will reserve length time the size of what you want to point to. This gives you the ability to change the data type anytime, simply declaring sample to be another kind of data.
int *sample;
sample = malloc(length * sizeof(*sample)); // length * 4
char *sample;
sample = malloc(length * sizeof(*sample)); // length * 1
Provided the length already accounts for the nul terminator, I would write either:
sample = malloc(length);
or:
sample = malloc(length * sizeof(*sample));
sizeof(char*) is the size of the pointer, and it is completely irrelevant to the the size that the allocated buffer needs to be. So definitely don't use that.
My first snippet is IMO good enough for string-manipulation code. C programmers know that memory and string lengths in C are both measured in multiples of sizeof(char). There's no real need to put a conversion factor in there that everybody knows is always 1.
My second snippet is the One True Way to write allocations in general. So if you want all your allocations to look consistent, then string allocations should use it too. I can think of two possible reasons to make all your allocations look consistent (both fairly weak IMO, but not actually wrong):
some people will find it easier to read them that way, only one visual pattern to recognise.
you might want to use the code in future as the basis for code that handles wide strings, and a consistent form would remind you to get the allocation right when the length is no longer measured in bytes but in wide chars. Using sizeof(*sample) as the consistent form means you don't need to change that line of code at all, assuming that you update the type of sample at the same time as the units in which length is measured.
Other options include:
sample = calloc(length, 1);
sample = calloc(length, sizeof(char));
sample = calloc(length, sizeof(*sample));
They're probably fairly pointless here, but as well as the trifling secondary effect of zeroing the memory, calloc has an interesting difference from malloc that it explicitly separates the number and size of objects that you're planning to use, whereas malloc just wants the total size.
For any type T, the usual form is
T *p = malloc(N * sizeof *p);
or
T *p;
...
p = malloc(N * sizeof *p);
where N is the number of elements of type T you wish to allocate. The expression *p has type T, so sizeof *p is equivalent to sizeof (T).
Note that sizeof is an operator like & or *, not a library function; parentheses are only necessary if the operand is a type name like int or char *.
Please visit this Linkhttps://www.codesdope.com/c-dynamic-memory/for understand how it allocat the memory dynamically at run time. It might be helpful to understand the concept of malloc and how it allocate the amount of memory to the variable.
In your example;
char *sample;
sample = malloc ( length * sizeof(char) );
here, you are declare a pointer to character for sample without declaring how much memory it required. In the next line, length * sizeof(char) bytes memory is assigned for the address of sample and (char*) is to typecast the pointer returned by the malloc to character.