Which pointer value is the max for a malloc call - c

Fairly simple question regarding malloc. What is the max that I can set within the allocated area. For instance:
char *buffer;
buffer = malloc(20);
buffer[19] = 'a'; //Is this the highest spot I can set?
buffer[20] = 'a'; //Or is this the highest spot I can set?
free(buffer);

The phrasing of your question is a bit off. You mean "what is the maximum index I can use for an allocated block of memory". The answer is the same as for arrays.
If you are reading or writing the memory, you may safely use indices between (and including) 0 and one less than the size of the block (in your case, that means index 19). All up, that means you can access the 20 values that you asked for.
If you are simply obtaining the pointer for comparison with other pointers inside the same block (and you are not going to read or write to it), you may additionally obtain the pointer one-past-the-end (in your case that means index 20).
To clarify these things with examples:
Yes, buffer[19] = 'a'; is the very last value you may access in a read or write capacity. Don't forget that if you want to store a string in this memory, and hand it to functions that expect a null-terminated string, this slot is your last chance to put that value of '\0'.
You are allowed to access buffer[20] in the following manner:
char *p;
for( p = &buffer[0]; p != &buffer[20]; ++p )
{
putc( *p, stdout );
}
This is useful because of the way we tend to iterate over memory and store sizes. It would make our code quite less readable if we had to subtract 1 all over the place.
Oh, and it gives you the neat trick:
size_t buf_size = 20;
char *buffer = malloc(buf_size);
char *start = buffer;
char *end = buffer + buf_size;
size_t oops_i_forgot_the_size = end - start;

malloc(x) will allocate x bytes.
So by accessing buffer[0] you access the first byte, by accessing buffer[1] you access the second.
e.g
char * buffer = (char *) malloc(1);
buffer[0] = 0; // legal
buffer[1] = 0; // illegal

Related

Memory, pointers, and pointers to pointers

I am working on a short program that reads a .txt file. Intially, I was playing around in main function, and I had gotten to my code to work just fine. Later, I decided to abstract it to a function. Now, I cannot seem to get my code to work, and I have been hung up on this problem for quite some time.
I think my biggest issue is that I don't really understand what is going on at a memory/hardware level. I understand that a pointer simply holds a memory address, and a pointer to a pointer simply holds a memory address to an another memory address, a short breadcrumb trail to what we really want.
Yet, now that I am introducing malloc() to expand the amount of memory allocated, I seem to lose sight of whats going on. In fact, I am not really sure how to think of memory at all anymore.
So, a char takes up a single byte, correct?
If I understand correctly, then by a char* takes up a single byte of memory?
If we were to have a:
char* str = "hello"
Would it be say safe to assume that it takes up 6 bytes of memory (including the null character)?
And, if we wanted to allocate memory for some "size" unknown at compile time, then we would need to dynamically allocate memory.
int size = determine_size();
char* str = NULL;
str = (char*)malloc(size * sizeof(char));
Is this syntactically correct so far?
Now, if you would judge my interpretation. We are telling the compiler that we need "size" number of contiguous memory reserved for chars. If size was equal to 10, then str* would point to the first address of 10 memory addresses, correct?
Now, if we could go one step further.
int size = determine_size();
char* str = NULL;
file_read("filename.txt", size, &str);
This is where my feet start to leave the ground. My interpretation is that file_read() looks something like this:
int file_read(char* filename, int size, char** buffer) {
// Set up FILE stream
// Allocate memory to buffer
buffer = malloc(size * sizeof(char));
// Add characters to buffer
int i = 0;
char c;
while((c=fgetc(file))!=EOF){
*(buffer + i) = (char)c;
i++;
}
Adding the characters to the buffer and allocating the memory is what is I cannot seem to wrap my head around.
If **buffer is pointing to *str which is equal to null, then how do I allocate memory to *str and add characters to it?
I understand that this is lengthy, but I appreciate the time you all are taking to read this! Let me know if I can clarify anything.
EDIT:
Whoa, my code is working now, thanks so much!
Although, I don't know why this works:
*((*buffer) + i) = (char)c;
So, a char takes up a single byte, correct?
Yes.
If I understand correctly, by default a char* takes up a single byte of memory.
Your wording is somewhat ambiguous. A char takes up a single byte of memory. A char * can point to one char, i.e. one byte of memory, or a char array, i.e. multiple bytes of memory.
The pointer itself takes up more than a single byte. The exact value is implementation-defined, usually 4 bytes (32bit) or 8 bytes (64bit). You can check the exact value with printf( "%zd\n", sizeof char * ).
If we were to have a char* str = "hello", would it be say safe to assume that it takes up 6 bytes of memory (including the null character)?
Yes.
And, if we wanted to allocate memory for some "size" unknown at compile time, then we would need to dynamically allocate memory.
int size = determine_size();
char* str = NULL;
str = (char*)malloc(size * sizeof(char));
Is this syntactically correct so far?
Do not cast the result of malloc. And sizeof char is by definition always 1.
If size was equal to 10, then str* would point to the first address of 10 memory addresses, correct?
Yes. Well, almost. str* makes no sense, and it's 10 chars, not 10 memory addresses. But str would point to the first of the 10 chars, yes.
Now, if we could go one step further.
int size = determine_size();
char* str = NULL;
file_read("filename.txt", size, &str);
This is where my feet start to leave the ground. My interpretation is that file_read() looks something like this:
int file_read(char* filename, int size, char** buffer) {
// Set up FILE stream
// Allocate memory to buffer
buffer = malloc(size * sizeof(char));
No. You would write *buffer = malloc( size );. The idea is that the memory you are allocating inside the function can be addressed by the caller of the function. So the pointer provided by the caller -- str, which is NULL at the point of the call -- needs to be changed. That is why the caller passes the address of str, so you can write the pointer returned by malloc() to that address. After your function returns, the caller's str will no longer be NULL, but contain the address returned by malloc().
buffer is the address of str, passed to the function by value. Allocating to buffer would only change that (local) pointer value.
Allocating to *buffer, on the other hand, is the same as allocating to str. The caller will "see" the change to str after your file_read() returns.
Although, I don't know why this works: *((*buffer) + i) = (char)c;
buffer is the address of str.
*buffer is, basically, the same as str -- a pointer to char (array).
(*buffer) + i) is pointer arithmetic -- the pointer *buffer plus i means a pointer to the ith element of the array.
*((*buffer) + i) is dereferencing that pointer to the ith element -- a single char.
to which you are then assigning (char)c.
A simpler expression doing the same thing would be:
(*buffer)[i] = (char)c;
with char **buffer, buffer stands for the pointer to the pointer to the char, *buffer accesses the pointer to a char, and **buffer accesses the char value itself.
To pass back a pointer to a new array of chars, write *buffer = malloc(size).
To write values into the char array, write *((*buffer) + i) = c, or (probably simpler) (*buffer)[i] = c
See the following snippet demonstrating what's going on:
void generate0to9(char** buffer) {
*buffer = malloc(11); // *buffer dereferences the pointer to the pointer buffer one time, i.e. it writes a (new) pointer value into the address passed in by `buffer`
for (int i=0;i<=9;i++) {
//*((*buffer)+i) = '0' + i;
(*buffer)[i] = '0' + i;
}
(*buffer)[10]='\0';
}
int main(void) {
char *b = NULL;
generate0to9(&b); // pass a pointer to the pointer b, such that the pointer`s value can be changed in the function
printf("b: %s\n", b);
free(b);
return 0;
}
Output:
0123456789

store characters in character pointer

I have a thread which parses incomming characters/bytes one by one.
I would like to store the sequence of bytes in a byte pointer, and in the end when the sequence of "\r\n" is found it should print the full message out.
unsigned char byte;
unsigned char *bytes = NULL;
while (true){ // thread which is running on the side
byte = get(); // gets 1 byte from I/O
bytes = byte; //
*bytes++;
if (byte == 'x'){ // for now instead of "\r\n" i use the char 'x'
printf( "Your message: %s", bytes);
bytes = NULL; // or {0}?
}
}
You should define bytes as array with size of max message length not a pointer.
unsigned char byte, i;
unsigned char arr[10]; // 10 for example
i=0;
while (true){
byte = get();
arr[i] = byte;
i++;
if (byte == 'x'){
printf( "Your message: %s", arr);
}
}
When you define bytes as a pointer, it points to nothing and writing to it may erase other data in your program, you can make it array or allocate space for it in run time using malloc
Your Code
unsigned char byte;
unsigned char *bytes = NULL;
while (true){
Nothing wrong here, but some things must be cleared:
Did you alloc memory for your bytes buffer? That is, using malloc() family functions?
If so, did you check malloc() return and made sure the pointer is ok?
Did you include stdbool.h to use true and false?
Moving on...
byte = get();
bytes = byte;
*bytes++;
I'm assuming get() returns an unsigned char, since you didn't give the code.
Problem: bytes = byte. You're assigning an unsigned char to an unsigned char *. That's bad because unsigned char * is expecting a memory address (aka pointer) and you're giving it a character (which translates into a really bad memory address, cause you're giving addresses up to 255, which your program isn't allowed to access), and your compiler certainly complained about that assignment...
*byte++ has two "problems" (not being really problems): one, you don't need the * (dereferencing) operator to just increment the pointer reference, you could've done byte++; two, it was shorter and easier to understand if you switched this line and the previous one (bytes = byte) to *bytes++ = byte. If you don't know what this statement does, I suggest reading up on operator precedence and assignment operators.
Then we have...
if (byte == 'x'){
printf( "Your message: %s", bytes);
bytes = NULL;
}
if's alright.
printf() is messed up because you've been incrementing your bytes pointer the whole time while you were get()ting those characters. This means that the current location pointed by bytes is the end of your string (or message). To correct this, you can do one of two things: one, have a counter on the number of bytes read and then use that to decrement the bytes pointer and get the correct address; or two, use a secondary auxiliary pointer (which I prefer, cause it's easier to understand).
bytes = NULL. If you did malloc() for your bytes buffer, here you're destroying that reference, because you're making an assignment that effectively changes the address to which the pointer points to to NULL. Anyway, what you need to clear that buffer is memset(). Read more about it in the manual.
Another subtle (but serious) problem is the end of string character, which you forgot to put in that string altogether. Without it, printf() will start printing really weired things past your message until a Segmentation Fault or the like happens. To do that, you can use your already incremented bytes pointer and do *bytes = 0x0 or *bytes = '\0'. The NULL terminating byte is used in a string so that functions know where the string ends. Without it, it would be really hard to manipulate strings.
Code
unsigned char byte;
unsigned char *bytes = NULL;
unsigned char *bytes_aux;
bytes = malloc(500);
if (!bytes) return;
bytes_aux = bytes;
while (true) { /* could use while(1)... */
byte = get();
*bytes++ = byte;
if (byte == 'x') {
*(bytes - 1) = 0x0;
bytes = bytes_aux;
printf("Your message: %s\n", bytes);
memset(bytes, 0, 500);
}
}
if ((*bytes++ = get()) == 'x') is a compound version of the three byte = get(); *bytes++ = byte; if (byte == 'x'). Refer to that assignment link I told you about! This is a neat way of writing it and will make you look super cool at parties!
*(bytes - 1) = 0x0; The -1 bit is to exclude the x character which was saved in the string. With one step we exclude the x and set the NULL terminating byte.
bytes = bytes_aux; This restores bytes default state - now it correctly points to the beginning of the message.
memset(bytes, 0, 500) The function I told you about to reset your string.
Using memset is not necessary in this particular case. Every loop repetition we're saving characters from the beginning of the bytes buffer forward. Then, we set a NULL terminating byte and restore it's original position, effectively overwriting all other data. The NULL byte will take care of preventing printf() from printing whatever lies after the end of the current message. So the memset() part can be skipped and precious CPU time saved!
Somewhere when you get out of that loop (if you do), remember to free() the bytes pointer! You don't want that memory leaking...

Using calloc and manually inputting chars results in a crash

I have this code right here.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int *size;
int i = 0;
char buf[] = "Thomas was alone";
size = (int*)calloc(1, sizeof(buf)+1);
for(i=0; i<strlen(buf); i++)
{
*(size+i) = buf[i];
printf("%c", *(size+i));
}
free(size);
}
To my understanding calloc reserves a memspace the size of the first arg multiplied by the second, in this case 18. The length of buf is 17 and thus the for loop should not have any problems at all.
Running this program results in the expected results ( It prints Thomas was alone ), however it crashes immediately too. This persists unless I crank up the size of calloc ( like multiplied by ten ).
Am I perhaps understanding something wrongly?
Should I use a function to prevent this from happening?
int *size means you need:
size = calloc(sizeof(int), sizeof(buf));
You allocated enough space for an array of char, but not an array of int (unless you're on an odd system where sizeof(char) == sizeof(int), which is a theoretical possibility rather than a practical one). That means your code writes well beyond the end of the allocated memory, which is what leads to the crashing. Or you can use char *size in which case the original call to calloc() is OK.
Note that sizeof(buf) includes the terminal null; strlen(buf) does not. That means you overallocate slightly with the +1 term.
You could also perfectly sensibly write size[i] instead of *(size+i).
Change the type of size to char.
You are using an int and when you add to the pointer here *(size+i), you go out of bounds.
Pointer arithmetic takes account of the type, which in you case is int not char. sizeof int is larger than char on your system.
You allocate place for char array not for int array:
char is 1 byte in memory (most often)
int is 4 bytes in memory (most often)
so you allocate 1 * sizeof(buf) + 1 = 18 bytes
so for example in memory:
buf[0] = 0x34523
buf[1] = 0x34524
buf[2] = 0x34525
buf[3] = 0x34526
but when you use *(size + 1) you don't move pointer on 1 byte but for sizeof(int) so for 4 bytes.
So in memory it will look like:
size[0] = 0x4560
size[1] = 0x4564
size[2] = 0x4568
size[3] = 0x4572
so after few loops you are out of memory.
change calloc(1, sizeof(buf) + 1); to calloc(sizeof(int), sizeof(buf) + 1); to have enough memory.
Second think, I think is some example on which you learn how it works?
My suggestion:
Use the same type of pointer and variable.
when you assign diffnerent type of variables, use explicit conversion, in this example
*(size+i) = (int)buf[i];

How do I make a function return a pointer to a new string in C?

I'm reading K&R and I'm almost through the chapter on pointers. I'm not entirely sure if I'm going about using them the right way. I decided to try implementing itoa(n) using pointers. Is there something glaringly wrong about the way I went about doing it? I don't particularly like that I needed to set aside a large array to work as a string buffer in order to do anything, but then again, I'm not sure if that's actually the correct way to go about it in C.
Are there any general guidelines you like to follow when deciding to use pointers in your code? Is there anything I can improve on in the code below? Is there a way I can work with strings without a static string buffer?
/*Source file: String Functions*/
#include <stdio.h>
static char stringBuffer[500];
static char *strPtr = stringBuffer;
/* Algorithm: n % 10^(n+1) / 10^(n) */
char *intToString(int n){
int p = 1;
int i = 0;
while(n/p != 0)
p*=10, i++;
for(;p != 1; p/=10)
*(strPtr++) = ((n % p)/(p/10)) + '0';
*strPtr++ = '\0';
return strPtr - i - 1;
}
int main(){
char *s[3] = {intToString(123), intToString(456), intToString(78910)};
printf("%s\n",s[2]);
int x = stringToInteger(s[2]);
printf("%d\n", x);
return 0;
}
Lastly, can someone clarify for me what the difference between an array and a pointer is? There's a section in K&R that has me very confused about it; "5.5 - Character Pointers and Functions." I'll quote it here:
"There is an important difference between the definitions:
char amessage[] = "now is the time"; /*an array*/
char *pmessage = "now is the time"; /*a pointer*/
amessage is an array, just big enough to hold the sequence of characters and '\0' that
initializes it. Individual characters within the array may be changed but amessage will
always refer to the same storage. On the other hand, pmessage is a pointer, initialized
to point to a string constant; the pointer may subsequently be modified to point
elsewhere, but the result is undefined if you try to modify the string contents."
What does that even mean?
For itoa the length of a resulting string can't be greater than the length of INT_MAX + minus sign - so you'd be safe with a buffer of that length. The length of number string is easy to determine by using log10(number) + 1, so you'd need buffer sized log10(INT_MAX) + 3, with space for minus and terminating \0.
Also, generally it's not a good practice to return pointers to 'black box' buffers from functions. Your best bet here would be to provide a buffer as a pointer argument in intToString, so then you can easily use any type of memory you like (dynamic, allocated on stack, etc.). Here's an example:
char *intToString(int n, char *buffer) {
// ...
char *bufferStart = buffer;
for(;p != 1; p/=10)
*(buffer++) = ((n % p)/(p/10)) + '0';
*buffer++ = '\0';
return bufferStart;
}
Then you can use it as follows:
char *buffer1 = malloc(30);
char buffer2[15];
intToString(10, buffer1); // providing pointer to heap allocated memory as a buffer
intToString(20, &buffer2[0]); // providing pointer to statically allocated memory
what the difference between an array and a pointer is?
The answer is in your quote - a pointer can be modified to be pointing to another memory address. Compare:
int a[] = {1, 2, 3};
int b[] = {4, 5, 6};
int *ptrA = &a[0]; // the ptrA now contains pointer to a's first element
ptrA = &b[0]; // now it's b's first element
a = b; // it won't compile
Also, arrays are generally statically allocated, while pointers are suitable for any allocation mechanism.
Regarding your code:
You are using a single static buffer for every call to intToString: this is bad because the string produced by the first call to it will be overwritten by the next.
Generally, functions that handle strings in C should either return a new buffer from malloc, or they should write into a buffer provided by the caller. Allocating a new buffer is less prone to problems due to running out of buffer space.
You are also using a static pointer for the location to write into the buffer, and it never rewinds, so that's definitely a problem: enough calls to this function, and you will run off the end of the buffer and crash.
You already have an initial loop that calculates the number of digits in the function. So you should then just make a new buffer that big using malloc, making sure to leave space for the \0, write in to that, and return it.
Also, since i is not just a loop index, change it to something more obvious like length:
That is to say: get rid of the global variables, and instead after computing length:
char *s, *result;
// compute length
s = result = malloc(length+1);
if (!s) return NULL; // out of memory
for(;p != 1; p/=10)
*(s++) = ((n % p)/(p/10)) + '0';
*s++ = '\0';
return result;
The caller is responsible for releasing the buffer when they're done with it.
Two other things I'd really recommend while learning about pointers:
Compile with all warnings turned on (-Wall etc) and if you get an error try to understand what caused it; they will have things to teach you about how you're using the language
Run your program under Valgrind or some similar checker, which will make pointer bugs more obvious, rather than causing silent corruption
Regarding your last question:
char amessage[] = "now is the time"; - is an array. Arrays cannot be reassigned to point to something else (unlike pointers), it points to a fixed address in memory. If the array was allocated in a block, it will be cleaned up at the end of the block (meaning you cannot return such an array from a function). You can however fiddle with the data inside the array as much as you like so long as you don't exceed the size of the array.
E.g. this is legal amessage[0] = 'N';
char *pmessage = "now is the time"; - is a pointer. A pointer points to a block in memory, nothing more. "now is the time" is a string literal, meaning it is stored inside the executable in a read only location. You cannot under any circumstances modify the data it is pointing to. You can however reassign the pointer to point to something else.
This is NOT legal -*pmessage = 'N'; - will segfault most likely (note that you can use the array syntax with pointers, *pmessage is equivalent to pmessage[0]).
If you compile it with gcc using the -S flag you can actually see "now is the time" stored in the read only part of the assembly executable.
One other thing to point out is that arrays decay to pointers when passed as arguments to a function. The following two declarations are equivalent:
void foo(char arr[]);
and
void foo(char* arr);
About how to use pointers and the difference between array and pointer, I recommend you read the "expert c programming" (http://www.amazon.com/Expert-Programming-Peter-van-Linden/dp/0131774298/ref=sr_1_1?ie=UTF8&qid=1371439251&sr=8-1&keywords=expert+c+programming).
Better way to return strings from functions is to allocate dynamic memory (using malloc) and fill it with the required string...return this pointer to the calling function and then free it.
Sample code :
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#define MAX_NAME_SIZE 20
char * func1()
{
char * c1= NULL;
c1 = (char*)malloc(sizeof(MAX_NAME_SIZE));
strcpy(c1,"John");
return c1;
}
main()
{
char * c2 = NULL;
c2 = func1();
printf("%s \n",c2);
free(c2);
}
And this works without the static strings.

Allocating an array of an unknown size

Context: I'm trying to do is to make a program which would take text as input and store it in a character array. Then I would print each element of the array as a decimal. E.g. "Hello World" would be converted to 72, 101, etc.. I would use this as a quick ASCII2DEC converter. I know there are online converters but I'm trying to make this one on my own.
Problem: how can I allocate an array whose size is unknown at compile-time and make it the exact same size as the text I enter? So when I enter "Hello World" it would dynamically make an array with the exact size required to store just "Hello World". I have searched the web but couldn't find anything that I could make use of.
I see that you're using C. You could do something like this:
#define INC_SIZE 10
char *buf = (char*) malloc(INC_SIZE),*temp;
int size = INC_SIZE,len = 0;
char c;
while ((c = getchar()) != '\n') { // I assume you want to read a line of input
if (len == size) {
size += INC_SIZE;
temp = (char*) realloc(buf,size);
if (temp == NULL) {
// not enough memory probably, handle it yourself
}
buf = temp;
}
buf[len++] = c;
}
// done, note that the character array has no '\0' terminator and the length is represented by `len` variable
Typically, on environments like a PC where there are no great memory constraints, I would just dynamically allocate, (language-dependent) an array/string/whatever of, say, 64K and keep an index/pointer/whatever to the current end point plus one - ie. the next index/location to place any new data.
if you use cpp language, you can use the string to store the input characters,and access the character by operator[] , like the following codes:
std::string input;
cin >> input;
I'm going to guess you mean C, as that's one of the commonest compiled languages where you would have this problem.
Variables that you declare in a function are stored on the stack. This is nice and efficient, gets cleaned up when your function exits, etc. The only problem is that the size of the stack slot for each function is fixed and cannot change while the function is running.
The second place you can allocate memory is the heap. This is a free-for-all that you can allocate and deallocate memory from at runtime. You allocate with malloc(), and when finished, you call free() on it (this is important to avoid memory leaks).
With heap allocations you must know the size at allocation time, but it's better than having it stored in fixed stack space that you cannot grow if needed.
This is a simple and stupid function to decode a string to its ASCII codes using a dynamically-allocated buffer:
char* str_to_ascii_codes(char* str)
{
size_t i;
size_t str_length = strlen(str);
char* ascii_codes = malloc(str_length*4+1);
for(i = 0; i<str_length; i++)
snprintf(ascii_codes+i*4, 5, "%03d ", str[i]);
return ascii_codes;
}
Edit: You mentioned in a comment wanting to get the buffer just right. I cut corners with the above example by making each entry in the string a known length, and not trimming the result's extra space character. This is a smarter version that fixes both of those issues:
char* str_to_ascii_codes(char* str)
{
size_t i;
int written;
size_t str_length = strlen(str), ascii_codes_length = 0;
char* ascii_codes = malloc(str_length*4+1);
for(i = 0; i<str_length; i++)
{
snprintf(ascii_codes+ascii_codes_length, 5, "%d %n", str[i], &written);
ascii_codes_length = ascii_codes_length + written;
}
/* This is intentionally one byte short, to trim the trailing space char */
ascii_codes = realloc(ascii_codes, ascii_codes_length);
/* Add new end-of-string marker */
ascii_codes[ascii_codes_length-1] = '\0';
return ascii_codes;
}

Resources