Character arrays in C

Character arrays in C - c

I'm new to c. Just have a question about the character arrays (or string) in c: When I want to create a character array in C, do I have to give the size at the same time?
Because we may not know the size that we actually need. For example of client-server program, if we want to declare a character array for the server program to receive a message from the client program, but we don't know the size of the message, we could do it like this:
char buffer[1000];
recv(fd,buffer, 1000, 0);
But what if the actual message is only of length 10. Will that cause a lot of wasted memory?

Yes, you have to decide the dimension in advance, even if you use malloc.
When you read from sockets, as in the example, you usually use a buffer with a reasonable size, and dispatch data in other structure as soon you consume it. In any case, 1000 bytes is not a so much memory waste and is for sure faster than asking a byte at a time from some memory manager :)

Yes, you have to give the size if you are not initializing the char array at the time of declaration. Better approach for your problem is to identify the optimum size of the buffer at run time and dynamically allocate the memory.

What you're asking about is how to dynamically size a buffer. This is done with a dynamic allocation such as using malloc() -- a memory allocator. Using it gives you an important responsibility though: when you're done using the buffer you must return it to the system yourself. If using malloc() [or calloc()], you return it with free().
For example:
char *buffer; // pointer to a buffer -- essentially an unsized array
buffer = (char *)malloc(size);
// use the buffer ...
free(buffer); // return the buffer -- do NOT use it any more!
The only problem left to solve is how to determine the size you'll need. If you're recv()'ing data that hints at the size, you'll need to break the communication into two recv() calls: first getting the minimum size all packets will have, then allocating the full buffer, then recv'ing the rest.

When you don't know the exact amount of input data, do as follows:
Create a small buffer
Allocate some memory for a "storage" (e.g. twice of buffer size)
Fill the buffer with the data from the input stream (e.g. socket, file etc.)
Copy the data from the buffer to the storage
4.1 If there is not enough place in storage, re-allocate the memory (e.g. with a size twice bigger than it is at this point)
Do steps 3 and 4 unless the "END OF STREAM"
Your storage contains the data now.

If you don't know the size a-priori, then you have no choice but to create it dynamically using malloc (or whatever equivalent mechanism in your language of choice.)
size_t buffer_size = ...; /* read from a DEFINE or from a config file */
char * buffer = malloc( sizeof( char ) * (buffer_size + 1) );
Creating a buffer of size m, but only receiving an input string of size n with n < m is not a waste of memory, but an engineering compromise.
If you create your buffer with a size close to the intended input, you risk having to refill the buffer many, many times for those cases where m >> n. Typically, iterations over the buffer are tied up with I/O operations, so now you might be saving some bytes (which is really nothing in today's hardware) at the expense of potentially increasing the problems in some other end. Specially for client-server apps. If we were talking about resource-constrained embedded systems, that'd be another thing.
You should be worrying about getting your algorithms right and solid. Then you worry, if you can, about shaving off a few bytes here and there.
For me, I'd rather create a buffer that is 2 to 10 times greater than the average input (not the smallest input as in your case, but the average), assuming my input tends to have a slow standard deviation in size. Otherwise, I'd go 20 times the size or more (specially if memory is cheap and doing this minimizes hitting the disk or the NIC card.)
At the most basic setup, one typically gets the size of the buffer as a configuration item read off a file (or passed as an argument), and defaulting to a default compile time value if none is provided. Then you can adjust the size of your buffers according to the observed input sizes.
More elaborate algorithms (say TCP) adjust the size of their buffers at run-time to better accommodate input whose size might/will change over time.

Even if you use malloc you also must define the size first! So instead you give a large number that is capable of accepting the message like:
int buffer[2000];
In case of small message or large you can reallocate it to release the unused locations or to occupy the unused locations
example:
int main()
{
char *str;
/* Initial memory allocation */
str = (char *) malloc(15);
strcpy(str, "tutorialspoint");
printf("String = %s, Address = %u\n", str, str);
/* Reallocating memory */
str = (char *) realloc(str, 25);
strcat(str, ".com");
printf("String = %s, Address = %u\n", str, str);
free(str);
return(0);
}
Note: make sure to include stdlib.h library

Related

Input string without knowing the size

What's the way when i want to store string that i don't know the size.
I do like this:
#include <stdio.h>
#include <conio.h>
int main () {
char * str;
str = (char *)malloc(sizeof(char) + 1);
str[1] = '\0';
int i = 0;
int c = '\0';
do {
c = getche();
if(c != '\r'){
str[i] = c;
str[i + 1] = '\0';
i++;
str = (char *)realloc(str, sizeof(char) + i + 2);
}
} while(c != '\r');
printf("\n%s\n", str);
free(str);
return 0;
}
I find this page:
Dynamically prompt for string without knowing string size
Is it correct? If it is, then:
Is there any better way?
Is there more efficient way?

Is it correct?
No
The main problem is the use of realloc. It is just wrong. When using realloc never directly assign to the pointer that points to the already allocated memory - always use a temporary to take the return value. Like:
char * temp;
temp = realloc(str, 1 + i + 2);
if (temp == NULL)
{
// out of memory
.. add error handling
}
str = temp;
The reason for this is that realloc may fail in which case it will return NULL. So if you assign directly to str and realloc fails, you have lost the pointer to the allocated memory (aka the string).
Besides that:
1) Don't cast malloc and realloc
2) sizeof(char) is always 1 - so you don't need to use it - just put 1
Is there any better way?
Is there more efficient way?
Instead of reallocating by 1 in each loop - which is pretty expensive performance wise - it is in many cases better to (re)allocate a bigger chunk.
One strategy is to double the allocation whenever calling realloc. So if you have allocated 128 bytes the next allocation should be 2*128=256. Another strategy is to let it grow with some fixed size which is significantly bigger than 1 - for instance you could let it grow with 1024 each time.

I suggest using a buffer to avoid repeated realloc calls. Create a buffer or arbitary size e.g. 1024 when it fills up you can realloc more space to your dynamically allocated buffer and memmove the buffer into it.

The key to answering this question is to clarify the term "without knowing the size".
We may not know what amount of data we're going to get, but we may know what we're going to do with it.
Let us consider the following use cases:
We have restrictions on the data we need, for example: a person's name, an address, the title of a book. I guess we are good with 1k or a maximum of 16k of space.
We obtain a continuous flow of data, for example: some sensor or other equipment sends us data every second. In this case, we could process the data in chunks.
Answer:
We need to make an educated guess about the size we intend to process and allocate space accordingly.
We have to process data on the fly and we need to release the space that is no longer required.
Note:
It is important to note, that we can't allocate unlimited size of memory. On some point we have to implement error handling and/or we need to store the data on 'disk' or somewhere else.
Note II:
In case a more memory efficient solution is needed, using realloc is not recommended as it can duplicate the allocated size (if the system cannot simply increase the allocated space, it first allocates a new block of memory and copies the current contents) while running. Instead, an application-specific memory structure would be required. But I assume that is beyond the scope of the original question.

Is it correct?
Sort of.
We don't cast the result of malloc() in C.
Is there any better way?
That's primarily opinion-based.
Is there more efficient way?
With regards to time or space?
If you are asking about space, no.
If you are asking about time, yes.
You could dynamically allocate memory for an array with a small size, that would hold the string for a while. Then, when the array would not be able to hold the string any longer, you would reallocate that memory and double its size. And so on, until the whole string is read. When you are done, you could reallocate the memory again, and shrink the size to be the exact number you need for your string.
You see, calling realloc(), is costly in time, since it may have to move a whole memory block, since the memory must be contiguous, and there might not be any space left to perform that operation without moving the memory related to the string.
Note: Of course, a fixed sized array, statically created would be better in terms of time, but worse in terms of memory. Everything is a trade off, that's where you come into play and decide what best suits your application.

How about this?
char *string_name;
asprintf(&string_name, "Hello World, my name is %s & I'm %d years old!", "James Bond", 27);
printf("string is %s", string_name);
free(string_name);

how to allocate memory to store 5 names without wasting not even 1 byte

I want to store 5 names without wasting 1byte , so how can allocate memory using malloc

That's for all practical purposes impossible, malloc will more often than not return blocks of memory bigger than requested.

#include <stdio.h>
#include<stdlib.h>
int main()
{
int n,i,c;
char *p[5];/*declare a pointer to 5 strings for the 5 names*/
for(i=0;i<5;i++)
{
n=0;
printf("please enter the name\n" );/*input name from the user*/
while((c=getchar())!='\n')
n++;/*count the total number of characters in the name*/
p[i]= (char *)malloc(sizeof(char)*n);/*allocate the required amount of memory for a name*/
scanf("%s",p[i]);
}
return 0;
}

If you know the cumulative length of the five names, let's call it length_names, you could do a
void *pNameBlock = malloc(length_names + 5);
Then you could store the names, null terminated (the +5 is for the null termination), one right after the other in the memory pointed to by pNameBlock.
char *pName1 = (char *) pNameBlock;
Store the name data at *pName1. Maybe via
char *p = *pName1; You can then write byte by byte (following is pseudo-codeish).
*p++ = byte1;
*p++ = byte2;
etc.
End with a null termination:
*p++ = '\0';
Now set
char *pName2 = p;
and write the second name using p, as above.
Doing things this way will still waste some memory. Malloc will internally get itself more memory than you are asking for, but it will waste that memory only once, on this one operation, getting this one block, with no overhead beyond this once.
Be very careful, though, because under this way of doing things, you can't free() the char *s, such as pName1, for the names. You can only free that one pointer you got that one time, pNameBlock.
If you are asking this question out of interest, ok. But if you are this memory constrained, you're going to have a very very hard time. malloc does waste some memory, but not a lot. You're going to have a hard time working with C this constrained. You'd almost have to write your own super light weight memory manager (do you really want to do that?). Otherwise, you'd be better off working in assembly, if you can't afford to waste even a byte.
I have a hard time imagining what kind of super-cramped embedded system imposes this kind of limit on memory usage.

If you don't want to waste any byte to store names, you should dynamically allocate a double array (char) in C.
A double array in C can be implemented as a pointer to a list of pointers.
char **name; // Allocate space for a pointer, pointing to a pointer (the beginning of an array in C)
name = (char **) malloc (sizeof(char *) * 5); // Allocate space for the pointer array, for 5 names
name[0] = (char *) malloc (sizeof(char) * lengthOfName1); // Allocate space for the first name, same for other names
name[1] = (char *) malloc (sizeof(char) * lengthOfName2);
....
Now you can save the name to its corresponding position in the array without allocating more space, even though names might have different lengths.

You have to take double pointer concept and then have to put your name character by character with increment of pointer address and then you are able to save all 5 names so as you are able to save your memory.
But as programmer you should not have to use this type of tedious task you have to take array of pointers to store names and have to allocate memory step by step.
This is only for the concept of storing names but if you are dealing with large amount of data then you have to use link list to store all data.

When you malloc a block, it actually allocates a bit more memory than you asked for. This extra memory is used to store information such as the size of the allocated block.

Encode the names in binary and store them in a byte array.

What is "memory waste"? If you can define it clearly, then a solution can be found.
For example, the null in a null terminated string might be considered "wasted memory" because the null isn't printed; however, another person might not consider it memory waste because without it, you need to store a second item (string length).
When I use a byte, the byte is fully used. Only if you can show me how it might be done without that byte will I consider your claims of memory waste valid. I use the nulls at the ends of my strings. If I declare an array of strings, I use the array too. Make what you need, and then if you find that you can rearrange those items to use less memory, decide that the other way wasted some memory. Until then, you're chasing a dream which you haven't finished.
If these five "names" are assembly jump points, you don't need a full string's worth of memory to hold them. If the five "names" are block scoped variables, perhaps they won't need any more memory than the registers already provide. If they are strings, then perhaps you can combine and overlay strings; but, until you come up with a solution, and a second solution to compare the first against, you don't have a case for wasted / saved memory.

Max Size for char buffer in C?

Is there a max size for a char buffer? I have a program that is collecting strings for a char buffer and writing it to a proc file. After a certain point it appears to stop writing things - is there too much in there? What is that max size so I can work around this?
Here is code. This is an LKM - is limits.h available from kernel space?
Foremost:
const char* input = "hooloo\n";
Next:
int read_info( char *page, char **start, off_t off, int count, int *eof, void *data )
{
unsigned int mem;
char answer_buf[strlen(input) + 1 + 14];
name_added = vmalloc(strlen(input) + 1 + 14);
strcpy(name_added, input);
strcat(name_added, extension);
mem = sprintf(answer_buf, "%s\n", name_added);
memcpy(page, answer_buf, mem);
return strlen(answer_buf) + 1;
}
All in my code are things like this are things that remalloc the buffer and add to it. Also, that read_info is for the procfile. This issue is I keep adding to that buffer with the code above over and over and over - eventually I ca my procfile and the text cuts off - it doesn't go on forever like i want )-=.

There's no concrete maximum size "in C" specifically. Theoretical (or "potential") maximum size of any object on a C platform is determined by the implementation and is usually derived from the properties of the underlying machine platform and OS.
On platforms with flat memory model it will typically be limited by the size of the address space in theory, and by the size of the available free memory (or that specific kind) in practice.
On platforms with segmented memory model it might be limited by the segment size, which is smaller than the address space size. Although implementations are free to breach that limit by "emulating" flat memory model in the code. For that reason on such platforms the maximum object size can also depend on compilation settings.

The only maximum size for a dynamically allocated char buffer will be available system memory.
A buffer on the stack will have its size constrained by maximum stack size. This will vary greatly depending on host OS.
When writing data to file, are you checking the size returned by fwrite and calling it repeatedly to write the remainder of the buffer if necessary?

You have a memory leak in your code!
The following memory is never freed:
name_added = vmalloc(strlen(input) + 1 + 14);
I don't understand why you allocate memory for the output at all.
And you do it twice, both on the stack and on the heap.
The caller has provided a buffer for the output. Use it!
Don't create copies!

I'd say it's at least able to handle 1,000 unique characters

Creating a memory buffer without malloc

I am working on an embedded system (ARM Cortex M3) where I do not have access to any sort of "standard library". In particular, I do not have access to malloc.
I have a function void doStuff(uint8_t *buffer) that accepts a pointer to a 512 bits buffer. I have tried doing the following:
uint8_t buffer[64] = {0};
doStuff((uint8_t *) &buffer));
but I'm not getting the expected results. Am I doing something wrong? Is there any alternative approach?

doStuff(buffer) shall be ok since buffer is already an uint8_t*.
Aside of this, you're closing one bracket too much after &buffer in your example.
If buffer is of variable size, you should pass the size into doStuff too, if it's of constant size, I'd also pass the size just in case that you change the size one day.
This being said, you should do it the following way:
uint8_t buffer[64] = {0};
int len = 64;
doStuff(buffer, len);

a simplemalloc(): have a char mem[MAXMEM]; and a struct freetable. Then write your own simplemalloc(), that finds in freetable a big enough junk of memory, and returns the offset into mem. simplefree() would then adjust the freetable.
EDITH:
if you need a lot of malloc()s, you may even devide your static mem into different chunks for different tasks (one chunk for exactly 100byte allocs, one fort the size of your favorite struct, and so on) this will speed up finding free mem.
if you are short of mem, you should implement a bestmatch() in simplemalloc(), which as a bad sideeffect slows the execution down.
if you have enough memory, you can implement a debugversion, which "allocates" a bit more memory and puts XXX before the start and after the end of the simplemalloc()ed mem. on free() you can check, if this XXX is broken, so you know you have some buffer-over or under-flow, which you can be aware of.

How many chars can be in a char array?

#define HUGE_NUMBER ???
char string[HUGE_NUMBER];
do_something_with_the_string(string);
I was wondering what would be the maximum number that I could add to a char array without risking any potential memory problems, buffer overflows or the like. I wanted to get user input into it, and possibly the maximum possible.

See this response by Jack Klein (see original post):
The original C standard (ANSI 1989/ISO
1990) required that a compiler
successfully translate at least one
program containing at least one
example of a set of environmental
limits. One of those limits was being
able to create an object of at least
32,767 bytes.
This minimum limit was raised in the
1999 update to the C standard to be at
least 65,535 bytes.
No C implementation is required to
provide for objects greater than that
size, which means that they don't need
to allow for an array of ints greater
than (int)(65535 / sizeof(int)).
In very practical terms, on modern
computers, it is not possible to say
in advance how large an array can be
created. It can depend on things like
the amount of physical memory
installed in the computer, the amount
of virtual memory provided by the OS,
the number of other tasks, drivers,
and programs already running and how
much memory that are using. So your
program may be able to use more or
less memory running today than it
could use yesterday or it will be able
to use tomorrow.
Many platforms place their strictest
limits on automatic objects, that is
those defined inside of a function
without the use of the 'static'
keyword. On some platforms you can
create larger arrays if they are
static or by dynamic allocation.
Now, to provide a slightly more tailored answer, DO NOT DECLARE HUGE ARRAYS TO AVOID BUFFER OVERFLOWS. That's close to the worst practice one can think of in C. Rather, spend some time writing good code, and carefully make sure that no buffer overflow will occur. Also, if you do not know the size of your array in advance, look at malloc, it might come in handy :P

It depends on where char string[HUGE_NUMBER]; is placed.
Is it inside a function? Then the array will be on the stack, and if and how fast your OS can grow stacks depends on the OS. So here is the general rule: dont place huge arrays on the stack.
Is it ouside a function then it is global (process-memory), if the OS cannot allocate that much memory when it tries to load your program, your program will crash and your program will have no chance to notice that (so the following is better:)
Large arrays should be malloc'ed. With malloc, the OS will return a null-pointer if the malloc failed, depending on the OS and its paging-scheme and memory-mapping-scheme this will either fail when 1) there is no continuous region of free memory large enough for the array or 2) the OS cannot map enough regions of free physical memory to memory that appears to your process as continous memory.
So, with large arrays do this:
char* largeArray = malloc(HUGE_NUMBER);
if(!largeArray) { do error recovery and display msg to user }

Declaring arbitrarily huge arrays to avoid buffer overflows is bad practice. If you really don't know in advance how large a buffer needs to be, use malloc or realloc to dynamically allocate and extend the buffer as necessary, possibly using a smaller, fixed-sized buffer as an intermediary.
Example:
#define PAGE_SIZE 1024 // 1K buffer; you can make this larger or smaller
/**
* Read up to the next newline character from the specified stream.
* Dynamically allocate and extend a buffer as necessary to hold
* the line contents.
*
* The final size of the generated buffer is written to bufferSize.
*
* Returns NULL if the buffer cannot be allocated or if extending it
* fails.
*/
char *getNextLine(FILE *stream, size_t *bufferSize)
{
char input[PAGE_SIZE]; // allocate
int done = 0;
char *targetBuffer = NULL;
*bufferSize = 0;
while (!done)
{
if(fgets(input, sizeof input, stream) != NULL)
{
char *tmp;
char *newline = strchr(input, '\n');
if (newline != NULL)
{
done = 1;
*newline = 0;
}
tmp = realloc(targetBuffer, sizeof *tmp * (*bufferSize + strlen(input)));
if (tmp)
{
targetBuffer = tmp;
*bufferSize += strlen(input);
strcat(targetBuffer, input);
}
else
{
free(targetBuffer);
targetBuffer = NULL;
*bufferSize = 0;
fprintf(stderr, "Unable to allocate or extend input buffer\n");
}
}
}

If the array is going to be allocated on the stack, then you are limited by the stack size (typically 1MB on Windows, some of it will be used so you have even less). Otherwise I imagine the limit would be quite large.
However, making the array really big is not a solution to buffer overflow issues. Don't do it. Use functions that have a mechanism for limiting the amount of buffer they use to make sure you don't overstep your buffer, and make the size something more reasonable (1K for example).

You can use malloc() to get larger portions of memory than normally an array could handle.

Well, a buffer overflow wouldn't be caused by too large a value for HUGE_NUMBER so much as too small compared to what was written to it (write to index HUGE_NUMBER or higher, and you've overflown the buffer).
Aside from that it will depend upon the machine. There are certainly systems that could handle several millions in the heap, and a million or so on the stack (depending on other pressures), but there are also certainly some that couldn't handle more than a few hundred (small embedded devices would be an obvious example). While 65,535 is a standard-specified minimum, a really small device could specify that the standard was deliberately departed from for this reason.
In real terms, on a large machine, long before you actually run out of memory, you are needlessly putting pressure on the memory in a way that would affect performance. You would be better off dynamically sizing an array to an appropriate size.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight