C - varying size text - c

I have to write for my assignement a program that will consist of agents and a central server deamon. It will be a distributed shell - every command issued from a server will be also performed on every agent(the output will be sent back from every agent to central server).
I will have to deal with output commands (like ls -la /home/user/dir1) - on each agent the output may vary in size). The output of "find /" will also be BIG but I have to take somehow into account that something like that can happen. What is desired way of handling varying size outputs in C and operating on them? (saving to variable, sending it over a socket).

The way to deal with data of arbitrary size is to use dynamic allocation, i.e. the functions malloc(), realloc() and free(). You allocate and possibly grow the memory needed to store the command output.
Reading command output (assuming a Unix-like OS) is best done with popen().
Read the manuals of each of the mentioned functions for details.

Dynamic Memory Allocation
To hold your "variable length" strings, you should use dynamic memory allocation: the malloc family of functions.
#include <stdlib.h>
void *malloc(size_t size);
void free(void *ptr);
void *calloc(size_t nmemb, size_t size);
void *realloc(void *ptr, size_t size);
So, suppose you have your data stored in a variable char *ag_str. I suggest you malloc and then realloc the size of the buffer in blocks. Calling malloc and then realloc a thousand times to readjust the block size after each character is very costly.
So, you might do something like this:
#define BLOCK_SIZE 4096
struct mem_block {
size_t current_block_size;
size_t current_str_size;
char *ag_str;
};
struct mem_block *new_chunk(void)
{
struct mem_block *p = malloc(sizeof *p);
p->ag_str = malloc(BLOCK_SIZE);
p->current_block_size = BLOCK_SIZE;
p->current_str_size = 0;
return p;
}
void realloc_chunk(struct mem_block *chunk)
{
size_t ns = chunk->current_block_size + BLOCK_SIZE;
chunk->ag_str = realloc(chunk->ag_str, ns);
chunk->current_block_size = ns;
}
void cat_ag_str(struct mem_block *chunk, char *ag_str, size_t ag_len)
{
if (chunk->current_str_size + ag_len > chunk->current_block_size)
realloc_chunk(chunk);
strncat (chunk->ag_str, ag_str, ag_len);
chunk->current_str_size += ag_len;
}
void receive_from_agent(...)
{
struct mem_block *chunk = new_chunk();
ssize_t c; // Linux read/recv return
size_t count;
char buff[BLOCK_SIZE];
while((c = read(your_fd, buff, BLOCK_SIZE)) // or probably recv()
if (c < 0) ...
count = (size_t)c;
cat_ag_str(chunk, buff, count);
(...)
}
Note that this code was not tested and is just an idea for you. (Error checking was omitted)
struct mem_block: This will keep information about your current memory block.
new_chunk: function to create a new chunk handler for you.
realloc_chunk: anytime the amount of characters that must be written exceeds the amount of characters available in the chunk, we get one more block.
cat_ag_str: this will append what you just read to the memory block you have, effectively transforming chunks of data into one coherent big buffer.
receive_from_agent: this is the entry point of your receiving loop. You may use read or recv, I don't know which you use, but both return the amount of bytes read, which you'll use to pass to cat_ag_str.
It's important to note that you're reading in the same sized blocks as you realloc. (You can read in smaller chunks too, but never bigger).
You can do roughly the same for sending, but you don't need all that workaround for memory. You can just use a fixed sized buffer and copy data from your big string to it in fixed sizes, then you send the fixed-sized buffer.

Related

How to initialize array size in a library in C?

I'm creating a C-library with .h and .c files for a ring buffer. Ideally, you would initialize this ring buffer library in the main project with something like ringbuff_init(int buff_size); and the size that is sent, will be the size of the buffer. How can I do this when arrays in C needs to be initialized statically?
I have tried some dynamically allocating of arrays already, I did not get it to work. Surely this task is possible somehow?
What I would like to do is something like this:
int buffSize[];
int main(void)
{
ringbuffer_init(100); // initialize buffer size to 100
}
void ringbuffer_init(int buff_size)
{
buffSize[buff_size];
}
This obviously doesn't compile because the array should have been initialized at the declaration. So my question is really, when you make a library for something like a buffer, how can you initialize it in the main program (so that in the .h/.c files of the buffer library) the buffer size is set to the wanted size?
You want to use dynamic memory allocation. A direct translation of your initial attempt would look like this:
size_t buffSize;
int * buffer;
int main(void)
{
ringbuffer_init(100); // initialize buffer size to 100
}
void ringbuffer_init(size_t buff_size)
{
buffSize = buff_size;
buffer = malloc(buff_size * sizeof(int));
}
This solution here is however extremely bad. Let me list the problems here:
There is no check of the result of malloc. It could return NULL if the allocation fails.
Buffer size needs to be stored along with the buffer, otherwise there's no way to know its size from your library code. It isn't exactly clean to keep these global variables around.
Speaking of which, these global variables are absolutely not thread-safe. If several threads call functions of your library, results are inpredictible. You might want to store your buffer and its size in a struct that would be returned from your init function.
Nothing keeps you from calling the init function several times in a row, meaning that the buffer pointer will be overwritten each time, causing memory leaks.
Allocated memory must be eventually freed using the free function.
In conclusion, you need to think very carefully about the API you expose in your library, and the implementation while not extremely complicated, will not be trivial.
Something more correct would look like:
typedef struct {
size_t buffSize;
int * buffer;
} RingBuffer;
int ringbuffer_init(size_t buff_size, RingBuffer * buf)
{
if (buf == NULL)
return 0;
buf.buffSize = buff_size;
buf.buffer = malloc(buff_size * sizeof(int));
return buf.buffer != NULL;
}
void ringbuffer_free(RingBuffer * buf)
{
free(buf.buffer);
}
int main(void)
{
RingBuffer buf;
int ok = ringbuffer_init(100, &buf); // initialize buffer size to 100
// ...
ringbuffer_free(&buf);
}
Even this is not without problems, as there is still a potential memory leak if the init function is called several times for the same buffer, and the client of your library must not forget to call the free function.
Static/global arrays can't have dynamic sizes.
If you must have a global dynamic array, declare a global pointer instead and initialize it with a malloc/calloc/realloc call.
You might want to also store its size in an accompanying integer variable as sizeof applied to a pointer won't give you the size of the block the pointer might be pointing to.
int *buffer;
int buffer_nelems;
char *ringbuffer_init(int buff_size)
{
assert(buff_size > 0);
if ( (buffer = malloc(buff_size*sizeof(*buffer)) ) )
buffer_nelems = buff_size;
return buffer;
}
You should use malloc function for a dynamic memory allocation.
It is used to dynamically allocate a single large block of memory with the specified size. It returns a pointer of type void which can be cast into a pointer of any form.
Example:
// Dynamically allocate memory using malloc()
buffSize= (int*)malloc(n * sizeof(int));
// Initialize the elements of the array
for (i = 0; i < n; ++i) {
buffSize[i] = i + 1;
}
// Print the elements of the array
for (i = 0; i < n; ++i) {
printf("%d, ", buffSize[i]);
}
I know I'm three years late to the party, but I feel I have an acceptable solution without using dynamic allocation.
If you need to do this without dynamic allocation for whatever reason (I have a similar issue in an embedded environment, and would like to avoid it).
You can do the following:
Library:
int * buffSize;
int buffSizeLength;
void ringbuffer_init(int buff_size, int * bufferAddress)
{
buffSize = bufferAddress;
buffSizeLength = buff_size;
}
Main :
#define BUFFER_SIZE 100
int LibraryBuffer[BUFFER_SIZE];
int main(void)
{
ringbuffer_init(BUFFER_SIZE, LibraryBuffer ) // initialize buffer size to 100
}
I have been using this trick for a while now, and it's greatly simplified some parts of working with a library.
One drawback: you can technically mess with the variable in your own code, breaking the library. I don't have a solution to that yet. If anyone has a solution to that I would love to here it. Basically good discipline is required for now.
You can also combine this with #SirDarius 's typedef for ring buffer above. I would in fact recommend it.

Is it possible to increase char array while using it, WITHOUT malloc?

I have a char array, we know that that a char size is 1 byte. Now I have to collect some char -> getchar() of course and simultaneously increase the array by 1 byte (without malloc, only library: stdio.h)
My suggestion would be, pointing to the array and somehow increase that array by 1 till there are no more chars to get OR you run out of Memory...
Is it possible to increase char array while using it, WITHOUT malloc?
No.
You cannot increase the size of a fixed size array.
For that you need realloc() from <stdlib.h>, which it seems you are not "allowed" to use.
Is it possible to increase char array while using it, WITHOUT malloc?
Quick answer: No it is not possible to increase the size of an array without reallocating it.
Fun answer: Don't use malloc(), use realloc().
Long answer:
If the char array has static or automatic storage class, it is most likely impossible to increase its size at runtime because keeping it at the same address that would require objects that are present at higher addresses to be moved or reallocated elsewhere.
If the array was obtained by malloc, it might be possible to extend its size if no other objects have been allocated after it in memory. Indeed realloc() to a larger size might return the same address. The problem is it is impossible to predict and if realloc returns a different address, the current space has been freed so pointers to it are now invalid.
The efficient way to proceed with this reallocation is to increase the size geometrically, by a factor at a time, 2x, 1.5x, 1.625x ... to minimize the number of reallocations and keep linear time as the size of the array grows linearly. You would a different variable for the allocated size of the array and the number of characters that you have stored into it.
Here is an example:
#include <stdio.h>
#include <stdlib.h>
int main(void) {
char *a = NULL;
size_t size = 0;
size_t count = 0;
int c;
while ((c = getchar()) != EOF && c != '\n') {
if (count >= size) {
/* reallocate the buffer to 1.5x size */
size_t newsize = size + size / 2 + 16;
char *new_a = realloc(a, new_size);
if (new_a == NULL) {
fprintf("out of memory for %zu bytes\n", new_size);
free(a);
return 1;
}
a = new_a;
size = new_size;
}
a[count++] = c;
}
for (i = 0; i < count; i++) {
putchar(a[i]);
}
free(a);
return 0;
}
There are two ways to create space for the string without using dynamic memory allocation(malloc...). You can use a static array or an array with automatic storage duration, you need to specify a maximum amount, you might never reach. But always check against it.
#define BUFFER_SIZE 0x10000
Static
static char buffer[BUFFER_SIZE];
Or automatic (You need to ensure BUFFER_SIZE is smaller than the stack size)
int main() {
char buffer[BUFFER_SIZE];
...
};
There are also optimizations done by the operating system. It might lazily allocate the whole (static/automatic) buffer, so that only the used part is in the physical memory. (This also applies to the dynamic memory allocation functions.) I found out that calloc (for big chunks) just allocates the virtual memory for the program; memory pages are cleared only, when they are accessed (probably through some interrupts raised by the cpu). I compared it to an allocation with malloc and memset. The memset does unnessecary work, if not all bytes/pages of the buffer are accessed by the program.
If you cannot allocate a buffer with malloc..., create a static/automatic array with enough size and let the operating system allocate it for you. It does not occupy the same space in the binary, because it is just stored as a size.

Obtain size of array via write permission check

To obtain the length of a null terminated string,we simply write len = strlen(str) however,i often see here on SO posts saying that to get the size of an int array for example,you need to keep track of it on your own and that's what i do normally.But,i have a question,could we obtain the size by using some sort of write permission check,that checks if we have writing permissions to a block of memory? for example :
#include <stdio.h>
int getSize(int *arr);
bool permissionTo(int *ptr);
int main(void)
{
int arr[3] = {1,2,3};
int size = getSize(arr) * sizeof(int);
}
int getSize(int *arr)
{
int *ptr = arr;
int size = 0;
while( permissionTo(ptr) )
{
size++;
ptr++;
}
return size;
}
bool permissionTo(int *ptr)
{
/*............*/
}
No, you can't. Memory permissions don't have this granularity on most, if not all, architectures.
Almost all CPU architectures manage memory in pages. On most things you'll run into today one page is 4kB. There's no practical way to control permissions on anything smaller than that.
Most memory management is done by your libc allocating a large:ish chunk of memory from the kernel and then handing out smaller chunks of it to individual malloc calls. This is done for performance (among other things) because creating, removing or modifying a memory mapping is an expensive operation especially on multiprocessor systems.
For the stack (as in your example), allocations are even simpler. The kernel knows that "this large area of memory will be used by the stack" and memory accesses to it just simply allocates the necessary pages to back it. All tracking your program does of stack allocations is one register.
If you are trying to achive, that an allocation becomes comfortable to use by carrying its own size around then do this:
Wrap malloc and free by prefixing the memory with its size internally (written from memory, not tested yet):
void* myMalloc(long numBytes) {
char* mem = malloc(numBytes+sizeof(long));
((long*)mem)[0] = numBytes;
return mem+sizeof(long);
}
void myFree(void* memory) {
char* mem = (char*)memory-sizeof(long);
free(mem)
}
long memlen(void* memory) {
char* mem = (char*)memory-sizeof(long);
return ((long*)mem)[0];
}

malloc alternative for memory allocation as a stack

I am looking for a malloc alternative for c that will only ever be used as a stack. Something more like alloca but not limited in space by the stack size. It is for coding a math algorithm.
I will work with large amounts of memory (possibly hundreds of megabytes in use in the middle of the algorithm)
memory is accessed in a stack-like order. What I mean is that the next memory to be freed is always the memory that was most recently allocated.
would like to be able to run an a variety of systems (Windows and Unix-like)
as an extra, something that can be used with threading, where the stack-like allocate and free order applies just to individual threads. (ie ideally each thread has its own "pool" for memory allocation)
My question is, is there anything like this, or is this something that would be easy to implement?
This sounds like a perfect use for Obstack.
I've never used it myself since the API is really confusing, and I can't dig up an example right now. But it supports all the operations you want, and additionally supports streaming creation of the "current" object.
Edit: whipped up a quick example. The Obstack API shows signs of age, but the principle is sound at least.
You will probably want to look into tuning the align/block settings and likely use obstack_next_free and obstack_object_size if you do any fancy growing.
#include <obstack.h>
#include <stdio.h>
#include <stdlib.h>
void *xmalloc(size_t size)
{
void *rv = malloc(size);
if (rv == NULL)
abort();
return rv;
}
#define obstack_chunk_alloc xmalloc
#define obstack_chunk_free free
const char *cat(struct obstack *obstack_ptr, const char *dir, const char *file)
{
obstack_grow(obstack_ptr, dir, strlen(dir));
obstack_1grow(obstack_ptr, '/');
obstack_grow0(obstack_ptr, file, strlen(file));
return obstack_finish(obstack_ptr);
}
int main()
{
struct obstack main_stack;
obstack_init(&main_stack);
const char *cat1 = cat(&main_stack, "dir1", "file1");
const char *cat2 = cat(&main_stack, "dir1", "file2");
const char *cat3 = cat(&main_stack, "dir2", "file3");
puts(cat1);
puts(cat2);
puts(cat3);
obstack_free(&main_stack, cat2);
// cat2 and cat3 both freed, cat1 still valid
}
As you already found out, as long as it works with malloc you should use it and only come back when you need to squeeze out the last bit of performance.
An idea fit that case: You could use a list of blocks, that you allocate when needed. Using a list makes it possible to eventually swap out data in case you hit the virtual memory limit.
struct block {
size_t size;
void * memory;
struct block * next;
};
struct stacklike {
struct block * top;
void * last_alloc;
};
void * allocate (struct stacklike * a, size_t s) {
// add null check for top
if (a->top->size - (a->next_alloc - a->top->memory) < s + sizeof(size_t)) {
// not enough memory left in top block, allocate new one
struct block * nb = malloc(sizeof(*nb));
nb->next = a->top;
a->top = nb;
nb->memory = malloc(/* some size large enough to hold multiple data entities */);
// also set nb size to that size
a->next_alloc = nb->memory;
}
void * place = a->next_alloc;
a->next_alloc += s;
*((size_t *) a->next_alloc) = s; // store size to be able to free
a->next_alloc += sizeof (size_t);
return place;
}
I hope this shows the general idea, for an actual implementation there's much more to consider.
To swap out stuff you change that to a doubly linked list an keep track of the total allocated bytes. If you hit a limit, write the end to some file.
I have seen a strategy used in an old FORTRAN program that might be what you are looking for. The strategy involves use of a global array that is passed down to each function from main.
char global_buffer[SOME_LARGE_SIZE];
void foo1(char* buffer, ...);
void foo2(char* buffer, ...);
void foo3(char* buffer, ...);
int main()
{
foo1(global_buffer, ....);
}
void foo1(char* buffer, ...)
{
// This function needs to use SIZE1 characters of buffer.
// It can let the functions that it calls use buffer+SIZE1
foo2(buffer+SIZE1, ...);
// When foo2 returns, everything from buffer+SIZE1 is assumed
// to be free for re-use.
}
void foo2(char* buffer, ...)
{
// This function needs to use SIZE2 characters of buffer.
// It can let the functions that it calls use buffer+SIZE2
foo3(buffer+SIZE2, ...);
}
void foo3(char* buffer, ...)
{
// This function needs to use SIZE3 characters of buffer.
// It can let the functions that it calls use buffer+SIZE3
bar1(buffer+SIZE3, ...);
}

Static pointer to dynamically allocated buffer inside function

I have a function in C that dynamically allocates a buffer, which is passed to another function to store its return value. Something like the following dummy example:
void other_function(float in, float *out, int out_len) {
/* Fills 'out' with 'out_len' values calculated from 'in' */
}
void function(float *data, int data_len, float *out) {
float *buf;
int buf_len = 2 * data_len, i;
buf = malloc(sizeof(float) * buf_len);
for (i = 0; i < data_len; i++, data++, out++) {
other_function(*data, buf, buf_len);
/* Do some other stuff with the contents of buf and write to *out */
}
free buf;
}
function is called by an iterator over a multi-dimensional array (it's a NumPy gufunc kernel, to be precise), so it gets called millions of times with the same value for data_len. It seems wasteful to be creating and destroying the buffer over and over again. I would normally move allocation of the buffer to the function that calls function, and pass a poiinter to it, but I don't directly control that, so not possible. Instead, I am considering doing the following:
void function(float *data, int data_len, float *out) {
static float *buf = NULL;
static int buf_len = 0;
int i;
if (buf_len != 2 * data_len) {
buf_len = 2 * data_len;
buf = realloc(buf, sizeof(float) * buf_len); /* same as malloc if buf == NULL */
}
for (i = 0; i < data_len; i++, data++, out++) {
other_function(*data, buf, buf_len);
/* Do some other stuff with the contents of buf and write to *out */
}
}
That means that I never directly free the memory I allocate: it gets reused in subsequent calls, and then lingers there until my program exits. It doesn't seem like the right thing to do, but not too bad either, as the amount of memory allocated is always going to be small. Am I over worrying? Is there a better approach to this?
This approach is legitimate (but see below), although tools like valgrind will incorrectly flag it as a "leak". (It's not a leak, as a leak is an unbounded increase in memory usage.) You might want to benchmark exactly how much time is lost on malloc and free compared to other things the function is doing.
If you can use C99 or gcc, and if your buffer is not overly large, you should also consider variable-length arrays, which are as fast (or faster than) a static buffer, and create no fragmentation. If you're on another compiler, you can look into the non-standard (but widely supported) alloca extension.
You do need to be aware that using a static buffer makes your function:
Thread-unsafe - if it is called from multiple threads simultaneously, it will destroy the data of the other instance. If the Python is called from numpy, this is probably not a problem, as threads will be effectively serialized by the GIL.
Non-reentrant - if other_function calls some Python code which ends up calling function - for whatever reason - before function finishes, your function will again destroy its own data.
If you don't need true parallel execution and reentrancy, this use of static variables is fine, and a lot of C code uses them that way.
This is a fine approach, and something like this is likely used internally by many libraries. The memory will be freed automatically when the program exits.
You might want to round buf_len up to a multiple of some block size, so you don't realloc() every time data_len changes a small bit. But if data_len is almost always the same size, this isn't necessary.

Resources