If I am trying to scan a string of an unknown length, would it be a good approach to scan the input one char at a time and build a linked list of chars to create the string? The only problem I am currently facing is I'm not sure how to handle the string one char at a time without asking the user to enter the string one char at a time, which would be unreasonable. Is there a better approach? I would like to avoid mallocing an arbitrarily large size char array just to accommodate most strings.
In my suggestion having a linked list of chars will be very bad idea as it would consume too much memory for a single string.
Instead, you allocate a nominal sized buffer (say 128 bytes) and keep reading the characters. Once you feel, the buffer is almost full, allocate another buffer of double the current size and copy the contents of first buffer into second one, freeing the first buffer. Like this, you can continue till your string has been read completely.
Also, in most of the programs I have written or seen, an upper limit for string size will be maintained and if the string input appears to be exceeding the size, the program will return an error. The upper limit for string size is determined based on the application context. For Ex: If the string which you are reading is a name it generally cannot exceed more than 32 (or some x value) characters, if it does, then the name is truncated to fit the buffer. In this way the buffer can be allocated first time itself for the upper limit size.
This is just one idea. There may be many other ways in which this can be addressed rather than a linked list.
Ignoring the overkill memory usage of a node-per-char linked-list for a moment and supposing you actually built it and got your string into it. Can you actually work with it?
For example:
The non-contiguous buffer means many of the standard functions (e.g. printf(), strlen(), fwrite()) would simply be incompatible or be a pain to work with.
Non-sequence access on the string would be extremely inefficient.
As for a better approach: it really depends on what you're going to do with the string. (For example, if you can process the string as it comes in, maybe you don't even need to hold the entire thing in memory.)
Store it in an array. Initialize the array with a fixed sized array and while reading the input store them in the array. When array is full and new input comes then create a larger array of double size, and copy old array in newer one. Now keep adding new chars in this array. Repeat the process till you have read all data.You can optimize the process of copying chars from old array to new array by following approach
1)Initialize a variable old_idx to 0
2) When a new char comes (after the old array is full) then create a new array of double size of older one and copy the new char at old_size+1 index. Also copy the data at index old_idx in old array at old_idx in newer array.
3)Increment old_idx
At the end just check that if old_idx < old_array_size then copy rest of old data.
Amortized cost of the whole process is pretty low and this is how ArrayList in java also works.
Advantages of Array over linklist are obvious
1) Less memory footprint
2)Faster linear access (as in array all the memory allocations for data happen in contiguous manner)
Related
How can I read in C a line from the console to not initialize a some array like int buf[30];? I need to allocate it once and required length, i.e. to I can know input characters count before read it...
Is it possible in C?
There is no way to know the number of characters available in standard input before reading them. You can, however, use getline(3), which will read until \n and then return the data in a dynamically-allocated buffer holding the data (along with the size of that buffer). You must free the buffer when you're done with it.
You should be aware that this will routine will block until it reads a newline. It's also difficult to use this routine safely, as malformed inputs are not handled well. (What if the input has no newline?) This is one of the reasons many applications often read a fixed length input.
I suspect that what you are requesting is about dynamic memory.
With dynamic memory we can create arrays with dynamic capacity, so the number of slots inside can be varied on run-time. That way, you don't need to decide at coding the size of a particular array.
To generate that kind of dynamic array you will need to create a pointer referring to a space in memory.
int *array;
Once we have that connection between memory and a variable, we now need to set how much memory do we want(how many slots inside the array).
array = (int *)malloc(sizeof(int) * numberOfSlots);
This function malloc its provided by an external library called stdlib.h.
It will request the computer for a space in the memory. That space is defined inside those brackets (). There you set the number of bytes you want to request.
If what we want is an array of integers, we multiply the size of an integer with the slots we need.
To access or modify data inside an array, you can keep it simple by using [], like this:
array[0] = 1;
Important note: Never access or modify data inside an array without requesting memory before!
To read the numbers of chars in a line, you can simple use a loop, and read letter by letter until you find that '/n' character.
I'm considering writing a function which creates an array to house data (in the form of characters for now) from a file using calloc.
My two most obvious options, as I understand, are reading all the characters to get the total size needed, use calloc to assign the needed space, and then use fseek to get to the begining of the file, and fill the array before returning the pointer to the array.
The second option would be to create a small initial array continuously add realloc to add chunks as needed as I copy, and in case of realloc faliure transfer all data to a new calloc of the new, larger, size before freeing the old calloc, and once all is read returning the array pointer.
The question is really, how likely is it for realloc to fail with large data sets, as if it is not I imagine the second approach will be advantageous in such cases.
As far as I understand, creating a large array and then shrinking as needed would be trickier, so I haven't listed it as an option; if I am wrong about that please mention it.
Whether you're judging quality by likelihood of out-of-memory or performance, the second and third cases need not be considered because the first case is a clear winner. Except don't read every character to get the total size needed. Use a binary file and seek to the end; get the position (thatll be the length) and then seek back to the start. This will be instantaneous in essentially every conceivable scenario and certainly no worse than reading every character. Certainly however effective realloc is, it can't be any better than allocating only once. And if you were judging quality by performance, you could have already tested it by now.
I am having difficulties to find a possible solution so I decided to post my question. I am writing a program in C, and:
i am generating a huge array containing a lot of pointers to ints, it is allocated dynamically and filled during runtime. So before I don't know which pointers will be added and how many. The problem is that they are just to many of them, so I need to shrink somehow the space.
IS there any package or tool available which could possibly encode my entries somehow or change the representation so that I save space?
Another question, I also thought about writing a file with my information, is this then kept in memory the whole time or just if I reopen the file again?
It seems like you are looking for a simple dynamic array (the advanced data type dynamic array, that is). There should be many implementations for this out there. You can simply start with a small dynamic array and push new items to the back just like you would do with a vector in c++ or java. One implementation would be GArray. You will only allocate the memory you need.
If you have to/want to do it manually, the usual method is to store the capacity and the size of the array you allocated along with the pointer in a struct and call realloc() from within push_back() whenever you need more space. Usually you should increase the size of your array by a factor of 1.3 to 1.4, but a factor of 2 will do if you're not expecting a HUGE array. If you call remove and your size is below a certain threshold (e.g. capacity/2) you shrink the array again with realloc();
I have a program that counts word occurrences in a text file and stores them in an array. So far I'm using a fixed array and everything works fine but now I'd like to change that to a dynamic array so there is never any memory wasted/required. I understand that malloc and realloc must be used to accomplish this but I don't really understand how to go about doing it.
My first idea was to simply count the words in the text file then malloc enough space for all of them but this will leave wasted space as duplicate words will have a counter increased, but not be added to the array again.
Does this approach sound like it makes sense and would be the best way to go about accomplishing it? If I first malloc a small array just enough to find one word and its counter. Then each time I find a new word that needs adding to the array just realloc enough to fit another word and counter in. If it's a duplicate no realloc will be needed as an existing counter will just be incremented.
It's generally best (in terms of trading speed vs memory usage) to not aim for 100% memory utilization; especially if your program only runs for a limited amount of time, using a bit more memory than needed really doesn't "cost" a lot, overall.
One typical approach is to make the dynamic array have an initial size, say 8 or 128 or whatever, then double it whenever it fills up.
This reduces the number of re-allocations (which are costly) compared to just incrementing the size by 1 when it fills up. Of course, it wastes some memory.
I'm reading a file into memory in C by copying the bytes to a dynamic array. Currently, I realloc() one byte larger each time a new byte comes in. This seems inefficient.
Some suggest (I can't remember where) that doubling the memory each time more is needed is good because it's O(log n) allocation time, with the only expense of a worst case of just under half of the memory being unused.
Any recommendations on memory allocation?
If you are loading the whole file into a string you could probably use the method outlined in this question. This way you can get the size of the file in bytes and allocate your string to hold that (Don't forget the extra byte for the null character).
However, if you are dynamically growing a string it is better to increase it's size by some factor that's larger than a single byte (reallocating a string each byte is going to be very slow, especially if the string has to be allocated in a new area of memory and then copied over). Since you are reading a file doubling it is probably very reasonable. I've seen people use other methods to do this as well, for example:
I've seen people round to the next power of 2, for example 2, 4, 8, then 16 bytes. (which is essentially doubling the size of the file each time).
I've also seen people use a value that's more suited for the strings they intend to read, ie. 100 bytes at a time.
If you over allocate the string you could always get that memory back at the end with a final reallocation to the exact size you need.
Do what some suggest (increase the size of the buffer by a multiplicative factor each time you need more room). I've done this many times and it works well. If you don't like the factor of two, you can use something else. I've used Phi (the golden ratio) to good effect.
I don't have a cite for this in front of me, and it is probably an implementation-specific detail, but I believe that power-of-2-resized pointers are what are used to resize C++ STL's string objects, as characters are continually added. (It should be easy to verify this by calling the string::capacity method as characters are added.)