How to allocate memory for a String Dynamically? - c

I am trying to read Data from a Text file & storing it inside a structure having one char pointer & an int variable.
During fetching data from file I know that there will be one string to fetch & one integer value.
I also know the position form where I have to start fetching.
What I don't know is size of the string.
So, how can I allocate memory for that String.
Sample code is here :
struct filevalue
{
char *string;
int integer;
} value;
fseek(ptr,18,SEEK_SET);//seeking from start of file to position from where I get String
fscanf(ptr,"%s",value.string);//ptr is file pointer
fseek(ptr,21,SEEK_CUR);//Now seeking from current position
fscanf(ptr,"%d",value.integer);
Thanks in advance for your help.

Either
malloc the maximum possible length
read that much into the malloc'd block
figure out where the real end of the string is
write a \0 into your malloc'd block there so it behaves correctly as a nul-terminated string (and/or save the length too in case you need it)
optionally realloc your block to the correct size
Or
malloc a reasonable guesstimate N for the length
read that much
if you can't find the end of the string in that buffer:
grow the buffer with realloc to 2N (for example) and read the next N bytes into the end
goto 3
write a \0 etc. as above
You said in a comment that the max. string length is bounded, so the first approach is probably fine. You haven't said how you figure out where the string ends, but I'm assuming there is some delimiter, or it's right-filled with spaces, or something.

Did you mean to SEEK_CUR in your second fseek()? if so, then you know the length of the string. Used a fixed sized buffer.

If you know the position of the first structure, and the position of the second structure, you also know the total length of the first structure (position of second - position of first). You also know the size of the integer part of the structure, and therefore you can easily calculate the length of the string.
off_t pos1; /* Position of first structure */
off_t pos2; /* Position of second structure */
size_t struct_len = pos2 - pos1;
size_t string_len = struct_len - sizeof(int);

i assume you open the file in binary mode since you use fseek.
you could read from the file using fgetc() since you don't know the size just allocate a buffer with some initial size like 100, then read char by char placing them into the buffer. monitor if the buffer is large enough to hold the characters and if not realloc() the buffer to a larger size.

Related

Will assigning a large value for length of char string be an issue?

I am reading a line from a file and I do not know the length it is going to be. I know there are ways to do this with pointers but I am specifically asking for just a plan char string. For Example if I initialize the string like this:
char string[300]; //(or bigger)
Will having large string values like this be a problem?
Any hard coded number is potentially too small to read the contents of a file. It's best to compute the size at run time, allocate memory for the contents, and then read the contents.
See Read file contents with unknown size.
char string[300]; //(or bigger)
I am not sure which of the two issues you are concerned with, so I will try to address both below:
if the string in the file is larger than 300 bytes and you try to "stick" that string in that buffer, without accounting the max length of your array -you will get undefined behaviour because of overwriting the array.
If you are just asking if 300 bytes is too much too allocate - then no, it is not a big deal unless you are on some very restricted device. e.g. In Visual Studio the default stack size (where that array would be stored) is 1 MB if I am not wrong. Benefits of doing so is understandable, e.g. you don't need to concern yourself with freeing it etc.
PS. So if you are sure the buffer size you specify is enough - this can be fine approach as you free yourself from memory management related issues - which you get from pointers and dynamic memory.
Will having large string values like this be a problem?
Absolutely.
If your application must read the entire line from a file before processing it, then you have two options.
1) Allocate buffer large enough to hold the line of maximum allowed length. For example, the SMTP protocol does not allow lines longer than 998 characters. In that case you can allocate a static buffer of length 1001 (998 + \r + \n + \0). Once you have read a line from a file (or from a client, in the example context) which is longer than the maximum length (that is, you have read 1000 characters and the last one is not \n), you can treat it as a fatal (protocol) error and report it.
2) If there are no limitations on the length of the input line, the only thing you can do to ensure your program robustness is allocating buffers dynamically as the input is read. This may involve storing multiple malloc-ed buffers in a linked list, or calling realloc when buffer exhaustion detected (this is how getline function works, although it is not specified in the C standard, only in POSIX.1-2008).
In either case, never use gets to read the line. Call fgets instead.
It all depends on how you read the line. For example:
char string[300];
FILE* fp = fopen(filename, "r");
//Error checking omitted
fgets(string, 300, fp);
Taken from tutorialspoint.com
The C library function char *fgets(char *str, int n, FILE *stream) reads a line from the specified stream and stores it into the string pointed to by str. It stops when either (n-1) characters are read, the newline character is read, or the end-of-file is reached, whichever comes first.
That means that this will read 299 characters from the file at most. This will cause only a logical error (because you might not get all the data you need) that won't cause any undefined behavior.
But, if you do:
char string[300];
int i = 0;
FILE* fp = fopen(filename, "r");
do{
string[i] = fgetc(fp);
i++;
while(string[i] != '\n');
This will cause Segmantation Fault because it will try to write on unallocated memory on lines bigger than 300 characters.

How to save a specific length string from a file and work with it in C

So what I'm trying to do is open a file and read it until the end in blocks that are 256 bytes long each time it is called. My dilemma is using fgets() or fread() to do it.
I was using fgets() initially, because it returns a string of the bytes that were read, which is great because I can store that data and work with it. However, in my particular file that I'm reading, the 256 bytes often happen over a more than 2 lines, which is a problem because fgets() stops reading when it hits a newline character or the end of the file.
I then thought of using fread(), but I don't know how to save the line that I'm referring to with it because fread() returns an int referring to the number of elements successfully read (according to its documentation).
I've searched and thought of solutions for a while now and can't find anything that works with my particular scenario. I would like some guidance on how to go about this issue, how would you go about this in my position?
You can use fread() to read each 256 bytes block and keep a lineCount variable to keep track of the number of new line characters you have encountered so far in the input. Since you have to process the blocks already this wouldn't mean much of an overhead in the processing.
To read a block of 256 chars, which is what I think you are doing, you just need to create a buffer of chars that can hold 256 of them, in other words a char array of size 256.
#define BLOCK_SIZE 256
char block[BLOCK_SIZE];
Then if you check the documentation for fread() it shows the following signature:
Following is the declaration for fread() function.
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream)
Parameters
ptr -- This is the pointer to a block of memory with a minimum size of size*nmemb bytes.
size -- This is the size in bytes of each element to be read.
nmemb -- This is the number of elements, each one with a size of size bytes.
stream -- This is the pointer to a FILE object that an input stream.
So this means it takes a pointer to the buffer where it will write the read information, the size of each element it's supposed to read, the maximum amount of elements you want it to read and the file pointer. In your case it would be:
int read = fread(block, sizeof(char), BLOCK_SIZE, file);
This will copy the information from the file to the block array, which you can later process and keep track of the lines. The characters that were read by fread are in the block array, so the first char in the last read block would be block[0], the second block[1] and so on. The returned value in read indicates how many elements (in your case chars) were inserted in the array block when you call fread, this number will be equal to BLOCK_SIZE for every call, unless you reach the end of the file or there's an error.
I suggest you read some documentation for a full example, play a little with the code and do some reading on pointers in C to gain a better understanding of how everything works in general. If you still have questions after that, we can take it from there or you can create a new SO question.

C using fread to read an unknown amount of data

I have a text file called test.txt
Inside it will be a single number, it may be any of the following:
1
2391
32131231
3123121412
I.e. it could be any size of number, from 1 digit up to x digits.
The file will only have 1 thing in it - this number.
I want a bit of code using fread() which will read that number of bytes from the file and put it into an appropriately sized variable.
This is to run on an embedded device; I am concerned about memory usage.
How to solve this problem?
You can simply use:
char buffer[4096];
size_t nbytes = fread(buffer, sizeof(char), sizeof(buffer), fp);
if (nbytes == 0)
...EOF or other error...
else
...process nbytes of data...
Or, in other words, provide yourself with a data space big enough for any valid data and then record how much data was actually read into the string. Note that the string will not be null terminated unless either buffer contained all zeroes before the fread() or the file contained a zero byte. You cannot rely on a local variable being zeroed before use.
It is not clear how you want to create the 'appropriately sized variable'. You might end up using dynamic memory allocation (malloc()) to provide the correct amount of space, and then return that allocated pointer from the function. Remember to check for a null return (out of memory) before using it.
If you want to avoid over-reading, fread is not the right function. You probably want fscanf with a conversion specifier along the lines of %100[0123456789]...
One way to achieve this is to use fseek to move your file stream location to the end of the file:
fseek(file, SEEK_END, SEEK_SET);
and then using ftell to get the position of the cursor in the file — this returns the position in bytes so you can then use this value to allocate a suitably large buffer and then read the file into that buffer.
I have seen warnings saying this may not always be 100% accurate but I've used it in several instances without a problem — I think the issues could be dependant on specific implementations of the functions on certain platforms.
Depending on how clever you need to be with the number conversion... If you do not need to be especially clever and fast, you can read it a character at a time with getc(). So,
- start with a variable initialized to 0.
- Read a character, multiply variable by 10 and add new digit.
- Then repeat until done.
Get a bigger sized variable as needed along the way or start with your largest sized variable and then copy it into the smallest size that fits after you finish.

Scanning a file and allocating correct space to hold the file

I am currently using fscanf to get space delimited words. I establish a char[] with a fixed size to hold each of the extracted words. How would I create a char[] with the correct number of spaces to hold the correct number of characters from a word?
Thanks.
Edit: If I do a strdup on a char[1000] and the char[1000] actually only holds 3 characters, will the strdup reserve space on the heap for 1000 or 4 (for the terminating char)?
Here is a solution involving only two allocations and no realloc:
Determine the size of the file by seeking to the end and using ftell.
Allocate a block of memory this size and read the whole file into it using fread.
Count the number of words in this block.
Allocate an array of char * able to hold pointers to this many words.
Loop through the block of text again, assigning to each pointer the address of the beginning of a word, and replacing the word delimiter at the end of the word with 0 (the null character).
Also, a slightly philosophical matter: If you think this approach of inserting string terminators in-place and breaking up one gigantic string to use it as many small strings is ugly, hackish, etc. then you probably should probably forget about programming in C and use Python or some other higher-level language. The ability to do radically-more-efficient data manipulation operations like this while minimizing the potential points of failure is pretty much the only reason anyone should be using C for this kind of computation. If you want to go and allocate each word separately, you're just making life a living hell for yourself by doing it in C; other languages will happily hide this inefficiency (and abundance of possible failure points) behind friendly string operators.
There's no one-and-only way. The idea is to just allocate a string large enough to hold the largest possible string. After you've read it, you can then allocate a buffer of exactly the right size and copy it if needed.
In addition, you can also specify a width in your fscanf format string to limit the number of characters read, to ensure your buffer will never overflow.
But if you allocated a buffer of, say 250 characters, it's hard to imaging a single word not fitting in that buffer.
char *ptr;
ptr = (char*) malloc(size_of_string + 1);
char first = ptr[0];
/* etc. */

Text file to string array in plain c?

I want to load a txt file into an array like file() does in php. I want to be able to access different lines like array[N] (which should contain the entire line N from the file), then I would need to remove each array element after using it to the array will decrease size until reaching 0 and the program will finish. I know how to read the file but I have no idea how to fill a string array to be used like I said. I am using gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) to compile.
How can I achieve this?
Proposed algorithm:
Use fseek, ftell, fseek to seek to end, determine file length, and seek back to beginning.
malloc a buffer big enough for the whole file plus null-termination.
Use fread to read the whole file into the buffer, then write a 0 byte at the end.
Loop through the buffer byte-by-byte and count newlines.
Use malloc to allocate that number + 1 char * pointers.
Loop through the buffer again, assigning the first pointer to point to the beginning of the buffer, and successive pointers to point to the byte after a newline. Replace the newline bytes themselves with 0 (null) bytes in the process.
One optimization: if you don't need random access to the lines (indexing them by line number), do away with the pointer array and just replace all the newlines with 0 bytes. Then s+=strlen(s)+1; advances to the next line. You'll need to add some check to make sure you don't advance past the end (or beginning if you're doing this in reverse) of the buffer.
Either way, this method is very efficient (no memory fragmentation) but has a couple drawbacks:
You can't individually free lines; you can only free the whole buffer once you finish.
You have to overwrite the newlines. Some people prefer to have them kept in the in-memory structure.
If the file ended with a newline, the last "line" in your pointer array will be zero-length. IMO this is the sane interpretation of text files, but some people prefer considering the empty string after the last newline a non-line and considering the last proper line "incomplete" if it doesn't end with a newline.
I suggest you read your file into an array of pointers to strings which would allow you to index and delete the lines as you have specified. There are efficiency tradeoffs to consider with this approach as to whether you count the number of lines ahead of time or allocate/extend the array as you read each line. I would opt for the former.
Read the file, counting the number of line terminators you see (ether \n or \r\n)
Allocate a an array of char * of that size
Re-read the file, line by line, using malloc() to allocate a buffer for each and pointed to by the next array index
For your operations:
Indexing is just array[N]
Deleting is just freeing the buffer indexed by array[N] and setting the array[N] entry to NULL
UPDATE:
The more memory efficient approach suggested by #r.. and #marc-van-kempen is a good optimization over malloc()ing each line at a time, that is, slurp the file into a single buffer and replace all the line terminators with '\0'
Assuming you've done that and you have a big buffer as char *filebuf and the number of lines is int num_lines then you can allocate your indexing array something like this:
char *lines[] = (char **)malloc(num_lines + 1); // Allocates array of pointers to strings
lines[num_lines] = NULL; // Terminate the array as another way to stop you running off the end
char *p = filebuf; // I'm assuming the first char of the file is the start of the first line
int n;
for (n = 0; n < num_lines; n++) {
lines[i] = p;
while (*p++ != '\0') ; // Seek to the end of this line
if (n < num_lines - 1) {
while (*p++ == '\0') ; // Seek to the start the next line (if there is one)
}
}
With a single buffer approach "deleting" a line is merely a case of setting lines[n] to NULL. There is no free()
Two slightly different ways to achieve this, one is more memory friendly, the other more cpu friendly.
I memory friendly
Open the file and get its size (use fstat() and friends) ==> size
allocate a buffer of that size ==> char buf[size];
scan through the buffer counting the '\n' (or '\n\r' == DOS or '\r' == MAC) ==> N
Allocate an array: char *lines[N]
scan through the buffer again and point lines[0] to &buf[0], scan for the first '\n' or '\r' and set it to '\0' (delimiting the string), set lines[1] to the first character after that that is not '\n' or '\r', etc.
II cpu friendly
Create a linked list structure (if you don't know how to do this or don't want to, have a look at 'glib' (not glibc!), a utility companion of gtk.
Open the file and start reading the lines using fgets(), malloc'ing each line as you go along.
Keep a linked list of lines ==> list and count the total number of lines
Allocate an array: char *lines[N];
Go through the linked list and assign the pointer to each element to its corresponding array element
Free the linked list (not its elements!)

Resources