How to read() from a file continuosly into a variable

How to read() from a file continuosly into a variable - c

I am trying to perform a read() from a file of which I don't know it's exact size into a variable so that I can do stuff on it later on, so I am looping like this:
char buf[BUFSIZE];
char* contentsOfFile;
fd = open(file, O_RDONLY);
while ( (nbytes = read(fd, buf, sizeof(buf)) ) > 0) { // keep reading until the end of file or error
strcat(contentsOfFile, buf);
}
Of course, this explodes unless contentsOfFile is another char array, but I cannot do this as
I could have a bigger file than the number of bytes it could hold.
Is there any other library solution, or should I resort to malloc?

Use malloc. Find the size first (How do you determine the size of a file in C?) then malloc the appropriate number of bytes and do the read.

This is terrible code:
contentsOfFile is an unitialized pointer, so dereferencing it invokes UB
read returns raw bytes and never adds any terminating null (unformatted io), but strcat expects null terminated strings.
Without more context, it is hard to tell you what is the correct way. Possible ways are:
use mmap to map the file content into memory. After that, you can process it transparently and the OS will load and unload pages from the file when required
load everything into memory using malloc and realloc to make sure to have enough allocated memory for next read
load everything into memory using one single malloc and one single read after finding the file size.

Related

compression algorithm

I'm working on a compression algorithm wherein we have to write code in C. The program takes a file and removes the most significant bit in every character and stores the compressed text in another file. I wrote a function called compress as shown below. I'm getting a seg fault while freeing out_buf. Any help will be a great pleasure.

You close out_fd twice, so of course the second time it is an invalid file descriptor. But more than that, you need to review your use of sizeof() which is NOT the same as finding the buffer size of a dynamically-allocated buffer (sizeof returns the size of the pointer, not the buffer). You don't show the calling code, but using strcat() on a buffer passed-in is always worth a look too (is the buffer passed by the caller large enough for the result?).
Anyway, that should be enough to get you going again...

You're closing twice the same file descriptor
close(out_fd);
if ( close(out_fd) == -1 )
oops("Error closing output file", "");
Just remove the first close(out_fd)
The segmentation fault is because you moved the out_buf pointer.
If you want to put values inside his malloc'd area, use another temp pointer and move it through this memory area.
Like this:
unsigned char *out_buf = malloc(5400000*7/8);
unsigned char *tmp_buf = out_buf;
then subst every *out_buf++ with *tmp_buf++;
Change also the out_buf inside the write call with tmp_buf

C: Help finding memory faults

My program is written in C and it is a disk emulator. I finished writing it and it runs when I comment out certain lines of my test program, but I get a memory error where I un-comment them. I suspect it is with my char* 's.
The line I comment out (and where the program crashes) is
free(buffer);
where buffer is the char* that the string of bytes that was read into from the disk. It was initially allocated 30 bytes using malloc.
char* buffer = (char *) malloc(sizeof(char) * 30);
There is too much to just post it all here, so I am going to put the parts where I am writing/copying to char* 's in the hopes that someone will see what I am doing wrong.
I don't think it is anything too complicated, I am just not familiar enough with C the recognize obvious memory mistakes.
// In the event of a cache miss:
// block_buffer to pass to add_cache_entry
char cMissBuffer[BLOCK_SIZE];
// read content of block from disk
fread(cMissBuffer,sizeof(char),BLOCK_SIZE,diskEntity.pBlockStore);
// add to cache
if(1==add_cache_entry(i,cMissBuffer)) return 1;
.
.
.
// some of what is in add_cache_entry
int add_cache_entry(int v, char *block_buffer)
{
// ...
// construct a new queue element
QueueElement *block_to_cache = (QueueElement*)malloc(sizeof(QueueElement));
block_to_cache->blkidx = v;
block_to_cache->content=(char*)malloc(BLOCK_SIZE);
strcpy(block_to_cache->content,block_buffer);
// ...
}
In the test, BLOCK_SIZE is 5, QueueElement is a struct, content is a char* with BLOCK_BYTES of info.
Here is an excerpt from running the executable (dumping the queue)...I think that the lack of a '\0' could have something to do with the issue...
after adding cache entry (5):
DUMP:
BLOCK 5 FLAG:0 CONTENT:222220000000
BLOCK 4 FLAG:0 CONTENT:222220000000
BLOCK 3 FLAG:0 CONTENT:000000000000
BLOCK 2 FLAG:0 CONTENT:000000000000
BLOCK 1 FLAG:1 CONTENT:11100
I think I get extra space because malloc allocates more space than I require, but I read that is normal.
Any thoughts?

A probable cause for the behaviour is that strcpy() requires the source string to be null terminated, which is not the case here as fread() does not append a null terminator for you (nor could it in this case as fread() is reading the exact buffer size). strcpy() also appends a null terminator which means the strcpy() call will definitely be writing beyond the block_to_cache->content buffer.
If the data is not be used as a C style string use memcpy() to copy the data instead:
memcpy(block_to_cache->content, block_buffer, BLOCK_SIZE);
Other points:
Check the return value of fread(), to ensure it successfully populated the buffer before attempting to use it.
it is unnecessary to cast the return value of malloc() (see Do I cast the result of malloc?).
check return value of malloc() to ensure memory was successful allocated.
sizeof(char) is guaranteed to be one so can be removed from argument to malloc().

Understanding String assignments in C

Okay I've read through a massive amount of of the answers here on SO, and many other places but I just can't seem to grasp this simple function. Please forgive me for something so simple I haven't done c/c++ code in over 8 years and I'm very much trying to re-learn, so please have patience...
I've tried many different ways to do this from assigning a string through a function param by shifting in the value to just straight returning it, but nothing seems to work within the while. I also get no errors during compile time, but I do get segfaults at runtime. I would very much like to find out why the following function does not work... I just don't understand why the else returns fine as type char *content, but strcat(content, line); does not. Even though the man pages for strcat shows that strcat's definition should be (char *DEST, const char *SRC). As I currently understand it trying to do a cast to a const char on the line variable within the while would just return an integer to the pointer. So I'm stumped here and would like to be educated by those who have some time!
char * getPage(char *filename) {
FILE *pFile;
char *content;
pFile = fopen(filename, "r");
if (pFile != NULL) {
syslog(LOG_INFO,"Reading from:%s",filename);
char line [256];
while (fgets(line, sizeof line, pFile) != NULL) {
syslog(LOG_INFO,">>>>>>>Fail Here<<<<<<<");
strcat(content, line);
}
fclose(pFile);
} else {
content = "<!DOCTYPE html><html lang=\"en-US\"><head><title>Test</title></head><body><h1>Does Work</h1></body></html>";
syslog(LOG_INFO,"Reading from:%s failed, serving static response",filename);
}
return content;
}
Very much appreciate all the great answers in this post. I would give everyone in the discussion a checkmark but unfortunately I can't...

This is pretty simple, but very surprising if you're used to a higher-level language. C does not manage memory for you, and C doesn't really have strings. That content variable is a pointer, not a string. You have to manually allocate the space you need for the string before calling strcat. The correct way to write this code is something like this:
FILE *fp = fopen(filename, "r");
if (!fp) {
syslog(LOG_INFO, "failed to open %s: %s", filename, strerror(errno));
return xstrdup("<!DOCTYPE html><html lang=\"en-US\"><head><title>Test</title>"
"</head><body><h1>Does Work</h1></body></html>");
} else {
size_t capacity = 4096, offset = 0, n;
char *content = xmalloc(capacity);
size_t n;
while ((n = fread(content + offset, 1, capacity - offset, fp)) > 0) {
offset += n;
if (offset == capacity) {
capacity *= 2;
content = xrealloc(content, capacity);
}
}
if (n < 0)
syslog(LOG_INFO, "read error from %s: %s", filename, strerror(errno));
content[offset] = '\0';
fclose(fp);
return content;
}
Notes:
Error messages triggered by I/O failures should ALWAYS include strerror(errno).
xmalloc, xrealloc, and xstrdup are wrapper functions around their counterparts with no leading x; they crash the program rather than return NULL. This is almost always less grief than trying to recover from out-of-memory by hand in every single place where it can happen.
I return xstrdup("...") rather than "..." in the failed-to-open case so that the caller can always call free(content). Calling free on a string literal will crash your program.
Gosh, that was a lot of work, wasn't it? This is why people tend to prefer to write web apps in a higher-level language. ;-)

You need to allocate memory for content. It has to be big enough for the entire file the way you are doing it. You can either allocate a huge buffer up front and hope for the best, or allocate a smaller one and realloc it as needed.
Even better would be rearranging the code to avoid the need for storing the whole file all at once, although if your caller needs a whole web page as a string, that may be hard.
Note also that you need to return the same type of memory from both your code paths. You can't return a static string sometimes and a heap-allocated string other times. That's guaranteed to call headaches and/or memory leaks. So if you are copying the file contents into a block of memory, you should also copy the static string into the same type of block.

content is just a pointer to a string not an actual string - it has 0 bytes of space reserved for your string. You need to allocate memory large enough to hold hour string. Note that after you will have to free it
char *content=malloc(256);
And your code should be ok - oh and I suggest using strncat
The 2nd assignment to content worked ok before - because you are setting the pointer to point to your const string. If you change content to a malloc'ed region of memory - then you would also want to strncpy your fixed string into content.
Ideally if you can use C++ std::string.

char *foo is only a pointer to some piece of memory holding the characters that form the string. So you cannot use strcat because you don't have any memory to copy to. Inside the if statement you are allocating local memory on the stack with char line[256] that holds the line, but since that memory is local for the function is will disappear once it returns, so you cannot return line;.
So what you really want is to allocate some persistent memory, e.g. with strdup or malloc, so that you can return it from the function. Note that you cannot mix constants and allocated memory (because the user of your function must free the memory - which is only possible if it is not a constant).
So you could use something like this:
char * getPage(const char *filename) {
FILE *pFile;
char *content;
pFile = fopen(filename, "r");
if (pFile != NULL) {
syslog(LOG_INFO,"Reading from:%s",filename);
/* check the size and allocate memory */
fseek(pFile, 0, SEEK_END);
if (!(content = malloc(ftell(pfile) + 1))) { /* out of memory ... */ }
rewind(pFile);
/* set the content to be empty */
*content = 0;
char line [256];
while (fgets(line, sizeof line, pFile) != NULL) {
syslog(LOG_INFO,">>>>>>>Fail Here<<<<<<<");
strcat(content, line);
}
fclose(pFile);
} else {
content = strdup("<!DOCTYPE html><html lang=\"en-US\"><head><title>Test</title></head><body><h1>Does Work</h1></body></html>");
syslog(LOG_INFO,"Reading from:%s failed, serving static response",filename);
}
return content;
}
It is not the most efficient way of doing this (because strcat has to find the end every time), but the least modification of your code.

An earlier answer suggested the solution:
char content[256];
This buffer will not be large enough to hold anything but the smallest files and the pointer content goes out of scope when return content; is executed. (Your earlier line, content = "static.."; is fine, because the string is placed in the .rodata data segment and its pointer will always point to the same data, for the entire lifetime of the program.)
If you allocate the memory for content with malloc(3), you can "grow" the space required with realloc(3), but this introduces the potential for a horrible error -- whatever you handed the pointer to must clean up after the memory allocation when it is done with the data (or else you leak memory), and it cannot simply call free(3) because the content pointer might be to statically allocated memory.
So, you have two easy choices:
use strdup(3) to duplicate the static string each time you need it, and use content = malloc(size); for the non-static path
make your caller responsible for providing the memory; every call needs to provide sufficient memory to handle either the contents of the file or the static string.
I would probably prefer the first approach, if only because the size needed for the second approach cannot be known prior to the call.

content is a wild pointer; the variable contains garbage, so it's pointing somewhere into left field. When you copy data to it using strcat, the data goes to some random, probably bad, location. The cure for this is to make content point somewhere good. Since you want it to outlive your function call, it needs to be allocated someplace besides the function's call stack. You need to use malloc() to allocate some space on the heap. Then the caller will own the memory, and should call free() to delete it when it's no longer needed.
You'll need to change the else part that directly assigns to content, as well, to use strcpy, so that the free() will always be valid. You can't free something that you didn't allocate!
Through all of this code, make sure you remember how much space you allocated with malloc(), and don't write more data than you have space, or you'll get more crashes.

How to read a file using offsets in c

How can I read the contents of a file if I have to use the following parameters:
I have to read the file in parts by using "start-value" of the part and length of the part
The start-value and length of the parts will be read from another file
Overall, I am trying to compute the MD5 value of these parts (you can also call them as CHUNKS).
The start-value and length of the chunks have been computed and stored in a file.
I tried to use fread() as follows, but it does not give me logical results
char *chunk_buffer;
//chunk_buffer is a pointer to a memory block
while(cur_poly != NULL) {
//cur_poly is a structure which is used to store the start and length of chunks
chunk_buffer = (char*) malloc ((cur_poly->length)*8);
//here I am trying to allocate memory based on the size of each chunk
int x=fread (chunk_buffer,1, cur_poly->length, c_file);
//c_file is the file to be read according to the offsets
char hash[32];
hash=md5(chunk_buffer);
//md5() is a function which can generate the md5 hash values for the chunks
}

I see two potential issues.
What units does cur_poly->length represent? You are mallocing memory as if it is a count of 64-bit words, yet reading the file as if it is bytes. If the field represents length in bytes, then you are reading correctly, but allocating too much memory. However, if the field is length in 64-bit words, then you are allocating the right amount of memory, but only reading 1/8th the data.
The code seems to be ignoring offsets. (Or assuming all chunks must be contiguous). If you want to read from an arbitrary offset, do a fseek(fp, offset, SEEK_SET); before the fread.
If the chunks are supposed to be contiguous, there still may be padding at the ends to force them all to start on an even boundary. You would have to seek over the padding whenever the byte count was odd (.WAV does this, as an example)

I want to note some more issues with that code. You might need to add some more details on these points.
If you want to read consecutive chunks from your file, you usually don't need to modify the get pointer of your file. Just read a chunk, and then read the next one. If you need to read the chunks in random order, you need to use fseek. This way you adjust the start position of the next file operation by an offset (from beginning, or end of the file, or relative to the current position).
You have a char pointer chunk_buffer, that you obviously use to store the data from your file temporarily. That is, it's only valid for the current loop iteration.
If this is the case I would suggest to do the malloc once before you enter the loop:
char * chunk_buffer = malloc (MAXIMUM_CHUNK_SIZE);
in the loop you may clear this buffer using memset or just overwrite the data. Also note that malloc()ed memory is not initialized with '\0' values (I don't know if this is one assumption you rely on ...).
I am not sure, why you actually allocate a buffer of size length*8 and just read length bytes to it. Probably
int x = fread (chunk_buffer, SIZE_OF_ITEM, THIS_CHUNK_SIZE, c_file);
would fit your needs closer, if your items are indeed larger than a byte.
It is unclear, what the md5() function actually does. What value does it return? A pointer to a buffer that is allocated dynamically? A pointer to a local array? Anyway, you assign the return value to a pointer to a local array of chars. You might not need to allocate 32 bytes for this, but just
char * hash = md5 (chunk_buffer);
Make sure that you keep the pointer to that array somewhere you find it when the loop takes the next iteration. An array that is created statically in local scope of that function can of course not be passed this way.
Your md5() function. How does it know, what the size of a chunk is? It is passed a pointer, but not the size of the valid data (as far as I see it). You might need to adapt this function to take the length of the input array as additional parameter.
What does the md5() function produce, a C-style string (alphanumeric digits, null-terminated) or an array of byte sized unsigned integers (uint8_t) ?
make sure that you free() the memory you allocate dynamically. If you want to keep the malloc() inside the loop, make sure the loop always ends with
free (chunk_buffer);
For us to help you any further, you need to define
a) what are logical results for you and
b) what results do you get

Character arrays in C

I'm new to c. Just have a question about the character arrays (or string) in c: When I want to create a character array in C, do I have to give the size at the same time?
Because we may not know the size that we actually need. For example of client-server program, if we want to declare a character array for the server program to receive a message from the client program, but we don't know the size of the message, we could do it like this:
char buffer[1000];
recv(fd,buffer, 1000, 0);
But what if the actual message is only of length 10. Will that cause a lot of wasted memory?

Yes, you have to decide the dimension in advance, even if you use malloc.
When you read from sockets, as in the example, you usually use a buffer with a reasonable size, and dispatch data in other structure as soon you consume it. In any case, 1000 bytes is not a so much memory waste and is for sure faster than asking a byte at a time from some memory manager :)

Yes, you have to give the size if you are not initializing the char array at the time of declaration. Better approach for your problem is to identify the optimum size of the buffer at run time and dynamically allocate the memory.

What you're asking about is how to dynamically size a buffer. This is done with a dynamic allocation such as using malloc() -- a memory allocator. Using it gives you an important responsibility though: when you're done using the buffer you must return it to the system yourself. If using malloc() [or calloc()], you return it with free().
For example:
char *buffer; // pointer to a buffer -- essentially an unsized array
buffer = (char *)malloc(size);
// use the buffer ...
free(buffer); // return the buffer -- do NOT use it any more!
The only problem left to solve is how to determine the size you'll need. If you're recv()'ing data that hints at the size, you'll need to break the communication into two recv() calls: first getting the minimum size all packets will have, then allocating the full buffer, then recv'ing the rest.

When you don't know the exact amount of input data, do as follows:
Create a small buffer
Allocate some memory for a "storage" (e.g. twice of buffer size)
Fill the buffer with the data from the input stream (e.g. socket, file etc.)
Copy the data from the buffer to the storage
4.1 If there is not enough place in storage, re-allocate the memory (e.g. with a size twice bigger than it is at this point)
Do steps 3 and 4 unless the "END OF STREAM"
Your storage contains the data now.

If you don't know the size a-priori, then you have no choice but to create it dynamically using malloc (or whatever equivalent mechanism in your language of choice.)
size_t buffer_size = ...; /* read from a DEFINE or from a config file */
char * buffer = malloc( sizeof( char ) * (buffer_size + 1) );
Creating a buffer of size m, but only receiving an input string of size n with n < m is not a waste of memory, but an engineering compromise.
If you create your buffer with a size close to the intended input, you risk having to refill the buffer many, many times for those cases where m >> n. Typically, iterations over the buffer are tied up with I/O operations, so now you might be saving some bytes (which is really nothing in today's hardware) at the expense of potentially increasing the problems in some other end. Specially for client-server apps. If we were talking about resource-constrained embedded systems, that'd be another thing.
You should be worrying about getting your algorithms right and solid. Then you worry, if you can, about shaving off a few bytes here and there.
For me, I'd rather create a buffer that is 2 to 10 times greater than the average input (not the smallest input as in your case, but the average), assuming my input tends to have a slow standard deviation in size. Otherwise, I'd go 20 times the size or more (specially if memory is cheap and doing this minimizes hitting the disk or the NIC card.)
At the most basic setup, one typically gets the size of the buffer as a configuration item read off a file (or passed as an argument), and defaulting to a default compile time value if none is provided. Then you can adjust the size of your buffers according to the observed input sizes.
More elaborate algorithms (say TCP) adjust the size of their buffers at run-time to better accommodate input whose size might/will change over time.

Even if you use malloc you also must define the size first! So instead you give a large number that is capable of accepting the message like:
int buffer[2000];
In case of small message or large you can reallocate it to release the unused locations or to occupy the unused locations
example:
int main()
{
char *str;
/* Initial memory allocation */
str = (char *) malloc(15);
strcpy(str, "tutorialspoint");
printf("String = %s, Address = %u\n", str, str);
/* Reallocating memory */
str = (char *) realloc(str, 25);
strcat(str, ".com");
printf("String = %s, Address = %u\n", str, str);
free(str);
return(0);
}
Note: make sure to include stdlib.h library

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight