Seg fault with open command when trying to open very large file

Seg fault with open command when trying to open very large file - c

I'm taking a networking class at school and am using C/GDB for the first time. Our assignment is to make a webserver that communicates with a client browser. I am well underway and can open files and send them to the client. Everything goes great till I open a very large file and then I seg fault. I'm not a pro at C/GDB so I'm sorry if that is causing me to ask silly questions and not be able to see the solution myself but when I looked at the dumped core I see my seg fault comes here:
if (-1 == (openfd = open(path, O_RDONLY)))
Specifically we are tasked with opening the file and the sending it to the client browser. My Algorithm goes:
Open/Error catch
Read the file into a buffer/Error catch
Send the file
We were also tasked with making sure that the server doesn't crash when SENDING very large files. But my problem seems to be with opening them. I can send all my smaller files just fine. The file in question is 29.5MB.
The whole algorithm is:
ssize_t send_file(int conn, char *path, int len, int blksize, char *mime) {
int openfd; // File descriptor for file we open at path
int temp; // Counter for the size of the file that we send
char buffer[len]; // Buffer to read the file we are opening that is len big
// Open the file
if (-1 == (openfd = open(path, O_RDONLY))) {
send_head(conn, "", 400, strlen(ERROR_400));
(void) send(conn, ERROR_400, strlen(ERROR_400), 0);
logwrite(stdout, CANT_OPEN);
return -1;
}
// Read from file
if (-1 == read(openfd, buffer, len)) {
send_head(conn, "", 400, strlen(ERROR_400));
(void) send(conn, ERROR_400, strlen(ERROR_400), 0);
logwrite(stdout, CANT_OPEN);
return -1;
}
(void) close(openfd);
// Send the buffer now
logwrite(stdout, SUC_REQ);
send_head(conn, mime, 200, len);
send(conn, &buffer[0], len, 0);
return len;
}
I dunno if it is just a fact that a I am Unix/C novice. Sorry if it is. =( But you're help is much appreciated.

It's possible I'm just misunderstanding what you meant in your question, but I feel I should point out that in general, it's a bad idea to try to read the entire file at once, in case you deal with something that's just too big for your memory to handle.
It's smarter to allocate a buffer of a specific size, say 8192 bytes (well, that's what I tend to do a lot, anyway), and just always read and send that much, as much as necessary, until your read() operation returns 0 (and no errno set) for end of stream.

I suspect you have a stackoverflow (I should get bonus points for using that term on this site).
The problem is you are allocating the buffer for the entire file on the stack all at once. For larger files, this buffer is larger than the stack, and the next time you try to call a function (and thus put some parameters for it on the stack) the program crashes.
The crash appears at the open line because allocating the buffer on the stack doesn't actually write any memory, it just changes the stack pointer. When your call to open tries tow rite the parameters to the stack, the top of the stack is now overflown and this causes a crash.
The solution is as Platinum Azure or dreamlax suggest, read in the file little bits at a time or allocate your buffer on the heap will malloc or new.

Rather than using a variable length array, perhaps try allocated the memory using malloc.
char *buffer = malloc (len);
...
free (buffer);
I just did some simple tests on my system, and when I use variable length arrays of a big size (like the size you're having trouble with), I also get a SEGFAULT.

You're allocating the buffer on the stack, and it's way too big.
When you allocate storage on the stack, all the compiler does is decrease the stack pointer enough to make that much room (this keeps stack variable allocation to constant time). It does not try to touch any of this stacked memory. Then, when you call open(), it tries to put the parameters on the stack and discovers it has overflowed the stack and dies.
You need to either operate on the file in chunks, memory-map it (mmap()), or malloc() storage.
Also, path should be declared const char*.

Related

open a temporary C FILE* for input

I have a legacy function accepting a FILE* pointer in a library. The contents I would like to parse is actually in memory, not on disk.
So I came up with the following steps to work around this issue:
the data is in memory at this point
fopen a temporary file (using tmpnam or tmpfile) on disk for writing
fclose the file
fopen the same file again for reading - guaranteed to exist
change the buffer using setvbuf(buffer, size)
do the legacy FILE* stuff
close the file
remove the temporary file
the data can be discarded
On windows, it looks like this:
int bufferSize;
char buffer[bufferSize];
// set up the buffer here
// temporary file name
char tempName [L_tmpnam_s];
tmpnam_s(tempName, L_tmpnam_s);
// open/close/reopen
fopen_s(&fp, tempName,"wb");
fclose(fp);
freopen_s(&fp, tempName,"rb", fp);
// replace the internal buffer
setvbuf(fp, buffer, _IONBF, bufferSize);
fp->_ptr = buffer;
fp->_cnt = bufferSize;
// do the FILE* reading here
// close and remove tmp file
fclose(fp);
remove(tempName);
Works, but quite cumbersome. The main problem, aside from the backwardness of this approach, are:
the temporary name needs to be determined
the temporary file is actually written to disk
the temporary file needs to be removed afterwards
I'd like to keep things portable, so using Windows memory-mapped functions or boost's facilities is not an option. The problem is mainly that, while it is possible to convert a FILE* to an std::fstream, the reverse seems to be impossible, or at least not supported on C++99.
All suggestions welcome!
Update 1
Using a pipe/fdopen/setvbuf as suggested by Speed8ump and a bit of twiddling seems to work. It does no longer create files on disk nor does it consume extra memory. One step closer, except, for some reason, setvbuf is not working as expected. Manually fixing it up is possible, but of course not portable.
// create a pipe for reading, do not allocate memory
int pipefd[2];
_pipe(pipefd, 0, _O_RDONLY | _O_BINARY);
// open the read pipe for binary reading as a file
fp = _fdopen(pipefd[0], "rb");
// try to switch the buffer ptr and size to our buffer, (no buffering)
setvbuf(fp, buffer, _IONBF, bufferSize);
// for some reason, setvbuf does not set the correct ptr/sizes
fp->_ptr = buffer;
fp->_charbuf = fp->_bufsiz = fp->_cnt = bufferSize;
Update 2
Wow. So it seems that unless I dive into the MS-specific implementation CreateNamedPipe / CreateFileMapping, POSIX portability costs us an entire memcopy (of any size!), be it to file or into a pipe. Hopefully the compiler understands that this is just a temporary and optimizes this. Hopefully.
Still, we eliminated the silly device writing intermediate. Yay!
int pipefd[2];
pipe(pipefd, bufferSize, _O_BINARY); // setting internal buffer size
FILE* in = fdopen(pipefd[0], "rb");
FILE* out = fdopen(pipefd[1], "wb");
// the actual copy
fwrite(buffer, 1, bufferSize, out);
fclose(out);
// fread(in), fseek(in), etc..
fclose(in);

You might try using a pipe and fdopen, that seems to be portable, is in-memory, and you might still be able to do the setvbuf trick you are using.

Your setvbuf hack is a nice idea, but not portable. C11 (n1570):
7.21.5.6 The setvbuf function
Synopsis
#include <stdio.h>
int setvbuf(FILE * restrict stream,
char * restrict buf,
int mode, size_t size);
Description
[...] If buf is not a null pointer, the array it points to may be used instead of a buffer allocated by the setvbuf function [...] and the argument size specifies the size of the array; otherwise, size may determine the size of a buffer allocated by the setvbuf function. The contents of the array at any time are indeterminate.
There is neither a guarantee that the provided buffer is used at all, nor about what it contains at any point after the setvbuf call until the file is closed or setvbuf is called again (POSIX doesn't give more guarantees).
The easiest portable solution, I think, is using tmpfile, fwrite the data into that file, fseek to the beginning (I'm not sure if temporary files are guaranteed to be seekable, on my Linux system, it appears they are, and I'd expect them to be elsewhere), and pass the FILE pointer to the function. This still requires copying in memory, but I guess usually no writing of the data to the disk (POSIX, unfortunately, implicitly requires a real file to exist). A file obtained by tmpfile is deleted after closing.

Reading all buffers except for the last one (in C)

I want to read all buffers from a pipe except for the last one. This is my current code:
while(read(server_to_client,serverString2,sizeof(serverString2))){
printf("Client : PID %d",getpid());
printf("-Target>>%s<<", clientString2);
printf(serverString2);
}
The problem with that is it reads everything from the buffer. How can I avoid reading the last buffer?

You can't. The question does not even make sense.
The question supposes that a "buffer" is a meaningful unit of measure for your data, but it is not. In particular, the third argument to read(2) is a maximum number of bytes to read, but the call may actually transfer fewer bytes for a large number reasons, with reaching the end of the data being only one. Other reasons are in fact a lot more likely to manifest when the file descriptor being read is connected to a pipe, as you say yours is, than when it is connected to a file. Note that this means you must always capture read()'s return value if you intend to examine the data it reads, for otherwise you cannot know how much of the buffer contains valid data.
More generally, you cannot tell from an open file descriptor for a pipe how much data is available to be read from it. You need to include that information in your protocol (for example, HTTP's Content-Length header), or somehow communicate it out-of-band. That still doesn't tell you how much data is available to be read right now, but it can help you determine when to stop trying to read more.
Edited to add:
If you ask because you want to avoid dealing with partially-filled buffers, then you are flat out of luck. At minimum you need to be prepared for a partially-filled buffer when the data are prematurely truncated. Unless the total size of the data to be transferred is certain to be a multiple of the chosen buffer size, you will also have to be prepared to deal with a partial buffer at the end of your data. You can, however, avoid dealing with partial buffers in the middle of your data by repeatedly read()ing until you fill the buffer, perhaps via a wrapper function such as this:
ssize_t read_fully(int fd, void *buf, size_t count) {
char *byte_buf = buf;
ssize_t bytes_remaining = count;
while (1) {
ssize_t nread = read(fd, byte_buf, bytes_remaining);
if ((nread <= 0) || ((bytes_remaining -= nread) <= 0)) {
break;
}
byte_buf += nread;
bytes_remaining -= nread;
}
return count - bytes_remaining;
}
Alternatively, you can approach the problem altogether differently. Instead of trying to avoid reading certain data, you may be able to read it but avoid processing it. Whether that could be sensible depends on the nature of your program.

Do you really need to avoid reading the last buffer? Or just avoid doing anything with it? Perhaps a different form of loop? Perhaps a check for eof() after reading each buffer?
while(read(server_to_client,serverString2,sizeof(serverString2)))
{
if (! eof(server_to_client))
{
printf("Client : PID %d",getpid());
printf("-Target>>%s<<", clientString2);
printf(serverString2);
}
else
{
// do special stuff for the last buffer here
}
}

Why is my Memory dumping soo slow?

The idea behind this program is to simply access the ram and download the data from it to a txt file.
Later Ill convert the txt file to jpeg and hopefully it will be readable .
However when I try and read from the RAM using NEW[] it takes waaaaaay to long to actually copy all the values into the file?
Isnt it suppose to be really fast? I mean I save pictures everyday and it doesn't even take a second?
Is there some other method I can use to dump memory to a file?
#include <stdio.h>
#include <stdlib.h>
#include <hw/pci.h>
#include <hw/inout.h>
#include <sys/mman.h>
main()
{
FILE *fp;
fp = fopen ("test.txt","w+d");
int NumberOfPciCards = 3;
struct pci_dev_info info[NumberOfPciCards];
void *PciDeviceHandler1,*PciDeviceHandler2,*PciDeviceHandler3;
uint32_t *Buffer;
int *BusNumb; //int Buffer;
uint32_t counter =0;
int i;
int r;
int y;
volatile uint32_t *NEW,*NEW2;
uintptr_t iobase;
volatile uint32_t *regbase;
NEW = (uint32_t *)malloc(sizeof(uint32_t));
NEW2 = (uint32_t *)malloc(sizeof(uint32_t));
Buffer = (uint32_t *)malloc(sizeof(uint32_t));
BusNumb = (int*)malloc(sizeof(int));
printf ("\n 1");
for (r=0;r<NumberOfPciCards;r++)
{
memset(&info[r], 0, sizeof(info[r]));
}
printf ("\n 2");
//Here the attach takes place.
for (r=0;r<NumberOfPciCards;r++)
{
(pci_attach(r) < 0) ? FuncPrint(1,r) : FuncPrint(0,r);
}
printf ("\n 3");
info[0].VendorId = 0x8086; //Wont be using this one
info[0].DeviceId = 0x3582; //Or this one
info[1].VendorId = 0x10B5; //WIll only be using this one PLX 9054 chip
info[1].DeviceId = 0x9054; //Also PLX 9054
info[2].VendorId = 0x8086; //Not used
info[2].DeviceId = 0x24cb; //Not used
printf ("\n 4");
//I attached the device and give it a handler and set some setting.
if ((PciDeviceHandler1 = pci_attach_device(0,PCI_SHARE|PCI_INIT_ALL, 0, &info[1])) == 0)
{
perror("pci_attach_device fail");
exit(EXIT_FAILURE);
}
for (i = 0; i < 6; i++)
//This just prints out some details of the card.
{
if (info[1].BaseAddressSize[i] > 0)
printf("Aperture %d: "
"Base 0x%llx Length %d bytes Type %s\n", i,
PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? PCI_MEM_ADDR(info[1].CpuBaseAddress[i]) : PCI_IO_ADDR(info[1].CpuBaseAddress[i]),
info[1].BaseAddressSize[i],PCI_IS_MEM(info[1].CpuBaseAddress[i]) ? "MEM" : "IO");
}
printf("\nEnd of Device random info dump---\n");
printf("\nNEWs Address : %d\n",*(int*)NEW);
//Not sure if this is a legitimate way of memory allocation but I cant see to read the ram any other way.
NEW = mmap_device_memory(NULL, info[1].BaseAddressSize[3],PROT_READ|PROT_WRITE|PROT_NOCACHE, 0,info[1].CpuBaseAddress[3]);
//Here is where things are starting to get messy and REALLY long to just run through all the ram and dump it.
//Is there some other way I can dump the data in the ram into a file?
while (counter!=info[1].BaseAddressSize[3])
{
fprintf(fp, "%x",NEW[counter]);
counter++;
}
fclose(fp);
printf("0x%x",*Buffer);
}

A few issues that I can see:
You are writing blocks of 4 bytes - that's quite inefficient. The stream buffering in the C library may help with that to a degree, but using larger blocks would still be more efficient.
Even worse, you are writing out the memory dump in hexadecimal notation, rather than the bytes themselves. That conversion is very CPU-intensive, not to mention that the size of the output is essentially doubled. You would be better off writing raw binary data using e.g. fwrite().
Depending on the specifics of your system (is this on QNX?), reading from I/O-mapped memory may be slower than reading directly from physical memory, especially if your PCI device has to act as a relay. What exactly is it that you are doing?
In any case I would suggest using a profiler to actually find out where your program is spending most of its time. Even a rudimentary system monitor would allow you to determine if your program is CPU-bound or I/O-bound.
As it is, "waaaaaay to long" is hardly a valid measurement. How much data is being copied? How long does it take? Where is the output file located?
P.S.: I also have some concerns w.r.t. what you are trying to do, but that is slightly off-topic for this question...

For fastest speed: write the data in binary form and use the open() / write() / close() API-s. Since your data is already available in a contiguous block of (virtual) memory it is a waste to copy it to a temporary buffer (used by the fwrite(), fprintf(), etc. API-s).
The code using write() will be similar to:
int fd = open("filename.bin", O_RDWR|O_CREAT, S_IRWXU);
write(fd, (void*)NEW, 4*info[1].BaseAddressSize[3]);
close(fd);
You will need to add error handling and make sure that the buffer size is specified correctly.
To reiterate, you get the speed-up from:
avoiding the conversion from binary to ASCII (as pointed out by others above)
avoiding many calls to libc
reducing the number of system-calls (from inside libc)
eliminating the overhead of copying data to a temporary buffer inside the fwrite()/fprintf() and related functions (buffering would be useful if your data arrived in small chunks, including the case of converting to ASCII in 4 byte units)
I intentionally ignore commenting on other parts of your code as it is apparently not intended to be production quality yet and your question is focused on how to speed up writing data to a file.

C - writing buffer into a file then FREEing the buffer cause segfault

I'm writing a buffer into a binary file. Code is as in the following :
FILE *outwav = fopen(outwav_path, "wb");
if(!outwav)
{
fprintf(stderr, "Can't open file %s for writing.\n", outwav_path);
exit(1);
}
[...]
//Create sample buffer
short *samples = malloc((loopcount*(blockamount-looppos)+looppos) << 5);
if(!samples)
{
fprintf(stderr, "Error : Can't allocate memory.\n");
exit(1);
}
[...]
fwrite(samples, 2, 16*k, outwav); //write samplebuffer to file
fflush(outwav);
fclose(outwav);
free(samples);
The last free() call causes me random segfaults.
After several headaches I thought it was probably because the fwrite call would execute only after a delay, and then it would read freed memory. So I added the fflush call, yet, the problem STILL occurs.
The only way to get rid of it is to not free the memory and let the OS do it for me. This is supposed to be bad practice though, so I'd rather ask if there is no better solution.
Before anyone asks, yes I check that the file is opened correctly, and yes I test that the memory is allocated properly, and no, I don't touch the returned pointers in any way.

Once fwrite returns you are free to do whatever you want with the buffer. You can remove the fflush call.
It sounds like a buffer overflow error in a totally unrelated part of the program is writing over the book-keeping information that free needs to do its work. Run your program under a tool like valgrind to find out if this is the problem and to find the part of the program that has a buffer overflow.

linux threads and fopen() fclose() fgets()

I'm looking at some legacy Linux code which uses pthreads.
In one thread a file is read via fgets(). The FILE variable is a global variable shared across all threads. (Hey, I didn't write this...)
In another thread every now and again the FILE is closed and reopened with another filename.
For several seconds after this has happened, the thread fgets() acts as if it is continuing to read the last record it read from the previous file: almost as if there was an error but fgets() was not returning NULL. Then it sorts itself out and starts reading from the new file.
The code looks a bit like this (snipped for brevity so I hope it's still intelligible):
In one thread:
while(gRunState != S_EXIT){
nanosleep(&timer_delay,0);
flag = fgets(buff, sizeof(buff), gFile);
if (flag != NULL){
// do something with buff...
}
}
In the other thread:
fclose(gFile);
gFile = fopen(newFileName,"r");
There's no lock to make sure that the fgets() is not called at the same time as the fclose()/fopen().
Any thoughts as to failure modes which might cause fgets() to fail but not return NULL?

How the described code goes wrong
The stdio library buffers data, allocating memory to store the buffered data. The GNU C library dynamically allocates file structures (some libraries, notably on Solaris, use pointers to statically allocated file structures, but the buffer is still dynamically allocated unless you set the buffering otherwise).
If your thread works with a copy of a pointer to the global file pointer (because you passed the file pointer to the function as an argument), then it is conceivable that the code would continue to access the data structure that was orginally allocated (even though it was freed by the close), and would read data from the buffer that was already present. It would only be when you exit the function, or read beyond the contents of the buffer, that things start going wrong - or the space that was previously allocated to the file structure is reallocated for a new use.
FILE *global_fp;
void somefunc(FILE *fp, ...)
{
...
while (fgets(buffer, sizeof(buffer), fp) != 0)
...
}
void another_function(...)
{
...
/* Pass global file pointer by value */
somefunc(global_fp, ...);
...
}
Proof of Concept Code
Tested on MacOS X 10.5.8 (Leopard) with GCC 4.0.1:
#include <stdio.h>
#include <stdlib.h>
FILE *global_fp;
const char etc_passwd[] = "/etc/passwd";
static void error(const char *fmt, const char *str)
{
fprintf(stderr, fmt, str);
exit(1);
}
static void abuse(FILE *fp, const char *filename)
{
char buffer1[1024];
char buffer2[1024];
if (fgets(buffer1, sizeof(buffer1), fp) == 0)
error("Failed to read buffer1 from %s\n", filename);
printf("buffer1: %s", buffer1);
/* Dangerous!!! */
fclose(global_fp);
if ((global_fp = fopen(etc_passwd, "r")) == 0)
error("Failed to open file %s\n", etc_passwd);
if (fgets(buffer2, sizeof(buffer2), fp) == 0)
error("Failed to read buffer2 from %s\n", filename);
printf("buffer2: %s", buffer2);
}
int main(int argc, char **argv)
{
if (argc != 2)
error("Usage: %s file\n", argv[0]);
if ((global_fp = fopen(argv[1], "r")) == 0)
error("Failed to open file %s\n", argv[1]);
abuse(global_fp, argv[1]);
return(0);
}
When run on its own source code, the output was:
Osiris JL: ./xx xx.c
buffer1: #include <stdio.h>
buffer2: ##
Osiris JL:
So, empirical proof that on some systems, the scenario I outlined can occur.
How to fix the code
The fix to the code is discussed well in other answers. If you avoid the problem I illustrated (for example, by avoiding global file pointers), that is simplest. Assuming that is not possible, it may be sufficient to compile with the appropriate flags (on many Unix-like systems, the compiler flag '-D_REENTRANT' does the job), and you will end up using thread-safe versions of the basic standard I/O functions. Failing that, you may need to put explicit thread-safe management policies around the access to the file pointers; a mutex or something similar (and modify the code to ensure that the threads use the mutex before using the corresponding file pointer).

A FILE * is just a pointer to the various resources. If the fclose does not zero out those resource, it's possible that the values may make enough sense that fgets does not immediately notice it.
That said, until you add some locking, I would consider this code completely broken.

Umm, you really need to control access to the FILE stream with a mutex, at the minimum. You aren't looking at some clever implementation of lock free methods, you are looking at really bad (and dusty) code.
Using thread local FILE streams is the obvious and most elegant fix, just use locks appropriately to ensure no two threads operate on the same offset of the same file at once. Or, more simply, ensure that threads block (or do other work) while waiting for the file lock to clear. POSIX advisory locks would be best for this, or your dealing with dynamically growing a tree of mutexes... or initializing a file lock mutex per thread and making each thread check the other's lock (yuck!) (since files can be re-named).
I think you are staring down the barrel of some major fixes .. unfortunately (from what you have indicated) there is no choice but to make them. In this case, its actually easier to debug a threaded program written in this manner than it would be to debug something using forks, consider yourself lucky :)

You can also put some condition-wait (pthread_cond_wait) instead of just some nanosleep which will get signaled when intended e.g. when a new file gets fopened.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Seg fault with open command when trying to open very large file - c

Rather than using a variable length array, perhaps try allocated the memory using malloc. char *buffer = malloc (len); ... free (buffer); I just did some simple tests on my system, and when I use variable length arrays of a big size (like the size you're having trouble with), I also get a SEGFAULT.

Related

open a temporary C FILE* for input

Reading all buffers except for the last one (in C)

Why is my Memory dumping soo slow?

C - writing buffer into a file then FREEing the buffer cause segfault

linux threads and fopen() fclose() fgets()

Categories

Resources