Write large binary file in C

Write large binary file in C - c

I'm using 64bit mingw to compile c code on windows x64.
I'm using fwrite to create binary files from memory array. I want to write ~20Gb calling this function but it just write until 1.4~1.5gb and then it stops writting (without crashing, just hangs there.... doing nothing).
Is there any solution? Right now I'm writing 20 files and then I merge them.
Opening the file as 'ab' works but I cant read the file properly if I use that mode.
Sample (pseudo)code:
short* dst= malloc(20GB);
*calculations to fill dst*
file=fopen("myfile",'wb');
fwrite(dst, sizeof(short), 20GB/sizeof(short), file);
fclose(file)
That program never ends and file size is never grater than 1.5GB

Write it in smaller chunks. For heaven's sake, don't try to malloc 20gb.

Depending on the environment (operating system, memory model, file system), it might not be possible to create a file greater than 2 GB. This is especially true with MSDOS file systems and of course could be true on any file system if there is insufficient disk space or allocation quota.
If you show your code, we could see if there is any intrinsic flaw in the algorithm and suggest alternatives.

Mingw is a 32 bit environment, there AFAIK does not exist a 64 bit variant.
It may be that fwrite() from mingw is unable to deal with more than 2 GB or 4GB unless mingw is large file aware.
If you can find something similar to truss(1), run your progran under this debugging tool. With the information you provided, it is not possible to give a better advise.

Related

C: reading files which are > 4 GB

I have some kind of reader which only has a handle (FILE*) to a file.
Another process keeps writing to a the same file which I don't have control.
Now, as the other process appends images to that file, it is likely that soon the file size will cross 4 GB limit.
The reader process reads that file using the handle, offset and length of the image file which can be found from some DB.
My question is how would reader be able to read the chunk from the file which will be present after 4GB size.
I'm working on Win32 machine.
EDIT:
I'm working on FreeBSD machine as well.

Just use the standard C API on Windows, fread, fwrite work just fine on large files. You will need _fseeki64 to seek to a 64-bit position.
You can alternatively use the plain WinAPI (ReadFile, etc.) which can also deal with >4 GiB files without problems.
[Edit]: The only thing you really need is a 64-bit seek, which ReadFile provides via the OVERLAPPED structure (as some commenters mentioned.) You can of course also get by using SetFilePointer which is the equivalent of _fseeki64. Reading/Writing is never a problem, no matter the file size, only seeking.

On FreeBSD the stdio API is not limited to 32 bits(4Gb).
You should have no problems reading past 4Gb as long as you use a 64 bit integer to manipulate the offsets and lengths.
If you're seeking in a FILE* , you'll have to use fseeko() and not fseek() if you're on a 32 bit host. fseek() takes a long which is 32 bit on 32 bit machines. fseeko() takes an off_t type which is 64 bits on all FreeBSD architectures.

How to read a large file in UNIX/LINUX with limited RAM?

I have a large text file to be opened (eg- 5GB size). But with a limited RAM (take 1 GB), How can I open and read the file with out any memory error? I am running on a linux terminal with with the basic packages installed.
This was an interview question, hence please do not look into the practicality.
I do not know whether to look at it in System level or programmatic level... It would be great if someone can throw some light into this issue.
Thanks.

Read it character by character... or X bytes by X bytes... it really depends what you want to do with it... As long as you don't need the whole file at once, that works.
(Ellipses are awesome)

What do they want you to do with the file? Are you looking for something? Extracting something? Sorting? This will affect your approach.
It may be sufficient to read the file line by line or character by character if you're looking for something. If you need to jump around the file or analyze sections of it, then most likely want to memory map it. Look up mmap(). Here's an short article on the subject:memory mapped i/o

[just comment]
If you are going to use system calls (open() and read()), then reading character by character will generate a lot of system calls that severely slow down your application. Even with the existence of the buffer cache (or disk file), system calls are expensive.
It is better to read block by block where block size "SHOULD" be more than 1MB. In case of 1MB block size, you will issue 5*1024 system calls.

Does fread fail for large files?

I have to analyze a 16 GB file. I am reading through the file sequentially using fread() and fseek(). Is it feasible? Will fread() work for such a large file?

You don't mention a language, so I'm going to assume C.
I don't see any problems with fread, but fseek and ftell may have issues.
Those functions use long int as the data type to hold the file position, rather than something intelligent like fpos_t or even size_t. This means that they can fail to work on a file over 2 GB, and can certainly fail on a 16 GB file.
You need to see how big long int is on your platform. If it's 64 bits, you're fine. If it's 32, you are likely to have problems when using ftell to measure distance from the start of the file.
Consider using fgetpos and fsetpos instead.

Thanks for the response. I figured out where I was going wrong. fseek() and ftell() do not work for files larger than 4GB. I used _fseeki64() and _ftelli64() and it is working fine now.

If implemented correctly this shouldn't be a problem. I assume by sequentially you mean you're looking at the file in discrete chunks and advancing your file pointer.
Check out http://www.computing.net/answers/programming/using-fread-with-a-large-file-/10254.html
It sounds like he was doing nearly the same thing as you.

It depends on what you want to do. If you want to read the whole 16GB of data in memory, then chances are that you'll run out of memory or application heap space.
Rather read the data chunk by chunk and do processing on those chunks (and free resources when done).
But, besides all this, decide which approach you want to do (using fread() or istream, etc.) and do some test cases to see which works better for you.

If you're on a POSIX-ish system, you'll need to make sure you've built your program with 64-bit file offset support. POSIX mandates (or at least allows, and most systems enforce this) the implementation to deny IO operations on files whose size don't fit in off_t, even if the only IO being performed is sequential with no seeking.
On Linux, this means you need to use -D_FILE_OFFSET_BITS=64 on the gcc command line.

Handling Files greater than 2 GB in MSVC6!

Normal file related functions like fseek, ftell etc in Windows MSVC6 can handle files only upto 2GB (As per my current understanding, Please correct me if I am wrong).
I want to work with files >2GB. How should I go about it? What are the functions available?

I am not sure but the limit is 4 GB, OS API and the standard libraries using theses API and the filesystem used.
The ftell, fseek functions are using 32 bit integers so you won't be able to handle file bigger than 4GB. You would have to use the OS API directly.
So you have to be careful what function you use, for example for getting the file size you have to use the ex function GetFileSizeEx, so you have to make sure you use function that use 64 bit file offset. Same for SetFilePointerEx
Last word you have be aware that some filesystem limit the maximum filesize, the FAT32 won't handle file bigger than 4 GB by design, NTFS would handle any size but the API generally is made for 4 GB or less big file.

The limit probably originates with the file system. FAT32 has a limit of 4GB, whereas NTFS has a much higher limit (in the terabytes).
So the size of the file you can handle will depend on which file system the hard disk has been formatted with, and which operating system you are using, although you are almost certainly using an operating system which can handle the upper limits of NTFS (Windows 2000 or above).

For the most part, you have to ignore all the file functions built into the standard library, and use only functions in the Win32 API -- e.g. instead of fwrite or ostream::write, you'll need to use WriteFile. Likewise, to seek in a file you'll need to use SetFilePointer instead of fseek or seekp. Most of the Win32 API can handle files larger than 4 GB -- and the few that can't have substitutes that can handle larger files.

You can use windows APIs for file handling like CreateFile, ReadFile, WriteFile. This also gives you option to have overlapped and non-overlapped operations.

It's actually 16TB (for anyone finding this in future). I just created a 6710886400 that's 6GB file using overlapped I/O - the following snippet shows how to work with the offsets
OVERLAPPED ol;
__int64 fileOffset;
ol.hEvent = CreateEvent(0, TRUE, FALSE, 0);
fileOffset = __int64(TEST_BUFFER_SIZE) * i;
ol.Offset = (DWORD)fileOffset;
ol.OffsetHigh = (DWORD)(fileOffset >> 32);
printf("[%d %I64d] ", i, fileOffset);
result = WriteFile(hFile, buffer, TEST_BUFFER_SIZE, &written, &ol);
to get the size I can perform...
DWORD dwHigh, dwLow =GetFileSize(hFile, &dwHigh);
__int64 FileSizeInBytes = __int64(dwHigh * (MAXDWORD + 1.0L)) + dwLow;
HINT: If you start getting "invalid parameter" return codes/error from an API, you are probably mucking the math and passing in negative offsets.
(some innocent variables and exception handler policing action was removed from this sample to protect basic byte rights)

Fopen failing for binary file

I have a huge binary file which is 2148181087 bytes (> 2gb)
I am trying to do fopen (file, "r") and it failed with
Can not open: xyz file (Value too
large to be stored in data type)
I read on the man page EOVERFLOW error is received when the file size > 2gb.
The weird thing is, I use a different input file which is also "almost" as big as the first file 2142884400 bytes (also >2gb), fopen works fine with this.
Is there any cutoff on the file size for fopen or is there any alternate way to solve this?

The cutoff is 2GB which, contrary to what you may think, is not 2,000,000,000 (2x10003).
It's 2,147,483,648 (2x10243). So your second file, which works, is actually less than 2GB in size).
2GB, in the computer world, is only 2,000,000,000 in the minds of hard drive manufacturers so they can say their disks are bigger than they really are :-) - it lets them say their disks are actually 2.1GB.

The "alternative way to solve this" depends on which operating system/library you are using.
For the GNU C library, you can use fopen64 as a replacement for fopen; it uses 64-bit file handles (there's also a macro to have fopen use 64-bit file handles).
For Windows, you'll probably have to switch to the Win32 file management API, with which you can use CreateFile.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Write large binary file in C - c

Write it in smaller chunks. For heaven's sake, don't try to malloc 20gb.

Related

C: reading files which are > 4 GB

How to read a large file in UNIX/LINUX with limited RAM?

Does fread fail for large files?

Handling Files greater than 2 GB in MSVC6!

Fopen failing for binary file

Categories

Resources