So, I looked around the internet and a couple questions here and I couldn't find anything that could fix my problem here. I have an assignment for C programming, to write a program that allows user to enter words into a string, add more words, put all words in the string to a text file, delete all words in string, and when they exit it saves the words in a binary, which is loaded upon starting up the program again. I've gotten everything to work except where the binary is concerned.
I made two functions, one that loads the bin file when the program starts, one that saves the bin file when it ends. I don't know in which, or if in both, the problem starts. But basically I know it's not working right because I get garbage in my text file if I save it in a text file after the program loads the bin file into the string. I know for sure that the text file saver is working properly.
Thank you to anyone who takes the time to help me out, it's been an all-day process! lol
Here are the two snippets of my functions, everything else in my code seems to work so I don't want to blot up this post with the entire program, but if need be I'll put it up to solve this.
SIZE is a constant of 10000 to meet program specs of a 1000 words. But I couldn't get this to run even asking for only 10 elements or 1, just to clear that up
void loadBin(FILE *myBin, char *stringAll) {
myBin = fopen("myBin.bin", "rb");
if (myBin == NULL) {
saveBin(&myBin, stringAll);
}//if no bin file exists yet
fread(stringAll, sizeof(char), SIZE + 1, myBin);
fclose(myBin); }
/
void saveBin(FILE *myBin, char *stringAll) {
int stringLength = 0;
myBin = fopen("myBin.bin", "wb");
if (myBin == NULL) {
printf("Problem writing file!\n");
exit(-1);
stringLength = strlen(stringAll);
fwrite(&stringAll, sizeof(char), (stringLength + 1), myBin);
fclose(myBin); }
You are leaving bad values in your myBin FILE*, and passing the & (address) of a pointer.
Pass the filename, and you can (re) use the functions for other purposes, other files, et al.
char* filename = "myBin.bin";
Pass the filename, buffer pointer, and max size to read. You should consider using stat/fstat to discover file size
size_t loadBin(char *fn, char *stringAll, size_t size)
{
//since you never use myBin, keep this FILE* local
FILE* myBin=NULL;
if( NULL == (myBin = fopen(fn, "rb")) ) {
//create missing file
saveBin(fn, stringAll, 0);
}//if no bin file exists yet
size_t howmany = fread(stringAll, sizeof(char), size, myBin);
if( howmany < size ) printf("read fewer\n");
if(myBin) fclose(myBin);
return howmany;
}
Pass the file name, buffer pointer, and size to save
size_t saveBin(char *fn, char *stringAll, size_t size)
{
int stringLength = 0;
//again, why carry around FILE* pointer only used locally?
FILE* myBin=NULL;
if( NULL == (myBin = fopen("myBin.bin", "wb")) ) {
printf("Problem writing file!\n");
exit(-1);
}
//binary data may have embedded '\0' bytes, cannot use strlen,
//stringLength = strlen(stringAll);
size_t howmany = fwrite(stringAll, sizeof(char), size, myBin);
if( howmany < size ) printf("short write\n");
if(myBin) fclose(myBin);
return howmany;
}
Call these; you are not guaranteed to write & read the same sizes...
size_t buffer_size = SIZE;
char buffer[SIZE]; //fill this with interesting bytes
saveBin(filename, buffer, buffer_size);
size_t readcount = loadBin(filename, buffer, buffer_size);
Related
I need to read a file of ints into an array in C. A sample of the file I need to read is below, though note the files this will process can have thousands or hundreds of thousands of lines.
127
234
97
8723
I've gotten the file open in C, read how many lines there are so I know how many spaces my array needs, but I can't seem to read/parse each line into the array.
FILE *file;
int N = 0;
char filePath[30];
char endFile;
printf("What file should be used?\n");
scanf("%s", filePath);
file = fopen(filePath, "r");
if(file == NULL) {
printf("This file failed to open.\n");
break;
}
for(endFile = getc(file); endFile!=EOF; endFile=getc(file))
if(endFile == '\n') {
N = N+1;
}
int myArray[N];
while(fscanf(file, "%d\n", &a) != EOF) {
fscanf(file, "%d\n", &a); // I'm not sure this line is needed...
printf("%d\n", a);
M[i] = a;
}
From here, I need to read the file contents into myArray, with each line being the corresponding spot in the array (i.e. line zero is myArray[0], line one is myArray[1], etc.). I can't seem to find a way to do this, though I see several methods to do tab-delimited 2d arrays or csv multi-dimensional arrays.
Please also let me know if creating the array/determining the array size can be done in a better way than literally counting new-line characters...
There's no need to first "count the number of lines".
The following code cautiously grows an array of integers (by increments of 10).
#define GROW 10
int *rec = NULL, nRec = 0, sz = 0;
while( fgets( buf, sizeof buf, ifp ) != NULL ) {
if( nRec == sz ) {
rec = realloc( rec, (nRec+GROW) * sizeof *rec );
/*omitting test for failure */
sz += GROW;
}
rec[ nRec++ ] = atoi( buf );
}
This shows what is possible.
Note that realloc() can fail, returning NULL... It's up to you to add a bit of code to handle that condition.
Further, some conventional thought is to double the size of the allocation when needed (because realloc() may not be 'cheap'.) You can decide if you want to grow the array in increments (of 1024?) or grow it exponentially.
I have a zipped text file with 7 very long lines of text containing information for the decoding of a JPEG encoded file.
When I try to read the unzipped File with my C program, line by line with fscanf, I get the first 3 and the last 3 lines correctly, just the 4th line isn't read as a string as expected.
The output of the 4th line is a very long string filled with 1 and 0.
If I look at the input file with Notepad or a hex editor everything looks fine as it should.
If I manually create a text file with the same structure (but with shorter lines) fscanf works fine.
There is no difference if I unzip the File with my program or do it manually.
FILE *tmpdata;
char enc_path[256];
int arrsize;
// Building the absolute Path
sprintf(enc_path, "%s%stmp.txt", dest, src_name);
arrsize = unzip(); // gives back size of the file
// not the best way to create the output strings,
// but I don't know the size of the lines.
char masse[10];
char ytabelle[arrsize / 3];
char cbtabelle[arrsize / 3];
char crtabelle[arrsize / 2];
char ywerte[arrsize /3];
char cbwerte[arrsize / 3];
char crwerte[arrsize / 3];
if ((tmpdata = fopen(enc_path, "r")) == NULL) {
printf("Error: can´t read input file\n");
return EXIT_FAILURE;
}
fscanf(tmpdata, "%s %s %s %s %s %s %s", masse, ytabelle, cbtabelle, crtabelle, ywerte, cbwerte, crwerte);
The input file looks like:
512x512
Y{42:110000;13:111000;...;0:0;}
CB{42:110000;13:111000;...;0:0;}
CR{42:110000;13:111000;...;0:0;}
000111010010111001110000111100011...
100011011101110001101000011100110...
100011101110110111011001100111011...
if I print the separate strings:
512x512
Y{42:110000;13:111000;...;0:0;}
CB{42:110000;13:111000;...;0:0;}
111001111111111000110000111111000...
000111010010111001110000111100011...
100011011101110001101000011100110...
100011101110110111011001100111011...
There are multiples reasons for your program to not behave properly:
you may allocate too much data with automatic storage (aka on the stack), causing erratic behavior.
the strings int the file might contain embedded spaces, causing fscanf() to read words instead of lines.
you do not tell fscanf() the size of the destination arrays. fscanf() may store data beyond the end of the destination arrays, overflowing into the next array (which would explain the observed behavior) or causing some other undefined behavior.
It is very cumbersome to pass the size of the destination arrays when they are not simple constants. I suggest you use fgets() instead of fscanf() to read the file contents and allocate the arrays with malloc() to a larger size to avoid problems:
FILE *tmpdata;
char enc_path[256];
size_t arrsize;
// Building the absolute path
snprintf(enc_path, sizeof enc_path, "%s%stmp.txt", dest, src_name);
arrsize = unzip(); // gives back size of the file
// not the best way to create the output strings,
// but I don't know the size of the lines.
char masse[16];
size_t ytabelle_size = arrsize + 2;
size_t cbtabelle_size = arrsize + 2;
size_t crtabelle_size = arrsize + 2;
char *ytabelle = malloc(ytabelle_size);
char *cbtabelle = malloc(cbtabelle_size);
char *crtabelle = malloc(crtabelle_size);
size_t ywerte_size = arrsize + 2;
size_t cbwerte_size = arrsize + 2;
size_t crwerte_size = arrsize + 2;
char *ywerte = malloc(ywerte_size);
char *cbwerte = malloc(cbwerte_size);
char *crwerte = malloc(crwerte_size);
if (!ytabelle ||!cbtabelle ||!crtabelle ||!ywerte ||!cbwerte ||!crwerte) {
printf("Error: cannot allocate memory\n");
return EXIT_FAILURE;
}
if ((tmpdata = fopen(enc_path, "r")) == NULL) {
printf("Error: cannot open input file\n");
return EXIT_FAILURE;
}
if (!fgets(masse, sizeof masse, tmpdata)
|| !fgets(ytabelle, ytabelle_size, tmpdata)
|| !fgets(cbtabelle, cbtabelle_size, tmpdata)
|| !fgets(crtabelle, crtabelle_size, tmpdata)
|| !fgets(ywerte, ywerte_size, tmpdata)
|| !fgets(cbwerte, cbwerte_size, tmpdata)
|| !fgets(crwerte, crwerte_size, tmpdata)) {
printf("Error: cannot read input file\n");
return EXIT_FAILURE;
}
// file contents were read, arrays should have a trailing newline, which
// you should strip or handle in the decoding phase.
...
If you are using the GNUlibc or some modern Posix systems, you could use the m prefix in fscanf() to allocate the space for the words read from the file. Using this allows for a simpler but non portable solution:
FILE *tmpdata;
char enc_path[256];
size_t arrsize;
// Building the absolute path
snprintf(enc_path, sizeof enc_path, "%s%stmp.txt", dest, src_name);
arrsize = unzip(); // gives back size of the file
// not the best way to create the output strings,
// but I don't know the size of the lines.
char masse[16];
char *ytabelle = NULL;
char *cbtabelle = NULL;
char *crtabelle = NULL;
char *ywerte = NULL;
char *cbwerte = NULL;
char *crwerte = NULL;
if ((tmpdata = fopen(enc_path, "r")) == NULL) {
printf("Error: cannot open input file\n");
return EXIT_FAILURE;
}
if (fscanf(tmpdata, "%ms %ms %ms %ms %ms %ms %ms", &masse,
&ytabelle, &cbtabelle, &crtabelle,
&ywerte, &cbwerte, &crwerte) != 7) {
printf("Error: cannot read input file\n");
return EXIT_FAILURE;
}
...
PS: Unlike German, the initial letters of nouns are not capitalized in English, except for some exceptions such as language, people and place names.
Maybe avoid stack allocation??
char masse[10];
char *ytabelle = malloc(arrsize/3); if (!ytabelle) exit(EXIT_FAILURE);
char *cbtabelle = malloc(arrsize/3); if (!cbtabelle) exit(EXIT_FAILURE);
char *crtabelle = malloc(arrsize/2); if (!crtabelle) exit(EXIT_FAILURE);
char *ywerte = malloc(arrsize/3); if (!ywerte) exit(EXIT_FAILURE);
char *cbwerte = malloc(arrsize/3); if (!cbwerte) exit(EXIT_FAILURE);
char *crwerte = malloc(arrsize/3); if (!crwerte) exit(EXIT_FAILURE);
/* use as before */
free(ytabelle);
free(cbtabelle);
free(crtabelle);
free(ywerte);
free(cbwerte);
free(crwerte);
I have created a framework to parse text files of reasonable size that can fit in memory RAM, and for now, things are going well. I have no complaints, however what if I encountered a situation where I have to deal with large files, say, greater than 8GB(which is the size of mine)?
What would be an efficient approach to deal with such large files?
My framework:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int Parse(const char *filename,
const char *outputfile);
int main(void)
{
clock_t t1 = clock();
/* ............................................................................................................................. */
Parse("file.txt", NULL);
/* ............................................................................................................................. */
clock_t t2 = clock();
fprintf(stderr, "time elapsed: %.4f\n", (double)(t2 - t1) / CLOCKS_PER_SEC);
fprintf(stderr, "Press any key to continue . . . ");
getchar();
return 0;
}
long GetFileSize(FILE * fp)
{
long f_size;
fseek(fp, 0L, SEEK_END);
f_size = ftell(fp);
fseek(fp, 0L, SEEK_SET);
return f_size;
}
char *dump_file_to_array(FILE *fp,
size_t f_size)
{
char *buf = (char *)calloc(f_size + 1, 1);
if (buf) {
size_t n = 0;
while (fgets(buf + n, INT_MAX, fp)) {
n += strlen(buf + n);
}
}
return buf;
}
int Parse(const char *filename,
const char *outputfile)
{
/* open file for reading in text mode */
FILE *fp = fopen(filename, "r");
if (!fp) {
perror(filename);
return 1;
}
/* store file in dynamic memory and close file */
size_t f_size = GetFileSize(fp);
char *buf = dump_file_to_array(fp, f_size);
fclose(fp);
if (!buf) {
fputs("error: memory allocation failed.\n", stderr);
return 2;
}
/* state machine variables */
// ........
/* array index variables */
size_t x = 0;
size_t y = 0;
/* main loop */
while (buf[x]) {
switch (buf[x]) {
/* ... */
}
x++;
}
/* NUL-terminate array at y */
buf[y] = '\0';
/* write buffer to file and clean up */
outputfile ? fp = fopen(outputfile, "w") :
fp = fopen(filename, "w");
if (!fp) {
outputfile ? perror(outputfile) :
perror(filename);
}
else {
fputs(buf, fp);
fclose(fp);
}
free(buf);
return 0;
}
Pattern deletion function based on the framework:
int delete_pattern_in_file(const char *filename,
const char *pattern, const char *outputfile)
{
/* open file for reading in text mode */
FILE *fp = fopen(filename, "r");
if (!fp) {
perror(filename);
return 1;
}
/* copy file contents to buffer and close file */
size_t f_size = GetFileSize(fp);
char *buf = dump_file_to_array(fp, f_size);
fclose(fp);
if (!buf) {
fputs("error - memory allocation failed", stderr);
return 2;
}
/* delete first match */
size_t n = 0, pattern_len = strlen(pattern);
char *tmp, *ptr = strstr(buf, pattern);
if (!ptr) {
fputs("No match found.\n", stderr);
free(buf);
return -1;
}
else {
n = ptr - buf;
ptr += pattern_len;
tmp = ptr;
}
/* delete the rest */
while (ptr = strstr(ptr, pattern)) {
while (tmp < ptr) {
buf[n++] = *tmp++;
}
ptr += pattern_len;
tmp = ptr;
}
/* copy the rest of the buffer */
strcpy(buf + n, tmp);
/* open file for writing and print the processed buffer to it */
outputfile ? fp = fopen(outputfile, "w") :
fp = fopen(filename, "w");
if (!fp) {
outputfile ? perror(outputfile) :
perror(filename);
}
else {
fputs(buf, fp);
fclose(fp);
}
free(buf);
return 0;
}
If you wish to stick with your current design, an option might be to mmap() the file instead of reading it into a memory buffer.
You could change the function dump_file_to_array to the following (linux-specific):
char *dump_file_to_array(FILE *fp, size_t f_size) {
buf = mmap(NULL, f_size, PROT_READ, MAP_SHARED, fileno(fp), 0);
if (buf == MAP_FAILED)
return NULL;
return buf;
}
Now you can read over the file, the memory manager will take automatically care to only hold the relevant potions of the file in memory.
For Windows, similar mechanisms exist.
Chances you are parsing the file line-by line. So read in a large block (4k or 16k) and parse all the lines in that. Copy the small remainder to the beginning of the 4k or 16k buffer and read in the rest of the buffer. Rinse and repeat.
For JSON or XML you will need an event based parser that can accept multiple blocks or input.
There are multiple issues with your approach.
The concept of maximum and available memory are not so evident: technically, you are not limited by the RAM size, but by the quantity of memory your environment will let you allocate and use for your program. This depends on various factors:
What ABI you compile for: the maximum memory size accessible to your program is limited to less than 4 GB if you compile for 32-bit code, even if your system has more RAM than that.
What quota the system is configured to let your program use. This may be less than available memory.
What strategy the system uses when more memory is requested than is physically available: most modern systems use virtual memory and share physical memory between processes and system tasks (such as the disk cache) using very advanced algorithms that cannot be describe in a few lines. It is possible on some systems for your program to allocate and use more memory than is physically installed on the motherboard, swapping memory pages to disk as more memory is accessed, at a huge cost in lag time.
There are further issues in your code:
The type long might be too small to hold the size of the file: on Windows systems, long is 32-bit even on 64-bit versions where memory can be allocated in chunks larger than 2GB. You must use different API to request the file size from the system.
You read the file with an series of calls to fgets(). This is inefficient, a single call to fread() would suffice. Furthermore, if the file contains embedded null bytes ('\0' characters), chunks from the file will be missing in memory. However you could not deal with embedded null bytes if you use string functions such as strstr() and strcpy() to handle your string deletion task.
the condition in while (ptr = strstr(ptr, pattern)) is an assignment. While not strictly incorrect, it is poor style as it confuses readers of your code and prevents life saving warnings by the compiler where such assignment-conditions are coding errors. You might think that could never happen, but anyone can make a typo and a missing = in a test is difficult to spot and has dire consequences.
you short-hand use of the ternary operator in place of if statements is quite confusing too: outputfile ? fp = fopen(outputfile, "w") : fp = fopen(filename, "w");
rewriting the input file in place is risky too: if anything goes wrong, the input file will be lost.
Note that you can implement the filtering on the fly, without a buffer, albeit inefficiently:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
if (argc < 2) {
fprintf(stderr, "usage: delpat PATTERN < inputfile > outputfile\n");
return 1;
}
unsigned char *pattern = (unsigned char*)argv[1];
size_t i, j, n = strlen(argv[1]);
size_t skip[n + 1];
int c;
skip[0] = 0;
for (i = j = 1; i < n; i++) {
while (memcmp(pattern, pattern + j, i - j)) {
j++;
}
skip[i] = j;
}
i = 0;
while ((c = getchar()) != EOF) {
for (;;) {
if (i < n && c == pattern[i]) {
if (++i == n) {
i = 0; /* match found, consumed */
}
break;
}
if (i == 0) {
putchar(c);
break;
}
for (j = 0; j < skip[i]; j++) {
putchar(pattern[j]);
}
i -= skip[i];
}
}
for (j = 0; j < i; j++) {
putchar(pattern[j]);
}
return 0;
}
First of all I wouldn't suggest holding such big files in RAM but instead using streams. This because buffering is usually done by the library as well as by the kernel.
If you are accessing the file sequentially, which seems to be the case, then you probably know that all modern systems implement read-ahead algorithms so just reading the whole file ahead of time IN RAM may in most cases just waste time.
You didn't specify the use-cases you have to cover so I'm going to have to assume that using streams like
std::ifstream
and doing the parsing on the fly will suit your needs. As a side note, also make sure your operations on files that are expected to be large are done in separate threads.
An alternative solution: If you're on linux systems, and you have a decent amount of swap space, just open the whole bad boy up. It will consume your ram and also consume harddrive space (swap). Thus you can have the entire thing open at once, just not all of it will be on the ram.
Pros
If an unexpected shut down occurred, the memory on the swap space is recoverable.
RAM is expensive, HDDs are cheap, so the application would put less strain on your expensive equipment
Virus could not harm your computer because there would be no room in RAM for them to run
You'll be taking full advantage of the Linux operating system by using the swap space. Normally the swap space module is not used and all it does is clog up precious ram.
The additional energy that is needed to utilize the entirety of the ram can warm the immediate area. Useful during winter time
You can add "Complex and Special Memory Allocation Engineering" to your resume.
Cons
None
Consider treating the file as an external array of lines.
Code can use an array of line indexes. This index array can be kept in memory at a fraction of the size of the large file. Access to any line is accomplished quickly via this lookup, a seek with fsetpos() and an fread()/fgets(). As the lines are edited, the new lines can be saved, in any order, in temporary text file. Saving of the file reads both the original file and temp one in sequence to form and write the new file.
typedef struct {
int attributes; // not_yet_read, line_offset/length_determined,
// line_changed/in_other_file, deleted, etc.
fpos_t line_offset; // use with fgetpos() fsetpos()
unsigned line_length; // optional field as code could re-compute as needed.
} line_index;
size_t line_count;
// read some lines
line_index *index = malloc(sizeof *index * line_count);
// read more lines
index = realloc(index, sizeof *index * line_count);
// edit lines, save changes to appended temporary file.
// ...
// Save file -weave the contents of the source file and temp file to the new output file.
Additionally, with enormous files, the array line_index[] itself can be realized in disk memory too. Access to is easily computed. In an extreme sense, only 1 line of the file needs to in memory at any time.
You mentioned state-machine. Every finite-state-automata can be optimized to have minimal (or no) lookahead.
Is it possible to do this in Lex? It will generate output c file which you can compile.
If you don't want to use Lex, you can always do following:
Read n chars into (ring?) buffer where n is size of pattern.
Try to match buffer with pattern
If match goto 1
Print buffer[0], read char, goto 2
Also for very long patterns and degenerate inputs strstr can be slow. In that case you might want to look into more advanced sting matching aglorithms.
mmap() is a pretty good way of working on files with large sizes.
It provides you with lot of flexibility but you need to be cautious with page size. Here is a good article which talks about more specifics.
I'm making a program to split a file into N smaller parts
of (almost) equal sizes. So here's my code:
FILE * fp = fopen(file,"r");
long aux;
long cursor = 0;
long blockSize = 1024000; //supose each smaller file will have 1 MB
long bytesLimit = blockSize;
for( i = 0 ; i < n ; i++) {
FILE * fp_aux = fopen( outputs[i] , "w"); //outputs is an array of temporary file names
while(cursor < bytesLimit) { //here occurs the infinite loop
fscanf(fp,"%lu\n",&aux);
fprintf(fp_aux,"%lu\n",aux);
cursor = ftell(fp);
}
fclose(fp_aux);
bytesLimit = bytesLimit + blockSize;
}
//here add some more logic to get the remaining content left in the main file
The code works if I want to split the file into two or three parts, but when I try to split it into 10 parts, fscanf locks on reading the same number and stays on an infinite loop there.
My input file has the format "%lu\n" like below:
1231231
4341342
4564565
...
If splitting a file is the focus, then simplify your method. Because your post indicates you are working with a text file, the assumption is that it contains words with punctuation, numbers, linefeeds etc. With this type of content, it can be parsed into lines using fgets()/fputs(). This will allow you to read lines from one large file, tracking accumulated size as you go, and writing lines to several smaller files...
Some simple steps:
1) determine file size of file to be split
2) Set desired small file size.
3) open large file
4) Use fgets/fputs in a loop, opening and closing files to split contents, using accumulated size as split point.
5) Clean up. (fclose files etc.)
Here is an example that will illustrate these steps. This splits a large text file by size, regardless of text content. (I used a text file with 130K of volume and split it into segments of 5k
#define SEGMENT 5000 //approximate target size of small file
long file_size(char *name);//function definition below
int main(void)
{
int segments=0, i, len, accum;
FILE *fp1, *fp2;
long sizeFile = file_size(largeFileName);
segments = sizeFile/SEGMENT + 1;//ensure end of file
char filename[260]={"c:\\play\\smallFileName_"};//base name for small files.
char largeFileName[]={"c:\\play\\largeFileName.txt"};//change to your path
char smallFileName[260];
char line[1080];
fp1 = fopen(largeFileName, "r");
if(fp1)
{
for(i=0;i<segments;i++)
{
accum = 0;
sprintf(smallFileName, "%s%d.txt", filename, i);
fp2 = fopen(smallFileName, "w");
if(fp2)
{
while(fgets(line, 1080, fp1) && accum <= SEGMENT)
{
accum += strlen(line);//track size of growing file
fputs(line, fp2);
}
fclose(fp2);
}
}
fclose(fp1);
}
return 0;
}
long file_size(char *name)
{
FILE *fp = fopen(name, "rb"); //must be binary read to get bytes
long size=-1;
if(fp)
{
fseek (fp, 0, SEEK_END);
size = ftell(fp)+1;
fclose(fp);
}
return size;
}
if you have bad data in the file that isn't a long unsigned int format then the fscanf will read it, the file pointer for the fp file object won't change. Then the program sets the fp file pointer back to the start of that read and it will do it again
To prevent this you need to check the return value of the fscanf to see that it has an appropriate value ( probably 1 )
If you want to split a file into several parts with a specified maximum file size of each part, why do you use fscanf(..), ftell(..) and fprintf(..)?
This is not the fastest way to achieve your goal...
I recommend doing it in this way:
Open input file
As long as there is input data (!feof(..))
Open output file (if not already open)
Read block of input data (fread)
Write block of data to output file (fwrite)
track number of bytes written and close output file if maximum file size is reached
Go back to step 2.
Clean up
If doing so the split files will not exceed a specific maximum file size. Additionally you avoid usage of slow file I/O functions like fprintf.
A possible implementation would look like this:
/*
** splitFile
** Splits an existing input file into multiple output files with a specified
** maximum file size.
**
** Return Value:
** Number of created result files, or 0 in case of bad input data or a negative
** value in case of an error during file splitting.
*/
int splitFile(char *fileIn, size_t maxSize)
{
int result = 0;
FILE *fIn;
FILE *fOut;
char buffer[1024 * 16];
size_t size;
size_t read;
size_t written;
if ((fileIn != NULL) && (maxSize > 0))
{
fIn = fopen(fileIn, "rb");
if (fIn != NULL)
{
fOut = NULL;
result = 1; /* we have at least one part */
while (!feof(fIn))
{
/* initialize (next) output file if no output file opened */
if (fOut == NULL)
{
sprintf(buffer, "%s.%03d", fileIn, result);
fOut = fopen(buffer, "wb");
if (fOut == NULL)
{
result *= -1;
break;
}
size = 0;
}
/* calculate size of data to be read from input file in order to not exceed maxSize */
read = sizeof(buffer);
if ((size + read) > maxSize)
{
read = maxSize - size;
}
/* read data from input file */
read = fread(buffer, 1, read, fIn);
if (read == 0)
{
result *= -1;
break;
}
/* write data to output file */
written = fwrite(buffer, 1, read, fOut);
if (written != read)
{
result *= -1;
break;
}
/* update size counter of current output file */
size += written;
if (size >= maxSize) /* next split? */
{
fclose(fOut);
fOut = NULL;
result++;
}
}
/* clean up */
if (fOut != NULL)
{
fclose(fOut);
}
fclose(fIn);
}
}
return (result);
}
The above code split a test file with a size of 126803945 bytes into 121 1MB parts in about 500ms.
Note that the size of buffer (here: 16KB) affects the speed a file is split. The bigger the buffer the faster a huge file is split. If you want to use really large buffers (>1MB or so) you have to allocate (and free) the buffer on each call (or use a static buffer if you do not need reentrant code).
What is the best way to create a empty text file of given length in C? Writing space or any special char is not an option. I mean it should directly create the file without any iteration up to file length or something.
It's pretty trivial to do. All you have to do is to seek the intended position and then write something:
#include <stdio.h>
const unsigned int wanted_size = 1048576;
int main(int argc, char **argv) {
FILE *fp = fopen("test.dat", "w+");
if (fp) {
// Now go to the intended end of the file
// (subtract 1 since we're writing a single character).
fseek(fp, wanted_size - 1, SEEK_SET);
// Write at least one byte to extend the file (if necessary).
fwrite("", 1, sizeof(char), fp);
fclose(fp);
}
return 0;
}
The example above will create a file that is 1 MB in length. Just keep in mind that the actual space will be allocated immediately, not just reserved.
This will also allow you to allocate files larger than your system memory. With the code above I'm able to instantly (< 1 ms) reserve a 1 GB large file on a Raspberry Pi (which only has 512 MB RAM) without having to use any kind of iteration.
You're also able to use any other way to write data to the position (like fputs()), it's just important that you actually write something. Calling fputs("", fp); won't necessarily extend the file as intended.
On Windows use SetFilePointer and SetEndOfFile, on Linux use truncate (which also increases).
This is what I came up with.
// hello.c
#include <stdio.h>
int CreateFileSetSize(const char *file, int size)
{
FILE *pFile;
pFile = fopen(file, "w");
if (NULL == pFile)
{
return 1;
}
fseek(pFile, size, SEEK_SET);
fputc('\n', pFile);
fclose(pFile);
return 0;
}
int main(int argc, const char *argv[])
{
const char *fileName = "MyFile.txt";
int size = 1024;
int ret = 0;
if (3 == argc)
{
fileName = argv[1];
size = atoi(argv[2]);
}
ret = CreateFileSetSize(fileName, size);
return ret;
}
I apparently am not the only one to come up with this solution. I happened to find the following question right here on Stack Overflow.
How to create file of “x” size?
How to create file of "x" size?