What is the best way to create a empty text file of given length in C? Writing space or any special char is not an option. I mean it should directly create the file without any iteration up to file length or something.
It's pretty trivial to do. All you have to do is to seek the intended position and then write something:
#include <stdio.h>
const unsigned int wanted_size = 1048576;
int main(int argc, char **argv) {
FILE *fp = fopen("test.dat", "w+");
if (fp) {
// Now go to the intended end of the file
// (subtract 1 since we're writing a single character).
fseek(fp, wanted_size - 1, SEEK_SET);
// Write at least one byte to extend the file (if necessary).
fwrite("", 1, sizeof(char), fp);
fclose(fp);
}
return 0;
}
The example above will create a file that is 1 MB in length. Just keep in mind that the actual space will be allocated immediately, not just reserved.
This will also allow you to allocate files larger than your system memory. With the code above I'm able to instantly (< 1 ms) reserve a 1 GB large file on a Raspberry Pi (which only has 512 MB RAM) without having to use any kind of iteration.
You're also able to use any other way to write data to the position (like fputs()), it's just important that you actually write something. Calling fputs("", fp); won't necessarily extend the file as intended.
On Windows use SetFilePointer and SetEndOfFile, on Linux use truncate (which also increases).
This is what I came up with.
// hello.c
#include <stdio.h>
int CreateFileSetSize(const char *file, int size)
{
FILE *pFile;
pFile = fopen(file, "w");
if (NULL == pFile)
{
return 1;
}
fseek(pFile, size, SEEK_SET);
fputc('\n', pFile);
fclose(pFile);
return 0;
}
int main(int argc, const char *argv[])
{
const char *fileName = "MyFile.txt";
int size = 1024;
int ret = 0;
if (3 == argc)
{
fileName = argv[1];
size = atoi(argv[2]);
}
ret = CreateFileSetSize(fileName, size);
return ret;
}
I apparently am not the only one to come up with this solution. I happened to find the following question right here on Stack Overflow.
How to create file of “x” size?
How to create file of "x" size?
Related
So, I looked around the internet and a couple questions here and I couldn't find anything that could fix my problem here. I have an assignment for C programming, to write a program that allows user to enter words into a string, add more words, put all words in the string to a text file, delete all words in string, and when they exit it saves the words in a binary, which is loaded upon starting up the program again. I've gotten everything to work except where the binary is concerned.
I made two functions, one that loads the bin file when the program starts, one that saves the bin file when it ends. I don't know in which, or if in both, the problem starts. But basically I know it's not working right because I get garbage in my text file if I save it in a text file after the program loads the bin file into the string. I know for sure that the text file saver is working properly.
Thank you to anyone who takes the time to help me out, it's been an all-day process! lol
Here are the two snippets of my functions, everything else in my code seems to work so I don't want to blot up this post with the entire program, but if need be I'll put it up to solve this.
SIZE is a constant of 10000 to meet program specs of a 1000 words. But I couldn't get this to run even asking for only 10 elements or 1, just to clear that up
void loadBin(FILE *myBin, char *stringAll) {
myBin = fopen("myBin.bin", "rb");
if (myBin == NULL) {
saveBin(&myBin, stringAll);
}//if no bin file exists yet
fread(stringAll, sizeof(char), SIZE + 1, myBin);
fclose(myBin); }
/
void saveBin(FILE *myBin, char *stringAll) {
int stringLength = 0;
myBin = fopen("myBin.bin", "wb");
if (myBin == NULL) {
printf("Problem writing file!\n");
exit(-1);
stringLength = strlen(stringAll);
fwrite(&stringAll, sizeof(char), (stringLength + 1), myBin);
fclose(myBin); }
You are leaving bad values in your myBin FILE*, and passing the & (address) of a pointer.
Pass the filename, and you can (re) use the functions for other purposes, other files, et al.
char* filename = "myBin.bin";
Pass the filename, buffer pointer, and max size to read. You should consider using stat/fstat to discover file size
size_t loadBin(char *fn, char *stringAll, size_t size)
{
//since you never use myBin, keep this FILE* local
FILE* myBin=NULL;
if( NULL == (myBin = fopen(fn, "rb")) ) {
//create missing file
saveBin(fn, stringAll, 0);
}//if no bin file exists yet
size_t howmany = fread(stringAll, sizeof(char), size, myBin);
if( howmany < size ) printf("read fewer\n");
if(myBin) fclose(myBin);
return howmany;
}
Pass the file name, buffer pointer, and size to save
size_t saveBin(char *fn, char *stringAll, size_t size)
{
int stringLength = 0;
//again, why carry around FILE* pointer only used locally?
FILE* myBin=NULL;
if( NULL == (myBin = fopen("myBin.bin", "wb")) ) {
printf("Problem writing file!\n");
exit(-1);
}
//binary data may have embedded '\0' bytes, cannot use strlen,
//stringLength = strlen(stringAll);
size_t howmany = fwrite(stringAll, sizeof(char), size, myBin);
if( howmany < size ) printf("short write\n");
if(myBin) fclose(myBin);
return howmany;
}
Call these; you are not guaranteed to write & read the same sizes...
size_t buffer_size = SIZE;
char buffer[SIZE]; //fill this with interesting bytes
saveBin(filename, buffer, buffer_size);
size_t readcount = loadBin(filename, buffer, buffer_size);
I have some problem with fopen() function in C.
I'am parsed directory and put all the paths to char array(char**). After that i should to open all these files. And...
fopen returns "No such file or directory" for some files. And I Am really don't understand, why.
All paths are right. I checked it.
I have all privileges to
open these files.
If I copy path to file from error log and try
to open only this file via my programm - it works.
Others
programms don't work with these files(i think).
What can I do wrong?
int main(int argc, char *argv[]){
char** set = malloc(10000*sizeof(char*));
char* path = argv[1];
listdir(path, set); /* Just parse directory. Paths from the root. No problem in this function. all paths in the variable "set" are right */
int i=0;
while(i<files){ /* files is number of paths */
FILE* file = fopen(set[i++],"rb");
fseek(file, 0L, SEEK_END);
int fileSize = ftell(file);
rewind(file);
/*reading bytes from file to some buffer and close current file */
i++;
}
}
You increments 'i' twice. May be mistakenly?
You can get file size w/o open it using stat().
ftell() returns "long", don't cast it in "int" as it can be shorten and you loose correct value.
Try this code:
#include <stdio.h>
#include <sys/stat.h>
/* example of listdir, replace it with your real one */
int listdir(const char *path, char *set[])
{
set[0] = "0.txt";
set[1] = "1.txt";
set[2] = "2.txt";
set[3] = "3.txt";
set[4] = "4.txt";
return 5;
}
int main(int argc, char *argv[]) {
int files;
char *path = argv[1];
char **set = malloc(1000 * sizeof(char *));
files = listdir(path, set);
for (int i = 0; i < files; i++) {
struct stat st;
stat(set[i], &st);
printf("FileSize of %s is %zu\n", set[i], st.st_size);
}
free(set);
}
(I am guessing you are on some Posix system, hopefully Linux)
Probably your listdir is wrong. FWIW, if you use readdir(3) in it, you need to concatenate the directory name and the file name (with a / in between, perhaps using snprintf(3) or asprintf(3) for that purpose).
But surely,
FILE* file = fopen(set[i++],"rb"); ////WRONG
is doubly wrong. First, you are incrementing i++ twice (and here it is too early). Then, you should read fopen(3) and handle the failure case, at least with:
char* curpath = set[i];
assert (curpath != NULL);
FILE* file = fopen(curpath, "rb");
if (!file) { perror(curpath); exit(EXIT_FAILURE); };
testing the result of fopen against failure is mandatory. Notice that I am passing the same curpath to perror(3) on failure.
You may also want to check that your current working directory is what you expect. Use getcwd(2) for that.
Use also strace(1) (on Linux) to understand what system calls are done by your program.
I have created a framework to parse text files of reasonable size that can fit in memory RAM, and for now, things are going well. I have no complaints, however what if I encountered a situation where I have to deal with large files, say, greater than 8GB(which is the size of mine)?
What would be an efficient approach to deal with such large files?
My framework:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int Parse(const char *filename,
const char *outputfile);
int main(void)
{
clock_t t1 = clock();
/* ............................................................................................................................. */
Parse("file.txt", NULL);
/* ............................................................................................................................. */
clock_t t2 = clock();
fprintf(stderr, "time elapsed: %.4f\n", (double)(t2 - t1) / CLOCKS_PER_SEC);
fprintf(stderr, "Press any key to continue . . . ");
getchar();
return 0;
}
long GetFileSize(FILE * fp)
{
long f_size;
fseek(fp, 0L, SEEK_END);
f_size = ftell(fp);
fseek(fp, 0L, SEEK_SET);
return f_size;
}
char *dump_file_to_array(FILE *fp,
size_t f_size)
{
char *buf = (char *)calloc(f_size + 1, 1);
if (buf) {
size_t n = 0;
while (fgets(buf + n, INT_MAX, fp)) {
n += strlen(buf + n);
}
}
return buf;
}
int Parse(const char *filename,
const char *outputfile)
{
/* open file for reading in text mode */
FILE *fp = fopen(filename, "r");
if (!fp) {
perror(filename);
return 1;
}
/* store file in dynamic memory and close file */
size_t f_size = GetFileSize(fp);
char *buf = dump_file_to_array(fp, f_size);
fclose(fp);
if (!buf) {
fputs("error: memory allocation failed.\n", stderr);
return 2;
}
/* state machine variables */
// ........
/* array index variables */
size_t x = 0;
size_t y = 0;
/* main loop */
while (buf[x]) {
switch (buf[x]) {
/* ... */
}
x++;
}
/* NUL-terminate array at y */
buf[y] = '\0';
/* write buffer to file and clean up */
outputfile ? fp = fopen(outputfile, "w") :
fp = fopen(filename, "w");
if (!fp) {
outputfile ? perror(outputfile) :
perror(filename);
}
else {
fputs(buf, fp);
fclose(fp);
}
free(buf);
return 0;
}
Pattern deletion function based on the framework:
int delete_pattern_in_file(const char *filename,
const char *pattern, const char *outputfile)
{
/* open file for reading in text mode */
FILE *fp = fopen(filename, "r");
if (!fp) {
perror(filename);
return 1;
}
/* copy file contents to buffer and close file */
size_t f_size = GetFileSize(fp);
char *buf = dump_file_to_array(fp, f_size);
fclose(fp);
if (!buf) {
fputs("error - memory allocation failed", stderr);
return 2;
}
/* delete first match */
size_t n = 0, pattern_len = strlen(pattern);
char *tmp, *ptr = strstr(buf, pattern);
if (!ptr) {
fputs("No match found.\n", stderr);
free(buf);
return -1;
}
else {
n = ptr - buf;
ptr += pattern_len;
tmp = ptr;
}
/* delete the rest */
while (ptr = strstr(ptr, pattern)) {
while (tmp < ptr) {
buf[n++] = *tmp++;
}
ptr += pattern_len;
tmp = ptr;
}
/* copy the rest of the buffer */
strcpy(buf + n, tmp);
/* open file for writing and print the processed buffer to it */
outputfile ? fp = fopen(outputfile, "w") :
fp = fopen(filename, "w");
if (!fp) {
outputfile ? perror(outputfile) :
perror(filename);
}
else {
fputs(buf, fp);
fclose(fp);
}
free(buf);
return 0;
}
If you wish to stick with your current design, an option might be to mmap() the file instead of reading it into a memory buffer.
You could change the function dump_file_to_array to the following (linux-specific):
char *dump_file_to_array(FILE *fp, size_t f_size) {
buf = mmap(NULL, f_size, PROT_READ, MAP_SHARED, fileno(fp), 0);
if (buf == MAP_FAILED)
return NULL;
return buf;
}
Now you can read over the file, the memory manager will take automatically care to only hold the relevant potions of the file in memory.
For Windows, similar mechanisms exist.
Chances you are parsing the file line-by line. So read in a large block (4k or 16k) and parse all the lines in that. Copy the small remainder to the beginning of the 4k or 16k buffer and read in the rest of the buffer. Rinse and repeat.
For JSON or XML you will need an event based parser that can accept multiple blocks or input.
There are multiple issues with your approach.
The concept of maximum and available memory are not so evident: technically, you are not limited by the RAM size, but by the quantity of memory your environment will let you allocate and use for your program. This depends on various factors:
What ABI you compile for: the maximum memory size accessible to your program is limited to less than 4 GB if you compile for 32-bit code, even if your system has more RAM than that.
What quota the system is configured to let your program use. This may be less than available memory.
What strategy the system uses when more memory is requested than is physically available: most modern systems use virtual memory and share physical memory between processes and system tasks (such as the disk cache) using very advanced algorithms that cannot be describe in a few lines. It is possible on some systems for your program to allocate and use more memory than is physically installed on the motherboard, swapping memory pages to disk as more memory is accessed, at a huge cost in lag time.
There are further issues in your code:
The type long might be too small to hold the size of the file: on Windows systems, long is 32-bit even on 64-bit versions where memory can be allocated in chunks larger than 2GB. You must use different API to request the file size from the system.
You read the file with an series of calls to fgets(). This is inefficient, a single call to fread() would suffice. Furthermore, if the file contains embedded null bytes ('\0' characters), chunks from the file will be missing in memory. However you could not deal with embedded null bytes if you use string functions such as strstr() and strcpy() to handle your string deletion task.
the condition in while (ptr = strstr(ptr, pattern)) is an assignment. While not strictly incorrect, it is poor style as it confuses readers of your code and prevents life saving warnings by the compiler where such assignment-conditions are coding errors. You might think that could never happen, but anyone can make a typo and a missing = in a test is difficult to spot and has dire consequences.
you short-hand use of the ternary operator in place of if statements is quite confusing too: outputfile ? fp = fopen(outputfile, "w") : fp = fopen(filename, "w");
rewriting the input file in place is risky too: if anything goes wrong, the input file will be lost.
Note that you can implement the filtering on the fly, without a buffer, albeit inefficiently:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
if (argc < 2) {
fprintf(stderr, "usage: delpat PATTERN < inputfile > outputfile\n");
return 1;
}
unsigned char *pattern = (unsigned char*)argv[1];
size_t i, j, n = strlen(argv[1]);
size_t skip[n + 1];
int c;
skip[0] = 0;
for (i = j = 1; i < n; i++) {
while (memcmp(pattern, pattern + j, i - j)) {
j++;
}
skip[i] = j;
}
i = 0;
while ((c = getchar()) != EOF) {
for (;;) {
if (i < n && c == pattern[i]) {
if (++i == n) {
i = 0; /* match found, consumed */
}
break;
}
if (i == 0) {
putchar(c);
break;
}
for (j = 0; j < skip[i]; j++) {
putchar(pattern[j]);
}
i -= skip[i];
}
}
for (j = 0; j < i; j++) {
putchar(pattern[j]);
}
return 0;
}
First of all I wouldn't suggest holding such big files in RAM but instead using streams. This because buffering is usually done by the library as well as by the kernel.
If you are accessing the file sequentially, which seems to be the case, then you probably know that all modern systems implement read-ahead algorithms so just reading the whole file ahead of time IN RAM may in most cases just waste time.
You didn't specify the use-cases you have to cover so I'm going to have to assume that using streams like
std::ifstream
and doing the parsing on the fly will suit your needs. As a side note, also make sure your operations on files that are expected to be large are done in separate threads.
An alternative solution: If you're on linux systems, and you have a decent amount of swap space, just open the whole bad boy up. It will consume your ram and also consume harddrive space (swap). Thus you can have the entire thing open at once, just not all of it will be on the ram.
Pros
If an unexpected shut down occurred, the memory on the swap space is recoverable.
RAM is expensive, HDDs are cheap, so the application would put less strain on your expensive equipment
Virus could not harm your computer because there would be no room in RAM for them to run
You'll be taking full advantage of the Linux operating system by using the swap space. Normally the swap space module is not used and all it does is clog up precious ram.
The additional energy that is needed to utilize the entirety of the ram can warm the immediate area. Useful during winter time
You can add "Complex and Special Memory Allocation Engineering" to your resume.
Cons
None
Consider treating the file as an external array of lines.
Code can use an array of line indexes. This index array can be kept in memory at a fraction of the size of the large file. Access to any line is accomplished quickly via this lookup, a seek with fsetpos() and an fread()/fgets(). As the lines are edited, the new lines can be saved, in any order, in temporary text file. Saving of the file reads both the original file and temp one in sequence to form and write the new file.
typedef struct {
int attributes; // not_yet_read, line_offset/length_determined,
// line_changed/in_other_file, deleted, etc.
fpos_t line_offset; // use with fgetpos() fsetpos()
unsigned line_length; // optional field as code could re-compute as needed.
} line_index;
size_t line_count;
// read some lines
line_index *index = malloc(sizeof *index * line_count);
// read more lines
index = realloc(index, sizeof *index * line_count);
// edit lines, save changes to appended temporary file.
// ...
// Save file -weave the contents of the source file and temp file to the new output file.
Additionally, with enormous files, the array line_index[] itself can be realized in disk memory too. Access to is easily computed. In an extreme sense, only 1 line of the file needs to in memory at any time.
You mentioned state-machine. Every finite-state-automata can be optimized to have minimal (or no) lookahead.
Is it possible to do this in Lex? It will generate output c file which you can compile.
If you don't want to use Lex, you can always do following:
Read n chars into (ring?) buffer where n is size of pattern.
Try to match buffer with pattern
If match goto 1
Print buffer[0], read char, goto 2
Also for very long patterns and degenerate inputs strstr can be slow. In that case you might want to look into more advanced sting matching aglorithms.
mmap() is a pretty good way of working on files with large sizes.
It provides you with lot of flexibility but you need to be cautious with page size. Here is a good article which talks about more specifics.
I want to create a function that Copies a file to some location. I'm wondering weather it would be beneficial to read it in in 64kb blocks? Or should I just dynamically allocate the buffer? Or should I just use the system() function to do it on the command line?
I mean like this:
int copy_file(const char *source, const char *dest)
{
FILE *fsource, *fdest;
int readSize;
unsigned char buffer[64*1024]; //64kb in size
fsource = fopen(source, "rb");
fdest = fopen(dest, "wb");
if(!fsource)
return 0;
if(!fdest)
{
fclose(fsource);
return 0;
}
while(1)
{
readSize = fread(buffer, 1, sizeof(buffer), fsource);
if(!readSize)
break;
fwrite(buffer, 1, readSize, fdest);
}
fclose(fsource);
fclose(fdest);
return 1;
}
The optimal read size is going to be very platform dependent. A power of 2 is definitely a good idea, but without testing, it would be hard to say which size would be best.
If you want to see how cp copies files, you can see the bleeding edge source code
hi i have the following code below, where i try to get all the lines of a file into an array... for example if in file data.txt i have the following:
first line
second line
then in below code i want to get in data array the following:
data[0] = "first line";
data[1] = "second line"
My first question: Currently I am getting "Segmentation fault"... Why?
Exactly i get the following output:
Number of lines is 7475613
Segmentation fault
My second question: Is there any better way to do what i am trying do?
Thanks!!!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char* argv[])
{
FILE *f = fopen("data.txt", "rb");
fseek(f, 0, SEEK_END);
long pos = ftell(f);
fseek(f, 0, SEEK_SET);
char *bytes = malloc(pos);
fread(bytes, pos, 1, f);
int i =0;
int counter = 0;
for(; i<pos; i++)
{
if(*(bytes+i)=='\n') counter++;
}
printf("\nNumber of lines is %d\n", counter);
char* data[counter];
int start=0, end=0;
counter = 0;
int length;
for(i=0; i<pos; i++)
{
if(*(bytes+i)=='\n')
{
end = i;
length =end-start;
data[counter]=(char*)malloc(sizeof(char)*(length));
strncpy(data[counter],
bytes+start,
length);
counter = counter+1;
start = end+1;
}
}
free(bytes);
return 0;
}
First line of the data.txt in this case is not '\n' it is: "23454555 6346346 3463463".
Thanks!
You need to malloc 1 more char for data[counter] for the terminating NUL.
after strncpy, you need to terminate the destination string.
Edit after edit of original question
Number of lines is 7475613
Whooooooaaaaaa, that's a bit too much for your computer!
If the size of a char * is 4, you want to reserve 29902452 bytes (30M) of automatic memory in the allocation of data.
You can allocate that memory dynamically instead:
/* char *data[counter]; */
char **data = malloc(counter * sizeof *data);
/* don't forget to free the memory when you no longer need it */
Edit: second question
My second question: Is there any
better way to do what i am trying do?
Not really; you're doing it right. But maybe you can code without the need to have all that data in memory at the same time.
Read and deal with a single line at a time.
You also need to free(data[counter]); in a loop ... and free(data); before the "you're doing it right" above is correct :)
And you need to check if each of the several malloc() calls succeeded LOL
First of all you need to check if the file got opened correctly or not:
FILE *f = fopen("data.txt", "rb");
if(!f)
{
fprintf(stderr,"Error opening file");
exit (1);
}
If there is error opening the file and you don't check it, you'll get a seg fault when you try to fseek on an invalid file pointer.
Apart from that I see no errors. Tried running the program, by printing the value of the data array at the end, it ran as expected.
One thing to note is that you're opening your file as binary - line termination disciplines may not work as you expect on your platform (UNIX is lf, Windows is cr-lf, some versions of MacOS are cr).