To malloc or not to malloc, that is the question!

To malloc or not to malloc, that is the question! - c

Do I need to malloc when creating a file to write to?
The file will be based on the contents of 2 others, so would I need to malloc space for the writeable file of sizeof( file a ) + sizeof( file b) + 1?
Sorry if this makes no sense; if it doesn't then I guess I need to go read some more :D
Essentially, I have 2 txt files and a string sequence - I am writing each line of each file side by side separated by the string sequence.
txt file a
hello stack over
flow this
is a test
txt file b
jump the
gun i am
a novice
seperator == xx
output ==
hello stack overxxjump the
flow thisxxgun i am
is a testxxa novice

If you're writing it in order, can't you just use fprintf() or fwrite() whenever you need to write something out, instead of writing the entire file at once?
EDIT: Based on your update, here's basically what you have to do (probably not valid C since I'm not a C programmer):
EDIT2: With some help from msw:
const int BUFSIZE = 200;
FILE *firstFile = fopen("file1.txt", "r");
FILE *secondFile = fopen("file2.txt", "r");
FILE *outputFile = fopen("output.txt", "w");
char* seperator = "xx";
char firstLine[BUFSIZE], secondLine[BUFSIZE];
// start a loop here
fgets(firstLine, 200, firstFile);
fgets(secondLine, 200, secondFile);
// Remove '\n's from each line
fprintf(outputFile, "%s%s%s", firstLine, seperator, secondLine);
// end a loop here
fclose(outputFile);
fclose(firstFile);
fclose(secondFile);

You only need to malloc the entire size of a file if you need to hold the entire file in memory (and even then, you can probably use mmap or something instead). Allocate as much memory as you need for the data you intend to work with in memory: no more, no less.

Files are on disk, malloc is for RAM.
You'd only malloc if you needed space in memory to store the data PRIOR to writing out to the file, otherwise, typically you'd use a stack allocated buffer of say 8k to write to the file in chunks.
So taking your question as-is, no you'd rarely malloc just to write to a file.
If your goal is to keep the file in memory in completion, then you'd malloc sizeof file.

Related

Using fwrite in strange way

If I write
if((fp = fopen(some_path, "wb")))
{
int a = 50000;
fwrite("c",sizeof(char),a,fp);
fclose(fp);
fp = fopen(some_path, "rb");
char arr[50000];
fread(arr, sizeof(char), a, fp);
printf("%s\n", arr);
fclose(fp);
}
it prints "c" of course but the file is ~50kb
My questions are:
How is this actually working?
If I modify var a to 60000 the executable crashes, so i`m thinking about some internal buffer of fwrite. How do i get its max capacity?
What does fwrite() write to the file in order to get the file to ~50kb of size and still print only "c"(I was expecting some mambo-jumbo characters here)?
How wrong is this usage of the function, I want to write a blank file of a certain size really fast(with dummy data), would I be wrong exploiting this behaviour in order not to make a big buffer and use up memory to write "real" data but still reduce fwrite calls(I may need to write a 10 gb file for ex.)?

How is this actually working?
I would argue that it isn't working. You did something nonsense, and it went uncaught. The gave you the impression that it will work in the future. That's a failure.
If I modify var a to 60000 the executable crashes, so i`m thinking about some internal buffer of fwrite. How do i get its max capacity?
There's no buffer. You are merely accessing whatever is in memory after the c␀ created by "c". When it crashes, it's because you've reached a memory page that can't be read (e.g. hasn't been allocated).
What does fwrite() write to the file in order to get the file to ~50kb of size
Whatever happens to be in memory at the address returned by "c" and beyond.
and still print only "c"(I was expecting some mambo-jumbo characters here)?
It doesn't print only c. Try something like hexdump -C file or od -c file
How wrong is this usage of the function
Incredibly. It could crash for any value of a larger than 2.
I want to write a blank file of a certain size really fast(with dummy data)
The docs for truncate says: "If the file previously was shorter, it is extended, and the extended part reads as null bytes ('\0')." So you could use the following:
if (truncate(path, length)) {
perror("truncate");
exit(1);
}

How does fgets() keep track of what line it's on?

This code correctly reads a file line-by-line, stores each line in line[] and prints it.
int beargit_add(const char* filename) {
FILE* findex = fopen(".beargit/.index", "r");
char line[FILENAME_SIZE];
while(fgets(line, sizeof(line), findex)) {
strtok(line, "\n");
fprintf(stdout, "%s\n", line);
}
fclose(findex);
return 0;
}
However, I am baffled as to why using fgets() in the while loop actually reads the file line-by-line. I am brand new to C after having learned Python and Java.
Since each call to fgets() is independent, where is C remembering which line it is currently on each time you call it? I thought it might have to do with changing the value FILE* index points to, but you are passing the pointer into fgets() by value so it could not be modified.
Any help in understanding the magic of C is greatly appreciated!

It's not fgets keep track, findex does this for you
findex is a FILE type, which includes infomation about a open file such as file descriptor, file offset
FILE is a encapsulation of I/O in OS.
more about: FILE
Object type that identifies a stream and contains the information needed to control it, including a pointer to its buffer, its position indicator and all its state indicators.
and the offset is to keep track of the file, each time you read the file, it starts from the offset. All these work are done by FILE, it do for you, and for fgets
more info about offset offset wiki

I thought it might have to do with changing the value FILE* index points to
it's not the value of the pointer itself that is changed. The pointer points to a structure (of type FILE) which contains information about the stream associated with that structure/pointer. Presumably, one of the members of that structure contains a "cursor" that points to the next byte to be read.
Or maybe it's just got a file descriptor int (like on many Unices) and I/O functions just call out to the kernel in order to obtain information about the file descriptor, including the current position.

Effective methods for reading and writing large files in C

I'm writing an application that deals with very large user-generated input files. The program will copy about 95 percent of the file, effectively duplicating it and switching a few words and values in the copy, and then appending the copy (in chunks) to the original file, such that each block (consisting of between 10 and 50 lines) in the original is followed by the copied and modified block, and then the next original block, and so on. The user-generated input conforms to a certain format, and it is highly unlikely that any line in the original file is longer than 100 characters long.
Which would be the better approach?
To use one file pointer and use variables that hold the current position of how much has been read and where to write to, seeking the file pointer back and forth to read and write; or
To use multiple file pointers, one for reading and one for writing.
I am mostly concerned with the efficiency of the program, as the input files will reach up to 25,000 lines, each about 50 characters long.

If you have memory constraints, or you want a generic approach, read bytes into a buffer from one file pointer, make changes, and write out the buffer to a second file pointer when the buffer is full. If you reach EOF on the first pointer, make your changes and just flush whatever is in the buffer to the output pointer. If you intend to replace the original file, copy the output file to the input file and remove the output file. This "atomic" approach lets you check that the copy operation took place correctly before deleting anything.
For example, to deal with generically copying over any number of bytes, say, 1 MiB at a time:
#define COPY_BUFFER_MAXSIZE 1048576
/* ... */
unsigned char *buffer = NULL;
buffer = malloc(COPY_BUFFER_MAXSIZE);
if (!buffer)
exit(-1);
FILE *inFp = fopen(inFilename, "r");
fseek(inFp, 0, SEEK_END);
uint64_t fileSize = ftell(inFp);
rewind(inFp);
FILE *outFp = stdout; /* change this if you don't want to write to standard output */
uint64_t outFileSizeCounter = fileSize;
/* we fread() bytes from inFp in COPY_BUFFER_MAXSIZE increments, until there is nothing left to fread() */
do {
if (outFileSizeCounter > COPY_BUFFER_MAXSIZE) {
fread(buffer, 1, (size_t) COPY_BUFFER_MAXSIZE, inFp);
/* -- make changes to buffer contents at this stage
-- if you resize the buffer, then copy the buffer and
change the following statement to fwrite() the number of
bytes in the copy of the buffer */
fwrite(buffer, 1, (size_t) COPY_BUFFER_MAXSIZE, outFp);
outFileSizeCounter -= COPY_BUFFER_MAXSIZE;
}
else {
fread(buffer, 1, (size_t) outFileSizeCounter, inFp);
/* -- make changes to buffer contents at this stage
-- again, make a copy of buffer if it needs resizing,
and adjust the fwrite() statement to change the number
of bytes that need writing */
fwrite(buffer, 1, (size_t) outFileSizeCounter, outFp);
outFileSizeCounter = 0ULL;
}
} while (outFileSizeCounter > 0);
free(buffer);
An efficient way to deal with a resized buffer is to keep a second pointer, say, unsigned char *copyBuffer, which is realloc()-ed to twice the size, if necessary, to deal with accumulated edits. That way, you keep expensive realloc() calls to a minimum.
Not sure why this got downvoted, but it's a pretty solid approach for doing things with a generic amount of data. Hope this helps someone who comes across this question, in any case.

25000 lines * 100 characters = 2.5MB, that's not really a huge file. The fastest will probably be to read the whole file in memory and write your results to a new file and replace the original with that.

C file streams, inserting at the beginning

Is there a simple way to insert something at the beginning of a text file using file streams? Because the only way I can think of, is to load a file into a buffer, write text-to-append and then write the buffer. I want to know if it is possible to do it without my buffer.

No, its not possible. You'd have to rewrite the file to insert text at the beginning.
EDIT: You could avoid reading the whole file into memory if you used a temporary file ie:
Write the value you want inserted at the beginning of the file
Read X bytes from the old file
Write those X bytes to the new file
Repeat 2,3 until you are done reading the old file
Copy the new file to the old file.

There is no simple way, because the actual operation is not simple. When the file is stored on the disk, there are no empty available bytes before the beginning of the file, so you can't just put data there. There isn't an ideal generic solution to this -- usually, it means copying all of the rest of the data to move it to make room.
Thus, C makes you decide how you want to solve that problem.

Just wanted to counter some of the more absolute claims in here:
There is no way to append data to the beginning of a file.
Incorrect, there are, given certain constraints.
When the file is stored on the disk, there are no empty available bytes before the beginning of the file, so you can't just put data there.
This may be the case when dealing at the abstraction level of files as byte streams. However, file systems most often store files as a sequence of blocks, and some file systems allow a bit more free access at that level.
Linux 4.1+ (XFS) and 4.2+ (XFS, ext4) allows you to insert holes into files using fallocate, given certain offset/length constraints:
Typically, offset and len must be a multiple of the filesystem logical block size, which varies according to the filesystem type and configuration.
Examples on StackExchange sites can be found by web searching for 'fallocate prepend to file'.

There is no way to append data to the beginning of a file.
The questioner also says that the only way they thought of solving the problem was by reading the whole file into memory and writing it out again. Here are other methods.
Write a placeholder of zeros of a known length. You can rewind the file handler and write over this data, so long as you do not exceed the placeholder size.
A simplistic example is writing the size of an unsigned int at the start that represents the count of lines that will follow, but will not be able to fill in until you reached the end and can rewind the file handler and rewrite the correct value.
Note: Some versions of 'C' on different platforms insist you finally place the file handler at the end of file before closing the file handler for this to work correctly.
Write the new data to a new file and then using file streams append the old data to the new file. Delete the old file and then rename the new file as the old file name. Do not use copy, it is a waste of time.
All methods have trade offs of disk size versus memory and CPU usage. It all depends on your application requirements.

Not strictly a solution, but if you're adding a short string to a long file, you can make a looping buffer of the same length you want to add, and sort of roll the extra characters out:
//also serves as the buffer; the null char gives the extra char for the begining
char insRay[] = "[inserted text]";
printf("inserting:%s size of buffer:%ld\n", insRay,sizeof(insRay));
//indecies to read in and out to the file
int iRead = sizeof(insRay)-1;
int iWrite = 0;
//save the input, so we only read once
int in = '\0';
do{
in = fgetc(fp);
//don't go to next char in the file
ungetc(in,fp);
if(in != EOF){
//preserve og file char
insRay[iRead] = in;
//loop
iRead++;
if(iRead == sizeof(insRay))
iRead = 0;
//insert or replace chars
fputc(insRay[iWrite],fp);
//loop
iWrite++;
if(iWrite == sizeof(insRay))
iWrite = 0;
}
}while(in != EOF);
//add buffer to the end of file, - the char that was null before
for(int i = 0; i < sizeof(insRay)-1;i++){
fputc(insRay[iWrite],fp);
iWrite++;
if(iWrite == sizeof(insRay))
iWrite = 0;
}

Unexpected output copying file in C

In another question, the accepted answer shows a method for reading the contents of a file into memory.
I have been trying to use this method to read in the content of a text file and then copy it to a new file. When I write the contents of the buffer to the new file, however, there is always some extra garbage at the end of the file. Here is an example of my code:
inputFile = fopen("D:\\input.txt", "r");
outputFile = fopen("D:\\output.txt", "w");
if(inputFile)
{
//Get size of inputFile
fseek(inputFile, 0, SEEK_END);
inputFileLength = ftell(inputFile);
fseek(inputFile, 0, SEEK_SET);
//Allocate memory for inputBuffer
inputBuffer = malloc(inputFileLength);
if(inputBuffer)
{
fread (inputBuffer, 1, inputFileLength, inputFile);
}
fclose(inputFile);
if(inputBuffer)
{
fprintf(outputFile, "%s", inputBuffer);
}
//Cleanup
free(inputBuffer);
fclose(outputFile);
}
The output file always contains an exact copy of the input file, but then has the text "MPUTERNAM2" appended to the end. Can anyone shed some light as to why this might be happening?

You may be happier with
int numBytesRead = 0;
if(inputBuffer)
{
numBytesRead = fread (inputBuffer, 1, inputFileLength, inputFile);
}
fclose(inputFile);
if(inputBuffer)
{
fwrite( inputBuffer, 1, numBytesRead, outputFile );
}
It doesn't need a null-terminated string (and therefore will work properly on binary data containing zeroes)

Because you are writing the buffer as if it were a string. Strings end with a NULL, the file you read does not.
You could NULL terminate your string, but a better solution is to use fwrite() instead of fprintf(). This would also let you copy files that contain NULL characters.
Unless you know the input file will always be small, you might consider reading/writing in a loop so that you can copy files larger than memory.

You haven't allocated enough space for the terminating null character in your buffer (and you also forget to actually set it), so your fprintf is effectively overreading into some other memory. Your buffer is exactly the same size as the file, and is filled with its content, however, fprintf reads the parameter looking for the terminating null, which isn't there, until a couple of characters later where, coincidently, there is one.
EDIT
You're actually mixing two types of io, fread (which is paired with fwrite) and fprintf (which is paired with fscanf). You should probably be doing fwrite with the number of bytes to write; or conversely, use fscanf, which would null-terminate your string (although, this wouldn't allow nulls in your string).

Allocating memory to fit the file is actually quite a bad way of doing it, especially the way it's done here. If the malloc() fails, no data is written to the output file (and it fails silently). In other words, you can't copy files greater than a few gigabytes on a 32-bit platform due to the address space limitations.
It's actually far better to use a smaller memory chunk (allocated or on the stack) and read/write the file in chunks. The reads and writes will be buffered anyway and, as long as you make the chunks relatively large, the overhead of function calls to the C runtime libraries is minimal.
You should always copy files in binary mode as well, it's faster since there's no chance of translation.
Something like:
FILE *fin = fopen ("infile","rb"); // make sure you check these for NULL return
FILE *fout = fopen ("outfile","wb");
char buff[1000000]; // or malloc/check-null if you don't have much stack space.
while ((count = fread (buff, 1, sizeof(buff), fin)) > 0) {
// Check count == -1 and errno here.
fwrite (buff, 1, count, fout); // and check return value.
}
fclose (fout);
fclose (fin);
This is from memory but provides the general idea of how to do it. And you should always have copiuos error checking.

fprintf expects inputBuffer to be null-terminated, which it isn't. So it's reading past the end of inputBuffer and printing whatever's there (into your new file) until it finds a null character.
In this case you could malloc an extra byte and put a null as the last character in inputBuffer.

In addition to what other's have said: You should also open your files in binary-mode - otherwise, you might get unexpected results on Windows (or other non-POSIX systems).

You can use
fwrite (inputBuffer , 1 , inputFileLength , outputFile );
instead of fprintf, to avoid the zero-terminated string problem. It also "matches better" with fread :)

Try using fgets instead, it will add the null for you at the end of the string. Also as was said above you need one more space for the null terminator.
ie
The string "Davy" is represented as the array that contains D,a,v,y,\0 (without the commas). Basically your array needs to be at least sizeofstring + 1 to hold the null terminator. Also fread will not automatically add the terminator, which is why even if your file is way shorter than the maximum length you get garbage..
Note an alternative method for being lazy is just to use calloc which sets the string to 0. But still you should only fread inputFileLength-1 characters at most.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

To malloc or not to malloc, that is the question! - c

You only need to malloc the entire size of a file if you need to hold the entire file in memory (and even then, you can probably use mmap or something instead). Allocate as much memory as you need for the data you intend to work with in memory: no more, no less.

Related

Using fwrite in strange way

How does fgets() keep track of what line it's on?

Effective methods for reading and writing large files in C

C file streams, inserting at the beginning

Unexpected output copying file in C

Categories

Resources