C reading a writing files - c

The program I wrote gets the size of one file, reads partSize amount of bytes from that file and writes partSize amount of bytes to a newly created file. The problem is that it only works for small text files. If I try to run the program with text file of a few hundred lines or a picture I get a segmentation fault and significantly less than partSize bytes are stored to the new file.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <errno.h>
int main()
{
int createDescriptor;
int openDescriptorOriginal;
int closeCreateDescriptor;
// char fileNameOriginal[] = "picture.jpg";
char fileNameOriginal[] = "myFile.txt";
int parts;
int partSize;
parts=2;
int bytesRemaining;
int partNumber;
char BUFFER[512];
int readDescriptor;
int buffer[1];
oid *pbuffer = &buffer;
int bytes, infile, outfile;
if ((openDescriptorOriginal = open(fileNameOriginal, O_RDONLY )) == -1)
{
printf("Error opening %s", fileNameOriginal);
exit(EXIT_FAILURE);
}
struct stat buf;
int r = fstat(openDescriptorOriginal, &buf);
if (r)
{
fprintf(stderr, "error: fstat: %s\n", (char *) strerror(errno));
exit(1);
}
int originalFileSize = buf.st_size;
printf("The file is %.9f bytes large.\n",(double)originalFileSize);
partSize = ((originalFileSize + parts) - 1)/parts;
printf("Part size: %.9f bytes large\n",(double)partSize);
umask(0000);
//create and open new file
if ( (outfile = open("NewPicture.jpg", O_CREAT|O_WRONLY,0777))==-1 )
{
printf("ERROR %s\n", "NewPicture.jpg");
}
ssize_t count, total;
total = 0;
char *bufff = BUFFER;
while (partSize) {
count = read(openDescriptorOriginal, bufff, partSize);
if (count < 0) {
break;
}
if (count == 0)
break;
bufff += count;
total += count;
partSize -= count;
}
write (outfile, BUFFER, total);
printf("\n");
return 0;
}

You are using buffer with 512 bytes only.
BUFFER[512];
if the content in that file goes beyond this limit seg fault will occurs.

Your buffer is too small. You need a larger buffer variable. If your file size is more than 512 bytes you will have segfault.
Ideally, you should read from the file in fixed chunks. That is, read maybe 30-40 or a constant number of characters in every read and then write it to the new file. Repeat until the complete file has been read.

count = read(openDescriptorOriginal, bufff, partSize); In this line 3rd argument is wrong,
In your code you have defined char BUFFER[512]; use BUFFER to read from file just 511 bytes at a time.
count = read(openDescriptorOriginal, BUFFER, 512);
Reason why not working with big likes:
If partSize > then 512 then there may be buffer overrun(buffer overflow) happen. that's why your does not work for large files. Because the read() function shall attempt to read partSize bytes from the file associated with the open file descriptor openDescriptorOriginal, fildes, into the buffer pointed to by BUFFER that is just of 512 bytes long. This buffer overrun is cause of segmentation fault in your program.
If file size is small then code will work.
I have corrected your code some extend:
ssize_t count=0, total=0;
total = 0;
char *bufff = calloc(partSize+1, sizeof(char));
char *b = bufff;
while (partSize > 0) {
count = read(openDescriptorOriginal, b, 512);
if (count < 0) {
break;
}
if (count == 0)
break;
b = b + count;
total = total + count;
partSize = partSize - count;
}
write (outfile, bufff, total);
close(openDescriptorOriginal);
close(outfile);

this here makes no sense
partSize = ((originalFileSize + parts) - 1)/parts;
you have initialized parts to two then you add two to the original file size and subtract one and then divide with two again even though at the end of the day your buffer size is 512?
what you need to do is to use the buffer size when reading from the file and checking how many bytes were actually read, this value you subtract from the original file size, repeat until the actual read bytes is smaller than buffer size and/or when original file size is 0.
also it would probably be better if you use file buffered I/O i.e. fopen/fread/fwrite/fclose - if you do not have any special reason why you want to use unbuffered I/O.

Related

Trying to read an unknown string length from a file using fgetc()

So yeah, saw many similar questions to this one, but thought to try solving it my way. Getting huge amount of text blocks after running it (it compiles fine).
Im trying to get an unknown size of string from a file. Thought about allocating pts at size of 2 (1 char and null terminator) and then use malloc to increase the size of the char array for every char that exceeds the size of the array.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char *pts = NULL;
int temp = 0;
pts = malloc(2 * sizeof(char));
FILE *fp = fopen("txtfile", "r");
while (fgetc(fp) != EOF) {
if (strlen(pts) == temp) {
pts = realloc(pts, sizeof(char));
}
pts[temp] = fgetc(fp);
temp++;
}
printf("the full string is a s follows : %s\n", pts);
free(pts);
fclose(fp);
return 0;
}
You probably want something like this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define CHUNK_SIZE 1000 // initial buffer size
int main()
{
int ch; // you need int, not char for EOF
int size = CHUNK_SIZE;
char *pts = malloc(CHUNK_SIZE);
FILE* fp = fopen("txtfile", "r");
int i = 0;
while ((ch = fgetc(fp)) != EOF) // read one char until EOF
{
pts[i++] = ch; // add char into buffer
if (i == size + CHUNK_SIZE) // if buffer full ...
{
size += CHUNK_SIZE; // increase buffer size
pts = realloc(pts, size); // reallocate new size
}
}
pts[i] = 0; // add NUL terminator
printf("the full string is a s follows : %s\n", pts);
free(pts);
fclose(fp);
return 0;
}
Disclaimers:
this is untested code, it may not work, but it shows the idea
there is absolutely no error checking for brevity, you should add this.
there is room for other improvements, it can probably be done even more elegantly
Leaving aside for now the question of if you should do this at all:
You're pretty close on this solution but there are a few mistakes
while (fgetc(fp) != EOF) {
This line is going to read one char from the file and then discard it after comparing it against EOF. You'll need to save that byte to add to your buffer. A type of syntax like while ((tmp=fgetc(fp)) != EOF) should work.
pts = realloc(pts, sizeof(char));
Check the documentation for realloc, you'll need to pass in the new size in the second parameter.
pts = malloc(2 * sizeof(char));
You'll need to zero this memory after acquiring it. You probably also want to zero any memory given to you by realloc, or you may lose the null off the end of your string and strlen will be incorrect.
But as I alluded to earlier, using realloc in a loop like this when you've got a fair idea of the size of the buffer already is generally going to be non-idiomatic C design. Get the size of the file ahead of time and allocate enough space for all the data in your buffer. You can still realloc if you go over the size of the buffer, but do so using chunks of memory instead of one byte at a time.
Probably the most efficient way is (as mentioned in the comment by Fiddling Bits) is to read the whole file in one go (after first getting the file's size):
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/stat.h>
int main()
{
size_t nchars = 0; // Declare here and set to zero...
// ... so we can optionally try using the "stat" function, if the O/S supports it...
struct stat st;
if (stat("txtfile", &st) == 0) nchars = st.st_size;
FILE* fp = fopen("txtfile", "rb"); // Make sure we open in BINARY mode!
if (nchars == 0) // This code will be used if the "stat" function is unavailable or failed ...
{
fseek(fp, 0, SEEK_END); // Go to end of file (NOTE: SEEK_END may not be implemented - but PROBABLY is!)
// while (fgetc(fp) != EOF) {} // If your system doesn't implement SEEK_END, you can do this instead:
nchars = (size_t)(ftell(fp)); // Add one for NUL terminator
}
char* pts = calloc(nchars + 1, sizeof(char));
if (pts != NULL)
{
fseek(fp, 0, SEEK_SET); // Return to start of file...
fread(pts, sizeof(char), nchars, fp); // ... and read one great big chunk!
printf("the full string is a s follows : %s\n", pts);
free(pts);
}
else
{
printf("the file is too big for me to handle (%zu bytes)!", nchars);
}
fclose(fp);
return 0;
}
On the issue of the use of SEEK_END, see this cppreference page, where it states:
Library implementations are allowed to not meaningfully support SEEK_END (therefore, code using it has no real standard portability).
On whether or not you will be able to use the stat function, see this Wikipedia page. (But it is now available in MSVC on Windows!)

Copy a file with buffers of different sizes for read and write

I have been doing some practice problems for job interviews and I came across a function that I can't wrap my mind on how to tackle it. The idea is to create a function that takes the name of two files, and the allowed buffer size to read from file1 and allowed buffer size for write to file2. if the buffer size is the same, I know how to go trough the question, but I am having problems figuring how to move data between the buffers when the sizes are of different. Part of the constraints is that we have to always fill the write buffer before writing it to file. if file1 is not a multiple of file2, we pad the last buffer transfer with zeros.
// input: name of two files made for copy, and their limited buffer sizes
// output: number of bytes copied
int fileCopy(char* file1,char* file2, int bufferSize1, int bufferSize2){
int bytesTransfered=0;
int bytesMoved=o;
char* buffer1, *buffer2;
FILE *fp1, *fp2;
fp1 = fopen(file1, "r");
if (fp1 == NULL) {
printf ("Not able to open this file");
return -1;
}
fp2 = fopen(file2, "w");
if (fp2 == NULL) {
printf ("Not able to open this file");
fclose(fp1);
return -1;
}
buffer1 = (char*) malloc (sizeof(char)*bufferSize1);
if (buffer1 == NULL) {
printf ("Memory error");
return -1;
}
buffer2 = (char*) malloc (sizeof(char)*bufferSize2);
if (buffer2 == NULL) {
printf ("Memory error");
return -1;
}
bytesMoved=fread(buffer1, sizeof(buffer1),1,fp1);
//TODO: Fill buffer2 with maximum amount, either when buffer1 <= buffer2 or buffer1 > buffer2
//How do I iterate trough file1 and ensuring to always fill buffer 2 before writing?
bytesTransfered+=fwrite(buffer2, sizeof(buffer2),1,fp2);
fclose(fp1);
fclose(fp2);
return bytesTransfered;
}
How should I write the while loop for the buffer transfers before the fwrites?
I am having problems figuring how to move data between the buffers when the sizes are of different
Layout a plan. For "some practice problems for job interviews", a good plan and ability to justify it is important. Coding, although important, is secondary.
given valid: 2 FILE *, 2 buffers and their sizes
while write active && read active
while write buffer not full && reading active
if read buffer empty
read
update read active
append min(read buffer length, write buffer available space) of read to write buffer
if write buffer not empty
pad write buffer
write
update write active
return file status
Now code it. A more robust solution would use a struct to group the FILE*, buffer, size, offset, length, active variables.
// Return true on problem
static bool rw(FILE *in_s, void *in_buf, size_t in_sz, FILE *out_s,
void *out_buf, size_t out_sz) {
size_t in_offset = 0;
size_t in_length = 0;
bool in_active = true;
size_t out_length = 0;
bool out_active = true;
while (in_active && out_active) {
// While room for more data
while (out_length < out_sz && in_active) {
if (in_length == 0) {
in_offset = 0;
in_length = fread(in_buf, in_sz, 1, in_s);
in_active = in_length > 0;
}
// Append a portion of `in` to `out`
size_t chunk = min(in_length, out_sz - out_length);
memcpy((char*) out_buf + out_length, (char*) in_buf + in_offset, chunk);
out_length += chunk;
in_length -= chunk;
in_offset += chunk;
}
if (out_length > 0) {
// Padding only occurs, maybe, on last write
memset((char*) out_buf + out_length, 0, out_sz - out_length);
out_active = fwrite(out_buf, out_sz, 1, out_s) == out_sz;
out_length = 0;
}
}
return ferror(in_s) || ferror(out_s);
}
Other notes;
Casting malloc() results not needed. #Gerhardh
// buffer1 = (char*) malloc (sizeof(char)*bufferSize1);
buffer1 = malloc (sizeof *buffer1 * bufferSize1);
Use stderr for error messages. #Jonathan Leffler
Open the file in binary.
size_t is more robust for array/buffer sizes than int.
Consider sizeof buffer1 vs. sizeof (buffer1) as parens not needed with sizeof object
while(bytesMoved > 0) {
for(i=0; i<bytesMoved && i<bufferSize2; i++)
buffer2[i]=buffer1[i];
bytesTransfered+=fwrite(buffer2, i,1,fp2);
bytesMoved-=i;
}
If bufferSize1 is smaller than the filesize you need an outer loop.
As the comments to your question have indicated, this solution is not the best way to transfer data from 1 file to another file. However, your case has certain restrictions, which this solution accounts for.
(1) Since you are using a buffer, you do not need to read and write 1 char at a time, but instead you can make as few calls to those functions possible.
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
:from the man page for fread, nmemb can = bufferSize1
(2) You will need to check the return from fread() (i.e. bytesMoved) and compare it with both of the bufferSize 1 and 2. If (a) bytesMoved (i.e. return from fread()) is equal to bufferSize1 or if (b) bufferSize2 is less than bufferSize1 or the return from fread(), then you know that there is still data that needs to be read (or written). So, therefore you should begin the next transfer of data, and when completed return to the previous step you left off on.
Note: The pointer to the File Stream in fread() and fwrite() will begin where it left off in the event that the data is larger than the bufferSizes.
PseudoCode:
/* in while() loop continue reading from file 1 until nothing is left to read */
while (bytesMoved = fread(buffer1, sizeof(buffer1), bufferSize1, fp1))
{
/* transfer from buffer1 to buffer2 */
for(i = 0; i < bytesMoved && i < bufferSize2; i++)
buffer2[i] = buffer1[i];
buffer2[i] = '\0';
iterations = 1; /* this is just in case your buffer2 is super tiny and cannot store all from buffer1 */
/* in while() loop continue writing to file 2 until nothing is left to write
to upgrade use strlen(buffer2) instead of bufferSize2 */
while (bytesTransfered = fwrite(buffer2, sizeof(buffer2), bufferSize2, fp2))
{
/* reset buffer2 & write again from buffer1 to buffer2 */
for(i = bufferSize2 * iterations, j = 0; i < bytesMoved && j < bufferSize2; i++, j++)
buffer2[j] = buffer1[i];
buffer2[j] = '\0';
iterations++;
}
/* mem reset buffer1 to prepare for next data transfer*/
}

C read all input once optimization

first I'm looking for optimization, fast time execution
I would like to read data from input in C so here is my code (Linux)
int main(void) {
char command_str[MAX_COMMAND_SIZE];
while (!feof(stdin)) {
fgets(command_str, MAX_COMMAND_SIZE, stdin);
// Parse data
}
return EXIT_SUCCESS;
}
According to this post Read a line of input faster than fgets? read() function seems to be the solution.
The data input is like:
100 C
1884231 B
8978456 Z
...
From a file, so I execute my program like ./myapp < mytext.txt
It is not possible to know how many entries there is, it's could be 10, 10000 or even more.
From this post
Drop all the casts on malloc and realloc; they aren't necessary and clutter up the code
So if I use a dynamic array my app will be slower I think.
The idea is:
Read the whole input in one go into a buffer.
Process the lines from that buffer.
That's the fastest possible solution.
If someone would help me. Thanks in advance.
while (!feof(f)) is always wrong. Use this instead:
#include <stdio.h>
int main(void) {
char command_str[MAX_COMMAND_SIZE];
while (fgets(command_str, MAX_COMMAND_SIZE, stdin)) {
// Parse data
}
return EXIT_SUCCESS;
}
Reading file contents faster than fgets() is feasible, but seems beyond your skill level. Learn the simple stuff first. There is an awful lot that can be achieved with standard line by line readers... Very few use cases warrant the use of more advanced approaches.
If you want to read the whole input and parse it as a single string, here is a generic solution that should work for all (finite) input types:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
size_t pos = 0, size = 1025, nread;
char *buf0 = malloc(size);
char *buf = buf0;
for (;;) {
if (buf == NULL) {
fprintf(stderr, "not enough memory for %zu bytes\n", size);
free(buf0);
exit(1);
}
nread = fread(buf + pos, 1, size - pos - 1, stdin);
if (nread == 0)
break;
pos += nread;
/* Grow the buffer size exponentially (Fibonacci ratio) */
if (size - pos < size / 2)
size += size / 2 + size / 8;
buf = realloc(buf0 = buf, size);
}
buf[pos] = '\0';
// parse pos bytes of data in buf as a string
printf("read %zu bytes\n", strlen(buf));
free(buf);
return EXIT_SUCCESS;
}
Maybe you could use fseek (stdin, 0, SEEK_END) to go to the end of the standard input stream, then use ftell (stdin) to get its size in bytes, then allocate memory to save all that in a buffer and then process it's contents.

Trying to make program that counts number of bytes in a specified file (in C)

I am currently attempting to write a program that will tell it's user how many times the specified 8-bit byte appears in the specified file.
I have some ground work laid out, but when it comes to making sure that the file makes it in to an array or buffer or whatever format I should put the file data into to check for the bytes, I feel I'm probably very far off from using the correct methods.
After that, I need to check whatever the file data gets put in to for the byte specified, but I am also unsure how to do this.
I think I may be over-complicating this quite a bit, so explaining anything that needs to be changed or that can just be scrapped completely is greatly appreciated.
Hopefully didn't leave out any important details.
Everything seems to be running (this code compiles), but when I try to printf the final statement at the bottom, it does not spit out the statement.
I have a feeling I just did not set up the final for loop correctly at all..
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
//#define BUFFER_SIZE (4096)
main(int argc, char *argv[]){ //argc = arg count, argv = array of arguements
char buffer[4096];
int readBuffer;
int b;
int byteCount = 0;
b = atoi(argv[2]);
FILE *f = fopen(argv[1], "rb");
unsigned long count = 0;
int ch;
if(argc!=3){ /* required number of args = 3 */
fprintf(stderr,"Too few/many arguements given.\n");
fprintf(stderr, "Proper usage: ./bcount path byte\n");
exit(0);
}
else{ /*open and read file*/
if(f == 0){
fprintf(stderr, "File could not be opened.\n");
exit(0);
}
}
if((b <= -1) || (b >= 256)){ /*checks to see if the byte provided is between 0 & 255*/
fprintf(stderr, "Byte provided must be between 0 and 255.\n");
exit(0);
}
else{
printf("Byte provided fits in range.\n");
}
int i = 0;
int k;
int newFile[i];
fseek(f, 0, SEEK_END);
int lengthOfFile = ftell(f);
for(k = 0; k < sizeof(buffer); k++){
while(fgets(buffer, lengthOfFile, f) != NULL){
newFile[i] = buffer[k];
i++;
}
}
if(newFile[i] = buffer[k]){
printf("same size\n");
}
for(i = 0; i < sizeof(newFile); i++){
if(b == newFile[i]){
byteCount++;
}
printf("Final for loop is working???"\n");
}
}
OP is mixing fgets() with binary reads of a file.
fgets() reads a file up to the buffer size provided or reaching a \n byte. It is intended for text processing. The typical way to determine how much data was read via fgets() is to look for a final \n - which may or may not be there. The data read could have embedded NUL bytes in it so it becomes problematic to know when to stop scanning the buffer. on a NUL byte or a \n.
Fortunately this can all be dispensed with, including the file seek and buffers.
// "rb" should be used when looking at a file in binary. C11 7.21.5.3 3
FILE *f = fopen(argv[1], "rb");
b = atoi(argv[2]);
unsigned long byteCount = 0;
int ch;
while ((ch = fgetc(f)) != EOF) {
if (ch == b) {
byteCount++;
}
}
The OP error checking is good. But the for(k = 0; k < sizeof(buffer); k++){ loop and its contents had various issues. OP had if(b = newFile[i]){ which should have been if(b == newFile[i]){
Not really an ANSWER --
Chux corrected the code, this is just more than fits in a comment.
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
struct stat st;
int rc=0;
if(argv[1])
{
rc=stat(argv[1], &st);
if(rc==0)
printf("bytes in file %s: %ld\n", argv[1], st.st_size);
else
{
perror("Cannot stat file");
exit(EXIT_FAILURE);
}
return EXIT_SUCCESS;
}
return EXIT_FAILURE;
}
The stat() call is handy for getting file size and for determining file existence at the same time.
Applications use stat instead of reading the whole file, which is great for gigantic files.

Size of each record in a binary file

I have a binary file and I will be using fread to read the data from this binary file into an array of structures.
However, I don't know what value to pass to fread as its second argument. I know the file size is 536870912 bits. The binary file was constructed on the basis of being accessed for a 512^3 array. This means each data entry is of type float in the binary file with 4 bytes specified for each data element.
I made an error with the mention of bits. I read what was outputted by a C program finding the size of the file - it outputted 536870912 bits! Apologies to anyone confused.
Here is the code i'm using to read the data from the binary file into my arrary of structures (a simplified structure - there are 10 other parameters!)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
// Define the model structure
struct model {
float density;
};
// Entry point for the program
int main () {
int counter;
long lSize;
char * buffer;
size_t result;
FILE *pFile;
int i,j,k,ibox; /* Loop indices for the physical grid */
struct model ***mymodel;
pFile = fopen("core1_dens_0107.bin","rb");
if (pFile == NULL) { printf("Unable to open density file!"); exit(1); }
// obtain file size:
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);
printf( "File size : %lu Bits \n", lSize );
for ( j = 0 ; j < 512 ; j++ ) {
for ( k = 0; k < 512; k++ ) {
for ( i = 0; i < 512; i++ ) {
fread(&mymodel[i][j][k].density,4,1,pFile);
printf("%f \n",mymodel[i][j][k].density);
}
}
}
fclose(pFile);
return 0;
}
Supposing you have already opened the file and you have your file descriptor myStream, it should be as simple as this:
#define MY_DIM = 512; ///Maybe you want to play safe and make it a little bit larger? Up to you
float buffer[MY_DIM][MY_DIM][MY_DIM];
size_t readBytes;
int i,j,k;
for (k = 0; k < MY_DIM; k++)
for (j = 0; j < MY_DIM; j++) {
readBytes = fread((void*) (buffer[k][j]), sizeof float, MY_DIM, myStream); //I am not sure the (void*) conversion is necessary
if (readBytes < MY_DIM) //I unexpectedly reached the end of the file,
goto endOfTheLoop; //without reading all the data I needed for int
//You could also print a warning message
}
endOfTheLoop:
//Now close the input file, use fclose or something
//Now that you have read all the data, you have to put it in your array of struct:
for (k = 0; k < MY_DIM; k++)
for (j = 0; j < MY_DIM; j++)
for (i = 0; i < MY_DIM; i++)
mymodel[k][j][i].density = buffer[k][j][i];
You can pass whatever value of the 2nd argument is most convenient for your program. If you want to process the file one structure at a time, do:
nread = fread(&your_struct, 1, sizeof yourstruct, stream);
If you have an array of structures, e.g.
struct foo your_struct[STRUCT_COUNT];
you can do:
nread = fread(your_struct, STRUCT_COUNT, sizeof *your_struct, stream);
size_t fread(void *ptr, size_t size, size_t nmemb, FILE * stream );
will attempt to read nmemb blocks of size bytes each. It will guarantee that no partial blocks are read. If your blocks are 4 bit long then I suggest you read them byte by byte, otherwise use the size argument to specify the block size.
For instance
fread(buffer, 1, 1024, stdin);
will attempt to read 1024 bytes, but may stop at any point.
fread(buffer, 4, 256, stdin);
will attempt to also read 1024 bytes, but in blocks of 4 bytes. 256 blocks total. It will guarantee that no partial blocks are read.
fread(buffer, 1024, 1, stdin);
will attempt to read one block of 1024 bytes. If it can not - nothing will be read.
If you wish to read in the entire file then you can do it in blocks of 4 via:
size_t read, read_now;
while (read < filesize && (read_now= fread(buffer +read, 4, (filesize - read) >> 2, in)) != EOF)
read += read_now;
of you can attempt to read the whole thing in one go:
fread(buffer, filesize, 1, in);

Resources