I'm a beginner to C and wanted to code a simple function that reads the content of file and returns it as a string, as an exercise.
Here is my solution which I think works, but is there any obvious bad practices or unoptimal code here ? For example, I manually added a \0 at the end of the string, but I don't know if it is really necessary...
#include <stdio.h>
#include <stdlib.h>
char *readFile(char *path)
{
//open file
FILE *file = fopen(path, "r");
//if broken
if (file == NULL)
{
printf("Erreur");
return NULL;
}
//return variable
char *result;
//length of the file
int len;
fseek(file, 0, SEEK_END);
len = ftell(file);
fseek(file, 0, SEEK_SET);
//initialising return variable
result = (char*) malloc(sizeof(char) * (len + 1));
int c;
int i = 0;
while (feof(file) == 0)
{
c = fgetc(file);
if (c != EOF)
{
printf("%04x -> %c\n", c, c);
*(result + i) = c;
i++;
}
}
*(result + i) = '\0';
printf("len : %i\n", len);
fclose(file);
return result;
}
I'd replace this:
int c;
int i = 0;
while (feof(file) == 0)
{
c = fgetc(file);
if (c != EOF)
{
printf("%04x -> %c\n", c, c);
*(result + i) = c;
i++;
}
}
with this:
fread(file, 1, len, result);
It's much shorter
It's correct
It's certainly faster
There is still room for improvement though, for example you could add error handling, fread can fail.
Since you have already got the length of the file to be read, you could also read them at once instead char-by-char.
Another implmentation of your function, for example:
char *readFile(char *path)
{
//open file
FILE *file = fopen(path, "r");
//if broken
if (file == NULL)
{
printf("Erreur");
return NULL;
}
//return variable
char *result;
//length of the file
int len;
fseek(file, 0, SEEK_END);
len = ftell(file);
fseek(file, 0, SEEK_SET);
//initialising return variable
result = (char*) malloc(sizeof(char) * (len + 1));
size_t i = fread(result, sizeof(char), len, file);
*(result + i) = '\0';
printf("len : %i\n", len);
fclose(file);
return result;
}
void test(char *buffer, int size)
{
int length = strlen(buffer);
for (int i = 0; i <= length; ++i)
{
if (buffer[i] == '"')
{
int _size = i + size;
if (_size > length)
continue;
if (buffer[i + size] == '"')
{
}
}
}
}
This is how I read the file.
FILE *file = NULL;
size_t filesize = 0;
uint8_t *filebuffer = 0;
file = fopen("tokens.txt", "r");
if (file)
{
fseek(file, 0, SEEK_END);
filesize = ftell(file);
fseek(file, 0, SEEK_SET);
filebuffer = calloc(filesize + 1, 1);
if (filebuffer)
{
fread(filebuffer, 1, filesize, file);
for (size_t i = 0; i < filesize; i++)
{
if (filebuffer[i] == 0)
filebuffer[i] = '.';
}
char array[filesize];
strncpy(array, filebuffer, filesize);
array[filesize] = '\0';
test(array, 59);
}
}
"array" is char array[filesize];, "filesize" is ftell(file); (the file is valid and not NULL) the content of the file is asd"12345678912345678912345678912345678911111231231231231231232"asdasdasdasdasdasdss
for some weird reason it reaches to the "continue;" when the statement is not true...
Edit: I tried printing the values in the block of the if statement and for some reason I receive ->
Size: 122
Length: 84
Someone have any idea on how to solve it?
array[filesize] = '\0'; // access outside of array boundaries
I am trying to copy binary files from src to dst. This script seems to copy all of the bytes. BUT when I open both files in Hex Workshop I see that dst file is always missing 3 bytes at the end of the file. These 3 bytes should have been 00 00 00, this problem prevents me from opening dst file.
void binaryCopy(char **argv) {
int *buf = 0;
int elements = 0;
int size = 0, wantOverwrite = 0;
FILE *src = fopen(argv[SRC_POS], "rb");
FILE *dst = fopen(argv[DST_POS], "w+b");
if (src) {
if (dst) {
wantOverwrite = overwrite();
}
if (wantOverwrite) {
fseek(src, 0L, SEEK_END);
size = ftell(src);
fseek(src, 0L, SEEK_SET);
buf = (int *)malloc(size);
elements = fread(buf, BYTE_SIZE, size / BYTE_SIZE, src);
fwrite(buf, BYTE_SIZE, elements, dst);
printf("copy completed");
free(buf);
}
}
fclose(dst);
fclose(src);
}
There are several problems in your function as written.
fopen(dstFilename, "w+b"); truncates the file, so your overwrite check later is meaningless.
You're not checking for NULL after malloc, and your buffer should be an unsigned char* since that is what fread/fwrite will interpret it as.
At the end, both fclose functions could be called with NULL file pointers likely resulting in a crash. You should move them into the scopes where you know each was successfully opened.
The big problem, the one that prompted this question, is that you are not handling cases where the size of the file is not an even multiple of whatever BYTE_SIZE is. Since you allocated enough memory for the whole file you should just read and write the whole file. fread(buf, 1, size, src); and fwrite(buf, 1, size, dst);. In general it is best to make the element size parameter of fread/fwrite 1 and the count the number of bytes you want to read or write. There's no math to go wrong, and you can tell exactly how many bytes were read/written.
Here's a version of your original function that I've corrected and annotated so it works if nothing goes wrong.
void originalBinaryCopy(const char *srcFilename, const char *dstFilename)
{
//odd size to ensure remainder
const size_t BYTE_SIZE = 777;
int *buf = 0;
int elements = 0;
int size = 0, wantOverwrite = 0;
FILE *src = fopen(srcFilename, "rb");
//This truncates dst, so the overwirte check is meaningless
FILE *dst = fopen(dstFilename, "w+b");
if (src)
{
if (dst)
{
fseek(src, 0L, SEEK_END);
size = ftell(src);
fseek(src, 0L, SEEK_SET);
//always check for NULL after malloc - This should be a char*
buf = (int *)malloc(size);
if (!buf)
{
fclose(dst);
fclose(src);
return;
}
elements = fread(buf, BYTE_SIZE, size / BYTE_SIZE, src);
fwrite(buf, BYTE_SIZE, elements, dst);
//added, copy remainder
elements = fread(buf, 1, size % BYTE_SIZE, src);
fwrite(buf, 1, size % BYTE_SIZE, dst);
//end
printf("copy completed %s -> %s\n", srcFilename, dstFilename);
free(buf);
}
}
//dst could be NULL here, move inside if(dst) scope above
fclose(dst);
//src could be NULL here, move inside if(src) scope above
fclose(src);
if (comp(srcFilename, dstFilename) != 0)
{
printf("compare failed - %s -> %s\n", srcFilename, dstFilename);
}
}
Notice how the remainder is handled at the end.
Here is how I would handle copying files along with a test suite to create, copy, and verify a set of files. It shows how to avoid truncating the destination if you don't want to and has quite a bit of error checking in the actual functions. I did not include any specific error checking on the caller side, but for real code I would have enumerated all of the possible errors and used those return values to pass to an error handling function that could print them out and possibly exit the program.
Manipulating files is one thing you want to be VERY careful about since there's potential for data loss if your code doesn't work, so before you use it with real files make sure it's 100% solid with test files.
#include <malloc.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#define TEST_FILE_MIN 1024
#define TEST_FILE_MAX 1024 * 1024
const char *src_pattern = "src_file_%08x.bin";
const char *dst_pattern = "dst_file_%08x.bin";
void createTestFiles(const char *pattern)
{
char filename[256] = { 0 };
char buffer[1024];
for (size_t i = 0; i < sizeof(buffer); ++i)
{
buffer[i] = rand();
}
for (size_t i = TEST_FILE_MIN; i <= TEST_FILE_MAX; i *= 2)
{
sprintf(filename, pattern, i);
FILE *dst = fopen(filename, "wb");
if (dst)
{
size_t reps = i / TEST_FILE_MIN;
for (size_t w = 0; w < reps; ++w)
{
fwrite(buffer, 1, sizeof(buffer), dst);
}
fclose(dst);
}
}
}
int comp(const char *srcFilename, const char *dstFilename)
{
FILE *src = fopen(srcFilename, "rb");
if (!src)
{
return -1;
}
//open for reading to check for existence
FILE *dst = fopen(dstFilename, "rb");
if (!dst)
{
fclose(src);
return -2;
}
fseek(src, 0, SEEK_END);
size_t srcSize = ftell(src);
fseek(src, 0, SEEK_SET);
fseek(dst, 0, SEEK_END);
size_t dstSize = ftell(dst);
fseek(dst, 0, SEEK_SET);
if (srcSize == 0 || dstSize == 0 || srcSize != dstSize)
{
fclose(src);
fclose(dst);
return -3;
}
unsigned char *srcBuf = (unsigned char *)calloc(1, srcSize);
unsigned char *dstBuf = (unsigned char *)calloc(1, srcSize);
if (!srcBuf || !dstBuf)
{
fclose(src);
fclose(dst);
return -4;
}
if (fread(srcBuf, 1, srcSize, src) != srcSize)
{
fclose(src);
fclose(dst);
return -5;
}
if (fread(dstBuf, 1, dstSize, dst) != dstSize)
{
fclose(src);
fclose(dst);
return -6;
}
fclose(src);
fclose(dst);
//result * 100 to make this error outside te range of the other general errors from this function.
int result = memcmp(srcBuf, dstBuf, srcSize) * 100;
free(srcBuf);
free(dstBuf);
return result;
}
void originalBinaryCopy(const char *srcFilename, const char *dstFilename)
{
//odd size to ensure remainder
const size_t BYTE_SIZE = 777;
int *buf = 0;
int elements = 0;
int size = 0, wantOverwrite = 0;
FILE *src = fopen(srcFilename, "rb");
//This truncates dst, so the overwirte check is meaningless
FILE *dst = fopen(dstFilename, "w+b");
if (src)
{
if (dst)
{
fseek(src, 0L, SEEK_END);
size = ftell(src);
fseek(src, 0L, SEEK_SET);
//always check for NULL after malloc - This should be a char*
buf = (int *)malloc(size);
if (!buf)
{
fclose(dst);
fclose(src);
return;
}
elements = fread(buf, BYTE_SIZE, size / BYTE_SIZE, src);
fwrite(buf, BYTE_SIZE, elements, dst);
//added, copy remainder
elements = fread(buf, 1, size % BYTE_SIZE, src);
fwrite(buf, 1, size % BYTE_SIZE, dst);
//end
printf("copy completed %s -> %s\n", srcFilename, dstFilename);
free(buf);
}
}
//dst could be NULL here, move inside if(dst) scope above
fclose(dst);
//src could be NULL here, move inside if(src) scope above
fclose(src);
if (comp(srcFilename, dstFilename) != 0)
{
printf("compare failed - %s -> %s\n", srcFilename, dstFilename);
}
}
int binaryCopy(const char *srcFilename, const char *dstFilename, bool overwrite)
{
//arbitrary odd size so we can make sure we handle a partial buffer.
//assuming the code tests successfully I'd use something like 64 * 1024.
unsigned char buffer[7777] = { 0 };
FILE *src = fopen(srcFilename, "rb");
if (!src)
{
//Error, source file could not be opened
return -1;
}
//open for reading to check for existence
FILE *dst = fopen(dstFilename, "rb");
if (dst)
{
if (!overwrite)
{
//Error, dest file exists and we can't overwrite it
fclose(src);
fclose(dst);
return -2;
}
//reopen dst it for writing
if (!freopen(dstFilename, "wb", dst))
{
fclose(src);
fclose(dst);
dst = NULL;
}
}
else
{
//it didn't exist, create it.
dst = fopen(dstFilename, "wb");
}
if (!dst)
{
//Error, dest file couldn't be opened
fclose(src);
return -3;
}
//Get the size of the source file for comparison with what we read and write.
fseek(src, 0, SEEK_END);
size_t srcSize = ftell(src);
fseek(src, 0, SEEK_SET);
size_t totalRead = 0;
size_t totalWritten = 0;
size_t bytesRead = 0;
while (bytesRead = fread(buffer, 1, sizeof(buffer), src))
{
totalRead += bytesRead;
totalWritten += fwrite(buffer, 1, bytesRead, dst);
}
fclose(dst);
fclose(src);
if (totalRead != srcSize)
{
//src read error
return -4;
}
if (totalWritten != srcSize)
{
//dst write error
return -5;
}
return 0;
}
int main()
{
srand((unsigned)time(0));
createTestFiles(src_pattern);
for (size_t i = TEST_FILE_MIN; i <= TEST_FILE_MAX; i *= 2)
{
char srcName[256];
char dstName[256];
sprintf(srcName, src_pattern, i);
sprintf(dstName, dst_pattern, i);
//use my copy to create dest file
if (binaryCopy(srcName, dstName, true) != 0)
{
printf("File: '%s' failed initial copy.", srcName);
}
originalBinaryCopy(srcName, dstName);
if (binaryCopy(srcName, dstName, true) != 0)
{
printf("File: '%s' failed overwrite copy.", srcName);
}
if (binaryCopy(srcName, dstName, false) == 0)
{
printf("File: '%s' succeeded when file exists and overwrite was not set.", srcName);
}
//If compare succeeds delete the files, otherwise leave them for external comparison and print an error.
if (comp(srcName, dstName) == 0)
{
if (remove(srcName) != 0)
{
perror("Could not remove src.");
}
if (remove(dstName) != 0)
{
perror("Could not remove dst.");
}
}
else
{
printf("File: '%s' did not compare equal to '%s'.", srcName, dstName);
}
}
return 0;
}
Hopefully this gives you something to experiment with to make sure your copier is as good as it can be. Also worth noting, I would not distinguish copying text/binary files. Files are files and if your goal is to copy them then you should always do it in binary mode so the copy is identical. On operating systems other than Windows it wouldn't matter, but on Windows there are a number of pitfalls you can run into in text mode. Best to avoid those completely if you can.
Good luck!
The most probable cause for your observation is the file size is not a multiple of BYTE_SIZE: fread(buf, BYTE_SIZE, size / BYTE_SIZE , src); reads a multiple of BYTE_SIZE and the fwrite call writes the bytes read.
If BYTE_SIZE is 4, as the type int* buf = 0; seems to indicate, and if the source file has 3 more bytes than a multiple of 4, your observations would be fully explained.
You can correct the problem by making buf an unsigned char * and changing the code to:
elements = fread(buf, 1, size , src);
fwrite(buf, 1, elements, dst);
Note also that there is no need to open the files in update mode (the + in the mode string), errors and not handled explicitly and the fclose() calls are misplaced.
Also it seems incorrect to truncate the destination file if overwrite() returns 0.
Here is a corrected version with better error handling:
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
int binaryCopy(char *argv[]) {
FILE *src, *dst;
long file_size;
size_t size, size_read, size_written;
int wantOverwrite;
unsigned char *buf;
if ((src = fopen(argv[SRC_POS], "rb")) == NULL) {
printf("cannot open input file %s: %s\n", argv[SRC_POS], strerror(errno));
return -1;
}
wantOverwrite = overwrite();
if (!wantOverwrite) {
fclose(src);
return 0;
}
if ((dst = fopen(argv[DST_POS], "wb")) == NULL) {
printf("cannot open output file %s: %s\n", argv[DST_POS], strerror(errno));
fclose(src);
return -1;
}
fseek(src, 0L, SEEK_END);
file_size = ftell(src);
fseek(src, 0L, SEEK_SET);
size = (size_t)file_size;
if ((long)size != file_size) {
printf("file size too large for a single block: %ld\n", file_size);
fclose(src);
fclose(dst);
return -1;
}
buf = malloc(size);
if (buf == NULL) {
printf("cannot allocate block of %zu bytes\n", size);
fclose(src);
fclose(dst);
return -1;
}
size_read = fread(buf, 1, size, src);
if (size_read != size) {
printf("read error: %zu bytes read out of %zu\n", size_read, size);
}
size_written = fwrite(buf, 1, size_read, dst);
if (size_written != size_read) {
printf("write error: %zu bytes written out of %zu\n", size_written, size_read);
}
if (size_written == size) {
printf("copy completed\n");
}
free(buf);
fclose(dst);
fclose(src);
return 0;
}
I program a program to split file in C in Ubuntu.
I have error when get buffer in readfile.
here is my code.
int split(char *filename, unsigned long part) {
FILE *fp;
char *buffer;
size_t result; // bytes read
off_t fileSize;
fp = fopen(filename, "rb");
if (fp == NULL) {
fprintf(stderr, "Cannot Open %s", filename);
exit(2);
}
// Get Size
fileSize = get_file_size(filename);
// Buffer
buffer = (char*) malloc(sizeof(char) * (fileSize + 1));
if (buffer == NULL) {
fputs("Memory error", stderr);
fclose(fp);
return 1;
}
// Copy file into buffer
//char buffers[11];
result = fread(buffer, 1, fileSize, fp);
buffer[fileSize] = '\0';
if (result != fileSize) {
fputs("Reading error", stderr);
return 1;
}
// Split file
off_t partSize = fileSize / part;
// Last Part
off_t lastPartSize = fileSize - partSize * part;
unsigned long i;
unsigned long j;
// create part 1 to n-1
for (j = 0; j < part; j++) {
char partName[255];
char *content;
char partNumber[3];
// Content of file part
// for (i = j; i < partSize * (j + 1); i++) {
//
// }
content = (char*) malloc(sizeof(char) * partSize);
content = copychar(buffer, j + i, partSize + i);
i += partSize;
//copy name
strcpy(partName, filename);
// part Number
sprintf(partNumber, "%d", j);
// file name with .part1 2 3 4 ....
strcat(partName, ".part");
strcat(partName, partNumber);
// Write to file
writeFile(partName, content);
free(content);
}
// last part
char *content;
content = (char*) malloc(sizeof(char) * (fileSize - partSize * (part - 1)));
content = copychar(buffer, (part - 1) * partSize + 1, fileSize);
char lastPartNumber[3];
char lastPartName[255];
sprintf(lastPartNumber, "%d", part);
strcpy(lastPartName, filename);
strcat(lastPartName, ".part");
strcat(lastPartName, lastPartNumber);
writeFile(lastPartName, content);
free(content);
free(buffer);
fclose(fp);
return 0;
}
here is function copychar from start to end
char *copychar(char* buffer, unsigned long start, unsigned long end) {
if (start >= end)
return NULL;
char *result;
result = (char*) malloc(sizeof(char) * (end - start) + 1);
unsigned long i;
for (i = start; i <= end; i++)
result[i] = buffer[i];
result[end] = '\0';
return result;
}
here is function to get filesize
off_t get_file_size(char *filename) {
struct stat st;
if (stat(filename, &st) == 0)
return st.st_size;
fprintf(stderr, "Cannot determine size of %s: %s\n", filename);
return -1;
}
here is function to write file
int writeFile(char* filename, char*buffer) {
if (buffer == NULL || filename == NULL)
return 1;
FILE *file;
file = fopen(filename, "wb");
fwrite(buffer, sizeof(char), sizeof(buffer) + 1, file);
fclose(file);
return 0;
}
When I test I use file test 29MB and it dumped.
I debug It return fileSize true but when readfile in buffer get from file it only return 135 characters and when use copychar it error.
Breakpoint 1, 0x0000000000400a0b in copychar (buffer=0x7ffff5e3a010 "!<arch>\ndebian-binary 1342169369 0 0 100644 4 `\n2.0\ncontrol.tar.gz 1342169369 0 0 100644 4557 `\n\037\213\b", start=4154703576, end=4164450461) at final.c:43
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400a0b in copychar (buffer=0x7ffff5e3a010 "!<arch>\ndebian-binary 1342169369 0 0 100644 4 `\n2.0\ncontrol.tar.gz 1342169369 0 0 100644 4557 `\n\037\213\b", start=4154703576, end=4164450461) at final.c:43
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
I don't know how to devide buffer into part to write into part when split.
Thank for advance!
It's highly impractical to copy files in 1 big block as you may have noticed. And it's not needed.
At the simplest level you could copy the file byte by byte, like this
while( ( ch = fgetc(source) ) != EOF ) {
fputc(ch, target);
}
Which will work, but it will be quite slow. Better to copy in blocks, like this:
unsigned char buf[4096];
size_t size;
while( (size = fread(buf, 1, sizeof(buf), fpRead) ) > 0) {
fwrite(buf, 1, size, fpWrite);
}
Notice that the resulting code is way simpler and contains no dynamic memory allocation.
You still need to add the splitting logic of course, but that can be done by tracking the number of bytes written and opening a new write-file before actually writing it.
EDIT: how to handle the multipart facet - schematically, you still need to implement extra checks for some special cases and test results of the different system calls of course
unsigned char buf[4096];
size_t size;
size_t partsize = 100000; // asssuming you want to write 100k parts.
size_t stilltobewritten = partsize; // bytes remaining to be written in current part
size_t chunksize = sizeof(buf); // first time around we read full buffersize
while( (size = fread(buf, 1, chunksize, fpRead) ) > 0) {
fwrite(buf, 1, size, fpWrite);
stilltobewritten -= size; // subtract bytes written from saldo
if (stilltobewritten == 0) {
// part is complete, close this part and open next
fclose(fpWrite);
fpWrite = fopen(nextpart,"wb");
// and reinit variables
stilltobewritten = partsize;
chunksize = sizeof(buf);
} else {
// prep next round on present file - just the special case of the last block
// to handle
chunksize = (stilltobewritten > sizeof(buf)) ? sizeof(buf) : stilltobewritten;
}
}
and EDIT 2: the file part name can be made a LOT simpler as well:
sprintf(partName, "%s.part%d",file, j);
concerning the original code, there's some confusion about start and end in the copychar. First, you probably meant sizeof(char) * (end - start + 1) rather than sizeof(char) * (end - start) + 1 in the malloc, second, you're copying end-start+1 symbols from the original buffer (for (i = start; i <= end; i++)) and then overwrite the last one with '\0', which probably isn't the intended behavior.