Copy Function in C not creating matching Checksums - c

I written a simple copy program that copies a file and generates an MD5, It runs and generates the MD5 correctly.
However when verifying the file generated by the copy function it does not match the source MD5. I can't see any reason for this in my code, can anyone help?
#include <stdio.h>
#include <openssl/md5.h>
#include <assert.h>
#define BUFFER_SIZE 512
int secure_copy(char *filepath, char *destpath);
int main(int argc, char * argv[]) {
secure_copy(argv[1], argv[2]);
return 0;
}
int secure_copy(char *filepath, char *destpath) {
FILE *src, *dest;
src = fopen(filepath, "r");
assert(src != NULL);
dest = fopen(destpath, "w");
assert(dest != 0);
MD5_CTX c;
char buf[BUFFER_SIZE];
ssize_t bytes, out_writer;
unsigned char out[MD5_DIGEST_LENGTH];
MD5_Init(&c);
while((bytes = fread(buf, 1, BUFFER_SIZE, src)) != 0) {
MD5_Update(&c, buf, bytes);
out_writer = fwrite(buf, 1, BUFFER_SIZE, dest);
assert(out_writer != 0);
}
MD5_Final(out, &c);
printf("MD5: ");
for (int i=0; i < MD5_DIGEST_LENGTH; i++)
{
printf("%02x", out[i]);
}
printf("\n");
fclose(src);
fclose(dest);
return 0;
}
Output
$ ./md5speed doc.txt /home/doc.txt
MD5: 4c55e4b9185eece3cc000c4023f8f6fe
when verifying the copied file with md5sum I get a completely different hash.
md5sum doc.txt
29cb4da30c3e28fdb81463b5f0a76894 doc.txt
Though the file still opens and content is uncorrupted.

regarding:
while((bytes = fread(buf, 1, BUFFER_SIZE, src)) != 0)
and
out_writer = fwrite(buf, 1, BUFFER_SIZE, dest);
on the last read, the amount read can be less than BUFFER_SIZE so should always use bytes variable for the number of bytes to write.
Also, certain errors can occur when calling fread() and/or fwrite() Such errors are indicated by negative values (and/or values less than the 3rd parameter to those functions) in the returned variables (bytes, outwriter). The code, to be robust, must be checking those values and handling any errors that occur, including EOF

As stated in comments, changing the fwrite function to use bytes as opposed to BUFFER_SIZE combined with changing file operations mode "rb" and "wb" to binary.

Related

C in UNIX: Reading/combining files based upon number of bytes

I am trying to fix the code below to only read the first few N bytes. I would also like to do the same thing, but for the last number of N bytes (I assume that would involve just adding a '-' in front of the number of bytes N). I am not sure if using fget is the correct method for doing so.
I tried changing the 1000 in
while(fgets(buffer, 1000, fp)
however I do not think changing that value will pick up a certain number of bytes, as I have read that it is only a maximum value.
char buffer[1001];
int main(int argc, char** argv) {
bzero(buffer, sizeof(buffer));
for(int x=1; x<argc; x++) {
FILE *fp = fopen(argv[x], "r+");
if (fp) {
while(fgets(buffer, 1000, fp)) {
printf("%s", buffer);
}
} else {
printf("could not open file %s\n", argv[x]);
}
}
}
Assuming that you want the first 1000 bytes and the last 1000 bytes of a file, and largely ignoring problems with files smaller than 2000 bytes (it works, but you might want a different result), you could use:
#include <stdio.h>
enum { NUM_BYTES = 1000 };
int main(int argc, char **argv)
{
for (int x = 1; x < argc; x++)
{
FILE *fp = fopen(argv[x], "r");
if (fp)
{
char buffer[NUM_BYTES];
int nbytes = fread(buffer, 1, NUM_BYTES, fp);
fwrite(buffer, 1, nbytes, stdout);
if (fseek(fp, -NUM_BYTES, SEEK_END) == 0)
{
nbytes = fread(buffer, 1, NUM_BYTES, fp);
fwrite(buffer, 1, nbytes, stdout);
}
fclose(fp);
}
else
{
fprintf(stderr, "%s: could not open file %s\n", argv[0], argv[x]);
}
}
}
This uses fread(), fwrite() and fseek() as suggested in the comments.
It also takes care to close successfully opened files. It does not demand write permissions on the files since it only reads and does not write those files (using "r" instead of "r+" in the call to fopen()).
If the file is smaller than 1000 bytes, the fseek() will fail because it tries to seek to a negative offset. If that happens, don't bother to read or write another 1000 bytes.
I debated whether to use sizeof(buffer) or NUM_BYTES in the function calls. I decided that NUM_BYTES was better, but the choice is not definitive — there are cogent arguments for using sizeof(buffer) instead.
Note that buffer becomes a local variable. There's no need to zero it; only the entries that are written on by fread() will be written by fwrite(), so there is no problem resolved by bzero(). (There doubly wasn't any point in that when the variable was global; variables with static duration are default initialized to all bytes zero anyway.)
The error message is written to standard error.
The code doesn't check for zero bytes read; arguably, it should.
If the NUM_BYTES becomes a parameter (e.g. you call your program fl19 and use fl19 -n 200 file1 to print the first and last 200 bytes of file1), then you need to do some tidying up as well as command-line argument handling.

Read file block by block in C

I want to copy the contents of file1 to file2 exactly as they are (keeping spaces and newlines). I specifically want to copy these contents one small block of chars at a time(this is a small segment of a larger project so bear with me).
I have attempted the following:
#include <stdio.h>
#include <stdlib.h>
#define MAX 5
int main(int argc, char *argv[]) {
FILE *fin, *fout;
char buffer[MAX];
int length;
char c;
if((fin=fopen(argv[1], "r")) == NULL){
perror("fopen");
exit(EXIT_FAILURE);
}
if((fout=fopen(argv[2], "w")) == NULL){
perror("fopen");
exit(EXIT_FAILURE);
}
while(1){
length = 0;
while((c = fgetc(fin)) != EOF && length < MAX){
buffer[length++] = (char) c;
}
if(length == 0){
break;
}
fprintf(fout, "%s", buffer);
}
fclose(fout);
fclose(fin);
}
However, this causes incorrect output to my file2. Any input would be appreciated.
Your buffer is not zero-terminated. Use fwrite instead of fprintf:
fwrite(buffer, 1, length, fout);
And you should check the error too. So compare return code of fwrite to length and if it differs, either retry the write of remaining bytes (if positive) or print appropriate error message via perror("fwrite") (if return code is negative).
Additionally you may consider opening the files in binary mode which would cause difference on windows, i.e. pass "rb" and "wb" to fopen.
Last but not least, instead of looping and getting one character at a time, consider using fread instead:
length = fread(buffer, 1, MAX, fin);
Here is a simple example.(with no error checking)
You should use fwrite() since the string you would write to file is not a "null-terminated". And also note that "b" mode is specified with fopen(), which means you want to open the file as a binary file.
#include <stdio.h>
#include <stdlib.h>
#define MAX 5
#define FILE_BLOCK_SIZE 50
int _tmain(int argc, _TCHAR* argv[])
{
FILE *fin, *fout;
unsigned char *BufContent = NULL;
BufContent = (unsigned char*) malloc(FILE_BLOCK_SIZE);
size_t BufContentSz;
if((fin=fopen("E:\\aa.txt", "rb")) == NULL){
perror("fopen");
exit(EXIT_FAILURE);
}
if((fout=fopen("E:\\bb.txt", "wb")) == NULL){
perror("fopen");
exit(EXIT_FAILURE);
}
while ((BufContentSz = fread(BufContent, sizeof(unsigned char), FILE_BLOCK_SIZE, fin)) > 0)
{
fwrite(BufContent, sizeof(unsigned char), BufContentSz, fout);
}
fclose(fout);
fclose(fin);
delete BufContent;
return 0;
}
First off, change char buffer[MAX]; to int buffer[MAX];, and char c; to int c;, for a char can be either signed char or unsigned char, depending on your implementation. In the later case, c = EOF will give c a large positive number(It's unsigned ,anyway), so the loop will never end. A int will be large enough to hold all characters and EOF though.
Then, change your
fprintf(fout, "%s", buffer);
to
fwrite(buffer, 1, length, four);
This is because fprintf(fout, "%s", buffer); call for a C-style string, with ends with a '\0', but your buffer isn't zero-terminated. As a result, the program will keep copying the stuff in the stack, until a '\0' is met, leaving lots of garbage in file2.

fread is not reading other file formats

I am fairly new to C still, but the program below compiles just fine, (using gcc) and it even works when using text files, but I when I use other file formats, i.e. png, I get nothing. The console spits out ?PNG and nothing else. I don't want the image to print as an image, obviously the program does nothing like that, but I would like the data from the png file to be printed. Why is the program not fread-ing properly? Is is because fread refuses any file other than text?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
FILE *fp;
int main() {
char buffer[1000];
fp=fopen("FILE IN QUESTION HERE", "rb");
if(fp==NULL) {
perror("An error occured while opening the file...");
exit(1);
}
fread(buffer, 1000, 1, fp);
printf("%s\n", buffer);
fclose(fp);
return 0;
}
%s in printf() is for printing null-terminated string, not binary data and PNG header contains a signature to prevent the data from being transfered as text by mistake.
(Actually there are no 0x00 in the PNG signature and printf() stopped at the 0x00 contained in the size of IHDR chunk)
Use fwrite() to output binary data, or print the bytes one-by-one via putchar().
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
FILE* fp; /* avoid using gloval variables unless it is necessary */
char buffer[1000] = {0}; /* initialize to avoid undefined behavior */
fp=fopen("FILE IN QUESTION HERE", "rb");
if(fp==NULL) {
perror("An error occured while opening the file...");
exit(1);
}
fread(buffer, 1000, 1, fp);
fwrite(buffer, 1000, 1, stdout); /* use fwrite instead of printf */
fclose(fp);
return 0;
}
fread is not reading other file formats
Code does not check the result of fread(). That is the way to determine if fread() is working.
char buffer[1000];
// fread(buffer, 1000, 1, fp);
size_t sz = fread(buffer, 1000, 1, fp);
if (sz == 0) puts("Did not read an entire block");
fread() returns the number of blocks read. With OP's case, code is attempting to read one 1000 byte block. Recommend reading 1000 blocks, each of 1 char rather than 1 block of a 1000 char. Further, avoid magic numbers.
for (;;) {
size_t sz = fread(buffer, sizeof buffer[0], sizeof buffer, fp);
if (sz == 0) break;
// Somehow print the buffer.
print_it(buffer, sz);
}
OP call to printf() expects a pointer to a string. A C string is an array of characters up to and including the terminating null character. buffer may/may not contain a null character and useful data after a null character.
// Does not work for OP
// printf("%s\n", buffer);
The data of a .png file is mostly binary and will have little textual meaning. A sample print function of mixed binary data and text follows. Most output will appears meaningless until one learns the .png file format. Untested code.
int print_it(const unsigned char *x, size_t sz) {
char buf[5];
unsigned column = 0;
while (sz > 0) {
sz--;
if (isgraph(*x) && *x != `(`) {
sprintf(buf, "%c", *x);
} else {
sprintf(buf, "(%02X)", *x);
}
column += strlen(buf);
if (column > 80) {
column = 0;
fputc('\n', stdout);
}
fputs(buf, stdout);
}
if (column > 0) fputc('\n', stdout);
}

How can i select the last line of a text file using C

I am trying to find out a way to select the last line of a text file using C (not c++ or c#, just C) and I am having a difficult time finding a way to do this, if anyone could assist me with this problem I would be very grateful, thanks! (btw for a good example of what i am trying to do, this would be similar what to tail -n 1 would be doing in bash)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void main(int argc, char *argv[])
{
FILE *fd; // File pointer
char filename[] = "./Makefile"; // file to read
char buff[1024];
if ((fd = fopen(filename, "r")) != NULL) // open file
{
fseek(fd, 0, SEEK_SET); // make sure start from 0
while(!feof(fd))
{
memset(buff, 0x00, 1024); // clean buffer
fscanf(fd, "%[^\n]\n", buff); // read file *prefer using fscanf
}
printf("Last Line :: %s\n", buff);
}
}
I'm using Linux.
CMIIW
No direct way, but my preferred method is:
Go to the end of the file
Read last X bytes
If they contain '\n' - you got your line - read from that offset to the end of the file
Read X bytes before them
back to 3 until match found
If reached the beginning of the file - the whole file is the last line
E.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#ifndef max
#define max(a, b) ((a)>(b))? (a) : (b)
#endif
long GetFileSize(FILE *fp){
long fsize = 0;
fseek(fp,0,SEEK_END);
fsize = ftell(fp);
fseek(fp,0,SEEK_SET);//reset stream position!!
return fsize;
}
char *lastline(char *filepath){
FILE *fp;
char buff[4096+1];
int size,i;
long fsize;
if(NULL==(fp=fopen(filepath, "r"))){
perror("file cannot open at lastline");
return NULL;
}
fsize= -1L*GetFileSize(fp);
if(size=fseek(fp, max(fsize, -4096L), SEEK_END)){
perror("cannot seek");
exit(1);
}
size=fread(buff, sizeof(char), 4096, fp);
fclose(fp);
buff[size] = '\0';
i=size-1;
if(buff[i]=='\n'){
buff[i] = '\0';
}
while(i >=0 && buff[i] != '\n')
--i;
++i;
return strdup(&buff[i]);
}
int main(void){
char *last;
last = lastline("data.txt");
printf("\"%s\"\n", last);
free(last);
return 0;
}
If you are using *nix operating system, you can use the command 'last'. See 'last' man page for details.
If you want integrate the functionality inside another program, you can use 'system' call to execute 'last' and get it's result.
A simple and inefficient way to do it is to read each line into a buffer.
When the last read gives you EOF, you have the last line in the buffer.
Binyamin Sharet's suggestion is more efficient, but just a bit harder to implement.

How to calculate the MD5 hash of a large file in C?

I am writing in C using OpenSSL library.
How can I calculate hash of a large file using md5?
As I know, I need to load a whole file to RAM as char array and then call the hash function. But what if the file is about 4Gb long? Sounds like a bad idea.
SOLVED: Thanks to askovpen, I found my bug. I've used
while ((bytes = fread (data, 1, 1024, inFile)) != 0)
MD5_Update (&mdContext, data, 1024);
not
while ((bytes = fread (data, 1, 1024, inFile)) != 0)
MD5_Update (&mdContext, data, bytes);
example
gcc -g -Wall -o file file.c -lssl -lcrypto
#include <stdio.h>
#include <openssl/md5.h>
int main()
{
unsigned char c[MD5_DIGEST_LENGTH];
char *filename="file.c";
int i;
FILE *inFile = fopen (filename, "rb");
MD5_CTX mdContext;
int bytes;
unsigned char data[1024];
if (inFile == NULL) {
printf ("%s can't be opened.\n", filename);
return 0;
}
MD5_Init (&mdContext);
while ((bytes = fread (data, 1, 1024, inFile)) != 0)
MD5_Update (&mdContext, data, bytes);
MD5_Final (c,&mdContext);
for(i = 0; i < MD5_DIGEST_LENGTH; i++) printf("%02x", c[i]);
printf (" %s\n", filename);
fclose (inFile);
return 0;
}
result:
$ md5sum file.c
25a904b0e512ee546b3f47574703d9fc file.c
$ ./file
25a904b0e512ee546b3f47574703d9fc file.c
First, MD5 is a hashing algorithm. It doesn't encrypt anything.
Anyway, you can read the file in chunks of whatever size you like. Call MD5_Init once, then call MD5_Update with each chunk of data you read from the file. When you're done, call MD5_Final to get the result.
You don't have to load the entire file in memory at once. You can use the functions MD5_Init(), MD5_Update() and MD5_Final() to process it in chunks to produce the hash. If you are worried about making it an "atomic" operation, it may be necessary to lock the file to prevent someone else changing it during the operation.
The top answer is correct, but didn't mention something: The value of the hash will be different for each buffer size used. The value will be consistent across hashes, so the same buffer size will produce the same hash everytime, however if this hash will be compared against a hash of the same data at a later time, the same buffer size must be used for each call.
In addition, if you want to make sure your digest code functions correctly, and go online to compare your hash with the online hashing websites, it appears they use a buffer length of 1. This also brings an interesting thought: It is perfectly acceptable to use a buffer length of 1 to hash a large file, it will just take longer (duh).
So my rule of thumb is if it's only for internal use, then I can set the buffer length accordingly for a large file, but if it has to play nice with other systems, then set the buffer length to 1 and deal with the time consequence.
int hashTargetFile(FILE* fp, unsigned char** md_value, int *md_len) {
#define FILE_BUFFER_LENGTH 1
EVP_MD_CTX *mdctx;
const EVP_MD *md;
int diglen; //digest length
int arrlen = sizeof(char)*EVP_MAX_MD_SIZE + 1;
int arrlen2 = sizeof(char)*FILE_BUFFER_LENGTH + 1;
unsigned char *digest_value = (char*)malloc(arrlen);
char *data = (char*)malloc(arrlen2);
size_t bytes; //# of bytes read from file
mdctx = EVP_MD_CTX_new();
md = EVP_sha512();
if (!mdctx) {
fprintf(stderr, "Error while creating digest context.\n");
return 0;
}
if (!EVP_DigestInit_ex(mdctx, md, NULL)) {
fprintf(stderr, "Error while initializing digest context.\n");
return 0;
}
while (bytes = fread(data, 1, FILE_BUFFER_LENGTH, fp) != 0) {
if (!EVP_DigestUpdate(mdctx, data, bytes)) {
fprintf(stderr, "Error while digesting file.\n");
return 0;
}
}
if (!EVP_DigestFinal_ex(mdctx, digest_value, &diglen)) {
fprintf(stderr, "Error while finalizing digest.\n");
return 0;
}
*md_value = digest_value;
*md_len = diglen;
EVP_MD_CTX_free(mdctx);
return 1;
}

Resources