trying to read a file - c

basically trying to make an anti virus but all I get when trying to read the infected file into a buffer is EOF... it's a jpg and I have no idea how to fix this
about the file functions I'm allowed to use:
fread/fwrite
fgets
fputs
fclose
fopen
fgetc
fputc
fscanf
fprintf
int fullScan(FILE* sign, FILE* infected);
char* getFile(FILE* file);
int main(int argc, char** argv)
{
FILE* sign = fopen("KittenVirusSign", "rb");
FILE* infected = fopen("kitten_frog.jpg", "rb");
int j = 0;
if (infected == NULL)
{
printf("couldn't open the file (suspicious file)");
return -1;
}
if (sign == NULL)
{
printf("couldn't open the file (virus signature)");
return -1;
}
j = fullScan(sign, infected);
return 0;
}
int fullScan(FILE* sign, FILE* infected)
{
char* sign_c = NULL;
char* infec_c = NULL;
int infect_res = -1;
int sign_len = 0;
int infec_len = 0;
int i = 0;
int j = 0;
sign_c = getFile(sign);
infec_c = getFile(infected);
while (1)
{
if (*(infec_c + i) == *(sign_c + j))
{
infect_res = 1;
if (*(sign_c + j) == EOF)
{
break;
}
else if (*(infec_c + i) == EOF)
{
infect_res = -1;
break;
}
i++;
j++;
continue;
}
else if (*(infec_c + i) != *(sign_c + j))
{
if (*(infec_c + i) == EOF || *(sign_c + j) == EOF)
{
break;
}
i++;
j = 0;
infect_res = -1;
}
}
fclose(infected);
free(sign_c);
free(infec_c);
return infect_res;
}
char* getFile(FILE* file)
{
char* buffer;
long filelen;
int i;
fseek(file, 0, SEEK_END);
filelen = ftell(file);
fseek(file, 0, SEEK_SET);
buffer = (char *)malloc((filelen + 1)*sizeof(char));
for (i = 0; i < filelen; i++)
{
fread(buffer + i, sizeof(char), 1, file);
}
return buffer;
}

EOF is a special integer value returned by some input functions to indicate that the end of the file has been reached, but it is not part of the file data. Your fread() will therefore never store an EOF character into the input buffer you provided. However, if your C implementation features signed default chars, as many do, then there is a char value that is numerically equal to EOF (usually -1).
If either file happens to contain that byte, then your code will misinterpret it as designating the end of that file. If it happens to be the first byte in either file then the program will misinterpret the file as being empty.
Since you are analyzing binary files,
I recommend using buffers of unsigned char rather than default char.
All possible byte values can appear in the file data, so you cannot identify the end of the data by the value of any byte within.
Probably, getFile() should return a struct that contains both a pointer to the buffer and its size.

As other answer suggested, you should also send the file length and iterate over that, rather than waiting for a EOF.
Also, in your getFile() function, when you determine the length of the file you don't have to read byte by byte, you can just send the filelen to fread() like so
fread(buffer, sizeof(char), filelen, file);
fread now reads filelen elements of data each the size of a char (you can write 1 instead) from the stream file to buffer.

Related

pallindrome is not copied to next file but printed on output screen

I have a file named fp1 containing different names, some being palindromes, and have to read all names from fp1 and check if each name is a palindrome or not. If it's a palindrome the I need to print the name to screen and copy it to another file named fp.
Here's my program:
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
void main() {
FILE *fp, *fp1;
char m, y[100];
int k = 0, i = 0, t = 1, p = 0;
fp = fopen("C:\\Users\\HP\\Desktop\\New folder\\file 2.txt", "w");
fp1 = fopen("C:\\Users\\HP\\Desktop\\New folder\\file4.txt", "r");
if (fp == NULL) {
printf("error ");
exit(1);
}
if (fp1 == NULL) {
printf("error");
exit(1);
}
k = 0;
m = fgetc(fp1);
while (m != EOF) {
k = 0;
i = 0;
t = 1;
p = 0;
while (m != ' ') {
y[k] = m;
k = k + 1;
m = fgetc(fp1);
}
p = k - 1;
for (i = 0; i <= k - 1; i++) {
if (y[i] != y[p]) t = 0;
p = p - 1;
}
if (t == 1) {
fputs(y, fp);
printf("%s is a pallindrome\n", y);
}
m = fgetc(fp1);
}
fclose(fp);
fclose(fp1);
}
coping pallindrome from one file to next file
You are not null terminating your buffer before attempting to use the contents as a string. After placing the last valid character read by fgetc into the buffer, you must place a null terminating character (\0).
A character buffer without a null terminating byte is not a string. Passing such a buffer to fputs, or the printf specifier %s without a length bound, will invoke Undefined Behaviour.
fgetc returns an int, not a char. On systems where char is unsigned, you will not be able to reliably test against the negative value of EOF.
The inner while loop is not checking for EOF. When the file is exhausted, it will repeatedly assign EOF to the buffer, until the buffer overflows.
To that end, in general, the inner while loop does nothing to prevent a buffer overflow for longer inputs.
In a hosted environment, void main() is never the correct signature for main. Use int main(void) or int main(int argc, char **argv).
Note that fputs does not print a trailing newline. As is, you would fill the output file full of strings with no delineation.
The nested while loops are fairly clumsy, and I would suggest moving your palindrome logic to its own function.
Here is a refactored version of your program. This program discards the tails of overly long words ... but the buffer is reasonably large.
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#define BUFFER_SIZE 1024
FILE *open_file_or_die(const char *path, const char *mode)
{
FILE *file = fopen(path, mode);
if (!path) {
perror(path);
exit(EXIT_FAILURE);
}
return file;
}
int is_palindrome(const char *word, size_t len)
{
for (size_t i = 0; i < len / 2; i++)
if (word[i] != word[len - i - 1])
return 0;
return 1;
}
int main(void)
{
/*
FILE *input = open_file_or_die("C:\\Users\\HP\\Desktop\\New folder\\file4.txt", "r");
FILE *output = open_file_or_die("C:\\Users\\HP\\Desktop\\New folder\\file 2.txt", "w");
*/
FILE *input = stdin;
FILE *output = stdout;
char buffer[BUFFER_SIZE];
size_t length = 0;
int ch = 0;
while (EOF != ch) {
ch = fgetc(input);
if (isspace(ch) || EOF == ch) {
buffer[length] = '\0';
if (length && is_palindrome(buffer, length)) {
fputs(buffer, output);
fputc('\n', output);
printf("<%s> is a palindrome.\n", buffer);
}
length = 0;
} else if (length < BUFFER_SIZE - 1)
buffer[length++] = ch;
}
/*
fclose(input);
fclose(output);
*/
}

How to convert a text file from DOS format to UNIX format

I am trying to make a program in C, that reads a text file and replace \r\n with \n to the same file converting the line ending from DOS to UNIX. I use fgetc and treat the file as a binary file. Thanks in advance.
#include <stdio.h>
int main()
{
FILE *fptr = fopen("textfile.txt", "rb+");
if (fptr == NULL)
{
printf("erro ficheiro \n");
return 0;
}
while((ch = fgetc(fptr)) != EOF) {
if(ch == '\r') {
fprintf(fptr,"%c", '\n');
} else {
fprintf(fptr,"%c", ch);
}
}
fclose(fptr);
}
If we assume the file uses a single byte character set, we just need to ignore all the '\r' characters when converting a text file form DOS to UNIX.
We also assume that the size of the file is less than the highest unsigned integer.
The reason we do these assumptions, is to keep the example short.
Be aware that the example below overwrites the original file, as you asked. Normally you shouldn't do this, as you can lose the contents of the original file, if an error occurs.
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
// Return a negative number on failure and 0 on success.
int main()
{
const char* filename = "textfile.txt";
// Get the file size. We assume the filesize is not bigger than UINT_MAX.
struct stat info;
if (stat(filename, &info) != 0)
return -1;
size_t filesize = (size_t)info.st_size;
// Allocate memory for reading the file
char* content = (char*)malloc(filesize);
if (content == NULL)
return -2;
// Open the file for reading
FILE* fptr = fopen(filename, "rb");
if (fptr == NULL)
return -3;
// Read the file and close it - we assume the filesize is not bigger than UINT_MAX.
size_t count = fread(content, filesize, 1, fptr);
fclose(fptr);
if (count != 1)
return -4;
// Remove all '\r' characters
size_t newsize = 0;
for (long i = 0; i < filesize; ++i) {
char ch = content[i];
if (ch != '\r') {
content[newsize] = ch;
++newsize;
}
}
// Test if we found any
if (newsize != filesize) {
// Open the file for writing and truncate it.
FILE* fptr = fopen(filename, "wb");
if (fptr == NULL)
return -5;
// Write the new output to the file. Note that if an error occurs,
// then we will lose the original contents of the file.
if (newsize > 0)
count = fwrite(content, newsize, 1, fptr);
fclose(fptr);
if (newsize > 0 && count != 1)
return -6;
}
// For a console application, we don't need to free the memory allocated
// with malloc(), but normally we should free it.
// Success
return 0;
} // main()
To only remove '\r' followed by '\n' replace the loop with this loop:
// Remove all '\r' characters followed by a '\n' character
size_t newsize = 0;
for (long i = 0; i < filesize; ++i) {
char ch = content[i];
char ch2 = (i < filesize - 1) ? content[i + 1] : 0;
if (ch == '\r' && ch2 == '\n') {
ch = '\n';
++i;
}
content[newsize++] = ch;
}

How to convert a text file from UNIX format to DOS format

I am trying to make a program in C, that reads a text file and replace \n with \r\n to the same file converting the line ending from UNIX to DOS. I use another code in stackoverflow that convert DOS to UNIX and treat the file as a binary file. my problem is converting to UNIX to DOS. Thanks in advance.
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
// Return a negative number on failure and 0 on success.
int main()
{
const char* filename = "textfile.txt";
// Get the file size. We assume the filesize is not bigger than UINT_MAX.
struct stat info;
if (stat(filename, &info) != 0)
return -1;
size_t filesize = (size_t)info.st_size;
// Allocate memory for reading the file
char* content = (char*)malloc(filesize);
if (content == NULL)
return -2;
// Open the file for reading
FILE* fptr = fopen(filename, "rb");
if (fptr == NULL)
return -3;
// Read the file and close it - we assume the filesize is not bigger than UINT_MAX.
size_t count = fread(content, filesize, 1, fptr);
fclose(fptr);
if (count != 1)
return -4;
// Remove all '\r' characters followed by a '\n' character
size_t newsize = 0;
for (long i = 0; i < filesize; ++i) {
char ch = content[i];
char ch2 = (i < filesize - 1) ? content[i + 1] : 0;
if (ch == '\r' && ch2 == '\n') {
ch = '\n';
++i;
}
content[newsize++] = ch;
}
// Test if we found any
if (newsize != filesize) {
// Open the file for writing and truncate it.
FILE* fptr = fopen(filename, "wb");
if (fptr == NULL)
return -5;
// Write the new output to the file. Note that if an error occurs,
// then we will lose the original contents of the file.
if (newsize > 0)
count = fwrite(content, newsize, 1, fptr);
fclose(fptr);
if (newsize > 0 && count != 1)
return -6;
}
// For a console application, we don't need to free the memory allocated
// with malloc(), but normally we should free it.
// Success
return 0;
} // main()

Is there any way to shift content of a file without storing it in array in c?

I am trying to replace words from a file, This works fine with words of the same length.
I know it can be done by storing content in a temporary array and then shifting but I was wondering if it can be done without using array.
#include<stdio.h>
#include<string.h>
int main(int argc, char **argv)
{
char s1[20], s2[20];
FILE *fp = fopen(argv[1], "r+");
strcpy(s1, argv[2]);
strcpy(s2, argv[3]);
int l, i;
while(fscanf(fp, "%s", s1)!=EOF){
if(strcmp(s1, argv[2]) == 0){
l = strlen(s2);
fseek(fp, -l, SEEK_CUR);
i=0;
while(l>0){
fputc(argv[3][i], fp);
i++;
l--;
}
}
}
}
Here is my code for replacing same length words, what can I modify here for different lengths?
Assuming that the OP's goal is to avoid storing the whole content of the file into a byte array (maybe not enough memory) and he also said that it needs to "shift" the file's content, so it cannot use a temp file to make the text replacement (perhaps not enough room in the storage device).
Note that copying into a temp file would be the easiest method.
So as I can see the solution has two algorithms:
Shift to left: Replace a text with another of equal or smaller length.
Shift to right: Replace a text with a longer one.
Shift to left:
Maintain 2 file position pointers: one for the read position (rdPos) and another for the write position (wrPos).
Both start in zero.
read char from rdPos until find the oldText and write it into the wrPos (but only if rdPos != wrPos to avoid unnecessary write operations).
write the newText into wrPos.
repeat from step 3 until EOF.
if len(oldText) > len(newText) then truncate the file
Shift to right:
Maintain 2 file position pointers: (rdPos and wrPos).
scan the whole file to find the number of the oldText occurrences.
store their file positions into a small array (not strictly needed, but useful to avoid a second reverse scan of the oldText)
set rdPos = EOF-1 (the last char in the file)
set wrPos = EOF+foundCount*(len(newText)-len(oldText)): reserving enough extra space for the shifting.
read char from rdPos until find the position in the "found" array and write the char into the wrPos.
write the newText into wrPos.
repeat from step 6 until BOF.
I wrote the following implementation as an example of the mentioned algorithms, but without caring too much about validations and edge cases.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_ITEMS 100
#define DO_WRITE 0x01
#define DO_FIND 0x02
FILE *fp;
long rdPos = 0L, wrPos = 0L, rdCount=0L, wrCount=0L;
int newLen, oldLen;
char *newText, *oldText;
struct found_t { int len; long pos[MAX_ITEMS];} found;
/* helper functions */
void writeChars(char *buffer, int len){
if(wrPos < rdPos){
long p = ftell(fp);
fseek(fp, wrPos, SEEK_SET);
fwrite(buffer, len, 1, fp);
fseek(fp, p, SEEK_SET);
wrCount += len;
}
wrPos += len;
}
int nextReadChar = -1;
int readChar(){
int c;
if(nextReadChar == EOF) {
if((c = fgetc(fp)) != EOF)
rdCount++;
} else {
c = nextReadChar;
nextReadChar = EOF;
}
return c;
}
int findFirstChar(int action){
int c; char ch;
for(; (c = readChar()) != EOF && c != (int)oldText[0]; rdPos++)
if(action == DO_WRITE) {
ch = (char)c;
writeChars(&ch, 1);
}
return c;
}
int testOldText(int c, int action){
char *cmp;
for(cmp = oldText; *cmp != '\0' && c == (int)*cmp; cmp++)
c = readChar();
nextReadChar = c;
if(*cmp == '\0') { /* found oldText */
if(action == DO_FIND)
found.pos[found.len++] = rdPos;
rdPos += oldLen;
if(action == DO_WRITE){
writeChars(newText, newLen);
found.len++;
}
}
else { /* some chars were equal */
if(action == DO_WRITE)
writeChars(oldText, cmp-oldText);
rdPos += cmp-oldText;
}
return c;
}
void writeReverseBlock(long firstCharPos){
for(;rdPos >= firstCharPos+oldLen; rdPos--, wrPos--, rdCount++, wrCount++){
int c;
fseek(fp, rdPos, SEEK_SET); c = fgetc(fp);
fseek(fp, wrPos, SEEK_SET); fputc(c, fp);
}
rdPos = firstCharPos-1;
wrPos -= newLen-1;
fseek(fp, wrPos--, SEEK_SET);
fwrite(newText, newLen, 1, fp);
wrCount += newLen;
}
void scanFile(int action){
int c;
do {
if( (c = findFirstChar(DO_WRITE)) == EOF ) break;
}while(testOldText(c, DO_WRITE) != EOF);
}
/** Main Algorithms */
void shiftToLeft(){
scanFile(DO_WRITE);
fflush(fp);
ftruncate(fileno(fp), wrPos);
}
void shiftToRight(){
int i;
scanFile(DO_FIND);
wrPos = --rdPos + found.len * (newLen-oldLen); /* reserve space after EOF */
for(i=found.len-1; i>=0; i--)
writeReverseBlock(found.pos[i]);
}
/* MAIN program */
int main(int argc, char **argv){
if(argc != 4){
fprintf(stderr, "Usage: %s file.ext oldText newText\n", argv[0]);
return 1;
}
if(!(fp = fopen(argv[1], "r+b"))) {
fprintf(stderr, "Cannot open file '%s'\n", argv[1]);
return 2;
}
oldLen = strlen(oldText = strdup(argv[2]));
newLen = strlen(newText = strdup(argv[3]));
found.len = 0;
/* which algorithm? */
if(newLen <= oldLen) shiftToLeft();
else shiftToRight();
fclose(fp);
printf("%7d occurrences\n"
"%7ld bytes read\n"
"%7ld bytes written\n", found.len, rdCount, wrCount);
return 0;
}

fgetc misses bytes while reading a file

I use the function fgetc to read each byte of a file, and then write it with printf.
I just noticed that sometimes, fgetc just miss some bytes, when I compare my result with a hex editor.
For example, the first mistake starts around the 118th byte, and a lot of other mistakes randomly ...
Somebody ever experienced this?
This is the code (Windows)
char main(int argc, char* argv[]) {
FILE* fdIn;
FILE* fdOut;
long size = 0;
long i = 0;
char c = 0;
if (argc == 3) {
if ((fdIn = fopen(argv[1], "rt")) == NULL) {
printf("FAIL\n");
return 0;
}
if ((fdOut = fopen(argv[2], "w+")) == NULL) {
printf("FAIL\n");
return 0;
}
fseek(fdIn, 0L, SEEK_END);
size = ftell(fdIn);
fseek(fdIn, 0L, 0);
fprintf(fdOut, "unsigned char shellcode[%ld] = {", size);
while (i < size) {
c = fgetc(fdIn);
if (!(i % 16))
fprintf(fdOut, "\n\t");
fprintf(fdOut, "0x%02X", (unsigned char)c);
if (i != size - 1)
fprintf(fdOut, ", ");
i++;
}
fprintf(fdOut, "\n};\n");
fclose(fdIn);
fclose(fdOut);
printf("SUCCESS");
system("PAUSE");
}
return 0;
}
Open the file in binary mode.
// if ((fdIn = fopen((char*)argv[1], "rt")) == NULL) {
// >.<
if ((fdIn = fopen((char*)argv[1], "rb")) == NULL) {
In text mode, and likely a Windows based machine given the "rt", a '\r', '\n' pair is certainly translated into '\n'. IAC, no translations are needed for OP's goal of a hex dump.
2nd issue: fgetc() returns an int in the range of unsigned char or EOF. Use type int to distinguish EOF from all data input.
// char c = 0;
int c = 0;
...
c = fgetc(fdIn);
// also add
if (c == EOF) break;

Resources