finding reoccuring string in I/O stream -C? - c

I'm quite new to C. I'm trying to write a code that finds a string in a I/O stream, and I don't understand what I'm doing wrong. I know the error is probably in the large while loop (in the code below).
I want the function to return the location in bytes from the beginning of the stream and -1 if it fails for some reason. It just keeps returning -1 for any file I try it on.
long find_string(const char *str, const char *filename, long offset)
{
FILE *f = fopen(filename, "r");
if (!f){
return -1;
}
int s=0,c;
c = fgetc(f);
if(c == EOF){
return -1;
}
char *check = malloc(sizeof(char));
fseek(f, 0L, SEEK_END); // Sees and stores how long the file is
long sz = ftell(f);
fseek(f, 0L, SEEK_SET);
if(fseek(f, offset,SEEK_SET) != 0){ // finds the position of offset
return -1;
}
while(fgetc(f) != EOF){
c = fgetc(f);
if(c == str[0] && ftell(f) < sz){
check[0] = c;
offset = ftell(f);
}
s++;
for (unsigned int r=1; r < (strlen(str));r++){
c = fgetc(f);
if(c == str[s]){
check = realloc(check, sizeof(char)*s);
check[s] = c;
s++;
}
}
if(strcmp(check, str)==0){
free(check);
fclose(f);
break;
}
else{
check = realloc(check, sizeof(char));
offset = -1;
}
}
return offset;}
Any help is greatly appreciated

This would be much easier if you simply memory-mapped the entire file and ran a standard string searching algorithm on it.
For memory mapping, see: Linux - Memory Mapped File
For string searching code, see: strstr() for a string that is NOT null-terminated

Please check the lines with comment updated
long find_string(const char *str, const char *filename, long offset)
{
FILE *f = fopen(filename, "r");
if (!f){
return -1;
}
int s=0,c;
c = fgetc(f);
if(c == EOF){
return -1;
}
char *check = malloc(sizeof(char));
fseek(f, 0L, SEEK_END); // Sees and stores how long the file is
long sz = ftell(f);
fseek(f, 0L, SEEK_SET);
if(fseek(f, offset,SEEK_SET) != 0){ // finds the position of offset
return -1;
}
c = fgetc(f); // Updated
while(c != EOF){ // Updated
if(c == str[0] && ftell(f) < sz){
check[0] = c;
offset = ftell(f);
}
s++;
for (unsigned int r=1; r < (strlen(str));r++){
c = fgetc(f);
if(c == str[s]){
check = realloc(check, sizeof(char)*s);
check[s] = c;
s++;
}
}
if(strcmp(check, str)==0){
free(check);
fclose(f);
break;
}
else{
check = realloc(check, sizeof(char));
offset = -1;
}
c = fgetc(f); //Updated
}
return offset;}
since you are using fgetc at the condition and start of the look, you actually comparing the second char of file with first char of str. update and check.

Related

Am I doing something wrong in my file-reader function?

I'm a beginner to C and wanted to code a simple function that reads the content of file and returns it as a string, as an exercise.
Here is my solution which I think works, but is there any obvious bad practices or unoptimal code here ? For example, I manually added a \0 at the end of the string, but I don't know if it is really necessary...
#include <stdio.h>
#include <stdlib.h>
char *readFile(char *path)
{
//open file
FILE *file = fopen(path, "r");
//if broken
if (file == NULL)
{
printf("Erreur");
return NULL;
}
//return variable
char *result;
//length of the file
int len;
fseek(file, 0, SEEK_END);
len = ftell(file);
fseek(file, 0, SEEK_SET);
//initialising return variable
result = (char*) malloc(sizeof(char) * (len + 1));
int c;
int i = 0;
while (feof(file) == 0)
{
c = fgetc(file);
if (c != EOF)
{
printf("%04x -> %c\n", c, c);
*(result + i) = c;
i++;
}
}
*(result + i) = '\0';
printf("len : %i\n", len);
fclose(file);
return result;
}
I'd replace this:
int c;
int i = 0;
while (feof(file) == 0)
{
c = fgetc(file);
if (c != EOF)
{
printf("%04x -> %c\n", c, c);
*(result + i) = c;
i++;
}
}
with this:
fread(file, 1, len, result);
It's much shorter
It's correct
It's certainly faster
There is still room for improvement though, for example you could add error handling, fread can fail.
Since you have already got the length of the file to be read, you could also read them at once instead char-by-char.
Another implmentation of your function, for example:
char *readFile(char *path)
{
//open file
FILE *file = fopen(path, "r");
//if broken
if (file == NULL)
{
printf("Erreur");
return NULL;
}
//return variable
char *result;
//length of the file
int len;
fseek(file, 0, SEEK_END);
len = ftell(file);
fseek(file, 0, SEEK_SET);
//initialising return variable
result = (char*) malloc(sizeof(char) * (len + 1));
size_t i = fread(result, sizeof(char), len, file);
*(result + i) = '\0';
printf("len : %i\n", len);
fclose(file);
return result;
}

How to convert a text file from DOS format to UNIX format

I am trying to make a program in C, that reads a text file and replace \r\n with \n to the same file converting the line ending from DOS to UNIX. I use fgetc and treat the file as a binary file. Thanks in advance.
#include <stdio.h>
int main()
{
FILE *fptr = fopen("textfile.txt", "rb+");
if (fptr == NULL)
{
printf("erro ficheiro \n");
return 0;
}
while((ch = fgetc(fptr)) != EOF) {
if(ch == '\r') {
fprintf(fptr,"%c", '\n');
} else {
fprintf(fptr,"%c", ch);
}
}
fclose(fptr);
}
If we assume the file uses a single byte character set, we just need to ignore all the '\r' characters when converting a text file form DOS to UNIX.
We also assume that the size of the file is less than the highest unsigned integer.
The reason we do these assumptions, is to keep the example short.
Be aware that the example below overwrites the original file, as you asked. Normally you shouldn't do this, as you can lose the contents of the original file, if an error occurs.
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
// Return a negative number on failure and 0 on success.
int main()
{
const char* filename = "textfile.txt";
// Get the file size. We assume the filesize is not bigger than UINT_MAX.
struct stat info;
if (stat(filename, &info) != 0)
return -1;
size_t filesize = (size_t)info.st_size;
// Allocate memory for reading the file
char* content = (char*)malloc(filesize);
if (content == NULL)
return -2;
// Open the file for reading
FILE* fptr = fopen(filename, "rb");
if (fptr == NULL)
return -3;
// Read the file and close it - we assume the filesize is not bigger than UINT_MAX.
size_t count = fread(content, filesize, 1, fptr);
fclose(fptr);
if (count != 1)
return -4;
// Remove all '\r' characters
size_t newsize = 0;
for (long i = 0; i < filesize; ++i) {
char ch = content[i];
if (ch != '\r') {
content[newsize] = ch;
++newsize;
}
}
// Test if we found any
if (newsize != filesize) {
// Open the file for writing and truncate it.
FILE* fptr = fopen(filename, "wb");
if (fptr == NULL)
return -5;
// Write the new output to the file. Note that if an error occurs,
// then we will lose the original contents of the file.
if (newsize > 0)
count = fwrite(content, newsize, 1, fptr);
fclose(fptr);
if (newsize > 0 && count != 1)
return -6;
}
// For a console application, we don't need to free the memory allocated
// with malloc(), but normally we should free it.
// Success
return 0;
} // main()
To only remove '\r' followed by '\n' replace the loop with this loop:
// Remove all '\r' characters followed by a '\n' character
size_t newsize = 0;
for (long i = 0; i < filesize; ++i) {
char ch = content[i];
char ch2 = (i < filesize - 1) ? content[i + 1] : 0;
if (ch == '\r' && ch2 == '\n') {
ch = '\n';
++i;
}
content[newsize++] = ch;
}

Is there any way to shift content of a file without storing it in array in c?

I am trying to replace words from a file, This works fine with words of the same length.
I know it can be done by storing content in a temporary array and then shifting but I was wondering if it can be done without using array.
#include<stdio.h>
#include<string.h>
int main(int argc, char **argv)
{
char s1[20], s2[20];
FILE *fp = fopen(argv[1], "r+");
strcpy(s1, argv[2]);
strcpy(s2, argv[3]);
int l, i;
while(fscanf(fp, "%s", s1)!=EOF){
if(strcmp(s1, argv[2]) == 0){
l = strlen(s2);
fseek(fp, -l, SEEK_CUR);
i=0;
while(l>0){
fputc(argv[3][i], fp);
i++;
l--;
}
}
}
}
Here is my code for replacing same length words, what can I modify here for different lengths?
Assuming that the OP's goal is to avoid storing the whole content of the file into a byte array (maybe not enough memory) and he also said that it needs to "shift" the file's content, so it cannot use a temp file to make the text replacement (perhaps not enough room in the storage device).
Note that copying into a temp file would be the easiest method.
So as I can see the solution has two algorithms:
Shift to left: Replace a text with another of equal or smaller length.
Shift to right: Replace a text with a longer one.
Shift to left:
Maintain 2 file position pointers: one for the read position (rdPos) and another for the write position (wrPos).
Both start in zero.
read char from rdPos until find the oldText and write it into the wrPos (but only if rdPos != wrPos to avoid unnecessary write operations).
write the newText into wrPos.
repeat from step 3 until EOF.
if len(oldText) > len(newText) then truncate the file
Shift to right:
Maintain 2 file position pointers: (rdPos and wrPos).
scan the whole file to find the number of the oldText occurrences.
store their file positions into a small array (not strictly needed, but useful to avoid a second reverse scan of the oldText)
set rdPos = EOF-1 (the last char in the file)
set wrPos = EOF+foundCount*(len(newText)-len(oldText)): reserving enough extra space for the shifting.
read char from rdPos until find the position in the "found" array and write the char into the wrPos.
write the newText into wrPos.
repeat from step 6 until BOF.
I wrote the following implementation as an example of the mentioned algorithms, but without caring too much about validations and edge cases.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#define MAX_ITEMS 100
#define DO_WRITE 0x01
#define DO_FIND 0x02
FILE *fp;
long rdPos = 0L, wrPos = 0L, rdCount=0L, wrCount=0L;
int newLen, oldLen;
char *newText, *oldText;
struct found_t { int len; long pos[MAX_ITEMS];} found;
/* helper functions */
void writeChars(char *buffer, int len){
if(wrPos < rdPos){
long p = ftell(fp);
fseek(fp, wrPos, SEEK_SET);
fwrite(buffer, len, 1, fp);
fseek(fp, p, SEEK_SET);
wrCount += len;
}
wrPos += len;
}
int nextReadChar = -1;
int readChar(){
int c;
if(nextReadChar == EOF) {
if((c = fgetc(fp)) != EOF)
rdCount++;
} else {
c = nextReadChar;
nextReadChar = EOF;
}
return c;
}
int findFirstChar(int action){
int c; char ch;
for(; (c = readChar()) != EOF && c != (int)oldText[0]; rdPos++)
if(action == DO_WRITE) {
ch = (char)c;
writeChars(&ch, 1);
}
return c;
}
int testOldText(int c, int action){
char *cmp;
for(cmp = oldText; *cmp != '\0' && c == (int)*cmp; cmp++)
c = readChar();
nextReadChar = c;
if(*cmp == '\0') { /* found oldText */
if(action == DO_FIND)
found.pos[found.len++] = rdPos;
rdPos += oldLen;
if(action == DO_WRITE){
writeChars(newText, newLen);
found.len++;
}
}
else { /* some chars were equal */
if(action == DO_WRITE)
writeChars(oldText, cmp-oldText);
rdPos += cmp-oldText;
}
return c;
}
void writeReverseBlock(long firstCharPos){
for(;rdPos >= firstCharPos+oldLen; rdPos--, wrPos--, rdCount++, wrCount++){
int c;
fseek(fp, rdPos, SEEK_SET); c = fgetc(fp);
fseek(fp, wrPos, SEEK_SET); fputc(c, fp);
}
rdPos = firstCharPos-1;
wrPos -= newLen-1;
fseek(fp, wrPos--, SEEK_SET);
fwrite(newText, newLen, 1, fp);
wrCount += newLen;
}
void scanFile(int action){
int c;
do {
if( (c = findFirstChar(DO_WRITE)) == EOF ) break;
}while(testOldText(c, DO_WRITE) != EOF);
}
/** Main Algorithms */
void shiftToLeft(){
scanFile(DO_WRITE);
fflush(fp);
ftruncate(fileno(fp), wrPos);
}
void shiftToRight(){
int i;
scanFile(DO_FIND);
wrPos = --rdPos + found.len * (newLen-oldLen); /* reserve space after EOF */
for(i=found.len-1; i>=0; i--)
writeReverseBlock(found.pos[i]);
}
/* MAIN program */
int main(int argc, char **argv){
if(argc != 4){
fprintf(stderr, "Usage: %s file.ext oldText newText\n", argv[0]);
return 1;
}
if(!(fp = fopen(argv[1], "r+b"))) {
fprintf(stderr, "Cannot open file '%s'\n", argv[1]);
return 2;
}
oldLen = strlen(oldText = strdup(argv[2]));
newLen = strlen(newText = strdup(argv[3]));
found.len = 0;
/* which algorithm? */
if(newLen <= oldLen) shiftToLeft();
else shiftToRight();
fclose(fp);
printf("%7d occurrences\n"
"%7ld bytes read\n"
"%7ld bytes written\n", found.len, rdCount, wrCount);
return 0;
}

fgetc misses bytes while reading a file

I use the function fgetc to read each byte of a file, and then write it with printf.
I just noticed that sometimes, fgetc just miss some bytes, when I compare my result with a hex editor.
For example, the first mistake starts around the 118th byte, and a lot of other mistakes randomly ...
Somebody ever experienced this?
This is the code (Windows)
char main(int argc, char* argv[]) {
FILE* fdIn;
FILE* fdOut;
long size = 0;
long i = 0;
char c = 0;
if (argc == 3) {
if ((fdIn = fopen(argv[1], "rt")) == NULL) {
printf("FAIL\n");
return 0;
}
if ((fdOut = fopen(argv[2], "w+")) == NULL) {
printf("FAIL\n");
return 0;
}
fseek(fdIn, 0L, SEEK_END);
size = ftell(fdIn);
fseek(fdIn, 0L, 0);
fprintf(fdOut, "unsigned char shellcode[%ld] = {", size);
while (i < size) {
c = fgetc(fdIn);
if (!(i % 16))
fprintf(fdOut, "\n\t");
fprintf(fdOut, "0x%02X", (unsigned char)c);
if (i != size - 1)
fprintf(fdOut, ", ");
i++;
}
fprintf(fdOut, "\n};\n");
fclose(fdIn);
fclose(fdOut);
printf("SUCCESS");
system("PAUSE");
}
return 0;
}
Open the file in binary mode.
// if ((fdIn = fopen((char*)argv[1], "rt")) == NULL) {
// >.<
if ((fdIn = fopen((char*)argv[1], "rb")) == NULL) {
In text mode, and likely a Windows based machine given the "rt", a '\r', '\n' pair is certainly translated into '\n'. IAC, no translations are needed for OP's goal of a hex dump.
2nd issue: fgetc() returns an int in the range of unsigned char or EOF. Use type int to distinguish EOF from all data input.
// char c = 0;
int c = 0;
...
c = fgetc(fdIn);
// also add
if (c == EOF) break;

Unknown symbols when I read file

I read file, but in the end of file i get unknown symbols:
int main()
{
char *buffer, ch;
int i = 0, size;
FILE *fp = fopen("file.txt", "r");
if(!fp){
printf("File not found!\n");
exit(1);
}
fseek(fp, 0, SEEK_END);
size = ftell(fp);
printf("%d\n", size);
fseek(fp, 0, SEEK_SET);
buffer = malloc(size * sizeof(*buffer));
while(((ch = fgetc(fp)) != NULL) && (i <= size)){
buffer[i++] = ch;
}
printf(buffer);
fclose(fp);
free(buffer);
getch();
return 0;
}
You need to add a null char at the end of buffer before you print:
while(((ch = fgetc(fp)) != NULL) && (i <= size)){
buffer[i++] = ch;
}
buffer[i] = 0; // add a null char at the end.
printf("%s",buffer); // print using %s format specifier.
first you need to allocate size + 1 bytes to make room for the terminating NULL character:
buffer = malloc((size + 1) * sizeof(*buffer));
then before printing make sure the string is NULL terminated: buffer[size] = '\0';
finally you're not using printf correctly, it should be
printf("%s", buffer);
see printf manual.
These two strings walk into a bar:
The first string says, "I think I'll have a beer quag fulk boorg jdk^CjfdLk jk3s d#f67howe%^U r89nvy~~owmc63^Dz x.xvcu"
"Please excuse my friend," the second string says, "He isn't null-terminated."
You seem to be waiting for a NULL character at the end of file, you should really be waiting for an EOF (End of file) character instead.
Change this line:
while(((ch = fgetc(fp)) != NULL)
To this:
while(((ch = fgetc(fp)) != EOF)

Resources