Trying to find and replace a string from file - c

void replaceString(char* file, char* str, char* replace)
{
FILE* fp = fopen(file,"rt");
char buffer[BUFFER];
while(fgets(buffer,BUFFER,fp)!=NULL)
{
char* s;
s=strstr(buffer,str);
if(s!=NULL)
{
strcpy(s,replace);
printf("%s is replaced by %s\n",str,replace);
}
}
fclose(fp);
}
int main(int argc, char **argv)
{
char* file= "text.txt";
replaceString(file,"is","was");
printFile(file);
return 0;
}
Guys I am new to file operations, trying to find and replace a string by another. please help! I am trying to open the file in "rt" mode. Saw this in some example code. Not sure about the mode. I am guessing that I need to use a temp.txt file to do that! Can it be done in a single file without using any other file?

Here are some of the errors in your algorithm.
You read and look at one BUFFER of chars at a time, with no overlap. What if str appears between buffers? (i.e. the first part of str is at the end of a buffer and the second part is at the start of the next buffer).
You try to overwrite str with replace directly in the buffer using strcpy. What if both strings are of different length? If replace is shorter than str, you'd still have the end of str there and if replace is longer, it will overwrite the text following str
Even if they are the same length, strcpy adds the final 0 char at the end of the copy (that's how they tell you where the string ended). you DEFINITIVELY don't want that. Maybe strncpy is a better suggestion here, although it will still not work if both strings aren't the same length.
You replace the strings in the buffer but do nothing with the "corrected" buffer! The buffer is not the file, the content of the file was COPIED into the buffer. So you changed the copy and then nothing. The file will not change. You need to write your changes into a file, preferably a different one.
Writing such a replace isn't as trivial as you might think. I may try and help you, but it might be a bit over your head if you're just trying to learn working with files and are still not fully comfortable with strings.
Doing the replace in a single file is easy if you have enough memory to read the entire file at once (if BUFFER is larger than the file size), but very tricky if not especially in your case where replace is longer than str.

This code replaces all occurences of 'orig' text. You can modify as your needing:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void
replaceAllString(char *buf, const char *orig, const char *replace)
{
int olen, rlen;
char *s, *d;
char *tmpbuf;
if (!buf || !*buf || !orig || !*orig || !replace)
return;
tmpbuf = malloc(strlen(buf) + 1);
if (tmpbuf == NULL)
return;
olen = strlen(orig);
rlen = strlen(replace);
s = buf;
d = tmpbuf;
while (*s) {
if (strncmp(s, orig, olen) == 0) {
strcpy(d, replace);
s += olen;
d += rlen;
}
else
*d++ = *s++;
}
*d = '\0';
strcpy(buf, tmpbuf);
free(tmpbuf);
}
int
main(int argc, char **argv)
{
char str[] = "malatya istanbul madrid newyork";
replaceString(str, "malatya", "ankara");
printf("%s\n", str);
replaceString(str, "madrid", "tokyo");
printf("%s\n", str);
return 0;
}

I'd look at using a buffer and work on this.
#include <stdio.h>
#include <string.h>
int main ( ) {
char buff[BUFSIZ]; // the input line
char newbuff[BUFSIZ]; // the results of any editing
char findme[] = "hello";
char replacewith[] = "world";
FILE *in, *out;
in = fopen( "file.txt", "r" );
out= fopen( "new.txt", "w" );
while ( fgets( buff, BUFSIZ, in ) != NULL ) {
if ( strstr( buff, findme ) != NULL ) {
// do 1 or more replacements
// the result should be placed in newbuff
// just watch you dont overflow newbuff...
} else {
// nothing to do - the input line is the output line
strcpy( newbuff, buff );
}
fputs( newbuff, out );
}
fclose( in );
fclose( out );
return 0;
}

"rt" mode is for read only. Use "r+" mode. That opens the file for both read and write.

Related

How to swap STDIO.H library functions with strictly system calls?

A simple program that takes an input file specified at the terminal, and alters the text to be reversed. How can the <stdio.h> functions be converted to only linux system calls? (I assume using only libraries like <unistd.h>)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char* concat(const char *str1, const char *str2)
{
char *answer = malloc(strlen(str1) + strlen(str2) + 1);
strcpy(answer, str1);
strcat(answer, str2);
return answer;
}
int main(int argc, char** argv) {
FILE * fp;
char * line = NULL;
size_t len = 0;
fp = fopen(argv[1], "r");
if (fp == NULL){
perror("\nError ");
exit(1);
}
char *rev1 = "rev ";
char *rev2 = {argv[1]};
char *rev3 = concat(rev1,rev2);
system(rev3);
fclose(fp);
return 0;
}
Thank you for any help. Company only wants me to use system calls for some reason, this internship is not going great!
don't understand how to properly implement [read and write]
Assuming you mean call them (not implement them), the catch with read and write is that they may read or write less than requested, so you have to call them in a loop.
size_t to_write = strlen(str);
while (to_write) {
ssize_t written = write(fd, str, to_write);
if (written < 0) {
perror(NULL);
exit(1);
}
str += written;
to_write -= written;
}
Reading works the same way if you know how much you need to read or if you're trying to read an entire file. (To read the entire file, read chunks until read returns 0. Factors of 8*1024 are nice chunk sizes.)
Otherwise, it gets far more complicated. How do you know how much to read before you read it? If you want to read a line, for example, you have no idea how long the line is until you encounter the terminating line feed. You could read a character at a time, but that's very inefficient. You could do like like stdio does and use a buffer that holds the excess. At which point you might as well use stdio.
//#include<stdio.h> not used
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<fcntl.h>
char* swap(const char *one, const char *two)
{
char *result = malloc(strlen(one) + strlen(two) + 1);
strcpy(result, one);
strcat(result, two);
return result;
}
int main(int argc, char** argv) {
if (argc != 2){
printf("Error, wrong number of arguments!\n");
exit(1);
}
int fd = open(argv[1], O_RDONLY); //<------------------<
char * line = NULL;
size_t len = 0;
char *string1 = "rev ";
char *string2 = {argv[1]};
char *result = swap(string1,string2);
system(result);
return 0;
}

How to read a file character by character and then store them into a dynamically allocated array

I have this project in C for school, and I have a problem. I've read pretty much all the topics about this question but none of them answered my question, that's why I'm asking for your help...
I've created a structure in which there is a pointer to an array of character, and I have a file with int and char.
I would like to create a function read_file that would read this file character by character, and then store them into a dynamically allocated array. But first I don't know how to read a file character by character, and second I don't know how to put these characters into the array...
Here is what I've written so far :
char* main(int argc, char *argv[]){
FILE *p_file;
char* code = malloc(1000*sizeof(char));
char* p = code;
p_file = fopen(argv[1],"rb+");
while((ch=getc(p_file)) != EOF){
*p++ = (char)fgetc(p_file);
}
return code;
free(code);
Could you light me up or send me to a link that could help ?
Thanks in advance !
Check the documentation on fread() and/or fgetc() about reading.
About the array , well, it depends, if it an array of char (e.g. char arr[100]) then you can put them in there as you proceed reading them. Hope this helps.
Tested code below.
Include headers:
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
Function:
char* readFile(const char const* filename)
{
FILE* fh = fopen(filename, "r");
char* result = NULL;
if (fh != NULL) {
size_t size = 1;
while (getc(fh) != EOF) {
size++;
}
result = (char*) malloc(sizeof(char) * size);
fseek(fh, 0, SEEK_SET); //Reset file pointer to begin
for (size_t i = 0; i < size - 1; i++) {
result[i] = (char) getc(fh);
}
result[size - 1] = '\0';
fclose(fh);
}
return result;
}
Usage:
char* result = readFile("test.txt");
if (result != NULL) {
printf("%s\n", result);
free(result);
}
Check this:
https://en.cppreference.com/w/c/io/fgetc
https://en.cppreference.com/w/c/memory/malloc
Hope this help.

Subtlety in strstr?

I have a file of binary data with various character strings sprinkled throughout. I am trying to write a C code to find the first occurrence of user-specified strings in the file. (I know this can be done with bash but I need a C code for other reasons.) The code as it stands is:
#include <stdio.h>
#include <string.h>
#define CHUNK_SIZE 512
int main(int argc, char **argv) {
char *fname = argv[1];
char *tag = argv[2];
FILE *infile;
char *chunk;
char *taglcn = NULL;
long lcn_in_file = 0;
int back_step;
fpos_t pos;
// allocate chunk
chunk = (char*)malloc((CHUNK_SIZE + 1) * sizeof(char));
// find back_step
back_step = strlen(tag) - 1;
// open file
infile = fopen(fname, "r");
// loop
while (taglcn == NULL) {
// read chunk
memset(chunk, 0, (CHUNK_SIZE + 1) * sizeof(char));
fread(chunk, sizeof(char), CHUNK_SIZE, infile);
printf("Read %c\n", chunk[0]);
// look for tag
taglcn = strstr(chunk, tag);
if (taglcn != NULL) {
// if you find tag, add to location the offset in bytes from beginning of chunk
lcn_in_file += (long)(taglcn - chunk);
printf("HEY I FOUND IT!\n");
} else {
// if you don't find tag, add chunk size minus back_step to location and ...
lcn_in_file += ((CHUNK_SIZE - back_step) * sizeof(char));
// back file pointer up by back_step for next read
fseek(infile, -back_step, SEEK_CUR);
fgetpos(infile, &pos);
printf("%ld\n", pos);
printf("%s\n\n\n", chunk);
}
}
printf("%ld\n", lcn_in_file);
fclose(infile);
free(chunk);
}
If you're wondering, back_step is put in to take care of the unlikely eventuality that the string in question is split by a chunk boundary.
The file I am trying to examine is about 1Gb in size. The problem is that for some reason I can find any string within the first 9000 or so bytes, but beyond that, strstr is somehow not detecting any string. That is, if I look for a string located beyond 9000 or so bytes into the file, strstr does not detect it. The code reads through the entire file and never finds the search string.
I have tried varying CHUNK_SIZE from 128 to 50000, with no change in results. I have tried varying back_step as well. I have even put in diagnostic code to print out chunk character by character when strstr fails to find the string, and sure enough, the string is exactly where it is supposed to be. The diagnostic output of pos is always correct.
Can anyone tell me where I am going wrong? Is strstr the wrong tool to use here?
Since you say your file is binary, strstr() will stop scanning at the first null byte in the file.
If you wish to look for patterns in binary data, then the memmem() function is appropriate, if it is available. It is available on Linux and some other platforms (BSD, macOS, …) but it is not defined as part of standard C or POSIX. It bears roughly the same relation to strstr() that memcpy() bears to strcpy().
Note that your code should detect the number of bytes read by fread() and only search on that.
char *tag = …; // Identify the data to be searched for
size_t taglen = …; // Identify the data's length (maybe strlen(tag))
int nbytes;
while ((nbytes = fread(chunk, 1, (CHUNK_SIZE + 1), infile)) > 0)
{
…
tagcln = memmem(chunk, nbytes, tag, taglen);
if (tagcln != 0)
…found it…
…
}
It isn't really clear why you have the +1 on the chunk size. The fread() function doesn't add null bytes at the end of the data or anything like that. I've left that aspect unchanged, but would probably not use it in my own code.
It is good that you take care of identifying a tag that spans the boundaries between two chunks.
The most likely reason for strstr to fail in your code is the presence of null bytes in the file. Furthermore, you should open the file in binary mode for the file offsets to be meaningful.
To scan for a sequence of bytes in a block, use the memmem() function. If it is not available on your system, here is a simple implementation:
#include <string.h>
void *memmem(const void *haystack, size_t n1, const void *needle, size_t n2) {
const unsigned char *p1 = haystack;
const unsigned char *p2 = needle;
if (n2 == 0)
return (void*)p1;
if (n2 > n1)
return NULL;
const unsigned char *p3 = p1 + n1 - n2 + 1;
for (const unsigned char *p = p1; (p = memchr(p, *p2, p3 - p)) != NULL; p++) {
if (!memcmp(p, p2, n2))
return (void*)p;
}
return NULL;
}
You would modify your program this way:
#include <errno.h>
#include <stdio.h>
#include <string.h>
void *memmem(const void *haystack, size_t n1, const void *needle, size_t n2);
#define CHUNK_SIZE 65536
int main(int argc, char **argv) {
if (argc < 3) {
fprintf(sderr, "missing parameters\n");
exit(1);
}
// open file
char *fname = argv[1];
FILE *infile = fopen(fname, "rb");
if (infile == NULL) {
fprintf(sderr, "cannot open file %s: %s\n", fname, strerror(errno));
exit(1);
}
char *tag = argv[2];
size_t tag_len = strlen(tag);
size_t overlap_len = 0;
long long pos = 0;
char *chunk = malloc(CHUNK_SIZE + tag_len - 1);
if (chunk == NULL) {
fprintf(sderr, "cannot allocate memory\n");
exit(1);
}
// loop
for (;;) {
// read chunk
size_t chunk_len = overlap_len + fread(chunk + overlap_len, 1,
CHUNK_SIZE, infile);
if (chunk_len < tag_len) {
// end of file or very short file
break;
}
// look for tag
char *tag_location = memmem(chunk, chunk_len, tag, tag_len);
if (tag_location != NULL) {
// if you find tag, add to location the offset in bytes from beginning of chunk
printf("string found at %lld\n", pos + (tag_location - chunk));
break;
} else {
// if you don't find tag, add chunk size minus back_step to location and ...
overlap_len = tag_len - 1;
memmove(chunk, chunk + chunk_len - overlap_len, overlap_len);
pos += chunk_len - overlap_len;
}
}
fclose(infile);
free(chunk);
return 0;
}
Note that the file is read in chunks of CHUNK_SIZE bytes, which is optimal if CHUNK_SIZE is a multiple of the file system block size.
For some really simple code, you can use mmap() and memcmp().
Error checking and proper header files are left as an exercise for the reader (there is at least one bug - another exercise for the reader to find):
int main( int argc, char **argv )
{
// put usable names on command-line args
char *fname = argv[ 1 ];
char *tag = argv[ 2 ];
// mmap the entire file
int fd = open( fname, O_RDONLY );
struct stat sb;
fstat( fd, &sb );
char *contents = mmap( NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0 );
close( fd );
size_t tag_len = strlen( tag );
size_t bytes_to_check = 1UL + sb.st_size - tag_len;
for ( size_t ii = 0; ii < bytes_to_check; ii++ )
{
if ( !memcmp( contents + ii, tag, tag_len ) )
{
// match found
// (probably want to check if contents[ ii + tag_len ]
// is a `\0' char to get actual string matches)
}
}
munmap( contents, sb.st_len );
return( 0 );
}
That likely won't be anywhere near the fastest way (in general, mmap() is not going to be anywhere near a performance winner, especially in this use case of simply streaming through a file from beginning to end), but it's simple.
(Note that mmap() also has problems if the file size changes while it's being read. If the file grows, you won't see the additional data. If the file is shortened, you'll get SIGBUS when you try to read the removed data.)
A binary data file is going to contain '\0' bytes acting as string ends. The more that are in there, the shorter the area strstr is going to search will be. Note strstr will consider its work done once it hits a 0 byte.
You can scan the memory in intervals like
while (strlen (chunk) < CHUNKSIZE)
chunk += strlen (chunk) + 1;
i.e. restart after a null byte in the chunk as long as you are still within the chunk.

Why cannot I free the memory?(Debug Error)

I need remove punctuation from a given string or a word. Here's my code:
void remove_punc(char* *str)
{
char* ps = *str;
char* nstr;
// should be nstr = malloc(sizeof(char) * (1 + strlen(*str)))
nstr = (char *)malloc(sizeof(char) * strlen(*str));
if (nstr == NULL) {
perror("Memory Error in remove_punc function");
exit(1);
}
// should be memset(nstr, 0, sizeof(char) * (1 + strlen(*str)))
memset(nstr, 0, sizeof(char) * strlen(*str));
while(*ps) {
if(! ispunct(*ps)) {
strncat(nstr, ps, 1);
}
++ps;
}
*str = strdup(nstr);
free(nstr);
}
If my main function is the simple one:
int main(void) {
char* str = "Hello, World!:)";
remove_punc(&str);
printf("%s\n", str);
return 0;
}
It works! The output is Hello World.
Now I want to read in a big file and remove punctuation from the file, then output to another file.
Here's another main function:
int main(void) {
FILE* fp = fopen("book.txt", "r");
FILE* fout = fopen("newbook.txt", "w");
char* str = (char *)malloc(sizeof(char) * 1024);
if (str == NULL) {
perror("Error -- allocating memory");
exit(1);
}
memset(str, 0, sizeof(char) * 1024);
while(1) {
if (fscanf(fp, "%s", str) != 1)
break;
remove_punc(&str);
fprintf(fout, "%s ", str);
}
return 0;
}
When I rerun the program in Visual C++, it reports a
Debug Error! DAMAGE: after Normal Block(#54)0x00550B08,
and the program is aborted.
So, I have to debug the code. Everything works until the statement free(nstr) being executed.
I get confused. Anyone can help me?
You forgot to malloc space for the null terminator. Change
nstr = (char *)malloc(sizeof(char) * strlen(*str));
to
nstr = malloc( strlen(*str) + 1 );
Note that casting malloc is a bad idea, and if you are going to malloc and then memset to zero, you could use calloc instead which does just that.
There is another bug later in your program. The remove_punc function changes str to point to a freshly-allocated buffer that is just big enough for the string with no punctuation. However you then loop up to fscanf(fp, "%s", str). This is no longer reading into a 1024-byte buffer, it is reading into just the buffer size of the previous punctuation-free string.
So unless your file contains lines all in descending order of length (after punctuation removal), you will cause a buffer overflow here. You'll need to rethink your design of this loop. For example perhaps you could have remove_punc leave the input unchanged, and return a pointer to the freshly-allocated string, which you would free after printing.
If you go with this solution, then use %1023s to avoid a buffer overflow with fscanf (unfortunately there's no simple way to take a variable here instead of hardcoding the length). Using a scanf function with a bare "%s" is just as dangerous as gets.
The answer by #MatMcNabb explains the causes of your problems. I'm going to suggest couple of ways you can simplify your code, and make it less susceptible to memory problems.
If performance is not an issue, read the file character by character and discard the puncuation characters.
int main(void)
{
FILE* fp = fopen("book.txt", "r");
FILE* fout = fopen("newbook.txt", "w");
char c;
while ( (c = fgetc(fp)) != EOF )
{
if ( !ispunct(c) )
{
fputc(c, fout);
}
}
fclose(fout);
fclose(fp);
return 0;
}
Minimize the number of calls to malloc and free by passing in the input string as well as the output string to remove_punc.
void remove_punc(char* inStr, char* outStr)
{
char* ps = inStr;
int index = 0;
while(*ps)
{
if(! ispunct(*ps))
{
outStr[index++] = *ps;
}
++ps;
}
outStr[index] = '\0';
}
and change the way you use remove_punc in main.
int main(void)
{
FILE* fp = fopen("book.txt", "r");
FILE* fout = fopen("newbook.txt", "w");
char inStr[1024];
char outStr[1024];
while (fgets(inStr, 1024, fp) != NULL )
{
remove_punc(inStr, outStr);
fprintf(fout, "%s", outStr);
}
fclose(fout);
fclose(fp);
return 0;
}
In your main you have the following
char* str = (char *)malloc(sizeof(char) * 1024);
...
remove_punc(&str);
...
Your remove_punc() function takes the address of str but when you do this in your remove_punc function
...
*str = strdup(nstr);
...
you are not copying the new string to the previously allocated buffer, you are reassigning str to point to the new line sized buffer! This means that when you read lines from the file and the next line to be read is longer than the previous line you will run into trouble.
You should leave the original buffer alone and instead e.g. return the new allocate buffer containing the new string e.g. return nstr and then free that when done with it or better yet just copy the original file byte by byte to the new file and exclude any punctuation. That would be far more effective

C - Recursively reverse string from one file into another file

I'm trying to reverse a string from a text file using recursion into another text file. The reversed string will be stored in a char array, and buffer will become that array. buffer will then be fprintf-ed to the new file. This is what I have so far.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
reverse(char *ch, char *str) //receives "buffer" as argument. str traverses ch
{
char array[20]; //will store the reversed string
if(*str == '\0')
return 0; //arrived at end of string
return(reverse(ch, str+1) + 1); //don't know if this is correct
}
//I want to use the returned number as the index number. For example, if I have
//string "abcd", string[0]='d', string[1]='c', string[2]='b', string[3]='a'. Problem is,
//how do I do it?
int main(int argc, char *argv[]) //argv[1] is input file. argv[2] is output file printed backwards
{
FILE *fp1, *fp2;
char *p, buffer[20]; //p points to buffer
fp1 = fopen("a.txt", "r");
if(fp1 == NULL)
{
printf("The file does not exist.\n");
return 0;
}
p = buffer;
while(fgets(buffer, 20, fp1) != NULL) //reads the first 20 characters of file.txt into buffer
{
reverse(buffer, p); //passes "buffer" as argument
fprintf(fp2, "%s\n", buffer);
}
printf("File %s has been successfully reversed into file %s!\n", argv[1], argv[2]);
fclose(fp1);
fclose(fp2);
return 0;
}
Since I am new to recursion, I only have the faintest idea of how to implement my reverse function.
Reversing a string is easier and faster via an iterative loop, but to make a recursive function you could have the function reverse the starting and ending chars, then repeat the process with the smaller string
abcde
^ ^ first call
^ ^ next call
^ end
---
void reverse(char *s, char *e) {
char tmp = *s;
*s++ = *e;
*e-- = tmp;
if (e > s) reverse (s, e);
}
Where s points to the 1st char and e to the last char. Note that the initial string must have a length > 0 (or a test could be added to the function).
Exemple
int main () {
char x[] = "abcde";
reverse(x, x+strlen(x)-1);
printf("%s\n", x);
return 0;
}
outputs edcba.

Resources