I need remove punctuation from a given string or a word. Here's my code:
void remove_punc(char* *str)
{
char* ps = *str;
char* nstr;
// should be nstr = malloc(sizeof(char) * (1 + strlen(*str)))
nstr = (char *)malloc(sizeof(char) * strlen(*str));
if (nstr == NULL) {
perror("Memory Error in remove_punc function");
exit(1);
}
// should be memset(nstr, 0, sizeof(char) * (1 + strlen(*str)))
memset(nstr, 0, sizeof(char) * strlen(*str));
while(*ps) {
if(! ispunct(*ps)) {
strncat(nstr, ps, 1);
}
++ps;
}
*str = strdup(nstr);
free(nstr);
}
If my main function is the simple one:
int main(void) {
char* str = "Hello, World!:)";
remove_punc(&str);
printf("%s\n", str);
return 0;
}
It works! The output is Hello World.
Now I want to read in a big file and remove punctuation from the file, then output to another file.
Here's another main function:
int main(void) {
FILE* fp = fopen("book.txt", "r");
FILE* fout = fopen("newbook.txt", "w");
char* str = (char *)malloc(sizeof(char) * 1024);
if (str == NULL) {
perror("Error -- allocating memory");
exit(1);
}
memset(str, 0, sizeof(char) * 1024);
while(1) {
if (fscanf(fp, "%s", str) != 1)
break;
remove_punc(&str);
fprintf(fout, "%s ", str);
}
return 0;
}
When I rerun the program in Visual C++, it reports a
Debug Error! DAMAGE: after Normal Block(#54)0x00550B08,
and the program is aborted.
So, I have to debug the code. Everything works until the statement free(nstr) being executed.
I get confused. Anyone can help me?
You forgot to malloc space for the null terminator. Change
nstr = (char *)malloc(sizeof(char) * strlen(*str));
to
nstr = malloc( strlen(*str) + 1 );
Note that casting malloc is a bad idea, and if you are going to malloc and then memset to zero, you could use calloc instead which does just that.
There is another bug later in your program. The remove_punc function changes str to point to a freshly-allocated buffer that is just big enough for the string with no punctuation. However you then loop up to fscanf(fp, "%s", str). This is no longer reading into a 1024-byte buffer, it is reading into just the buffer size of the previous punctuation-free string.
So unless your file contains lines all in descending order of length (after punctuation removal), you will cause a buffer overflow here. You'll need to rethink your design of this loop. For example perhaps you could have remove_punc leave the input unchanged, and return a pointer to the freshly-allocated string, which you would free after printing.
If you go with this solution, then use %1023s to avoid a buffer overflow with fscanf (unfortunately there's no simple way to take a variable here instead of hardcoding the length). Using a scanf function with a bare "%s" is just as dangerous as gets.
The answer by #MatMcNabb explains the causes of your problems. I'm going to suggest couple of ways you can simplify your code, and make it less susceptible to memory problems.
If performance is not an issue, read the file character by character and discard the puncuation characters.
int main(void)
{
FILE* fp = fopen("book.txt", "r");
FILE* fout = fopen("newbook.txt", "w");
char c;
while ( (c = fgetc(fp)) != EOF )
{
if ( !ispunct(c) )
{
fputc(c, fout);
}
}
fclose(fout);
fclose(fp);
return 0;
}
Minimize the number of calls to malloc and free by passing in the input string as well as the output string to remove_punc.
void remove_punc(char* inStr, char* outStr)
{
char* ps = inStr;
int index = 0;
while(*ps)
{
if(! ispunct(*ps))
{
outStr[index++] = *ps;
}
++ps;
}
outStr[index] = '\0';
}
and change the way you use remove_punc in main.
int main(void)
{
FILE* fp = fopen("book.txt", "r");
FILE* fout = fopen("newbook.txt", "w");
char inStr[1024];
char outStr[1024];
while (fgets(inStr, 1024, fp) != NULL )
{
remove_punc(inStr, outStr);
fprintf(fout, "%s", outStr);
}
fclose(fout);
fclose(fp);
return 0;
}
In your main you have the following
char* str = (char *)malloc(sizeof(char) * 1024);
...
remove_punc(&str);
...
Your remove_punc() function takes the address of str but when you do this in your remove_punc function
...
*str = strdup(nstr);
...
you are not copying the new string to the previously allocated buffer, you are reassigning str to point to the new line sized buffer! This means that when you read lines from the file and the next line to be read is longer than the previous line you will run into trouble.
You should leave the original buffer alone and instead e.g. return the new allocate buffer containing the new string e.g. return nstr and then free that when done with it or better yet just copy the original file byte by byte to the new file and exclude any punctuation. That would be far more effective
Related
I have a pointer of pointer to store lines I read from a file;
char **lines;
And I'm assigning them like this :
line_no=0;
*(&lines[line_no++])=buffer;
But it crashes why ?
According to my logic the & should give the pointer of zeroth index, then *var=value, that's how to store value in pointer. Isn't it ?
Here is my current complete code :
void read_file(char const *name,int len)
{
int line_no=0;
FILE* file;
int buffer_length = 1024;
char buffer[buffer_length];
file = fopen(name, "r");
while(fgets(buffer, buffer_length, file)) {
printf("---%s", buffer);
++line_no;
if(line_no==0)
{
lines = (char**)malloc(sizeof(*lines) * line_no);
}
else
{
lines = (char**)realloc(lines,sizeof(*lines) * line_no);
}
lines[line_no-1] = (char*)malloc(sizeof(buffer));
lines[line_no-1]=buffer;
printf("-------%s--------\n", *lines[line_no-1]);
}
fclose(file);
}
You have just a pointer, nothing more. You need to allocate memory using malloc().
Actually, you need first to allocate memory for pointers, then allocate memory for strings.
N lines, each M characters long:
char** lines = malloc(sizeof(*lines) * N);
for (int i = 0; i < N; ++i) {
lines[i] = malloc(sizeof(*(lines[i])) * M);
}
You are also taking an address and then immediately dereference it - something like*(&foo) makes little to no sense.
For updated code
Oh, there is so much wrong with that code...
You need to include stdlib.h to use malloc()
lines is undeclared. The char** lines is missing before loop
if in loop checks whether line_no is 0. If it is, then it allocates lines. The problem is, variable line_no is 0 - sizeof(*lines) times 0 is still zero. It allocates no memory.
But! There is ++line_no at the beginning of the loop, therefore line_no is never 0, so malloc() isn't called at all.
lines[line_no-1] = buffer; - it doesn't copy from buffer to lines[line_no-1], it just assigns pointers. To copy strings in C you need to use strcpy()
fgets() adds new line character at the end of buffer - you probably want to remove it: buffer[strcspn(buffer, "\n")] = '\0';
Argument len is never used.
char buffer[buffer_length]; - don't use VLA
It would be better to increment line_no at the end of the loop instead of constantly calculating line_no-1
In C, casting result of malloc() isn't mandatory
There is no check, if opening file failed
You aren't freeing the memory
Considering all of this, I quickly "corrected" it to such state:
void read_file(char const* name)
{
FILE* file = fopen(name, "r");
if (file == NULL) {
return;
}
int buffer_length = 1024;
char buffer[1024];
char** lines = malloc(0);
int line_no = 0;
while (fgets(buffer, buffer_length, file)) {
buffer[strcspn(buffer, "\n")] = '\0';
printf("---%s\n", buffer);
lines = realloc(lines, sizeof (*lines) * (line_no+1));
lines[line_no] = malloc(sizeof (*lines[line_no]) * buffer_length);
strcpy(lines[line_no], buffer);
printf("-------%s--------\n", lines[line_no]);
++line_no;
}
fclose(file);
for (int i = 0; i < line_no; ++i) {
free(lines[i]);
}
free(lines);
}
Ok, you have a couple of errors here:
lines array is not declared
Your allocation is wrong
I don't understand this line, it is pointless to allocate something multiplying it by zero
if( line_no == 0 )
{
lines = (char**)malloc(sizeof(*lines) * line_no);
}
You shouldn't allocate array with just one element and constantly reallocate it. It is a bad practice, time-consuming, and can lead to some bigger problems later.
I recommend you to check this Do I cast the result of malloc? for malloc casting.
You could write something like this:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
void read_file(char const *name)
{
int line_no = 0, arr_size = 10;
int buffer_length = 1024;
char buffer[buffer_length];
char **lines;
FILE* file;
lines = malloc(sizeof(char*) * 10);
file = fopen(name, "r");
while(fgets(buffer, buffer_length, file)) {
buffer[strlen(buffer)-1] = '\0';
printf("---%s", buffer);
++line_no;
if(line_no == arr_size)
{
arr_size += 10;
lines = realloc(lines, sizeof(char*) * arr_size);
}
lines[line_no-1] = malloc(sizeof(buffer));
lines[line_no-1] = buffer;
printf("-------%s--------\n", lines[line_no-1]);
}
fclose(file);
}
PS, fgets() also takes the '\n' char at the end, in order to prevent this you can write the following line: buffer[strlen(buffer)-1] = '\0';
I have this code:
#include<stdio.h>
#include<stdlib.h>
int main(void)
{
char **string = malloc(sizeof(char) * 20);
FILE *fp = fopen("input.txt", "r");
fscanf(fp, "%s", *string);
printf("%s\n", *string);
}
This code generates a segmentation fault. However, if I change **string to be a single character pointer and change the *strings to string it works. Why is this? And how can I use fscanf with arrays of pointers?
Thanks.
char **string = malloc(sizeof(char*)); // Pointer to pointer --> Alloc size of a POINTER
*string = malloc(sizeof(char) * 20); // Dereference and then you can malloc chars
When you allocate a pointer to a pointer, you allocate the size of the pointer first. You then dereference the variable and allocate the size of the contents of the pointer, in this case, the number of characters that it points to.
Also, your usage of fscanf is not only unsafe, but totally unneccessary as well.
Use fgets instead:
fgets( *string, 20, fp );
If you want to allocate an array of pointers to chars, then multiply the sizeof char* by the number entries when allocating the pointer-to-pointer. You must also use a for loop to allocate memory for each character pointer as shown above.
// Example code
char **string = malloc(sizeof(char*) * 10); // Allocates an array of 10 character pointers
if (string == 0) {
fprintf(stderr, "Memory allocation failed.");
exit(1);
}
int i = 0;
FILE *fp = fopen("input.txt", "r");
if (fp == 0) {
fprintf(stderr, "Couldn't open input.txt for reading.");
exit(1);
}
for (; i < 10; ++i) {
string[i] = malloc(sizeof(char) * 20); // For each char pointer, allocates enough memory for 20 characters
if (string[i] == 0) {
fprintf(stderr, "Memory allocation failed.");
exit(1);
}
fgets(string[i], 20, fp);
printf("%s\n", string[i]);
}
Use simple char * pointer instead of double pointer:
char *string = malloc(sizeof(char) * 20);
FILE *fp = fopen("input.txt", "r");
fscanf(fp, "%s", string);
printf("%s\n", string);
Double pointer is pointer to a string (or array of strings), and the first pointer is not initialized anywhere in the original code. Additionally, the first malloc would have to look like malloc(sizeof(char *)*20) - that would give array of 20 strings (which would then need to be properly initialized in the loop)...
Also, not specifying the maximum size of the string is prone to buffer overflow errors, so worth looking at the limits, return values and things like that too.
This solution is for an array of strings, it's also possible to do a realloc for each time we want to add another string to the array.
#include<stdio.h>
#include<stdlib.h>
int main(void)
{
char **string = (char**)malloc(sizeof(char*) * 2); // array with 2 strings
string[0] = (char*)malloc(sizeof(char)*20);
FILE *fp = fopen("input.txt", "r");
fscanf(fp, "%s", string[0]);
fclose(fp); // Remember to close after you don't need file handle anymore
printf("%s\n", string[1]);
string[1] = (char*)malloc(sizeof(char)*20);
FILE *fp2 = fopen("input2.txt", "r");
fscanf(fp2, "%s", string[1]);
fclose(fp2);
printf("%s\n", string[1]);
return 0;
}
FILE *file;
file = fopen(argv[1], "r");
char *match = argv[2];
if (file == NULL) {
printf("File does not exist\n");
return EXIT_FAILURE;
}
int numWords = 0, memLimit = 20;
char** words = (char**) calloc(memLimit, sizeof(char));
printf("Allocated initial array of 20 character pointers.\n");
char string[20];
while (fscanf(file, "%[a-zA-Z]%*[^a-zA-Z]", string) != EOF) {
words[numWords] = malloc(strlen(string) + 1 * sizeof(char));
strcpy(words[numWords], string);
printf("Words: %s\n", words[numWords]);
numWords++; /*keep track of indexes, to realloc*/
if (numWords == memLimit) {
memLimit = 2 * memLimit;
words = (char**) realloc(words, memLimit * sizeof(char*)); /*Fails here*/
printf("Reallocated array of %d character pointers.\n", memLimit);
}
}
Code should open and read a file containing words with punctuation, spaces etc and store in a string, but after 20 tries it throws an error, and I can't seem to get realloc() to work here, which I'm expecting to be the problem. The array is dynamically allocated 20 char pointers, at which when limit is reached, it should realloc by double. How can I get around this?
Two notes. First, you shouldn't ever cast the return value of calloc/malloc/realloc. See this for more information.
Second, as others have pointed out in comments, the first calloc statement uses sizeof(char) and not sizeof(char*) like it should.
words is a pointer to a pointer. The idea is to allocate an array of pointers.
The below is wrong as it allocates for memLimit characters rather than memLimit pointers.
This is the main issue
char** words = (char**) calloc(memLimit, sizeof(char)); // bad
So use an easy idiom: allocate memLimit groups of whatever words points to. It is easier to write, read and maintain.
char** words = calloc(memLimit, sizeof *words);
Avoid the while (scanf() != EOF) hole. Recall that various results can come from scanf() family. It returns the count of successfully scanned fields or EOF. That is typically 1 of at least 3 options. So do not test for one result you do not want, test for the one result you do want.
// while (fscanf(file, "%[a-zA-Z]%*[^a-zA-Z]", string) != EOF) {
while (fscanf(file, "%[a-zA-Z]%*[^a-zA-Z]", string) == 1) {
The above example may not every return 0, but the below easily could.
int d;
while (fscanf(file, "%d", &d) == 1) {
#Enzo Ferber rightly suggests using "%s". Further recommend to follow the above idiom and restrict input width to 1 less than the size of the buffer.
char string[20];
while (fscanf(file, "%19s", string) == 1) {
Suggest the habit of checking allocation result.
// better to use `size_t` rather than `int `for array sizes.
size_t newLimit = 2u * memLimit;
char** newptr = realloc(words, newLimit * sizeof *newptr);
if (newptr == NULL) {
puts("Out-of-memory");
// Code still can use old `words` pointer of size `memLimit * sizeof *words`
return -1;
}
memLimit = newLimit;
words = newptr;
}
Errors
Don't cast malloc/calloc returns. There's not need for it.
Your first sizeof is wrong. It should be sizeof(char*)
That scanf() format string. %s does the job just fine.
Code
The following code worked for me (printed one word per line):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
FILE *file;
file = fopen(argv[1], "r");
char *match = argv[2];
if (file == NULL) {
printf("File does not exist\n");
return EXIT_FAILURE;
}
int numWords = 0, memLimit = 20;
char **words = calloc(memLimit, sizeof(char*));
printf("Allocated initial array of 20 character pointers.\n");
char string[20];
while (fscanf(file, "%s", string) != EOF) {
words[numWords] =
malloc(strlen(string) + 1 * sizeof(char));
strcpy(words[numWords], string);
printf("Words: %s\n", words[numWords]);
numWords++; /*keep track of indexes, to realloc */
if (numWords == memLimit) {
memLimit = 2 * memLimit;
words = realloc(words, memLimit * sizeof(char *));
printf
("Reallocated array of %d character pointers.\n",
memLimit);
}
}
}
Called with ./realloc realloc.c
Hope it helps.
Your first allocation is the problem. You allocate 20 chars and treat them as 20 char pointers. You overrun the allocated buffer and corrupt your memory.
The second allocation fails because the heap is corrupted.
i have a certain txt file(for instance - dic.txt) in which words appear in this order:
hello - ola - hiya \n
chips - fries - frenchfries \n
I need to read the contents of the file into an array of string arrays:
for instance:
array[0] : [hello,ola,hiya]
array[1] : [chips,fries,frenchfries]
I was thinking of using strtok in order to split each line in the file into a string (after copying the entire file into a string and calculating the number of lines),but i could not figure how to split each line ("hello - ola - hiya \n") into the words,and storing each array into the array (an array of strings within an array).
I was considering using malloc in order to allocate memory for each line of words,and storing the pointer to the string's array into the array,but i will be glad to receive any suggestions.
The straightforward way to read lines from a file and then split them into tokens is to read lines with fgets and then use strtok to split each line into tokens:
int main(int argc, char *argv[])
{
// Check for arguments and file pointer omitted
FILE *f = fopen(argv[1], "r");
for (;;) {
char line[80];
char *token;
if (fgets(line, 80, f) == NULL) break;
token = strtok(line, " -\n");
while (token) {
// Do something with token, for example:
printf("'%s' ", token);
token = strtok(NULL, " -\n");
}
}
fclose(f);
return 0;
}
This approach is fine as long as all the lines in your file are shorter than 80 characters. It works for variable numbers of tokens per line.
You have mentioned the issue of handling memory for the lines. The example above assumes that the memory handling is done by the data structure for each word. (It's not part of the example, which just prints the tokens.)
You can malloc memory for each line, which is more flexible than a rigid character limit per line, but you'll end up with a lot of allocations. The benefit is that your words don't need extra memory, they can just be pointers into the lines, but you'll have to take care of properly allocating memory for the lines - and freeing it afterwards.
If you read the whole text file to a contiguous chunk of memory, you're basically done with memory storage, as long as you keep that chunk "alive" as long as your words live:
char *slurp(const char *filename, int *psize)
{
char *buffer;
int size;
FILE *f;
f = fopen(filename, "r");
if (f == NULL) return NULL;
fseek(f, 0, SEEK_END);
size = ftell(f);
fseek(f, 0, SEEK_SET);
buffer = malloc(size + 1);
if (buffer) {
if (fread(buffer, 1, size, f) < size) {
free(buffer);
} else {
buffer[size] = '\0';
if (psize) *psize = size;
}
}
fclose(f);
return buffer;
}
With that chunk of memory, you can first look for lines by looking for the next newline, and then use strtok as above:
int main(int argc, char *argv[])
{
char *buffer; // contiguous memory chunk
char *next; // pointer to next line or NULL for last line
buffer = slurp(argv[1], NULL);
if (buffer == NULL) return 0;
next = buffer;
while (next) {
char *token;
char *p = next;
// Find beginning of the next line,
// i.e. the char after the next newline
next = strchr(p, '\n');
if (next) {
*next = '\0'; // Null-terminate line
next = next + 1; // Advance past newline
}
token = strtok(p, " -\n");
while (token) {
// Do something with token, for example:
printf("'%s' ", token);
token = strtok(NULL, " -\n");
}
}
free(buffer); // ... and invalidate your words
return 0;
}
If you use fscan, you always copy the found tokens to a temporary buffer and when you store them away in your dictionary structure, you have to copy them again with strcpy. That's a lot of copying. Here, you read and allocate once and then work with pointers into the chunk. strtok null-terminates the tokens, so your chunk is a chain of C strings.
Reading the wholem file into memory is usually not a good solution, but in this case, where the file basically is the data, it makes sense.
(Note: All this discussion about memory does not affect the memory needed for your dictionary structure, the nodes in trees and lined lists or whatever. It is just about storing the strings proper.)
using fgets:
int eol(int c, FILE *stream) //given a char and the file, check if eol included
{
if (c == '\n')
return 1;
if (c == '\r') {
if ((c = getc(stream)) != '\n')
ungetc(c, stream);
return 1;
}
return 0;
}
int charsNumInLine(FILE *stream)
{
int position = ftell(stream);
int c, num_of_chars=0;
while ((c = getc(stream)) != EOF && !eol(c, stream))
num_of_chars++;
fseek(stream,position,SEEK_SET); //get file pointer to where it was before this function call
return num_of_chars;
}
void main()
{
//...
char *buffer;
int size;
while()
{
size=charsNumInLine(stream);
buffer = (char*)malloc( size*sizeof(char) );
fgets(buffer,sizeof(buffer),stream);
if (feof(stream) || ferror(stream) )
break;
// use strtok to separate words...
}
//...
}
another way is to use fscanf(file,"%s",buff)to read words and then use the above function eol to see when we get to a newline.
I am returning a char pointer from a function. But the caller is unable to see the string.
char* getFileContent(char* file)
{
FILE* fp = fopen("console.txt", "r");
fseek(fp, 0L, SEEK_END);
size_t sz = ftell(fp);
fseek(fp, 0L, SEEK_SET);
char* message = (char*)malloc(sz+1);
char buf[sz+1];
size_t len = 0;
if (fp != NULL)
{
len = fread(buf, sizeof(char), sz, fp);
}
printf("MALLOC SIZE:%d FILE SIZE:%d", sz, len);
strcpy(message,buf); //Modified code. This line fixed the code
message[++len] = '\0';
//printf("MESSAGE:%s", message);
return(message);
}
This is the caller. Output is empty.
int main(int argc, char **argv, char **env)
{
char* msg = getFileContent(imagefile);
if(msg != NULL)
printf("Output:%s \n", msg);
free(msg);
return 0;
}
Please help.
The error is here:
printf("Output:", msg);
You're not printing the string. Try this instead:
printf("Output: %s", msg);
The %s is needed to tell printf() to print msg as a string.
Note that due to buffering, you may also need to add a \n:
printf("Output: %s \n", msg);
Here's another minor error:
message[++len] = '\0';
should be:
message[len] = '\0';
strcpy(message,buf); //Modified code. This line fixed the code
message[++len] = '\0';
I'm not sure if it is guaranteed that after the allocation of the buffer buf it is nullified. Therefore buf[len] might not be '\0' after reading and if there is no '0\' in the read text then your strcpy might run outside its bounds. So either use strncpy or change the above two lines into
buf[len++] = '\0\;
strcpy(message,buf);
Also the first version places the '\0' at len+1 which in my opinion is false. Imagine you've read 0 bytes then len=0 and message[1] is set to '\0' which leaves message[0] undefined.
Probably your code just ran fine because there is a create change the the allocated buffer buf is either filled with zeros or your compiler actively nullifies it. But as far as I know this is not mandatory for C compilers todo.