I am writing a program that is sorting the lines from the input text file. It does its job, however I get memory leaks using valgrind.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char* getline(FILE * infile)
{
int size = 1024;
char * line = (char*)malloc(size);
int temp;
int i=0;
do
{
(temp = fgetc(infile));
if (temp !=EOF)
line[i++]=(char)temp;
if (i>=size)
{
size*=2;
line = (char*)realloc(line, size);
}
}while (temp != '\n' && temp !=EOF);
if (temp==EOF)
return NULL;
return line;
}
void print_line (char * line)
{
printf("%s", line);
}
int myCompare (const void * a, const void * b )
{
const char *pa = *(const char**)a;
const char *pb = *(const char**)b;
return strcmp(pa,pb);
}
int main (int argc, char* argv[])
{
FILE * infile;
FILE * outfile;
infile = fopen(argv[1], "r");
if (infile==NULL)
{
printf("Error");
exit(3);
}
outfile = fopen(argv[2], "w");
if (outfile==NULL)
{
printf("Error");
exit(3);
}
char * line;
char **all_lines;
int nlines=0;
while((line=getline(infile))!=NULL)
{
print_line(line);
nlines++;
}
all_lines=malloc(nlines*sizeof(char*));
rewind(infile);
int j=0;
printf("%d\n\n", nlines);
while((line=getline(infile))!=NULL)
{
all_lines[j]=line;
j++;
}
qsort(all_lines, nlines, sizeof(char*), myCompare);
for (int i=0; i<nlines;i++)
{
print_line(all_lines[i]);
fprintf(outfile, "%s", all_lines[i]);
}
for(int i =0; i<nlines;i++)
{
free(all_lines[i]);
}
free(all_lines);
fclose(infile);
fclose(outfile);
return 0;
}
Any ideas where they might come from? I loop over all_lines[] and free the content, then free all_lines itself.
UPDATE:
Ok, so I've done the updates you have suggested. However, now valgrind throws error for functions fprintf in my program. Here is what it says:
11 errors in context 2 of 2:
==3646== Conditional jump or move depends on uninitialised value(s)
==3646== at 0x40BA4B1: vfprintf (vfprintf.c:1601)
==3646== by 0x40C0F7F: printf (printf.c:35)
==3646== by 0x80487B8: print_line (sort_lines.c:44)
==3646== by 0x804887D: main (sort_lines.c:77)
==3646== Uninitialised value was created by a heap allocation
==3646== at 0x4024D12: realloc (vg_replace_malloc.c:476)
==3646== by 0x8048766: getline (sort_lines.c:30)
==3646== by 0x804889A: main (sort_lines.c:75)
I would like to know why it reports error on simply fprintf those lines to a text file. I've looked up that it is something concerning gcc optimalization turning fprint into fputs but I dont get this idea
There are multiple problems in your code:
function getline:
the string in the line buffer is not properly '\0' terminated at the end of the do / while loop.
It does not free the line buffer upon end of file, hence memory leak.
It does not return a partial line at end of file if the file does not end with a '\n'.
neither malloc not realloc return values are checked for memory allocation failure.
You should realloc the line to the actual size used to reduce the amount of memory consumed. Currently, you allocate at least 1024 bytes per line. Sorting the dictionary will require 100 times more memory than needed!
function MyCompare:
The lines read include the '\n' at the end. Comparison may yield unexpected results: "Hello\tworld\n" will come before "Hello\n". You should strip the '\n' in getline and modify the printf formats appropriately.
function main:
You do not check if command line arguments are actually provided before trying to open them with fopen, invoking undefined behaviour
The first loop that counts the number of lines does not free the lines returned by getline... big memory leak!
The input stream cannot always be rewinded. If your program is given its input via a pipe, rewind will fail except for very small input sizes.
The number of lines read in the second loop may be different from the one counted in the first loop if the file was modified asynchronously by another process. It the file grew, you invoke undefined behaviour when you load it, if it shrank, you invoke undefined behaviour when you sort the array. You should reallocate the all_lines array as you read the lines in a single loop.
Printing the lines before sort is not very useful and complicates testing.
Related
I have a daemon written in C that reads a file every INTERVAL seconds line by line and does some work on each line if it matches the criteria. I can't leave the file open since the file is being accessed and modified by another API, so this is a snippet of the code to show the issue:
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#define INTERVAL 5
int main(int argc, char *argv[])
{
size_t len = 100;
FILE *fp;
char *line = NULL;
line = malloc(len * sizeof(char));
size_t read;
int iteration = 1;
while (1)
{
printf("iteration %d\n", iteration++);
fp = fopen("input.txt", "r+");
if (fp == NULL)
exit(EXIT_FAILURE);
while ((read = getline(&line, &len, fp)) != -1)
{
printf("line is: %s\n", line);
// Do some more work with the data.
// If the password has expired remove it from the file.
}
fclose(fp);
if (line)
{
free(line);
}
sleep(INTERVAL);
}
}
When running this code, it results in this:
iteration 1
line is: 1656070481 qwerty12345
line is: 1656070482 qwerty
iteration 2
line is: 1656070481 qwerty12345
line is: 1656070482 qwerty
daemon(37502,0x1027e4580) malloc: *** error for object 0x600003ca8000: pointer being freed was not allocated
daemon(37502,0x1027e4580) malloc: *** set a breakpoint in malloc_error_break to debug
zsh: abort ./daemon
So the problem is in this line:
if (line)
{
free(line);
}
It looks like somehow the pointer is being deallocated somewhere inside the while loop. So, to solve this, I have to call the malloc function at the start of the while(1) loop to reallocate the memory with each iteration, like so:
while (1)
{
printf("iteration %d\n", iteration++);
line = malloc(len * sizeof(char));
This solves the issue, but I want to understand why the issue happened in the first place. Why is the pointer being deallocated after the second iteration?
have a daemon written in C
Then call daemon(3).
so the problem is in this line:
if (line){
free(line); }
Yes. So you need to clear line and code:
if (line) {
free(line);
line = NULL;
len = 0;
}
On the next loop, line will be NULL and you won't (incorrectly) free it twice.
(I won't bother free-ing inside the loop, but this is just my opinion.)
Of course, before getline you want to allocate line, so before
while ((read = getline(&line, &len, fp)) != -1)
code:
if (!line) {
line = malloc(100);
if (line)
len = 100;
};
Consider using Valgrind and in some cases Boehm conservative GC (or my old Qish GC downloadable here). You may then have to avoid getline and use simpler and lower-level alternatives (e.g., read(2) or simply fgetc(3)...).
Read the GC handbook.
Perhaps use Frama-C on your source code.
(Consider also using Rust, Go, C++, OCaml, or Common Lisp/SBCL to entirely recode your daemon.)
On Linux, see also inotify(7).
I have an exercise in which I have to read a file containing strings and I have to return the content using one/multiple arrays (this is because the second part of this exercise asks for these lines to be reversed, I'm having problems - and therefore ask for help - with the input).
So far, I have this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define LENGTH 1024
int main(int argc, char *argv[]){
char* input[LENGTH];
if(argc==2){
FILE *fp = fopen(argv[1], "rt");
if(fp!=NULL){
int i=0;
while(fgets(input, sizeof(input), fp)!=NULL){
input[i] = (char*)malloc(sizeof(char) * (LENGTH));
fgets(input, sizeof(input), fp);
i++;
}
printf("%s", *input);
free(input);
}
else{
printf("File opening unsuccessful!");
}
}
else{
printf("Enter an argument.");
}
return 0;
}
I also have to check whether or not memory allocation has failed. This program in its' current form returns nothing when run from the command line.
EDIT: I think it's important to mention that I get a number of warnings:
passing argument 1 of 'fgets' from incompatible pointer type [-Wincompatible-pointer-types]|
attempt to free a non-heap object 'input' [-Wfree-nonheap-object]|
EDIT 2:
Example of input:
These
are
strings
... and the expected output:
esehT
era
sgnirts
In the exercise, it's specified that the maximum length of a line is 1024 characters.
You probably want something like this.
Comments are in the code
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define LENGTH 1024
int main(int argc, char* argv[]) {
if (argc == 2) {
FILE* fp = fopen(argv[1], "rt");
if (fp != NULL) {
char** lines = NULL; // pointer to pointers to lines read
int nboflines = 0; // total number of lines read
char input[LENGTH]; // temporary input buffer
while (fgets(input, sizeof(input), fp) != NULL) {
char* newline = malloc(strlen(input) + 1); // allocate memory for line (+1 for null terminator)
strcpy(newline, input); // copy line just read
newline[strcspn(newline, "\n")] = 0; // remove \n if any
nboflines++; // one more line
lines = realloc(lines, nboflines * sizeof(char*)); // reallocate memory for one more line
lines[nboflines - 1] = newline; // store the pointer to the line
}
fclose(fp);
for (int i = 0; i < nboflines; i++) // print the lins we've read
{
printf("%s\n", lines[i]);
}
}
else {
printf("File opening unsuccessful!");
}
}
else {
printf("Enter an argument.");
}
return 0;
}
Explanation about removing the \n left by fgets: Removing trailing newline character from fgets() input
Disclaimers:
there is no error checking for the memory allocation functions
memory is not freed. This is left as an exercise.
the way realloc is used here is not very efficient.
you still need to write the code that reverses each line and displays it.
You probably should decompose this into different functions:
a function that reads the file and returns the pointer to the lines and the number of lines read,
a function that displays the lines read
a function that reverses one line (to be written)
a function that reverses all lines (to be written)
This is left as an exercise.
I am trying to process a 250MB file using a script in C.
The file is basically a dataset and I want to read just some of the columns and (more importantly) break one of them (which is originally a string) into a sequence of characters.
However, even though I have plenty of RAM available, the code is killed by konsole (using KDE Neon) everytime I run it.
The source is available below:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
FILE *arquivo;
char *line = NULL;
size_t len = 0;
int i = 0;
int j;
int k;
char *vetor[500];
int acertos[45];
FILE *licmat = fopen("licmat.csv", "w");
//creating the header
fprintf(licmat,"CO_CATEGAD,CO_UF_CURSO,ACERTO09,ACERTO10,ACERTO11,ACERTO12,ACERTO13,ACERTO14,ACERTO15,ACERTO16,ACERTO17,ACERTO18,ACERTO19,ACERTO20,ACERTO21,ACERTO22,ACERTO23,ACERTO24,ACERTO25,ACERTO26,ACERTO27,ACERTO28,ACERTO29,ACERTO30,ACERTO31,ACERTO32,ACERTO33,ACERTO34,ACERTO35\n");
if ((arquivo = fopen("MICRODADOS_ENADE_2017.csv", "r")) == NULL) {
printf ("\nError");
exit(0);
}
//reading one line at a time
while (getline(&line, &len, arquivo)) {
char *ptr = strsep(&line,";");
j=0;
//breaking the line into a vector based on ;
while(ptr != NULL)
{
vetor[j]=ptr;
j=j+1;
ptr = strsep(&line,";");
}
//filtering based on content
if (strcmp(vetor[4],"702")==0 && strcmp(vetor[33],"555")==0) {
//copying some info
fprintf(licmat,"%s,%s,",vetor[2],vetor[8]);
//breaking the string (32) into isolated characters
for (k=0;k<27;k=k+1) {
fprintf(licmat,"%c", vetor[32][k]);
if (k<26) {
fprintf(licmat,",");
}
}
fprintf(licmat,"\n");
}
i=i+1;
}
free(line);
fclose(arquivo);
fclose(licmat);
}
The output is perfect up to the point when the script is killed. The output file is just 640KB long and has about 10000 lines only.
What could be the issue?
It looks to me like you're mishandling the memory buffer managed by getline() - which allocates/reallocates as needed - by the use of strsep(), which seems to manipulate that same pointer value.
Once line has been updated to reflect some other element on the line, it's no longer pointing to the start of allocated memory, and then boom the next time getline() needs to do anything with it.
Use a different variable to pass to strsep():
while (getline(&line, &len, arquivo) > 0) { // use ">=" if you want blank lines
char *parseline = line;
char *ptr = strsep(&parseline,";");
// do the same thing later
The key thing here: you're not allowed to muck with the value of line other than to free() it at the end (which you do), and you can't let any other routine do it either.
Edit: updated to reflect getline() returning <0 on error (h/t to #user3121023)
Am trying to open a file(Myfile.txt) and concatenate each line to a single buffer, but am getting unexpected output. The problem is,my buffer is not getting updated with the last concatenated lines. Any thing missing in my code?
Myfile.txt (The file to open and read)
Good morning line-001:
Good morning line-002:
Good morning line-003:
Good morning line-004:
Good morning line-005:
.
.
.
Mycode.c
#include <stdio.h>
#include <string.h>
int main(int argc, const char * argv[])
{
/* Define a temporary variable */
char Mybuff[100]; // (i dont want to fix this size, any option?)
char *line = NULL;
size_t len=0;
FILE *fp;
fp =fopen("Myfile.txt","r");
if(fp==NULL)
{
printf("the file couldn't exist\n");
return;
}
while (getline(&line, &len, fp) != -1 )
{
//Any function to concatinate the strings, here the "line"
strcat(Mybuff,line);
}
fclose(fp);
printf("Mybuff is: [%s]\n", Mybuff);
return 0;
}
Am expecting my output to be:
Mybuff is: [Good morning line-001:Good morning line-002:Good morning line-003:Good morning line-004:Good morning line-005:]
But, am getting segmentation fault(run time error) and a garbage value. Any think to do? thanks.
Specify MyBuff as a pointer, and use dynamic memory allocation.
#include <stdlib.h> /* for dynamic memory allocation functions */
char *MyBuff = calloc(1,1); /* allocate one character, initialised to zero */
size_t length = 1;
while (getline(&line, &len, fp) != -1 )
{
size_t newlength = length + strlen(line)
char *temp = realloc(MyBuff, newlength);
if (temp == NULL)
{
/* Allocation failed. Have a tantrum or take recovery action */
}
else
{
MyBuff = temp;
length = newlength;
strcat(MyBuff, temp);
}
}
/* Do whatever is needed with MyBuff */
free(MyBuff);
/* Also, don't forget to release memory allocated by getline() */
The above will leave newlines in MyBuff for each line read by getline(). I'll leave removing those as an exercise.
Note: getline() is linux, not standard C. A function like fgets() is available in standard C for reading lines from a file, albeit it doesn't allocate memory like getline() does.
I have tried looking around but, to me files are the hardest thing to understand so far as I am learning C, especially text files, binary files were a bit easier. Basically I have to read in two text files both contains words that are formatted like this "hard, working,smart, works well, etc.." I am suppose to compare the text files and count the keywords. I would show some code but honestly I am lost and the only thing I have down is just nonsense besides this.
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#define SIZE 1000
void resumeRater();
int main()
{
int i;
int counter = 0;
char array[SIZE];
char keyword[SIZE];
FILE *fp1, *fp2;
int ch1, ch2;
errno_t result1 = fopen_s(&fp1, "c:\\myFiles\\resume.txt", "r");
errno_t result2 = fopen_s(&fp2, "c:\\myFiles\\ideal.txt", "r");
if (fp1 == NULL) {
printf("Failed to open");
}
else if (fp2 == NULL) {
printf("Failed to open");
}
else {
result1 = fread(array, sizeof(char), 1, fp1);
result2 = fread(keyword, sizeof(char), 1, fp2);
for (i = 0; i < SIZE; i++)
{
if (array[i] == keyword[i])
{
counter++;
}
}
fclose(fp1);
fclose(fp2);
printf("Character match: %d", counter);
}
system("pause");
}
When you have a situation where you are doing a multiple of something (like reading 2 files), it makes a lot of sense to plan ahead. Rather than muddying the body of main with all the code necessary to read 2 text files, create a function that reads the text file for you and have it return an array containing the lines of the file. This really helps you concentrate on the logic of what your code needs to do with the lines rather than filling space with getting the lines in the first place. Now there is nothing wrong with cramming it all in one long main, but from a readability, maintenance, and program structure standpoint, it makes all more difficult.
If you structure the read function well, you can reduce your main to the following. This reads both text files into character arrays and provides the number of lines read in a total of 4 lines (plus the check to make sure your provided two filenames to read):
int main (int argc, char **argv) {
if (argc < 3 ) {
fprintf (stderr, "error: insufficient input, usage: %s <filename1> <filename2>\n", argv[0]);
return 1;
}
size_t file1_size = 0; /* placeholders to be filled by readtxtfile */
size_t file2_size = 0; /* for general use, not needed to iterate */
/* read each file into an array of strings,
number of lines read, returned in file_size */
char **file1 = readtxtfile (argv[1], &file1_size);
char **file2 = readtxtfile (argv[2], &file2_size);
return 0;
}
At that point you have all your data and you can work on your key word code. Reading from textfiles is a very simple matter. You just have to get comfortable with the tools available. When reading lines of text, the preferred approach is to use line-input to read an entire line at a time into a buffer. You then parse to buffer to get what it is you need. The line-input tools are fgets and getline. Once you have read the line, you then have tools like strtok, strsep or sscanf to separate what you want from the line. Both fgets and getline read the newline at the end of each line as part of their input, so you may need to remove the newline to meet your needs.
Storing each line read is generally done by declaring a pointer to an array of char* pointers. (e.g. char **file1;) You then allocate memory for some initial number of pointers. (NMAX in the example below) You then access the individual lines in the file as file1_array[n] when n is the line index 0 - lastline of the file. If you have a large file and exceed the number of pointers you originally allocated, you simply reallocate additional pointers for your array with realloc. (you can set NMAX to 1 to make this happen for every line)
What you use to allocate memory and how you reallocate can influence how you make use of the arrays in your program. Careful choices of calloc to initially allocate your arrays, and then using memset when you reallocate to set all unused pointers to 0 (null), can really save you time and headache? Why? Because, to iterate over your array, all you need to do is:
n = 0;
while (file1[n]) {
<do something with file1[n]>;
n++;
}
When you reach the first unused pointer (i.e. the first file1[n] that is 0), the loop stops.
Another very useful function when reading text files is strdup (char *line). strdup will automatically allocate space for line using malloc, copy line to the newly allocated memory, and return a pointer to the new block of memory. This means that all you need to do to allocate space for each pointer and copy the line ready by getline to your array is:
file1[n] = strdup (line);
That's pretty much it. you have read your file and filled your array and know how to iterate over each line in the array. What is left is cleaning up and freeing the memory allocated when you no longer need it. By making sure that your unused pointers are 0, this too is a snap. You simply iterate over your file1[n] pointers again, freeing them as you go, and then free (file1) at the end. Your done.
This is a lot to take in, and there are a few more things to it. On the initial read of the file, if you noticed, we also declare a file1_size = 0; variable, and pass its address to the read function:
char **file1 = readtxtfile (argv[1], &file1_size);
Within readtxtfile, the value at the address of file1_size is incremented by 1 each time a line is read. When readtxtfile returns, file1_size contains the number of lines read. As shown, this is not needed to iterate over the file1 array, but you often need to know how many lines you have read.
To put this all together, I created a short example of the functions to read two text files, print the lines in both and free the memory associated with the file arrays. This explanation ended up longer than I anticipated. So take time to understand how it works, and you will be a step closer to handling textfiles easily. The code below will take 2 filenames as arguments (e.g. ./progname file1 file2) Compile it with something similar to gcc -Wall -Wextra -o progname srcfilename.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NMAX 256
char **readtxtfile (char *fn, size_t *idx);
char **realloc_char (char **p, size_t *n);
void prn_chararray (char **ca);
void free_chararray (char **ca);
int main (int argc, char **argv) {
if (argc < 3 ) {
fprintf (stderr, "error: insufficient input, usage: %s <filename1> <filename2>\n", argv[0]);
return 1;
}
size_t file1_size = 0; /* placeholders to be filled by readtxtfile */
size_t file2_size = 0; /* for general use, not needed to iterate */
/* read each file into an array of strings,
number of lines read, returned in file_size */
char **file1 = readtxtfile (argv[1], &file1_size);
char **file2 = readtxtfile (argv[2], &file2_size);
/* simple print function */
if (file1) prn_chararray (file1);
if (file2) prn_chararray (file2);
/* simple free memory function */
if (file1) free_chararray (file1);
if (file2) free_chararray (file2);
return 0;
}
char** readtxtfile (char *fn, size_t *idx)
{
if (!fn) return NULL; /* validate filename provided */
char *ln = NULL; /* NULL forces getline to allocate */
size_t n = 0; /* max chars to read (0 - no limit) */
ssize_t nchr = 0; /* number of chars actually read */
size_t nmax = NMAX; /* check for reallocation */
char **array = NULL; /* array to hold lines read */
FILE *fp = NULL; /* file pointer to open file fn */
/* open / validate file */
if (!(fp = fopen (fn, "r"))) {
fprintf (stderr, "%s() error: file open failed '%s'.", __func__, fn);
return NULL;
}
/* allocate NMAX pointers to char* */
if (!(array = calloc (NMAX, sizeof *array))) {
fprintf (stderr, "%s() error: memory allocation failed.", __func__);
return NULL;
}
/* read each line from fp - dynamicallly allocated */
while ((nchr = getline (&ln, &n, fp)) != -1)
{
/* strip newline or carriage rtn */
while (nchr > 0 && (ln[nchr-1] == '\n' || ln[nchr-1] == '\r'))
ln[--nchr] = 0;
array[*idx] = strdup (ln); /* allocate/copy ln to array */
(*idx)++; /* increment value at index */
if (*idx == nmax) /* if lines exceed nmax, reallocate */
array = realloc_char (array, &nmax);
}
if (ln) free (ln); /* free memory allocated by getline */
if (fp) fclose (fp); /* close open file descriptor */
return array;
}
/* print an array of character pointers. */
void prn_chararray (char **ca)
{
register size_t n = 0;
while (ca[n])
{
printf (" arr[%3zu] %s\n", n, ca[n]);
n++;
}
}
/* free array of char* */
void free_chararray (char **ca)
{
if (!ca) return;
register size_t n = 0;
while (ca[n])
free (ca[n++]);
free (ca);
}
/* realloc an array of pointers to strings setting memory to 0.
* reallocate an array of character arrays setting
* newly allocated memory to 0 to allow iteration
*/
char **realloc_char (char **p, size_t *n)
{
char **tmp = realloc (p, 2 * *n * sizeof *p);
if (!tmp) {
fprintf (stderr, "%s() error: reallocation failure.\n", __func__);
// return NULL;
exit (EXIT_FAILURE);
}
p = tmp;
memset (p + *n, 0, *n * sizeof *p); /* memset new ptrs 0 */
*n *= 2;
return p;
}
valgrind - Don't Forget To Check For Leaks
Lastly, anytime you allocate memory in your code, make sure you use a memory checker such as valgrind to confirm you have no memory errors and to confirm you have no memory leaks (i.e. allocated blocks you have forgotten to free, or that have become unreachable). valgrind is simple to use, just valgrind ./progname [any arguments]. It can provide a wealth of information. For example, on this read example:
$ valgrind ./bin/getline_readfile_fn voidstruct.c wii-u.txt
==14690== Memcheck, a memory error detector
==14690== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==14690== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==14690== Command: ./bin/getline_readfile_fn voidstruct.c wii-u.txt
==14690==
<snip - program output>
==14690==
==14690== HEAP SUMMARY:
==14690== in use at exit: 0 bytes in 0 blocks
==14690== total heap usage: 61 allocs, 61 frees, 6,450 bytes allocated
==14690==
==14690== All heap blocks were freed -- no leaks are possible
==14690==
==14690== For counts of detected and suppressed errors, rerun with: -v
==14690== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
Pay particular attention to the lines:
==14690== All heap blocks were freed -- no leaks are possible
and
==14690== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
You can ignore the (suppressed: 2 from 2) which just indicate I don't have the development files installed for libc.