Using file pointers while looping - c

Alright guys, I'm having some trouble with using my file pointers to traverse through a file via looping. I will have a list of strings in my text file, one per line, and I am testing similarities between them. So my method of going about it is having two file pointers to traverse and compare.
Example: FILE* fp1 will be set on the first line to begin. FILE* fp2 will be set on the second line to begin.
I wish to traverse this way:
Line 1 <-> Line 2
Line 1 <-> Line 3
Line 1 <-> Line 4
Line 1 <-> Line 5
(Here I read the next line via fp1 to get to Line 2, I also attempt to set fp2 to the next line read after fp1)
Line 2 <-> Line 3
Line 2 <-> Line 4
Line 2 <-> Line 5
Etc...
And here is the code... The FILE* fp was passed to the function as (FILE* fp)
FILE* nextfp;
for(i = 1; i <= numStr; i++){
fscanf(fp, "%s", str1);
nextfp = fp;
double str1len = (double)(strlen(str1));
for(j = i + 1; j <= numStr; j++){
fscanf(nextfp, "%s", str2);
double str2len = (double)(strlen(str2));
if((str1len >= str2len) && ((str2len / str1len) >= 0.90000) && (lcsLen(str1, str2) / (double)str2len >= 0.80000))
sim[i][j] = 'H';
else if ((str2len >= str1len) && ((str1len / str2len) >= 0.90000) && (lcsLen(str2, str1) / (double)str1len >= 0.80000))
sim[i][j] = 'H';
}
}
int numStr is the total number of lines with strings
lcsLen(char*, char*) returns length of longest common subsequence
The sim[][] array is where I am labeling my level of similarity. As of right now I only have it programmed to label strings of high similarity.
My results are incomplete and it is due to my fp not going to the next line and just staying on the same string, AND, my inner loop is keeping the nextfp pointing at the last string and not going where it should due to my nextfp = fp line.
Any help's appreciated! Thank you all so much!

You can't treat FILE * like a pointer to memory, it's a pointer to an object of type FILE which in turn holds the state associated with the file I/O.
Copying a FILE * makes little sense, and certainly doesn't create a copy of the state in question.
Part of that state is the current position in the file, this doesn't change just because you copy the pointer.
You should either investigate memory-mapping the file, which would give you the type of access you seem to expect, or just read in the entire file once to an array of strings, which you can then iterate over in any way you like.

After first innerloop, the file stream already goes to end-of-file. After that you can't use fp to read from the file stream. Remember you are reading on stream, stream don't go back. Read man 3 fseek, you can manualy set the file offset to some place, but this doesn't address your problem . You should read all lines to arrays, this is easier and faster.

As the other answers state, you should consider just reading the whole file into an array. If your file size is more than several hundreds of MB, your approach might be the right choice however.
Use ftell to save the current offset after reading the first line and set the file descriptor back to that offset with fseek after you looped through the rest of the lines.
FILE* nextfp;
size_t offset;
for(i = 1; i <= numStr; i++){
fscanf(fp, "%s", str1);
offset = ftell(fp); // save the current position
double str1len = (double)(strlen(str1));
for(j = i + 1; j <= numStr; j++){
fscanf(nextfp, "%s", str2);
double str2len = (double)(strlen(str2));
if((str1len >= str2len) && ((str2len / str1len) >= 0.90000) && (lcsLen(str1, str2) / (double)str2len >= 0.80000))
sim[i][j] = 'H';
else if ((str2len >= str1len) && ((str1len / str2len) >= 0.90000) && (lcsLen(str2, str1) / (double)str1len >= 0.80000))
sim[i][j] = 'H';
}
fseek(fp, offset, SEEK_SET); // set the file descriptor back to the previous position
}

Related

How to read a .xyz file into an array of doubles?

I am new to C, coming from Python. I want to read a .xyz file into a dynamically sized array, to use for various calculations later on in the program. The file is formatted as follows:
Title
Comment
Symbol 0.000 0.000 0.000
Symbol 0.000 0.000 0.000
....
The two first lines are not needed, and should just be skipped. The "Symbol" part of the file are chemical symbols--e.g. H, Au, C, Mn--as the .xyz file format is used for storing 3D coordinates of atoms. They need to be ignored as well. I'm interested in the space separated decimal numbers. I therefore want to:
Skip the first two lines, or just ignore them in some way.
Skip the first part of each line until the first space.
Store the three columns of numbers (coordinates) in an array.
So far I have been able to open a file for reading, and then I've attempted to check how long the file is, in order to have the size of the array change depending on how many coordinate sets needs to be stored.
// Variable declaration
FILE *fp;
long file_size;
// Open file and error checking
fp = fopen ("file_name" , "r");
if(!fp) perror("file_name"), exit(1);
// Check file size
fseek(fp, 0, SEEK_END);
file_size = ftell(fp);
rewind(fp);
// Close file
fclose(fp);
I've been able to skip the first two lines using fscanf(fp, "%*[^\n]"), to skip to the end of the line. But, I haven't been able to figure out how to loop through the rest of the file, while storing only the decimal numbers in an array.
If I understand correctly, I need to allocate memory for the array, using something like malloc() in combination with my file_size and then copy the data into the array using fread().
Here is an example of the contents of an actual .xyz file:
10 atom system
Energy: -914941.6614699
Ag 0.96834 1.51757 0.02281
Ag 0.96758 -1.51824 -0.02206
Ag -1.80329 2.27401 0.03179
Ag -3.58033 0.00046 0.00126
Ag -1.80447 -2.27338 -0.03537
Ag -0.96581 0.02246 -1.51755
Ag -0.96929 -0.02231 1.51463
Ag 1.80613 0.03321 -2.27213
Ag 3.58027 0.00028 0.00206
Ag 1.80086 -0.03407 2.27455
Here is a general approach in C for reading a file into an array of cstrings (pointers to cstrings, so the rough equivalent of a Python list of strings).
int count = 0; // line counter;
int char_count = 0; // char counter;
int max_len = 0; // for storing the longest line length
int c; // for measuring each line length
char **str_ptr_arr; // array of pointers to c-string
//extract characters from the file, looking for endlines; note that
//the EOF check has to come AFTER the getc(fp) to work properly
for (c = getc(fp); c != EOF; c = getc(fp)) { //edit see comments
char_count += 1;
if (c == '\n') { //safe comparison see comments
count += 1;
if (max_len < char_count) {
max_len = char_count; //gets longest line
}
char_count = 0;
}
}
//should probably do an feof check here
rewind(fp);
So now you have the number of lines and the length of the longest line, (You can try using the above loop to exclude lines if you want but it might just be easier to read the whole thing into an array of c-strings, then process that into an array of doubles). Now allocate the memory for the array of pointers to c-strings and for the c-strings themselves:
//allocate enough memory to hold all the strings in the file, by first
//allocating the arr of ptrs then a slot for each c-string pointed to:
str_ptr_arr = malloc(count * sizeof(char*)); //size of pointer
for (int i = 0; i < count; i++) {
str_ptr_arr[i] = malloc ((max_len + 1) * sizeof(char)); // +1 for '\0' terminate
}
rewind(fp); //rewind again;
Now, we have a problem, which is how to populate these cstrings (Python is so much easier!). This works, I'm not sure if it's the expert approach, but here we read into a
temporary buffer then use strcpy to move the contents of the buffer into our allocated array slots:
for (int i = 0; i < count; i++) {
char buff[max_len + 1]; //local temporary buffer that can store any line in file
fscanf(fp, "%s", buff); //read the first string to buffer
strcpy(str_ptr_arr[i], buff);
}
Note: this is a decent point at which to start excluding lines or removing various substrings from lines, as you can make strcpy conditional on the contents of the buffer, by using other cstring methods. I'm fairly new at this myself, (learning to write C functions for use in Python progams), but this seems to be the correct approach.
It might also be possible to go directly to a dynamically allocated array of floats for storing your numerical data without bothering with the cstring array; that could be done in the last loop above. You could split the strings at whitespace, exclude the alphabetical parts, and use the cstring function atof to convert to float datatype.
Edit: I should mention all these memory allocations must be manually freed when you are done with them, and this is the approach:
for(int i = 0; i < count; i++) { // free each allocated cstring space
free(str_ptr_arr[i]);
}
free(str_ptr_arr); // free the cstring pointer space
str_ptr_arr = NULL;
Given, for example:
#define STORAGE_INCREMENT 128
typedef struct
{
double x, y, z ;
} sXYZ ;
Then:
int atom_count = 0 ;
int atom_capacity = STORAGE_INCREMENT ;
sXYZ* atoms = malloc( atom_capacity * sizeof(*atoms) ) ;
// While valid triplet, discard symbol, get x,y,z
while( fscanf( fp, "%*s%lf%lf%lf", &atoms[atom_count].x,
&atoms[atom_count].y,
&atoms[atom_count].z ) == 3 )
{
// Increment count
atom_count++ ;
// If capacity exhausted, expand allocation
if( atom_count == atom_capacity )
{
atom_capacity += STORAGE_INCREMENT ;
sXYZ* bigger = realloc( atoms, atom_capacity * sizeof(*atoms) ) ;
if( bigger == NULL )
{
break ;
}
atoms = bigger ;
}
}
This allocates enough space for 128 atoms initially, and if the space is exhausted, it is expanded by a further 128 atoms - indefinitely. A smaller value can be used if the files typically have fewer atoms to be a little more memory efficient. This approach saves you having to first count the number of triplets in the file.

Allocate memory based on filesize has not the correct number?

I want to store the content of my file in a dynamic string pointer value.
Here is my Code:
char *strPtr = NULL;
char tmpChar = "";
inputFile = fopen(input_file, "r");
fseek(inputFile, 0, SEEK_END); // seek to end of file
fileSize = ftell(inputFile); // get current file pointer
rewind(inputFile);
strPtr = (char*) realloc(strPtr, fileSize * sizeof(char));
int counter = 0;
while ((tmpChar = fgetc(inputFile)) != EOF)
{
strPtr[counter] = tmpChar;
counter++;
if (counter == fileSize)
printf("OK!");
}
printf("Filesize: %d, Counter: %d", fileSize,counter);
Now to my Problem ... With the last printf I get 2 different values for example: Filesize 127 & Counter 118.
Addtionally at the END of my strPtr-Variable there is a wrong input like "ÍÍÍÍÍÍÍÍÍýýýýüe".
Notepad++ also says at the end of the file that I am at postion 127, so whats the Problem about the 118?
If you open the file in text mode (the default) on Windows, the CRT file functions will convert any \r\n to \n. The effect of this is every line you read will be 1 byte shorter than the original with \r\n.
To prevent such conversions, use "binary" mode, by adding a "b" mode modifier, e.g. "rb".
inputFile = fopen("example.txt", "rb")
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=vs-2019
In text mode, carriage return-linefeed combinations are translated into single linefeeds on input, and linefeed characters are translated to carriage return-linefeed combinations on output.
while ((tmpChar = fgetc(inputFile)) != EOF)
{
strPtr[counter] = tmpChar;
counter++;
if (counter == fileSize)
printf("OK!");
}
Additionally, this loop, assuming the file does not contain any NULL values will not null terminated your string. If you later use strPtr in such a way that one is expected (e.g. printf, strcmp, etc.) it will read past the valid range.
If you do want a null terminator, you need to add one after. To do this you also need to be sure you allocated an extra byte.
realloc(strPtr, (fileSize + 1) * sizeof(char));
while (...
strPtr[counter] = '\0'; // Add null terminator at end.
To handle files/strings that might contain nulls you can't use null terminated strings at all (e.g. use memcmp with size instead of strcmp).

How is my (custom) program leaking memory? I am preparing myself for pset5

I'm trying to understand how memory allocation and pointers work, since i find a problem set of CS50 (pset5) too overwhelming.
I made a simple program that reads characters from an array, and let them be written into both a new text file, and into the terminal.
The program works, but it is leaking memory.
Specifically for each \n encountered in the string, valgrind states that it loses memory in 1 more block. And for each character in the string (of char *c), it states that 1 more byte is leaked.
What am i doing wrong?
image link of the terminal: https://i.stack.imgur.com/ANtAs.png
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main (void)
{
FILE *fp;
char *c = "One\nTwo\n";
// Open file for writing (reading and writing works too, we can use 'w+' for that).
fp = fopen("file.txt", "w");
// Write data to the file.
fwrite(c, strlen(c), 1, fp);
// Seek to the beginning of the file
fseek(fp, 0, SEEK_SET);
// close file of the file pointer (the text file).
fclose(fp);
// initialize a counter for the amount of characters in the current word that is being read out of the file.
int char_count = 0;
// initialize an address for the first character in a string.
char *buffer_temp_word = NULL;
// Read and display data, using iterations over each character.
// Open the file in read mode.
fp = fopen("file.txt", "r");
// initiate a for loop.
// condition 1: getting a character from the fp stream does not equal reaching the end of the file
// condition 2: the amount of iterations is not above 60 (failsafe against endless loops).
for (int i = 0; fgetc(fp) != EOF && i <= 60 ; i++)
{
//add a counter to the amount of characters currently read.
char_count++;
// seek the pointer 1 place back (the 'IF' function moves the pointer forward 1 place forward for each character).
fseek(fp , -1L, SEEK_CUR);
// get the character value of the current spot that the pointer of the read file points to.
char x = fgetc(fp);
buffer_temp_word = realloc(buffer_temp_word, (sizeof(char)) * char_count);
//the string stores the character on the correct place
//(the first character starts at memory location 0, hence the amount of characters -1)
buffer_temp_word[char_count - 1] = x;
// check for the end of the line (which is the end of the word).
if(x == '\n')
{
//printf("(end of line reached)");
printf("\nusing memory:");
// iterate trough characters in the memory using the pointer + while loop, option 2.
while(*buffer_temp_word != '\n')
{
printf("%c", *buffer_temp_word);
buffer_temp_word++;
}
printf("\nword printed succesfully");
// reset the pointer to the beginning of the buffer_temp_word string (which is an array actually).
buffer_temp_word = NULL;
free(buffer_temp_word);
// reset the amount of characters (for the next word that will be read).
char_count = 0;
}
printf("%c", x);
}
fclose(fp);
free(buffer_temp_word);
return(0);
}
You set buffer_temp_word to NULL before freeing it:
// reset the pointer to the beginning of the buffer_temp_word string (which is an array actually).
buffer_temp_word = NULL;
free(buffer_temp_word);
If you use clang's static analyzer, it can walk you through a path in your code to show your memory leak.
Also, setting a pointer to NULL does not reset it to the starting position of the array it points to, it sets it to, well, NULL. Consider using a for-loop instead of your while loop and use the counter to index your array:
for(int j = 0; buffer_temp_word[j] != '\n'; ++j)
{
printf("%c", buffer_temp_word[j]);
}
And then don't set buffer_temp_word to NULL and don't free it immediately after this loop. The program is already set to realloc it or free it later.

How to reverse text in a file in C?

I'm try to get my text to be read back to front and to be printed in the reverse order in that file, but my for loop doesn't seem to working. Also my while loop is counting 999 characters even though it should be 800 and something (can't remember exactly), I think it might be because there is an empty line between the two paragraphs but then again there are no characters there.
Here is my code for the two loops -:
/*Reversing the file*/
char please;
char work[800];
int r, count, characters3;
characters3 = 0;
count = 0;
r = 0;
fgets(work, 800, outputfile);
while (work[count] != NULL)
{
characters3++;
count++;
}
printf("The number of characters to be copied is-: %d", characters3);
for (characters3; characters3 >= 0; characters3--)
{
please = work[characters3];
work[r] = please;
r++;
}
fprintf(outputfile, "%s", work);
/*Closing all the file streams*/
fclose(firstfile);
fclose(secondfile);
fclose(outputfile);
/*Message to direct the user to where the files are*/
printf("\n Merged the first and second files into the output file
and reversed it! \n Check the outputfile text inside the Debug folder!");
There are a couple of huge conceptual flaws in your code.
The very first one is that you state that it "doesn't seem to [be] working" without saying why you think so. Just running your code reveals what the problem is: you do not get any output at all.
Here is why. You reverse your string, and so the terminating zero comes at the start of the new string. You then print that string – and it ends immediately at the first character.
Fix this by decreasing the start of the loop in characters3.
Next, why not print a few intermediate results? That way you can see what's happening.
string: [This is a test.
]
The number of characters to be copied is-: 15
result: [
.tset aa test.
]
Hey look, there seems to be a problem with the carriage return (it ends up at the start of the line), which is exactly what should happen – after all, it is part of the string – but more likely not what you intend to do.
Apart from that, you can clearly see that the reversing itself is not correct!
The problem now is that you are reading and writing from the same string:
please = work[characters3];
work[r] = please;
You write the character at the end into position #0, decrease the end and increase the start, and repeat until done. So, the second half of reading/writing starts copying the end characters back from the start into the end half again!
Two possible fixes: 1. read from one string and write to a new one, or 2. adjust the loop so it stops copying after 'half' is done (since you are doing two swaps per iteration, you only need to loop half the number of characters).
You also need to think more about what swapping means. As it is, your code overwrites a character in the string. To correctly swap two characters, you need to save one first in a temporary variable.
void reverse (FILE *f)
{
char please, why;
char work[800];
int r, count, characters3;
characters3 = 0;
count = 0;
r = 0;
fgets(work, 800, f);
printf ("string: [%s]\n", work);
while (work[count] != 0)
{
characters3++;
count++;
}
characters3--; /* do not count last zero */
characters3--; /* do not count the return */
printf("The number of characters to be copied is-: %d\n", characters3);
for (characters3; characters3 >= (count>>1); characters3--)
{
please = work[characters3];
why = work[r];
work[r] = please;
work[characters3] = why;
r++;
}
printf ("result: [%s]\n", work);
}
As a final note: you do not need to 'manually' count the number of characters, there is a function for that. All that's needed instead of the count loop is this;
characters3 = strlen(work);
Here's a complete and heavily commented function that will take in a filename to an existing file, open it, then reverse the file character-by-character. Several improvements/extensions could include:
Add an argument to adjust the maximum buffer size allowed.
Dynamically increase the buffer size as the input file exceeds the original memory.
Add a strategy for recovering the original contents if something goes wrong when writing the reversed characters back to the file.
// naming convention of l_ for local variable and p_ for pointers
// Returns 1 on success and 0 on failure
int reverse_file(char *filename) {
FILE *p_file = NULL;
// r+ enables read & write, preserves contents, starts pointer p_file at beginning of file, and will not create a
// new file if one doesn't exist. Consider a nested fopen(filename, "w+") if creation of a new file is desired.
p_file = fopen(filename, "r+");
// Exit with failure value if file was not opened successfully
if(p_file == NULL) {
perror("reverse_file() failed to open file.");
fclose(p_file);
return 0;
}
// Assumes entire file contents can be held in volatile memory using a buffer of size l_buffer_size * sizeof(char)
uint32_t l_buffer_size = 1024;
char l_buffer[l_buffer_size]; // buffer type is char to match fgetc() return type of int
// Cursor for moving within the l_buffer
int64_t l_buffer_cursor = 0;
// Temporary storage for current char from file
// fgetc() returns the character read as an unsigned char cast to an int or EOF on end of file or error.
int l_temp;
for (l_buffer_cursor = 0; (l_temp = fgetc(p_file)) != EOF; ++l_buffer_cursor) {
// Store the current char into our buffer in the original order from the file
l_buffer[l_buffer_cursor] = (char)l_temp; // explicitly typecast l_temp back down to signed char
// Verify our assumption that the file can completely fit in volatile memory <= l_buffer_size * sizeof(char)
// is still valid. Return an error otherwise.
if (l_buffer_cursor >= l_buffer_size) {
fprintf(stderr, "reverse_file() in memory buffer size of %u char exceeded. %s is too large.\n",
l_buffer_size, filename);
fclose(p_file);
return 0;
}
}
// At the conclusion of the for loop, l_buffer contains a copy of the file in memory and l_buffer_cursor points
// to the index 1 past the final char read in from the file. Thus, ensure the final char in the file is a
// terminating symbol and decrement l_buffer_cursor by 1 before proceeding.
fputc('\0', p_file);
--l_buffer_cursor;
// To reverse the file contents, reset the p_file cursor to the beginning of the file then write data to the file by
// reading from l_buffer in reverse order by decrementing l_buffer_cursor.
// NOTE: A less verbose/safe alternative to fseek is: rewind(p_file);
if ( fseek(p_file, 0, SEEK_SET) != 0 ) {
return 0;
}
for (l_temp = 0; l_buffer_cursor >= 0; --l_buffer_cursor) {
l_temp = fputc(l_buffer[l_buffer_cursor], p_file); // write buffered char to the file, advance f_open pointer
if (l_temp == EOF) {
fprintf(stderr, "reverse_file() failed to write %c at index %lu back to the file %s.\n",
l_buffer[l_buffer_cursor], l_buffer_cursor, filename);
}
}
fclose(p_file);
return 1;
}

Cannot get Call to Function Working

I found this piece of code at Reading a file character by character in C and it compiles and is what I wish to use. My problem that I cannot get the call to it working properly. The code is as follows:
char *readFile(char *fileName)
{
FILE *file = fopen(fileName, "r");
char *code;
size_t n = 0;
int c;
if (file == NULL)
return NULL; //could not open file
code = malloc(1500);
while ((c = fgetc(file)) != EOF)
{
code[n++] = (char) c;
}
code[n] = '\0';
return code;
}
I am not sure of how to call it. Currently I am using the following code to call it:
.....
char * rly1f[1500];
char * RLY1F; // This is the Input File Name
rly1f[0] = readFile(RLY1F);
if (rly1f[0] == NULL) {
printf ("NULL array); exit;
}
int n = 0;
while (n++ < 1000) {
printf ("%c", rly1f[n]);
}
.....
How do I call the readFile function such that I have an array (rly1f) which is not NULL? The file RLY1F exists and has data in it. I have successfully opened it previously using 'in line code' not a function.
Thanks
The error you're experiencing is that you forgot to pass a valid filename. So either the program crashes, or fopen tries to open a trashed name and returns NULL
char * RLY1F; // This is not initialized!
RLY1F = "my_file.txt"; // initialize it!
The next problem you'll have will be in your loop to print the characters.
You have defined an array of pointers char * rly1f[1500];
You read 1 file and store it in the first pointer of the array rly1f[0]
But when you display it you display the pointer values as characters which is not what you want. You should just do:
while (n < 1000) {
printf ("%c", rly1f[0][n]);
n++;
}
note: that would not crash but would print trash if the file read is shorter than 1000.
(BLUEPIXY suggested the post-incrementation fix for n BTW or first character is skipped)
So do it more simply since your string is nul-terminated, pass the array to puts:
puts(rly1f[0]);
EDIT: you have a problem when reading your file too. You malloc 1500 bytes, but you read the file fully. If the file is bigger than 1500 bytes, you get buffer overflow.
You have to compute the length of the file before allocating the memory. For instance like this (using stat would be a better alternative maybe):
char *readFile(char *fileName, unsigned int *size) {
...
fseek(file,0,SEEK_END); // set pos to end of file
*size = ftell(file); // get pos, i.e. size
rewind(file); // set pos to 0
code = malloc(*size+1); // allocate the proper size plus one
notice the extra parameter which allows you to return the size as well as the file data.
Note: on windows systems, text files use \r\n (CRLF) to delimit lines, so the allocated size will be higher than the number of characters read if you use text mode (\r\n are converted to \n so there are less chars in your buffer: you could consider a realloc once you know the exact size to shave off the unused allocated space).

Resources