Copying a desired string from a text file in C - c

I have read all the text from a desired file and it is now stored in buff. I want to copy just the string content after identifier strings such as 'Title'.
Example file below:
"Title: I$_D$-V$_{DS}$ Characteristic Curves (Device 1)
MDate: 2016-03-01
XLabel: Drain voltage V$_{DS}$
YLabel: Drain current I$_D$
CLabel: V$_{GS}$
XUnit: V
... "
for(;;) {
size_t n = fread(buff, 1 , DATAHOLD, inFile);
subString = strstr( buff, "Title");
if( subString != NULL) {
strcpy(graph1.title , (subString + 7));
subString = NULL;
}
....more if statements....
if( n < DATAHOLD) {
break;
}
}
I understand that strstr() returns a pointer to location of the search string, I added 7 to get just the text that comes after the search string and this part works fine. The problem is strcpy() copies the rest of buff character array into graph1.title.
How to instruct strcpy() to only copy the text on the same line as the substring pointer? Using strtok() maybe?

I agree with ChuckCottrill, it would be better if you read and process one line at a time.
Also since the file you are dealing with is a text file, you could be opening it in text mode.
FILE *fin = fopen("filename", "r");
Read a line with fgets() into a string str. It should be noted that fgets() will take the trailing \n' to str.
fgets(str, sizeof(str), fin);
char *substring;
if( (substring = strstr(str, "Title: ")) != NULL )
{
strcpy(graph1.title, substring+strlen("Title: "));
}
At this point, graph1.title will have I$_D$-V$_{DS}$ Characteristic Curves (Device 1) in it.

Read and process a single line at a time.
for( ; fgets(line,...); ) {
do stuff on line
}

You could use another strstr to get the position of the end of the line, and then use strncpy which is like strcpy, but accepts a third argument, the number of chars to copy of the input.

Related

C strings string comparisons always result in false

I am trying to finding a string in a file. I wrote following by modifying code snippet present in man page of getline.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
FILE * fp;
char * line = NULL;
char *fixed_str = "testline4";
size_t len = 0;
ssize_t read;
fp = fopen("test.txt", "r");
if (fp == NULL)
exit(EXIT_FAILURE);
while ((read = getline(&line, &len, fp)) != -1) {
printf("Retrieved line of length %zu:\n", read);
printf("%s", line);
if (strcmp(fixed_str,line)==0)
printf("the match is found\n");
}
//printf("the len of string is %zu\n", strlen(fixed_str));
fclose(fp);
if (line)
free(line);
exit(EXIT_SUCCESS);
}
The problem is that result of strcmp is always false despite getline is successfully and correctly iterating over all lines in the file.
The length of fixed_str is 9 and that of equal string in file is 10 due to newline character (AM I RIGHT?). But comparing 9 chars with the help of strncmp still produces wrong result. I also ruled out the possibilities of caps and spaces so I think I am doing something very wrong
The test.txt is as below
test line1
test line2
test line3
testline4
string1
string2
string3
first name
I tried all entries but no success
NOTE: In my actual program I have to read fixed_str from another file
From the getline() man page (my emphasis):
getline() reads an entire line from stream, storing the address of
the buffer containing the text into *lineptr. The buffer is null-
terminated and includes the newline character, if one was found.
Your fixed_str has no newline.
Strip any newline character thus (for example):
char* nl = strrchr( line, '\n' ) ;
if(nl != NULL) *nl = `\0` ;
Or more efficiently since getline() returns the line length (in read in your case):
if(line[read - 1] == '\n' ) line[read - 1] = `\0` ;
Adding a '\n' to fixed_str may seem simpler, but is not a good idea because the last (or only) line in a file won't have one but may otherwise be a match.
Using strncmp() as described in your question should have worked, but without seeing the attempt it is hard to comment, but it is in any case a flawed solution since it would match all of the following for example:
testline4
testline4 and some more
testline4 12345.
Where fixed_str is taken from console or file input rather than a constant, the input method and data source may cause problems, as may the possibility of alternate line-end conventions. To make it more robust you might do:
// Strip any LF or CR+LF line end from fixed_str
char* line_end = strpbrk( fixed_str, "\r\n" ) ;
if( line_end != NULL ) *line_end = '\0' ;
// Strip any LF or CR+LF line end from line
line_end = strpbrk( line, "\r\n" ) ;
if( line_end != NULL ) *line_end = '\0' ;
Or the simpler (i.e. better) solution pointed out by #AndrewHenle:
// Strip any LF or CR+LF line end from fixed_str
fixed_str[strcspn(line, "\r\n")] = '\0';
// Strip any LF or CR+LF line end from line
line[strcspn(line, "\r\n")] = '\0';
That way either input can be compared regardless of lines ending in nothing, CR or CR+LF and the line end may even differ between the two inputs.

Allocate memory based on filesize has not the correct number?

I want to store the content of my file in a dynamic string pointer value.
Here is my Code:
char *strPtr = NULL;
char tmpChar = "";
inputFile = fopen(input_file, "r");
fseek(inputFile, 0, SEEK_END); // seek to end of file
fileSize = ftell(inputFile); // get current file pointer
rewind(inputFile);
strPtr = (char*) realloc(strPtr, fileSize * sizeof(char));
int counter = 0;
while ((tmpChar = fgetc(inputFile)) != EOF)
{
strPtr[counter] = tmpChar;
counter++;
if (counter == fileSize)
printf("OK!");
}
printf("Filesize: %d, Counter: %d", fileSize,counter);
Now to my Problem ... With the last printf I get 2 different values for example: Filesize 127 & Counter 118.
Addtionally at the END of my strPtr-Variable there is a wrong input like "ÍÍÍÍÍÍÍÍÍýýýýüe".
Notepad++ also says at the end of the file that I am at postion 127, so whats the Problem about the 118?
If you open the file in text mode (the default) on Windows, the CRT file functions will convert any \r\n to \n. The effect of this is every line you read will be 1 byte shorter than the original with \r\n.
To prevent such conversions, use "binary" mode, by adding a "b" mode modifier, e.g. "rb".
inputFile = fopen("example.txt", "rb")
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=vs-2019
In text mode, carriage return-linefeed combinations are translated into single linefeeds on input, and linefeed characters are translated to carriage return-linefeed combinations on output.
while ((tmpChar = fgetc(inputFile)) != EOF)
{
strPtr[counter] = tmpChar;
counter++;
if (counter == fileSize)
printf("OK!");
}
Additionally, this loop, assuming the file does not contain any NULL values will not null terminated your string. If you later use strPtr in such a way that one is expected (e.g. printf, strcmp, etc.) it will read past the valid range.
If you do want a null terminator, you need to add one after. To do this you also need to be sure you allocated an extra byte.
realloc(strPtr, (fileSize + 1) * sizeof(char));
while (...
strPtr[counter] = '\0'; // Add null terminator at end.
To handle files/strings that might contain nulls you can't use null terminated strings at all (e.g. use memcmp with size instead of strcmp).

C how to search string in a file?

I have a problem with my code, I'm trying to search a string in a file and I can read it but, when I compare two strings it takes only the last one of the file as equal to the the first string entered with the scanf().
So imagine I wrote in my file three words and each one is returning to the line.
test
test12
test123
If in my scanf() I write test12 for example or test when it's going to read it will return false to the compare so (!== 0). But if I write test123 it will works because it's the last word of the file but I don't know why?
char word[26];
char singleLine[26];
FILE *file = fopen("bin/Release/myWords.txt", "a+");
scanf("%26s", word);
if (file != NULL) {
while (!feof(file)) {
fgets(singleLine, 26, file);
compare = strcmp(singleLine, word);
if (compare == 0) {
printf("\n%s\n",word);
}
}
fclose(file);
}
Your program only works in very special cases and has several problems:
scanf("%26s", word); may affect up to 27 bytes in the destination array, which is defined with a length of only 26.
furthermore, you should check the return value to avoid undefined behavior on invalid input.
fopen("bin/Release/myWords.txt", "a+"); opens the file in append mode: is this necessary?
while (!feof(file)) is always wrong, you should instead check the return value of fgets() that returns NULL at end of file.
compare = strcmp(singleLine, word); only compares for an exact math of the full line, which can only happen if the word has 25 characters, otherwise the trailing newline in singleLine will cause the comparison to fail. Furthermore, broken lines may cause unexpected results, as well as if the file does not end with a newline.
the reason it matches the last word in the file is you forget to write a trailing newline after the last word in the file, so the last fgets() fills the buffer with the exact word and no trailing newline.
if you search for matches inside the line, you should use a larger buffer and search for a match with strstr.
if you search for a exact match, you should strip the trailing newline before the comparison.
Here is a modified version:
#include <stdio.h>
#include <string.h>
int main() {
char word[27];
char singleLine[256];
FILE *file = fopen("bin/Release/myWords.txt", "r");
if (scanf("%26s", word) != 1)
return 1;
if (file != NULL) {
while (fgets(singleLine, sizeof singleLine, file)) {
singleLine[strcspn(singleLine, "\n")] = '\0'; // strip the newline if any
compare = strcmp(singleLine, word);
if (compare == 0) {
printf("\n%s\n", word);
}
}
fclose(file);
}
return 0;
}

How to read a text file into matrix form in C

Given the following text file with the following content in it
SpotA B C
SpotB pass D
Spotc A E F
How to do I break up the words into tokens and store them in a 10 x 10 matrix.
Note that if the content in the file is a matrix size with smaller than 10 x 10, I want to add the character ~ to those positions.
So far this is my code:
char *matrix[10][10];
int loadFileToMatrix(char *filename){
FILE *fp;
int row = 0;
int col= 0;
char *tokens;
char buffer[1000];
fp = fopen(filename,"r");
if(fp == NULL){
perror(filename);
return(1);
}
while((fgets(buffer, sizeof(buffer), fp))!= NULL) {
tokens = strtok(buffer," ");
map[row++][col++] = tokens;
}
return(0);
}
If some can help me figure out how to achieve my goal that would be nice. Currently, I am really confused on how to proceed.
Just use fscanf to read tokens from file to buffer, then copy tokens into your the matrix map. You can use fgetc to detect if it reaches the end of line and the end of file.
char ch;
while (1) {
fscanf(fp, "%s", buffer);
matrix[row][col] = (char *)malloc(sizeof(char) * (strlen(buffer) + 1));
strcpy(matrix[row][col], buffer);
ch = fgetc(fp);
if (ch == ' ') {
col += 1;
}
else if (ch == '\n') {
row += 1;
col = 0;
}
else if (ch == EOF) {
break; // end of file.
}
}
strtok() is a weird function.
The key part of the man page is this:
"On the first call to strtok() the string to be parsed should be specified in str. In each subsequent call that should parse the same string, str should be NULL."
The reason for this is that strtok() alters the string you pass it. It searches through a string until it finds the next character that matches one of the delimiters, and then replaces that delimiter with a null terminator. If the delimiter is found at position n, internally, strtok() saves the position n+1 as the start of the rest of the string.
By calling strtok a second time with a non-null value, you are telling the function to start all over again at the start of that string, and try again to find a delimiter -- which it can never do, because it already found the first one. Instead, your second call to strtok() should pass NULL as the first argument, so each pass can bring out the next token.
If for some reason you need to call strtok() on multiple strings simultaneously, you will overwrite the internally-saved address; only the most recent call is saved properly. The reentrant function strtok_r() is useful in that situation.
If you're ever not sure how to use a function, the man pages are the best resource. You can type man strtok at the command line, or even just google it.
It looks like, in this case, you're using strtok() only once. This will just return the address of the first piece of the buffer, delimited by your delimiters. You need to call strtok() in a loop to get each piece in turn.

After using fopen to open a text file in C, it has additional characters

I need to read in table of data in a format x*[tab]*y*[tab]*z*[tab]\n* so I am using fopen and fgetc to stream characters. Loop is ending when c==EOF. (c is character.)
But I had difficulties with that as it overflows my array. After doing some debugging I realised that the opened file after the last line contains:
Northampton Oxford 68
ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ[...]ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍýýýý««««««««îþîþ
What is that? And why does that not appear in my plain text file? And how do I overcome this problem?
destination = fopen("ukcities.txt", "rt"); // r = read, t=text
if (destination != NULL) {
do {
c = fgetc (destination);
if (c == ' ') {
temp_input[i][n] = '\0';
i++;
n=0;
} else if (c == '\n') {
temp_input[i][n] = '\0';
printf("%s %s %s \n", temp_input[0], temp_input[1], temp_input[2]);
i = 0;
n=0;
} else {
temp_input[i][n] = c;
n++;
}
} while (c != -1);
return 1;
} else {
return 0;
}
Looking into my crystal ball, I see that fread or whatever you're using (apparently that's fgetc which makes it even more true) doesn't null-terminate the data it reads and you're trying to print it as a C-string. Terminate the data with a NUL character (a 0) and then it will print correctly.
That string looks unterminated. In C, strings that don't end with a '\0' character (a.k.a. null character) lead to constant trouble because a lot of the standard library and system libraries expect strings to be null-terminated.
Make sure that when you have finished reading in all the data, that the string is terminated; in some cases it must be done manually. There are a few ways to do this (the below makes all characters of the string null, so as long as you don't overwrite the very last one, the string will always be null terminated):
// (1) declare an array of char, set all characters to null character
char buffer[1000] = {0};
Alternatively, if you are keeping track of where you are in the buffer, you can also do this:
// (2) after reading in all data, add the null character yourself:
int n; // number of bytes read
char buf[1000];
// read data into buf, updating n
buf[n] = '\0'; // (tip: may need to use buf[n+1])
In either case, it is important that you don't overstep the end of the buffer. If you've only allocated 1000 bytes, then use only 999 bytes and save 1 byte for the null character.

Resources