File parsing in c, extract specific line if it exists - c

I have a dynamically updated text file with names of people, I want to parse the file to extract "Caleb" and the string that follows his name. However, his name may not always be in the list and I want to account for that.
I could do it in Java, but not even sure what to do in C. I could start by reading in the text file line by line, but then how would I check if "Caleb" is a substring of the string I just read in and handle the case when he isn't? I want to do this without using external libraries - what would be the best method?
Barnabas: Followed by a string
Bart: Followed by a string
Becky: Followed by a string
Bellatrix: Followed by a string
Belle: Followed by a string
Caleb: I want this string
Benjamin: Followed by a string
Beowul: Followed by a string
Brady: Followed by a string
Brick: Followed by a string
returns: "Caleb: I want this string" or "Name not found"

but then how would I check if "Caleb" is a substring of the string
The heart of the question as I read it. strstr does the job.
char *matchloc;
if ((matchloc = strstr(line, "Caleb:")) {
// You have a match. Code here.
}
However in this particular case you really want starts with Caleb, so we do better with strncmp:
if (!strncmp(line, "Caleb:", 6)) {
// You have a match. Code here.
}

So if you want to check if the user caleb exists, you can simple made a strstr, with your array of strings, and if exists you can make a strtok, to get only the string!
I dont know how you are opening the file, but you can use getline to get line by line!
You can do something like this:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
FILE *file;
char *fich="FILE.TXT";
char *line = NULL;
char *StringFile[100];
size_t len = 0;
ssize_t stringLength;
const char s[2] = ":"; //Divide string for this
char *token;
int check =0;
char *matchloc;
file=fopen(fich, "r");
if(file==NULL){
fprintf(stderr, "[ERROR]: cannot open file <%s> ", fich);
perror("");
exit(1);
}
while((stringLength = getline(&line, &len, file)) != -1){
if(line[strlen(line)-1] == '\n'){
line[strlen(line)-1] = '\0'; //Removing \n if exists
}
if((matchloc = strstr(line, "Caleb:"))){
check = 1;
strcpy(*StringFile, line);
token = strtok(*StringFile, s);
while( token != NULL ) {
token = strtok(NULL, s);
printf("%s\n", token);
break;
}
break;
}
}
if(check==0){
printf("Name not found\n");
}
return 0;
}
The code, can have some errors, but the idead is that! when founds the name, copy the line to array and the splits it.

Related

How to print a string with its \n characters included?

Let's say we have char* str = "Hello world!\n". Obviously when you print this you will see Hello world!, but I want to make it so it will print Hello world!\n. Is there any way to print a string with its line break characters included?
Edit: I want to print Hello world!\n without changing the string itself. Obviously I could just do char* str = "Hello world \\n".
Also, the reason I'm asking this question is because I'm using fopen to open a txt file with a ton of line breaks. After making the file into a string, I want to split the string by each of its line breaks so I can modify each line individually.
I think it's a typical case of an XY Problem: you ask about a particular solution without really focusing on the original problem first.
After making the file into a string
Why do you think you need to read the entire file in at once? That's not normally necessary.
I want to split the string by each of its line breaks so I can modify each line individually.
You don't need to print the string to do that (you wanted "to make it so it will print Hello World!\n). You don't need to modify the string. You just need to read it in line by line! That's what fgets is for:
void printFile(void)
{
FILE *file = fopen("myfile.txt", "r");
if (file) {
char linebuf[1024];
int lineno = 1;
while (fgets(linebuf, sizeof(linebuf), file)) {
// here, linebuf contains each line
char *end = linebuf + strlen(linebuf) - 1;
if (*end == '\n')
*end = '\0'; // remove the '\n'
printf("%5d:%s\\n\n", lineno ++, linebuf);
}
fclose(file);
}
}
I want to make it so it will print Hello world!\n
If you really wanted to do it, you'd have to translate the ASCII LF (that's what \n represents) to \n on output, for example like this:
#include <stdio.h>
#include <string.h>
void fprintWithEscapes(FILE *file, const char *str)
{
const char *cr;
while ((cr = strchr(str, '\n'))) {
fprintf(file, "%.*s\\n", (int)(cr - str), str);
str = cr + 1;
}
if (*str) fprintf(file, "%s", str);
}
int main() {
fprintWithEscapes(stdout, "Hello, world!\nA lot is going on.\n");
fprintWithEscapes(stdout, "\nAnd a bit more...");
fprintf(stdout, "\n");
}
Output:
Hello, world!\nA lot is going on.\n\nAnd a bit more...

Splitting Strings from file and putting them into array causes program crash

I am trying to read a file line by line and split it into words. Those words should be saved into an array. However, the program only gets the first line of the text file and when it tries to read the new line, the program crashes.
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {
token = strtok(NULL, " ");
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
After good answer from xing I decided to write my FULL simple program realizing your task and tell something about my solution. My program reads line-by-line a file, given as input argument and saves next lines into a buffer.
Code:
#include <assert.h>
#include <errno.h>
#define _WITH_GETLINE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define assert_msg(x) for ( ; !(x) ; assert(x) )
int
main(int argc, char **argv)
{
FILE *file;
char *buf, *token;
size_t length, read, size;
assert(argc == 2);
file = fopen(argv[1], "r");
assert_msg(file != NULL) {
fprintf(stderr, "Error ocurred: %s\n", strerror(errno));
}
token = NULL;
length = read = size = 0;
while ((read = getline(&token, &length, file)) != -1) {
token[read - 1] = ' ';
size += read;
buf = realloc(buf, size);
assert(buf != NULL);
(void)strncat(buf, token, read);
}
printf("%s\n", buf);
fclose(file);
free(buf);
free(token);
return (EXIT_SUCCESS);
}
For file file.txt:
that is a
text
which I
would like to
read
from file.
I got a result:
$ ./program file.txt
that is a text which I would like to read from file.
Few things which is worth to say about that solution:
Instead of fgets(3) I used getline(3) function because of easy way to knowledge about string length in line (read variable) and auto memory allocation for got string (token). It is important to remember to free(3) it. For Unix-like systems getline(3) is not provided by default in order to avoid compatibility problems. Therefore, #define _WITH_GETLINE macro is used before <stdio.h> header to make that function available.
buf contains only mandatory amount of space needed to save string. After reading one line from file buf is extended by the required amount of space by realloc(3). Is it a bit more "universal" solution. It is important to remember about freeing objects allocated on heap.
I also used strncat(3) which ensures that no more than read characters (length of token) would be save into buf. It is also not the best way of using strncat(3) because we also should testing a string truncation. But in general it is better than simple using of strcat(3) which is not recommended to use because enables malicious users to arbitrarily change a running program's functionality through a buffer overflow attack. strcat(3) and strncat(3) also adds terminating \0.
A getline(3) returns token with a new line character so I decided to replace it from new line to space (in context of creating sentences from words given in file). I also should eliminate last space but I do not wanted to complicate a source code.
From not mandatory things I also defined my own macro assert_msg(x) which is able to run assert(3) function and shows a text message with error. But it is only a feature but thanks to that we are able to see error message got during wrong attempts open a file.
The problem is getting the next token in the inner while loop and passing the result to strcpy without any check for a NULL result.
while(fgets(buf,sizeof(buf),inputfile)!=NULL){
token = strtok(buf, " ");
strcpy(words[0], token);
printf("%s\n", words[0]);
while (token != NULL) {//not at the end of the line. yet!
token = strtok(NULL, " ");//get next token. but token == NULL at end of line
//passing NULL to strcpy is a problem
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
}
By incorporating the check into the while condition, passing NULL as the second argument to strcpy is avoided.
while ( ( token = strtok ( NULL, " ")) != NULL) {//get next token != NULL
//if token == NULL the while block is not executed
strcpy(words[i],token);
printf("%s\n",words[i]);
i++;
}
Sanitize your loops, and don't repeat yourself:
#include <stdio.h>
#include <string.h>
int main(void)
{
FILE *inputfile = fopen("file.txt", "r");
char buf [1024];
int i=0;
char fileName [25];
char words [100][100];
char *token;
for(i=0; fgets(buf,sizeof(buf),inputfile); ) {
for(token = strtok(buf, " "); token != NULL; token = strtok(NULL, " ")){
strcpy(words[i++], token);
}
}
return 0;
}

Read in individual words from text file and translate - C

I am writing a program (for a class assignment) to translate normal words into their pirate equivalents (hi = ahoy).
I have created the dictionary using two arrays of strings and am now trying to translate an input.txt file and put it into an output.txt file. I am able to write to the output file, but it only writes the translated first word over and over on a new line.
I've done a lot of reading/scouring and from what I can tell, using fscanf() to read my input file isn't ideal, but I cannot figure out what would be a better function to use. I need to read the file word by word (separated by space) and also read in each punctuation mark.
Input File:
Hi, excuse me sir, can you help
me find the nearest hotel? I
would like to take a nap and
use the restroom. Then I need
to find a nearby bank and make
a withdrawal.
Miss, how far is it to a local
restaurant or pub?
Output: ahoy (46 times, each on a separate line)
Translate Function:
void Translate(char inputFile[], char outputFile[], char eng[][20], char pir[][20]){
char currentWord[40] = {[0 ... 39] = '\0'};
char word;
FILE *inFile;
FILE *outFile;
int i = 0;
bool match = false;
//open input file
inFile = fopen(inputFile, "r");
//open output file
outFile = fopen(outputFile, "w");
while(fscanf(inFile, "%s1023", currentWord) == 1){
if( ispunct(currentWord) == 0){
while( match != true){
if( strcasecmp(currentWord, eng[i]) == 0 || i<28){ //Finds word in English array
fprintf(outFile, pir[i]); //Puts pirate word corresponding to English word in output file
match = true;
}
else {i++;}
}
match = false;
i=0;
}
else{
fprintf(outFile, &word);//Attempt to handle punctuation which should carry over to output
}
}
}
As you start matching against different english words, i<28 is initially true. Hence the expression <anything> || i<28 is also immediately true and correspondingly the code will behave as though a match was found on the first word in your dictionary.
To avoid this you should handle the "found a match at index i" and the "no match found" condition separately. This can be achieved as follow:
if (i >= dictionary_size) {
// No pirate equivalent, print English word
fprintf(outFile, "%s", currentWord);
break; // stop matching
}
else if (strcasecmp(currentWord, eng[i]) == 0){
...
}
else {i++;}
where dictionary_size would be 28 in your case (based on your attempt at a stop condition with i<28).
Here's a code snippet that I use to parse things out. Here's what it does:
Given this input:
hi, excuse me sir, how are you.
It puts each word into an array of strings based on the DELIMS constant, and deletes any char in the DELIMS const. This will destroy your original input string though. I simply print out the array of strings:
[hi][excuse][me][sir][how][are][you][(null)]
Now this is taking input from stdin, but you can change it around to take it from a file stream. You also might want to consider input limits and such.
#include <stdio.h>
#include <string.h>
#define CHAR_LENGTH 100
const char *DELIMS = " ,.\n";
char *p;
int i;
int parse(char *inputLine, char *arguments[], const char *delimiters)
{
int count = 0;
for (p = strtok(inputLine, delimiters); p != NULL; p = strtok(NULL, delimiters))
{
arguments[count] = p;
count++;
}
return count;
}
int main()
{
char line[1024];
size_t bufferSize = 1024;
char *args[CHAR_LENGTH];
fgets(line, bufferSize, stdin);
int count = parse(line, args, DELIMS);
for (i = 0; i <= count; i++){
printf("[%s]", args[i]);
}
}

C - Read non-alphabetic chars as word boundary

I'm trying to parse in a text file, and add each distinct word into a hashtable, with the words as keys, and their frequencies as values. The problem is proving to be the reading part: the file is a very large file of "normal" text, in that it has punctuation and special characters. I want to treat all non-alphabetical chars read in as word-boundaries. I have something basic going with this:
char buffer[128];
while(fscanf(fp, "%127[A-Za-z]%*c", buffer) == 1) {
printf("%s\n", buffer);
memset(buffer, 0, 128);
}
However, that chokes whenever it actually hits a non-alphabetical char preceded by whitespace (e.g., "the,cat was (brown)" would be read in as "the cat was"). I know what the issue is with that code, but I'm not sure how to get around it. Would I be better off just reading in an entire line and doing the parsing manually? I'm trying scanf because I felt that this was a pretty good candidate for the mini-regex thing that you can do with the format string.
Suggest use of isalpha(), fgetc() and a simple state-machine.
#include <assert.h>
#include <ctype.h>
#include <stdio.h>
int AdamRead(FILE *inf, char *dest, size_t n) {
int ch;
do {
ch = fgetc(inf);
if (ch == EOF) return EOF;
} while (!isalpha(ch));
assert(n > 1);
n--; // save room for \0
while (n-- > 0) {
*dest++ = ch;
ch = fgetc(inf);
if (!isalpha(ch)) break;
}
ungetc(ch, inf); // Add this is something else may need to parse `inf`.
*dest = '\0';
return 1;
}
char buffer[128];
while(AdamRead(fp, buffer, sizeof buffer) == 1) {
printf("%s\n", buffer);
}
Note: If you want to go the "%127[A-Za-z]%*[^A-Za-z]" route, code may need to start with a one-time fscanf(fp, "*[^A-Za-z]"); to deal with leading non-letters.
There's another way apart from the one mentioned in the comment. I don't know if it's better though. You can read lines from the file using fgets and then tokenize the line using strtok_r POSIX function. Here, r means the function is reentrant which makes it thread-safe. However, you must know the maximum length a line can have in the file.
#include <stdio.h>
#include <string.h>
#define MAX_LEN 100
// in main
char line[MAX_LEN];
char *token;
const char *delim = "!##$%^&*"; // all special characters
char *saveptr; // for strtok_r
FILE *fp = fopen("myfile.txt", "r");
while(fgets(line, MAX_LEN, fp) != NULL) {
for(; ; line = NULL) {
token = strtok_r(line, delim, &saveptr);
if(token == NULL)
break;
else {
// token is a string.
// process it
}
}
}
fclose(fp);
strtok_r modifies its first argument line, so you should keep a copy of it if it needed for other purposes.

Search for a string in a text file and parse that line (Linux, C)

This is "how to parse a config file" question.
Basically i have a text file (/etc/myconfig) that has all kind of settings. I need to read that file and search for the string:
wants_return=yes
once I locate that string I need to parse it and return only whatever it is after the equal sign.
I've tried using a combinations of fgets and strtok but I'm getting confused here.
in any case anyone knows a function that can perform this?
Code is appreciated.
thanks
This works: (note: I'm unsure if fgets is supposed to include the newline character in the returned string; if it isn't, you can drop the check for it)
#include <stdio.h>
const unsigned MAXLINE=9999;
char const* FCFG="/etc/myconfig";
char const* findkey="wants_return=";
char * skip_ws(char *line)
{
return line+strspn(line," \t");
}
char * findval(char *line,char const* prefix,int prelen)
{
char *p;
p=skip_ws(line);
if (strncmp(p,prefix,prelen)==0)
return p+prelen;
else
return NULL;
}
char *findval_slow(char *line,char const* prefix)
{
return findval(line,prefix,strlen(prefix));
}
int main() {
FILE *fcfg;
char line[MAXLINE];
char *p,*pend;
int findlen;
findlen=strlen(findkey);
fcfg=fopen(FCFG,"r");
while (p=fgets(line,MAXLINE,fcfg)) {
printf("Looking at %s\n",p);
if (p=findval(line,findkey,findlen)) {
pend=p+strlen(p)-1; /* check last char for newline terminator */
if (*pend=='\n') *pend=0;
printf("Found %s\n",p); /* process/parse the value */
}
}
return 0;
}
Here's a quick example using strtok:
const int linelen = 256;
char line[linelen];
FILE* fp = fopen(argv[1], "r");
if (fp == NULL) {
perror("Error opening file");
} else {
while (! feof(fp)) {
if (fgets(line, linelen , fp)) {
const char* name = strtok(line, "= \r\n");
const char* value = strtok(NULL, "= \r\n");
printf("%s => %s\n", name, value);
}
}
fclose (fp);
}
Note, you'll need to put some additional error checking around it, but this works to parse the files I threw at it.
From your comment, it looks like you're already getting the appropriate line from the text file using fgets and loading it into a character buffer. You can use strtok to parse the tokens from the line.
If you run it with the string buffer as the first argument, it will return the first token from that string. If you run the same command with the first argument set to NULL it will return subsequent tokens from the same original string.
A quick example of how to retrieve multiple tokens:
#include <stdio.h>
#include <string.h>
int main() {
char buffer[17]="wants_return=yes";
char* tok;
tok = strtok(buffer, "=");
printf("%s\n", tok); /* tok points to "wants_return" */
tok = strtok(NULL, "=");
printf("%s\n", tok); /* tok points to "yes" */
return 0;
}
For the second strtok call, you can replace the "=" with "" to return everything to the end of the string, instead of breaking off at the next equal sign.
With a POSIX shell, I'd use something like:
answer=`egrep 'wants_config[ ]*=' /etc/myconfig | sed 's/^.*=[ ]*//'`
Of course, if you're looking for an answer that uses the C STDIO library, then you really need to review the STDIO documentation.

Resources