strtok string from file to array but missing first line - c

I'm trying to read a .txt file and save all sentences end with .!? into array. I use getline and strtok to do this. When I save the sentences, it seems work. But when I try to retrieve data later through index, the first line is missing.
The input is in a file input.txt with content below
The wandering earth! In 2058, the aging Sun? is about to turn into a red .giant and threatens to engulf the Earth's orbit!
Below is my code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main() {
FILE *fp = fopen("input.txt", "r+");
char *line = NULL;
size_t len = 0;
char *sentences[100];
if (fp == NULL) {
perror("Cannot open file!");
exit(1);
}
char delimit[] = ".!?";
int i = 0;
while (getline(&line, &len, fp) != -1) {
char *p = strtok(line, delimit);
while (p != NULL) {
sentences[i] = p;
printf("sentences [%d]=%s\n", i, sentences[i]);
i++;
p = strtok(NULL, delimit);
}
}
for (int k = 0; k < i; k++) {
printf("sentence is ----%s\n", sentences[k]);
}
return 0;
}
output is
sentences [0]=The wandering earth
sentences [1]= In 2058, the aging Sun
sentences [2]= is about to turn into a red
sentences [3]=giant and threatens to engulf the Earth's orbit
sentence is ----
sentence is ---- In 2058, the aging Sun
sentence is ---- is about to turn into a red
sentence is ----giant and threatens to engulf the Earth's orbit
I use strtok to split string directly. It worked fine.

Change mode from "r+" to "r".
Changed the list of delimiters from a variable to a constant DELIMITERS and added '\n'. You may or may not what that '\n' in there but I would need to see the expected output now that you supplied input. vim, at least, ends the last line with a '\n' which would generate at least one '\n' token at the end. The other option is to remove leading and trailing white space, and if you end up with an empty string then don't add it as a sentence.
Introduced a constant for number of sentences, and ignore additional sentences beyond what we have space for.
Combined the two strtok() calls (DRY).
Eliminated the two memory leaks.
If your input contains multiple lines the contents of line will be overwritten. This means the pointers in in sentences no longer make sense. The easiest fix is strdup() each string. Another approach would be to retain an array of line pointers (for subsequent free()) and have getline() allocate new a new line each time by resetting line = 0 and line = NULL.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define DELIMITERS ".!?\n"
#define SENTENCES_LEN 100
int main() {
FILE *fp = fopen("input.txt", "r");
if (!fp) {
perror("Cannot open file!");
return 1;
}
char *line = NULL;
size_t len = 0;
char *sentences[SENTENCES_LEN];
int i = 0;
while (getline(&line, &len, fp) != -1) {
char *s = line;
for(; i < SENTENCES_LEN; i++) {
char *sentence = strtok(s, DELIMITERS);
if(!sentence)
break;
sentences[i] = strdup(sentence);
printf("sentences [%d]=%s\n", i, sentences[i]);
s = NULL;
}
}
for (int k = 0; k < i; k++) {
printf("sentence is ----%s\n", sentences[k]);
free(sentences[k]);
}
free(line);
fclose(fp);
}
Using the supplied input file the matching out is:
sentences [0]=The wandering earth
sentences [1]= In 2058, the aging Sun
sentences [2]= is about to turn into a red
sentences [3]=giant and threatens to engulf the Earth's orbit
sentence is ----The wandering earth
sentence is ---- In 2058, the aging Sun
sentence is ---- is about to turn into a red
sentence is ----giant and threatens to engulf the Earth's orbit

Related

How to read specific words from a file?

I have a file that contains words and their synonyms each on a separate line.
I am writing this code that should read the file line by line then display it starting from the second word which is the synonym.
I used the variable count in the first loop in order to be able to count the number of synonyms of each word because the number of synonyms differs from one to another. Moreover I used the condition synonyms[i]==',' because each synonym is separate by a comma.
The purpose of me writing such code is to put them in a binary search tree in order to have a full dictionary.
The code doesn't contain any error yet it is not working.
I have tried to each the loop but that didn't work too.
Sample input from the file:
abruptly - dead, short, suddenly
acquittance - release
adder - common, vipera
Sample expected output:
dead short suddenly
acquittance realse
common vipera
Here is the code:
void LoadFile(FILE *fp){
int count;
int i;
char synonyms[50];
char word[50];
while(fgets(synonyms,50,fp)!=NULL){
for (i=0;i<strlen(synonyms);i++)
if (synonyms[i]==',' || synonyms[i]=='\n')
count++;
}
while(fscanf(fp,"%s",word)==1){
for(i=1;i<strlen(synonyms);i++){
( fscanf(fp,"%s",synonyms)==1);
printf("%s",synonyms);
}
}
}
int main(){
char fn[]="C:/Users/CLICK ONCE/Desktop/Semester 4/i2206/Project/Synonyms.txt";
FILE *fp;
fp=fopen(fn,"rt");
if (fp==NULL){
printf("Cannot open this file");
}
else{
LoadFile(fp);
}
return 0;
}
Here is my solution. I have split the work into functions for readability. The actual parsing is done in parsefunction. That function thakes into account hyphenated compound words such as seventy-two. The word and his synonyms must be separated by an hyphen preceded by at least one space.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
// Trim leading and trailing space characters.
// Warning: string is modified
char* trim(char* s) {
char* p = s;
int l = strlen(p);
while (isspace(p[l - 1])) p[--l] = 0;
while (*p && isspace(*p)) ++p, --l;
memmove(s, p, l + 1);
return s;
}
// Warning: string is modified
int parse(char* line)
{
char* token;
char* p;
char* word;
if (line == NULL) {
printf("Missing input line\n");
return 0;
}
// first find the word delimiter: an hyphen preceded by a space
p = line;
while (1) {
p = strchr(p, '-');
if (p == NULL) {
printf("Missing hypen\n");
return 0;
}
if ((p > line) && (p[-1] == ' ')) {
// We found an hyphen preceded by a space
*p = 0; // Replace by nul character (end of string)
break;
}
p++; // Skip hyphen inside hypheneted word
}
word = trim(line);
printf("%s ", word);
// Next find synonyms delimited by a coma
char delim[] = ", ";
token = strtok(p + 1, delim);
while (token != NULL) {
printf("%s ", token);
token = strtok(NULL, delim);
}
printf("\n");
return 1;
}
int LoadFile(FILE* fp)
{
if (fp == NULL) {
printf("File not open\n");
return 0;
}
int ret = 1;
char str[1024]; // Longest allowed line
while (fgets(str, sizeof(str), fp) != NULL) {
str[strcspn(str, "\r\n")] = 0; // Remove ending \n
ret &= parse(str);
}
return ret;
}
int main(int argc, char *argv[])
{
FILE* fp;
char* fn = "Synonyms.txt";
fp = fopen(fn, "rt");
if (fp == NULL) {
perror(fn);
return 1;
}
int ret = LoadFile(fp);
fclose(fp);
return ret;
}
I think the biggest conceptual misunderstanding demonstrated in the code is a failure to understand how fgets and fscanf work.
Consider the following lines of code:
while(fgets(synonyms,50,fp)!=NULL){
...
while(fscanf(fp,"%49s",word)==1){
for(i=1;i<strlen(synonyms);i++){
fscanf(fp,"%49s",synonyms);
printf("%s",synonyms);
}
}
}
The fgets reads one line of the input. (Unless there is an input line that is greater than 49 characters long (48 + a newline), in which case fgets will only read the first 49 characters. The code should check for that condition and handle it.) The next fscanf then reads a word from the next line of input. The first line is effectively being discarded! If the input is formatted as expected, the 2nd scanf will read a single - into synonyms. This makes strlen(synonyms) evaluate to 1, so the for loop terminates. The while scanf loop then reads another word, and since synonyms still contains a string of length 1, the for loop is never entered. while scanf then proceeds to read the rest of the file. The next call to fgets returns NULL (since the fscanf loop has read to the end of the file) so the while/fgets loop terminates after 1 iteration.
I believe the intention was for the scanfs inside the while/fgets to operate on the line read by fgets. To do that, all the fscanf calls should be replaced by sscanf.

counting the number of strings in a text file containing numbers as well

I wanted to only count the number of strings in a text file, containing numbers as well. But the code below, counts even the numbers in the file as strings. How do I rectify the problem?
int count;
char *temp;
FILE *fp;
fp = fopen("multiplexyz.txt" ,"r" );
while(fscanf(fp,"%s",temp) != EOF )
{
count++;
}
printf("%d ",count);
return 0;
}
Well, first up, using the temp pointer without having backing storage for it is going to cause you a world of pain.
I'd suggest, as a start, using something like char temp[1000] instead, keeping in mind that's still a bit risky if you have words more than a thousand or so characters long (that's a different issue to the one you're asking about so I'll mention it but not spend too much time on fixing it).
Secondly, it appears you want to count words with numbers (like alpha7 or pi/2). If that's the case, you simply need to check temp after reading the "word" and increment count only if it matches a "non-numeric" pattern.
That could be as simple as just not incrementing if the word consists only of digits, or it could be complicated if you want to handle decimals, exponential formats and so on.
But the bottom line remains the same:
while(fscanf(fp,"%s",temp) != EOF )
{
if (! isANumber(temp))
count++;
}
with a suitable definition of isANumber. For example, for unsigned integers only, something like this would be a good start:
int isANumber (char *str) {
// Empty string is not a number.
if (*str == '\0')
return 0;
// Check every character.
while (*str != '\0') {
// If non-digit, it's not a number.
if (! isdigit (*str))
return 0;
str++;
}
// If all characters were digits, it was a number.
return 1;
}
For more complex checking, you can use the strto* calls in C, giving them the temp buffer and ensuring you use the endptr method to ensure the entire string is scanned. Off the top of my head, so not well tested, that would go something like:
int isANumber (char *str) {
// Empty string is not a number.
if (*str == '\0')
return 0;
// Use strtod to get a double.
char *endPtr;
long double d = strtold (str, &endPtr);
// Characters unconsumed, not number (things like 42b).
if (*endPtr != '\0')
return 0;
// Was a long double, so number.
return 1;
}
The only thing you need to watch out for there is that certain strings like NaN or +Inf are considered a number by strtold so you may need extra checks for that.
inside your while loop, loop through the string to check if any of its characters are digits. Something like:
while(*temp != '\0'){
if(isnumber(*temp))
break;
}
[dont copy exact same code]
I find strpbrk to be one of the most helpful function to search for several needles in a haystack. Your set of needles being the numeric characters "0123456789" which if present in a line read from your file will count as a line. I also prefer POSIX getline for a line count do to its proper handling of files with non-POSIX line endings for the last line (both fgets and wc -l omit text (and a count) of the last line if it does not contain a POSIX line end ('\n'). That said, a small function that searches a line for characters contained in a trm passed as a parameter could be written as:
/** open and read each line in 'fn' returning the number of lines
* continaing any of the characters in 'trm'.
*/
size_t nlines (char *fn, char *trm)
{
if (!fn) return 0;
size_t lines = 0, n = 0;
char *buf = NULL;
FILE *fp = fopen (fn, "r");
if (!fp) return 0;
while (getline (&buf, &n, fp) != -1)
if (strpbrk (buf, trm))
lines++;
fclose (fp);
free (buf);
return lines;
}
Simply pass the filename of interest and the terms to search for in each line. A short test code with a default term of "0123456789" that takes the filename as the first parameter and the term as the second could be written as follows:
#include <stdio.h> /* printf */
#include <stdlib.h> /* free */
#include <string.h> /* strlen, strrchr */
size_t nlines (char *fn, char *trm);
int main (int argc, char **argv) {
char *fn = argc > 1 ? argv[1] : NULL;
char *srch = argc > 2 ? argv[2] : "0123456789";
if (!fn) return 1;
printf ("%zu %s\n", nlines (fn, srch), fn);
return 0;
}
/** open and read each line in 'fn' returning the number of lines
* continaing any of the characters in 'trm'.
*/
size_t nlines (char *fn, char *trm)
{
if (!fn) return 0;
size_t lines = 0, n = 0;
char *buf = NULL;
FILE *fp = fopen (fn, "r");
if (!fp) return 0;
while (getline (&buf, &n, fp) != -1)
if (strpbrk (buf, trm))
lines++;
fclose (fp);
free (buf);
return lines;
}
Give it a try and see if this is what you are expecting, if not, just let me know and I am glad to help further.
Example Input File
$ cat dat/linewno.txt
The quick brown fox
jumps over 3 lazy dogs
who sleep in the sun
with a temp of 101
Example Use/Output
$ ./bin/getline_nlines_nums dat/linewno.txt
2 dat/linewno.txt
$ wc -l dat/linewno.txt
4 dat/linewno.txt

Read in individual words from text file and translate - C

I am writing a program (for a class assignment) to translate normal words into their pirate equivalents (hi = ahoy).
I have created the dictionary using two arrays of strings and am now trying to translate an input.txt file and put it into an output.txt file. I am able to write to the output file, but it only writes the translated first word over and over on a new line.
I've done a lot of reading/scouring and from what I can tell, using fscanf() to read my input file isn't ideal, but I cannot figure out what would be a better function to use. I need to read the file word by word (separated by space) and also read in each punctuation mark.
Input File:
Hi, excuse me sir, can you help
me find the nearest hotel? I
would like to take a nap and
use the restroom. Then I need
to find a nearby bank and make
a withdrawal.
Miss, how far is it to a local
restaurant or pub?
Output: ahoy (46 times, each on a separate line)
Translate Function:
void Translate(char inputFile[], char outputFile[], char eng[][20], char pir[][20]){
char currentWord[40] = {[0 ... 39] = '\0'};
char word;
FILE *inFile;
FILE *outFile;
int i = 0;
bool match = false;
//open input file
inFile = fopen(inputFile, "r");
//open output file
outFile = fopen(outputFile, "w");
while(fscanf(inFile, "%s1023", currentWord) == 1){
if( ispunct(currentWord) == 0){
while( match != true){
if( strcasecmp(currentWord, eng[i]) == 0 || i<28){ //Finds word in English array
fprintf(outFile, pir[i]); //Puts pirate word corresponding to English word in output file
match = true;
}
else {i++;}
}
match = false;
i=0;
}
else{
fprintf(outFile, &word);//Attempt to handle punctuation which should carry over to output
}
}
}
As you start matching against different english words, i<28 is initially true. Hence the expression <anything> || i<28 is also immediately true and correspondingly the code will behave as though a match was found on the first word in your dictionary.
To avoid this you should handle the "found a match at index i" and the "no match found" condition separately. This can be achieved as follow:
if (i >= dictionary_size) {
// No pirate equivalent, print English word
fprintf(outFile, "%s", currentWord);
break; // stop matching
}
else if (strcasecmp(currentWord, eng[i]) == 0){
...
}
else {i++;}
where dictionary_size would be 28 in your case (based on your attempt at a stop condition with i<28).
Here's a code snippet that I use to parse things out. Here's what it does:
Given this input:
hi, excuse me sir, how are you.
It puts each word into an array of strings based on the DELIMS constant, and deletes any char in the DELIMS const. This will destroy your original input string though. I simply print out the array of strings:
[hi][excuse][me][sir][how][are][you][(null)]
Now this is taking input from stdin, but you can change it around to take it from a file stream. You also might want to consider input limits and such.
#include <stdio.h>
#include <string.h>
#define CHAR_LENGTH 100
const char *DELIMS = " ,.\n";
char *p;
int i;
int parse(char *inputLine, char *arguments[], const char *delimiters)
{
int count = 0;
for (p = strtok(inputLine, delimiters); p != NULL; p = strtok(NULL, delimiters))
{
arguments[count] = p;
count++;
}
return count;
}
int main()
{
char line[1024];
size_t bufferSize = 1024;
char *args[CHAR_LENGTH];
fgets(line, bufferSize, stdin);
int count = parse(line, args, DELIMS);
for (i = 0; i <= count; i++){
printf("[%s]", args[i]);
}
}

C - Find longest word in a sentence

Hi I have this program that reads a text file line by line and it's supposed to output the longest word in each sentence. Although it works to a degree, it's overwriting the biggest word with an equally big word which is something I am not sure how to fix. What do I need to think about when editing this program? Thanks
//Program Written and Designed by R.Sharpe
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "memwatch.h"
int main(int argc, char** argv)
{
FILE* file;
file = fopen(argv[1], "r");
char* sentence = (char*)malloc(100*sizeof(char));
while(fgets(sentence, 100, file) != NULL)
{
char* word;
int maxLength = 0;
char* maxWord;
maxWord = (char*)calloc(40, sizeof(char));
word = (char*)calloc(40, sizeof(char));
word = strtok(sentence, " ");
while(word != NULL)
{
//printf("%s\n", word);
if(strlen(word) > maxLength)
{
maxLength = strlen(word);
strcpy(maxWord, word);
}
word = strtok(NULL, " ");
}
printf("%s\n", maxWord);
maxLength = 0; //reset for next sentence;
}
return 0;
}
My textfile that the program is accepting contains this
some line with text
another line of words
Jimmy John took the a apple and something reallyreallylongword it was nonsense
and my output is this
text
another
reallyreallylongword
but I would like the output to be
some
another
reallyreallylongword
EDIT: If anyone plans on using this code, remember when you fix the newline character issue don't forget about the null terminator. This is fixed by setting
sentence[strlen(sentence)-1] = 0 which in effect gets rid of newline character and replaces it with null terminating.
You get each line by using
fgets(sentence, 100, file)
The problem is, the new line character is stored inside sentence. For instance, the first line is "some line with text\n", which makes the longest word "text\n".
To fix it, remove the new line character every time you get sentence.

Reading lines out of a file and putting them into a string array

What I would like to do is be able to read a line out of a file and stick it into an array of strings. This is what I have so far but it does not seem to work.
...
char line [128];
char file [10][128];
plist = fopen("plist1.txt", "r");
while(fgets(line, sizeof line, plist) != NULL){
file[i][0]= line;
i++;
}
I tried doing file[i][0] = *line; and I was able to print out the first character of each line. Is this the best way to do what I am trying to do?
Also, the number of lines in a text file will vary so I would like to have my array be of variable length instead of 10.
EDIT: I have tried two solution listed below. Both give me a segmentation fault dealing with either strcpy or the fgets.
1. while (fgets(file[i], sizeof(file[i]), plist))
i++;
2. while (fgets(line, sizeof(line), plist)){
strcpy(file[i], line);
i++;
}
Try this:
while (fgets(file[i], sizeof(file[i]), plist))
i++;
Alternatively
while (fgets(line, sizeof(line), plist))
strcpy(file[i], line);
Here is a fully working solution, assuming plist1.txt contains a line of text per line. fgets() will by default also include a newline at the end, that you need to get rid of. It is also a good idea to use symbolic constants so that you can change them easily later on if you so want.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUF 128 /* can change the buffer size as well */
#define TOT 10 /* change to accomodate other sizes, change ONCE here */
int main(void) {
char line[TOT][BUF];
FILE *plist = NULL;
int i = 0;
int total = 0;
plist = fopen("plist1.txt", "r");
while(fgets(line[i], BUF, plist)) {
/* get rid of ending \n from fgets */
line[i][strlen(line[i]) - 1] = '\0';
i++;
}
total = i;
for(i = 0; i < total; ++i)
printf("%s\n", line[i]);
return 0;
}
Please look at
http://www.daniweb.com/software-development/c/code/216411

Resources