Reading only letters in C using fscanf - c

Is it possible to use fscanf to read words without symbols from a text file
this function prints one word on a single line but if the word had comma or brackets it will print those too is there anyway to only print letters?
void load(const char *file)
{
FILE *inFile = fopen(file , "r");
if (inFile == NULL )
{
return false;
}
char word[LENGTH];
while (fscanf(inFile, "%s", word) != EOF)
{
printf("%s\n", word);
}
}

Your function will not compile at all as it is returning value and it is declared as void
Read char by char and print only what you want.
#include <stdio.h>
#include <stdbool.h>
#include <ctype.h>
bool load(const char *file)
{
FILE *inFile = fopen(file , "r");
if (inFile == NULL )
{
return false;
}
char word;
int retval;
while ((retval = fscanf(inFile, "%c", &word)) != EOF && retval == 1)
{
if(isalnum((unsigned char)word) || isspace((unsigned char)word))
printf("%c", word);
}
fclose(inFile);
return true;
}

Is it possible to use fscanf to read words without symbols from a text file
Well, yes...
You can use a scanset to tell which characters to match. Like:
%[a-zA-Z]
This will match/accept all upper and lower case letters and reject all other characters.
But... That will give you another problem. When the next character the the file doesn't match, the file pointer won't be advanced and you are kind of stuck. To handle that you need a way to skip non-matching characters, e.g. fgetc to read a single character.
Something like:
char word[32];
while (1)
{
int res = fscanf(inFile, "%31[a-zA-Z]", word); // At most 31 characters and only letters
if (res == EOF) break;
if (res == 1)
{
printf("%s\n", word);
}
else
{
// Didn't match, skip a character
if (fgetc(inFile) == EOF) break;
}
}
With a file like:
Hallo World, having() fun ....
Oh,yes...
the output is
Hallo
World
having
fun
Oh
yes
An alternative that only uses fsanf could be:
char word[32];
while (1)
{
int res = fscanf(inFile, "%31[a-zA-Z]", word); // Read letters
if (res == EOF) break;
if (res == 1)
{
printf("%s\n", word);
}
res = fscanf(inFile, "%31[^a-zA-Z]", word); // Skip non-letters
if (res == EOF) break;
}
Notice the ^ in the second scanset. It changes the meaning of the scanset to be "Don't match ...." So the code alternate between "Read a word consisting of letters" and "Skip everything not being letters" which is likely a better way of doing this than the fgetc method above.
That said, I normally prefer reading the file using fgets and then parse the buffer afterwards (e.g. using sscanf) but that's another story.

Related

How to read specific words from a file?

I have a file that contains words and their synonyms each on a separate line.
I am writing this code that should read the file line by line then display it starting from the second word which is the synonym.
I used the variable count in the first loop in order to be able to count the number of synonyms of each word because the number of synonyms differs from one to another. Moreover I used the condition synonyms[i]==',' because each synonym is separate by a comma.
The purpose of me writing such code is to put them in a binary search tree in order to have a full dictionary.
The code doesn't contain any error yet it is not working.
I have tried to each the loop but that didn't work too.
Sample input from the file:
abruptly - dead, short, suddenly
acquittance - release
adder - common, vipera
Sample expected output:
dead short suddenly
acquittance realse
common vipera
Here is the code:
void LoadFile(FILE *fp){
int count;
int i;
char synonyms[50];
char word[50];
while(fgets(synonyms,50,fp)!=NULL){
for (i=0;i<strlen(synonyms);i++)
if (synonyms[i]==',' || synonyms[i]=='\n')
count++;
}
while(fscanf(fp,"%s",word)==1){
for(i=1;i<strlen(synonyms);i++){
( fscanf(fp,"%s",synonyms)==1);
printf("%s",synonyms);
}
}
}
int main(){
char fn[]="C:/Users/CLICK ONCE/Desktop/Semester 4/i2206/Project/Synonyms.txt";
FILE *fp;
fp=fopen(fn,"rt");
if (fp==NULL){
printf("Cannot open this file");
}
else{
LoadFile(fp);
}
return 0;
}
Here is my solution. I have split the work into functions for readability. The actual parsing is done in parsefunction. That function thakes into account hyphenated compound words such as seventy-two. The word and his synonyms must be separated by an hyphen preceded by at least one space.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
// Trim leading and trailing space characters.
// Warning: string is modified
char* trim(char* s) {
char* p = s;
int l = strlen(p);
while (isspace(p[l - 1])) p[--l] = 0;
while (*p && isspace(*p)) ++p, --l;
memmove(s, p, l + 1);
return s;
}
// Warning: string is modified
int parse(char* line)
{
char* token;
char* p;
char* word;
if (line == NULL) {
printf("Missing input line\n");
return 0;
}
// first find the word delimiter: an hyphen preceded by a space
p = line;
while (1) {
p = strchr(p, '-');
if (p == NULL) {
printf("Missing hypen\n");
return 0;
}
if ((p > line) && (p[-1] == ' ')) {
// We found an hyphen preceded by a space
*p = 0; // Replace by nul character (end of string)
break;
}
p++; // Skip hyphen inside hypheneted word
}
word = trim(line);
printf("%s ", word);
// Next find synonyms delimited by a coma
char delim[] = ", ";
token = strtok(p + 1, delim);
while (token != NULL) {
printf("%s ", token);
token = strtok(NULL, delim);
}
printf("\n");
return 1;
}
int LoadFile(FILE* fp)
{
if (fp == NULL) {
printf("File not open\n");
return 0;
}
int ret = 1;
char str[1024]; // Longest allowed line
while (fgets(str, sizeof(str), fp) != NULL) {
str[strcspn(str, "\r\n")] = 0; // Remove ending \n
ret &= parse(str);
}
return ret;
}
int main(int argc, char *argv[])
{
FILE* fp;
char* fn = "Synonyms.txt";
fp = fopen(fn, "rt");
if (fp == NULL) {
perror(fn);
return 1;
}
int ret = LoadFile(fp);
fclose(fp);
return ret;
}
I think the biggest conceptual misunderstanding demonstrated in the code is a failure to understand how fgets and fscanf work.
Consider the following lines of code:
while(fgets(synonyms,50,fp)!=NULL){
...
while(fscanf(fp,"%49s",word)==1){
for(i=1;i<strlen(synonyms);i++){
fscanf(fp,"%49s",synonyms);
printf("%s",synonyms);
}
}
}
The fgets reads one line of the input. (Unless there is an input line that is greater than 49 characters long (48 + a newline), in which case fgets will only read the first 49 characters. The code should check for that condition and handle it.) The next fscanf then reads a word from the next line of input. The first line is effectively being discarded! If the input is formatted as expected, the 2nd scanf will read a single - into synonyms. This makes strlen(synonyms) evaluate to 1, so the for loop terminates. The while scanf loop then reads another word, and since synonyms still contains a string of length 1, the for loop is never entered. while scanf then proceeds to read the rest of the file. The next call to fgets returns NULL (since the fscanf loop has read to the end of the file) so the while/fgets loop terminates after 1 iteration.
I believe the intention was for the scanfs inside the while/fgets to operate on the line read by fgets. To do that, all the fscanf calls should be replaced by sscanf.

C Reading a file of digits separated by commas

I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!
For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.
First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.
My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");

Jumping to next line with fscanf()

I have two files .csv and I need to read the whole file but it have to be filed by field. I mean, csv files are files with data separated by comma, so I cant use fgets.
I need to read all the data but I don't know how to jump to the next line.
Here is what I've done so far:
int main()
{
FILE *arq_file;
arq_file = fopen("file.csv", "r");
if(arq_file == NULL){
printf("Not possible to read the file.");
exit(0);
}
while( !feof(arq_file) ){
fscanf(arq_file, "%i %lf", &myStruct[i+1].Field1, &myStruct[i+1].Field2);
}
fclose(arq_file);
return 0;
}
It will get in a infinity loop because it never gets the next line.
How could I reach the line below the one I just read?
Update: File 01 Example
1,Alan,123,
2,Alan Harper,321
3,Jose Rendeks,32132
4,Maria da graça,822282
5,Charlie Harper,9999999999
File 02 Example
1,320,123
2,444,321
3,250,123,321
3,3,250,373,451
2,126,621
1,120,320
2,453,1230
3,12345,0432,1830
I think an example is better than giving you hints, this is a combination of fgets() + strtok(), there are other functions that could work for example strchr(), though it's easier this way and since I just wanted to point you in the right direction, well I did it like this
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int
main(void)
{
FILE *file;
char buffer[256];
char *pointer;
size_t line;
file = fopen("data.dat", "r");
if (file == NULL)
{
perror("fopen()");
return -1;
}
line = 0;
while ((pointer = fgets(buffer, sizeof(buffer), file)) != NULL)
{
size_t field;
char *token;
field = 0;
while ((token = strtok(pointer, ",")) != NULL)
{
printf("line %zu, field %zu -> %s\n", line, field, token);
field += 1;
pointer = NULL;
}
line += 1;
}
return 0;
}
I think it's very clear how the code works and I hope you can understand.
If the same code has to handle both data files, then you're stuck with reading the fields into a string, and subsequently converting the string into a number.
It is not clear from your description whether you need to do something special at the end of line or not — but because only one of the data lines ends with a comma, you do have to allow for fields to be separated by a comma or a newline.
Frankly, you'd probably do OK with using getchar() or equivalent; it is simple.
char buffer[4096];
char *bufend = buffer + sizeof(buffer) - 1;
char *curfld = buffer;
int c;
while ((c = getc(arq_file)) != EOF)
{
if (curfld == bufend)
…process overlong field…
else if (c == ',' || c == '\n')
{
*curfld = '\0';
process(buffer);
curfld = buffer;
}
else
*curfld++ = c;
}
if (c == EOF && curfld != buffer)
{
*curfld = '\0';
process(buffer);
}
However, if you want to go with higher level functions, then you do want to use fgets() to read lines (unless you need to worry about deviant line endings, such as DOS vs Unix vs old-style Mac (CR-only) line endings). Or use POSIX
getline() to read arbitrarily long lines. Then split the lines using strtok_r() or equivalent.
char *buffer = 0;
size_t buflen = 0;
while (getline(&buffer, &buflen, arq_file) != -1)
{
char *posn = buffer;
char *epos;
char *token;
while ((token = strtok_r(posn, ",\n", &epos)) != 0)
{
process(token);
posn = 0;
}
/* Do anything special for end of line */
}
free(buffer);
If you think you must use scanf(), then you need to use something like:
char buffer[4096];
char c;
while (fscanf(arq_file, "%4095[^,\n]%c", buffer, &c) == 2)
process(buffer);
The %4095[^,\n] scan set reads up to 4095 characters that are neither comma nor newline into buffer, and then reads the next character (which must, therefore, either be comma or newline — or conceivably EOF, but that causes problems) into c. If the last character in the file is neither comma nor newline, then you will skip the last field.

Strange output from custom string object

I'm working on the second half of a program for class and the objective of the program is simple, but I can't figure out what's causing this output for my program. Basically, we have to read a file, using a function we wrote for the string header. We should then print out all the four-letter words in that file, obviously ignoring punctuation and whitespace. I've got the logic for that down, but what I can't figure out is why, even though I check to see if the length of the string is 4 before printing it, I sometimes get output that's clearly longer than 4. Here is input text from the file I'm using.
This is a test of the program which will only print out the four letter words in this file. Let's see if it works!
And this is the output I'm getting...
This
test
willham
onlyham
fourtam
thissrm
filesrm
Here is the main program: http://pastebin.com/xviETPFm
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include "mystring.h"
int fTerminate(char ch, int * pbDiscardChar);
int main(int argc, char ** argv) {
MYSTRING str;
FILE * in;
if((str = mystring_init_default()) == MYSTRING_STATUS_ERROR) {
printf("Error initializing MYSTRING object.\n");
return -1;
}
if((in = fopen("book.txt", "r")) == NULL) {
printf("Error opening file \"book.txt\". Does the file exist?\n");
return -1;
}
while(mystring_input(str, in, 1, fTerminate) != MYSTRING_STATUS_ERROR) {
if(mystring_size(str) == 4) {
mystring_output(str, stdout);
printf("\n");
}
}
mystring_destroy(&str);
return 0;
}
int fTerminate(char ch, int * pbDiscardChar) {
// Terminate on whitespace characters or non-alpha characters.
return (*pbDiscardChar = ((isspace(ch) || (isalpha(ch) == 0))?1:0));
}
And just in case you need it, here is the input function: http://pastebin.com/vD71hGEt
MyString_Status mystring_input(MYSTRING hString,
FILE * hFile,
int bIgnoreLeadingWhiteSpace,
int (*fTerminate)(char ch, int * pbDiscardChar)) {
char ch = '\0';
int eofCheck = 0;
int t, discard;
mystring_truncate(hString, 0);
if(hFile == NULL) return MYSTRING_STATUS_ERROR;
eofCheck = fscanf(hFile, "%c", &ch);
// If bIgnoreWhiteSpace is true, gobble leading whitespace.
if(bIgnoreLeadingWhiteSpace) {
while(isspace(ch)) {
eofCheck = fscanf(hFile, "%c", &ch);
if(eofCheck == EOF) return MYSTRING_STATUS_ERROR;
}
}
// Add all valid characters to the string, overwriting the old string.
while(eofCheck != EOF) {
t = fTerminate(ch, &discard);
if(discard == 0) mystring_push(hString, ch);
if(t) return MYSTRING_STATUS_SUCCESS;
eofCheck = fscanf(hFile, "%c", &ch);
}
if(eofCheck == EOF) return MYSTRING_STATUS_ERROR;
return MYSTRING_STATUS_SUCCESS;
}
It clearly works for the first two strings, so what happened with the rest of them? Does my computer just like ham?

fscanf help: how to check for formatting

So the current function is supposed to see anything stored between two pound signs (#abc# should give back abc), but if I want to error check to see if there's a pound sign missing, or there's nothing between the pound signs, or that the length of the string in between the two pound signs is greater than a certain number of characters, do I use the fscanf function to do that?
Here's what the fscanf code looks like:
if (fscanf(fp, " %c%[^#]%c", &start, buffer, &end) == 3) {
return strdup(buffer);
} else {
return NULL;
}
do I use the fscanf function to do that?
No, you don't. Using the scanf() family of functions is very rarely a good idea.
char buf[0x1000];
// expect leading `#'
int ch = fgetc(fp);
if (ch != '#') {
puts("Error, missing # at beginning of input");
exit(-1);
}
// read until terminating `#' or until the buffer is exhausted
char *p = buf;
while ((ch = fgetc(fp)) != '#' && p - buf < sizeof(buf)) {
*p++ = ch;
}
// expect terminating `#'
if (ch != '#') {
puts("Error, missing # at end of input");
exit(-1);
}
If you need to deal with the zero-characters case, then probably fscanf() is not the right tool. There's a reasonable argument that fscanf() is seldom the correct tool; you'd do better with fgets() and sscanf().
In this case, I'd collect lines until there's one that's not blank (since that's what the fscanf() does), and then search for the # symbols with strchr():
char line[4096];
while (fgets(line, sizeof(line), fp) != 0)
{
if (strspn(line, " \t\n") == strlen(line))
continue;
char *hash1 = strchr(line, '#');
if (hash1 == 0)
...error no hashes...
else
{
char *hash2 = strchr(hash1+1, '#');
if (hash2 == 0)
...second hash is missing...
if (hash2 - hash1 > MAX_PERMITTED_STRING_LEN)
...too long a string...
*hash2 = '\0';
char *res = strdup(hash1);
}
}

Resources