fscanf help: how to check for formatting - c

So the current function is supposed to see anything stored between two pound signs (#abc# should give back abc), but if I want to error check to see if there's a pound sign missing, or there's nothing between the pound signs, or that the length of the string in between the two pound signs is greater than a certain number of characters, do I use the fscanf function to do that?
Here's what the fscanf code looks like:
if (fscanf(fp, " %c%[^#]%c", &start, buffer, &end) == 3) {
return strdup(buffer);
} else {
return NULL;
}

do I use the fscanf function to do that?
No, you don't. Using the scanf() family of functions is very rarely a good idea.
char buf[0x1000];
// expect leading `#'
int ch = fgetc(fp);
if (ch != '#') {
puts("Error, missing # at beginning of input");
exit(-1);
}
// read until terminating `#' or until the buffer is exhausted
char *p = buf;
while ((ch = fgetc(fp)) != '#' && p - buf < sizeof(buf)) {
*p++ = ch;
}
// expect terminating `#'
if (ch != '#') {
puts("Error, missing # at end of input");
exit(-1);
}

If you need to deal with the zero-characters case, then probably fscanf() is not the right tool. There's a reasonable argument that fscanf() is seldom the correct tool; you'd do better with fgets() and sscanf().
In this case, I'd collect lines until there's one that's not blank (since that's what the fscanf() does), and then search for the # symbols with strchr():
char line[4096];
while (fgets(line, sizeof(line), fp) != 0)
{
if (strspn(line, " \t\n") == strlen(line))
continue;
char *hash1 = strchr(line, '#');
if (hash1 == 0)
...error no hashes...
else
{
char *hash2 = strchr(hash1+1, '#');
if (hash2 == 0)
...second hash is missing...
if (hash2 - hash1 > MAX_PERMITTED_STRING_LEN)
...too long a string...
*hash2 = '\0';
char *res = strdup(hash1);
}
}

Related

Is there any cross-platform approach for dealing with new line characters in text files in C language? [duplicate]

This question already has answers here:
Removing trailing newline character from fgets() input
(14 answers)
Closed 7 years ago.
I'm reading stdin and there are sometimes unix-style and sometimes windows-style newlines.
How to consume either type of newline?
Assuming you know there will be a newline, the solution is to consume one character, and then decide:
10 - LF ... Unix style newline
13 - CR ... Windows style newline
If it's 13, you have to consume one more character (10)
const char x = fgetc(stdin); // Consume LF or CR
if (x == 13) fgetc(stdin); // consume LF
There are a few more newline conventions than that. In particular, all four involving CR \r and LF \n -- \n, \r, \r\n, and \n\r -- are actually encoutered in the wild.
For reading text input, possibly interactively, and supporting all of those four newline encodings at the same time, I recommend using a helper function something like the following:
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <ctype.h>
#include <stdio.h>
#include <errno.h>
size_t get_line(char **const lineptr, size_t *const sizeptr, char *const lastptr, FILE *const in)
{
char *line;
size_t size, have;
int c;
if (!lineptr || !sizeptr || !in) {
errno = EINVAL; /* Invalid parameters! */
return 0;
}
if (*lineptr) {
line = *lineptr;
size = *sizeptr;
} else {
line = NULL;
size = 0;
}
have = 0;
if (lastptr) {
if (*lastptr == '\n') {
c = getc(in);
if (c != '\r' && c != EOF)
ungetc(c, in);
} else
if (*lastptr == '\r') {
c = getc(in);
if (c != '\n' && c != EOF)
ungetc(c, in);
}
*lastptr = '\0';
}
while (1) {
if (have + 2 >= size) {
/* Reallocation policy; my personal quirk here.
* You can replace this with e.g. have + 128,
* or (have + 2)*3/2 or whatever you prefer. */
size = (have | 127) + 129;
line = realloc(line, size);
if (!line) {
errno = ENOMEM; /* Out of memory */
return 0;
}
*lineptr = line;
*sizeptr = size;
}
c = getc(in);
if (c == EOF) {
if (lastptr)
*lastptr = '\0';
break;
} else
if (c == '\n') {
if (lastptr)
*lastptr = c;
else {
c = getc(in);
if (c != EOF && c != '\r')
ungetc(c, in);
}
break;
} else
if (c == '\r') {
if (lastptr)
*lastptr = c;
else {
c = getc(in);
if (c != EOF && c != '\n')
ungetc(c, in);
}
break;
}
if (iscntrl(c) && !isspace(c))
continue;
line[have++] = c;
}
if (ferror(in)) {
errno = EIO; /* I/O error */
return 0;
}
line[have] = '\0';
errno = 0; /* No errors, even if have were 0 */
return have;
}
int main(void)
{
char *data = NULL;
size_t size = 0;
size_t len;
char last = '\0';
setlocale(LC_ALL, "");
while (1) {
len = get_line(&data, &size, &last, stdin);
if (errno) {
fprintf(stderr, "Error reading standard input: %s.\n", strerror(errno));
return EXIT_FAILURE;
}
if (!len && feof(stdin))
break;
printf("Read %lu characters: '%s'\n", (unsigned long)len, data);
}
free(data);
data = NULL;
size = 0;
return EXIT_SUCCESS;
}
Except for the errno constants I used (EINVAL, ENOMEM, and EIO), the above code is C89, and should be portable.
The get_line() function dynamically reallocates the line buffer to be long enough when necessary. For interactive inputs, you must accept a newline at the first newline-ish character you encounter (as trying to read the second character would block, if the first character happens to be the only newline character). If specified, the one-character state at lastptr is used to detect and handle correctly any two-character newlines at the start of the next line read. If not specified, the function will attempt to consume the entire newline as part of the current line (which is okay for non-interactive inputs, especially files).
The newline is not stored or counted in the line length. For added ease of use, the function also skips non-whitespace control characters. Especially embedded nul characters (\0) often cause headaches, so having the function skip those altogether is often a robust approach.
As a final touch, the function always sets errno -- to zero if no error occurred, nonzero error code otherwise --, including ferror() cases, so detecting error conditions is trivial.
The above code snippet includes a main(), which reads and displays input lines, using the current locale for the meaning of "non-whitespace control character" (!isspace(c) && iscntrl(c)).
Although this is definitely not the fastest mechanism to read input, it is not that slow, and it is a very robust one.
Questions?

Reading only letters in C using fscanf

Is it possible to use fscanf to read words without symbols from a text file
this function prints one word on a single line but if the word had comma or brackets it will print those too is there anyway to only print letters?
void load(const char *file)
{
FILE *inFile = fopen(file , "r");
if (inFile == NULL )
{
return false;
}
char word[LENGTH];
while (fscanf(inFile, "%s", word) != EOF)
{
printf("%s\n", word);
}
}
Your function will not compile at all as it is returning value and it is declared as void
Read char by char and print only what you want.
#include <stdio.h>
#include <stdbool.h>
#include <ctype.h>
bool load(const char *file)
{
FILE *inFile = fopen(file , "r");
if (inFile == NULL )
{
return false;
}
char word;
int retval;
while ((retval = fscanf(inFile, "%c", &word)) != EOF && retval == 1)
{
if(isalnum((unsigned char)word) || isspace((unsigned char)word))
printf("%c", word);
}
fclose(inFile);
return true;
}
Is it possible to use fscanf to read words without symbols from a text file
Well, yes...
You can use a scanset to tell which characters to match. Like:
%[a-zA-Z]
This will match/accept all upper and lower case letters and reject all other characters.
But... That will give you another problem. When the next character the the file doesn't match, the file pointer won't be advanced and you are kind of stuck. To handle that you need a way to skip non-matching characters, e.g. fgetc to read a single character.
Something like:
char word[32];
while (1)
{
int res = fscanf(inFile, "%31[a-zA-Z]", word); // At most 31 characters and only letters
if (res == EOF) break;
if (res == 1)
{
printf("%s\n", word);
}
else
{
// Didn't match, skip a character
if (fgetc(inFile) == EOF) break;
}
}
With a file like:
Hallo World, having() fun ....
Oh,yes...
the output is
Hallo
World
having
fun
Oh
yes
An alternative that only uses fsanf could be:
char word[32];
while (1)
{
int res = fscanf(inFile, "%31[a-zA-Z]", word); // Read letters
if (res == EOF) break;
if (res == 1)
{
printf("%s\n", word);
}
res = fscanf(inFile, "%31[^a-zA-Z]", word); // Skip non-letters
if (res == EOF) break;
}
Notice the ^ in the second scanset. It changes the meaning of the scanset to be "Don't match ...." So the code alternate between "Read a word consisting of letters" and "Skip everything not being letters" which is likely a better way of doing this than the fgetc method above.
That said, I normally prefer reading the file using fgets and then parse the buffer afterwards (e.g. using sscanf) but that's another story.

C Reading a file of digits separated by commas

I am trying to read in a file that contains digits operated by commas and store them in an array without the commas present.
For example: processes.txt contains
0,1,3
1,0,5
2,9,8
3,10,6
And an array called numbers should look like:
0 1 3 1 0 5 2 9 8 3 10 6
The code I had so far is:
FILE *fp1;
char c; //declaration of characters
fp1=fopen(argv[1],"r"); //opening the file
int list[300];
c=fgetc(fp1); //taking character from fp1 pointer or file
int i=0,number,num=0;
while(c!=EOF){ //iterate until end of file
if (isdigit(c)){ //if it is digit
sscanf(&c,"%d",&number); //changing character to number (c)
num=(num*10)+number;
}
else if (c==',' || c=='\n') { //if it is new line or ,then it will store the number in list
list[i]=num;
num=0;
i++;
}
c=fgetc(fp1);
}
But this is having problems if it is a double digit. Does anyone have a better solution? Thank you!
For the data shown with no space before the commas, you could simply use:
while (fscanf(fp1, "%d,", &num) == 1 && i < 300)
list[i++] = num;
This will read the comma after the number if there is one, silently ignoring when there isn't one. If there might be white space before the commas in the data, add a blank before the comma in the format string. The test on i prevents you writing outside the bounds of the list array. The ++ operator comes into its own here.
First, fgetc returns an int, so c needs to be an int.
Other than that, I would use a slightly different approach. I admit that it is slightly overcomplicated. However, this approach may be usable if you have several different types of fields that requires different actions, like a parser. For your specific problem, I recommend Johathan Leffler's answer.
int c=fgetc(f);
while(c!=EOF && i<300) {
if(isdigit(c)) {
fseek(f, -1, SEEK_CUR);
if(fscanf(f, "%d", &list[i++]) != 1) {
// Handle error
}
}
c=fgetc(f);
}
Here I don't care about commas and newlines. I take ANYTHING other than a digit as a separator. What I do is basically this:
read next byte
if byte is digit:
back one byte in the file
read number, irregardless of length
else continue
The added condition i<300 is for security reasons. If you really want to check that nothing else than commas and newlines (I did not get the impression that you found that important) you could easily just add an else if (c == ... to handle the error.
Note that you should always check the return value for functions like sscanf, fscanf, scanf etc. Actually, you should also do that for fseek. In this situation it's not as important since this code is very unlikely to fail for that reason, so I left it out for readability. But in production code you SHOULD check it.
My solution is to read the whole line first and then parse it with strtok_r with comma as a delimiter. If you want portable code you should use strtok instead.
A naive implementation of readline would be something like this:
static char *readline(FILE *file)
{
char *line = malloc(sizeof(char));
int index = 0;
int c = fgetc(file);
if (c == EOF) {
free(line);
return NULL;
}
while (c != EOF && c != '\n') {
line[index++] = c;
char *l = realloc(line, (index + 1) * sizeof(char));
if (l == NULL) {
free(line);
return NULL;
}
line = l;
c = fgetc(file);
}
line[index] = '\0';
return line;
}
Then you just need to parse the whole line with strtok_r, so you would end with something like this:
int main(int argc, char **argv)
{
FILE *file = fopen(argv[1], "re");
int list[300];
if (file == NULL) {
return 1;
}
char *line;
int numc = 0;
while((line = readline(file)) != NULL) {
char *saveptr;
// Get the first token
char *tok = strtok_r(line, ",", &saveptr);
// Now start parsing the whole line
while (tok != NULL) {
// Convert the token to a long if possible
long num = strtol(tok, NULL, 0);
if (errno != 0) {
// Handle no value conversion
// ...
// ...
}
list[numc++] = (int) num;
// Get next token
tok = strtok_r(NULL, ",", &saveptr);
}
free(line);
}
fclose(file);
return 0;
}
And for printing the whole list just use a for loop:
for (int i = 0; i < numc; i++) {
printf("%d ", list[i]);
}
printf("\n");

Jumping to next line with fscanf()

I have two files .csv and I need to read the whole file but it have to be filed by field. I mean, csv files are files with data separated by comma, so I cant use fgets.
I need to read all the data but I don't know how to jump to the next line.
Here is what I've done so far:
int main()
{
FILE *arq_file;
arq_file = fopen("file.csv", "r");
if(arq_file == NULL){
printf("Not possible to read the file.");
exit(0);
}
while( !feof(arq_file) ){
fscanf(arq_file, "%i %lf", &myStruct[i+1].Field1, &myStruct[i+1].Field2);
}
fclose(arq_file);
return 0;
}
It will get in a infinity loop because it never gets the next line.
How could I reach the line below the one I just read?
Update: File 01 Example
1,Alan,123,
2,Alan Harper,321
3,Jose Rendeks,32132
4,Maria da graça,822282
5,Charlie Harper,9999999999
File 02 Example
1,320,123
2,444,321
3,250,123,321
3,3,250,373,451
2,126,621
1,120,320
2,453,1230
3,12345,0432,1830
I think an example is better than giving you hints, this is a combination of fgets() + strtok(), there are other functions that could work for example strchr(), though it's easier this way and since I just wanted to point you in the right direction, well I did it like this
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int
main(void)
{
FILE *file;
char buffer[256];
char *pointer;
size_t line;
file = fopen("data.dat", "r");
if (file == NULL)
{
perror("fopen()");
return -1;
}
line = 0;
while ((pointer = fgets(buffer, sizeof(buffer), file)) != NULL)
{
size_t field;
char *token;
field = 0;
while ((token = strtok(pointer, ",")) != NULL)
{
printf("line %zu, field %zu -> %s\n", line, field, token);
field += 1;
pointer = NULL;
}
line += 1;
}
return 0;
}
I think it's very clear how the code works and I hope you can understand.
If the same code has to handle both data files, then you're stuck with reading the fields into a string, and subsequently converting the string into a number.
It is not clear from your description whether you need to do something special at the end of line or not — but because only one of the data lines ends with a comma, you do have to allow for fields to be separated by a comma or a newline.
Frankly, you'd probably do OK with using getchar() or equivalent; it is simple.
char buffer[4096];
char *bufend = buffer + sizeof(buffer) - 1;
char *curfld = buffer;
int c;
while ((c = getc(arq_file)) != EOF)
{
if (curfld == bufend)
…process overlong field…
else if (c == ',' || c == '\n')
{
*curfld = '\0';
process(buffer);
curfld = buffer;
}
else
*curfld++ = c;
}
if (c == EOF && curfld != buffer)
{
*curfld = '\0';
process(buffer);
}
However, if you want to go with higher level functions, then you do want to use fgets() to read lines (unless you need to worry about deviant line endings, such as DOS vs Unix vs old-style Mac (CR-only) line endings). Or use POSIX
getline() to read arbitrarily long lines. Then split the lines using strtok_r() or equivalent.
char *buffer = 0;
size_t buflen = 0;
while (getline(&buffer, &buflen, arq_file) != -1)
{
char *posn = buffer;
char *epos;
char *token;
while ((token = strtok_r(posn, ",\n", &epos)) != 0)
{
process(token);
posn = 0;
}
/* Do anything special for end of line */
}
free(buffer);
If you think you must use scanf(), then you need to use something like:
char buffer[4096];
char c;
while (fscanf(arq_file, "%4095[^,\n]%c", buffer, &c) == 2)
process(buffer);
The %4095[^,\n] scan set reads up to 4095 characters that are neither comma nor newline into buffer, and then reads the next character (which must, therefore, either be comma or newline — or conceivably EOF, but that causes problems) into c. If the last character in the file is neither comma nor newline, then you will skip the last field.

Skip line if is determined character

I'm trying to read the first character of a file and whenever it's equal to '(' I should skip that line else get the first character from that line. I'm under a mac and I can make use of fgetln.
FILE *file = fopen("test.txt", "r");
char c;
while(fscanf(file, "%s", &c) != EOF) {
if (c != '(')
printf("%c", c);
}
That's my current code. I don't know how to skip lines, although I've tried to get the whole line and checked only the first char solving the skip problem. However this is not working I'm getting strange characters in my console instead of the ones inside test.txt. How should I do that?
The problem with using %s format specifier of fscanf is that is splits on spaces, not only on end-of-line characters. Moreover, reading it in a single-character buffer will nearly always produce undefined behavior.
There are several ways to solve this problem, using different APIs:
You could replace %s with %200[^\n], and passing a 201-character buffer instead of c,
Using fgets with a properly-sized buffer, and picking the initial character, or
Using a character-based API, and setting a "take next" flag each time that you see a '\n' character:
Here is how you can implement the third approach:
bool takeNext = true;
int ch;
while ((ch = fgetc(file)) != EOF) {
if (takeNext && ch != '(') {
printf("%c", ch);
}
takeNext = (ch == '\n');
}
Here is a slightly longer character-based approach, which conditions on whether the first character in a line is ( or not.
If it is (, then we consume everything up to and including the next newline without outputting.
If it not, then we do the same thing but we output the characters as we read them.
#include <stdio.h>
int main(){
FILE *file = fopen("test.txt", "r");
int c;
while((c = getc(file)) != -1) {
if (c == '(') {
// Skip until the next newline
do {
c = getc(file);
} while (c != -1 && c != '\n');
continue;
}
else {
putchar(c);
do {
c = getc(file);
putchar(c);
} while (c != -1 && c != '\n');
}
}
fclose(file);
}
Change c to string because fscanf reads string. See if the 1st character of c matches with (.
If it does not then print the line else skip the line.
FILE *file = fopen("test.txt", "r");
char c[100];
while(fscanf(file, "%s", c)) {
if (c[0] != '(')
printf("%s", c);
}
Use fgets to read whole lines. It is also safer than fscanf as it limits the reading to the buffer size.
To check if the first char is '(' you can refer to it directly:
if (buf[0]=='(')
or
if (*buf=='(')

Resources