I am not new to programming, but I encountered this small problem and I can't seem to get it.
I want to read a file with dates and put them in another file with another format
Input example: 18.08.2015
Output example: 18-08-2015
Here is the code (dat1 has "r" permission and dat2 "w"):
char d[3];
char m[3];
char g[5];
while(fscanf(dat1,"%s.%s.%s\n",&d,&m,&g)==3)
{
fprintf(dat2,"%s-%s-%s\n",d,m,g);
}
On the other hand, this works fine if I use [space] instead of a [dot] in the input file.
(18 08 2015)
What am I missing? The solution has to be as simple as possible and with using fscanf, not fgetc or fgets, to be explained to students that are just beginning to learn C. Thanks.
The %s pattern matches a sequence of non-white-space characters, so the first %s will gobble up the entire string.
Why use char arrays at all, why not int?
int d;
int m;
int g;
while(fscanf(dat1,"%d.%d.%d\n",&d,&m,&g)==3)
{
fprintf(dat2,"%d-%d-%d\n",d,m,g);
}
The %d in fprintf will not output leading zeros though. You'll have to teach your students a little bit extra or leave it for extra credit.
Since the scanf format %s reads up to the next whitespace character, it cannot be used for a string ending with a .. Instead use a character class: %2[0-9] or %2[^.]. (Change the 2 to the maximum number of characters you can handle, and don't forget that the [ format code does not skip whitespace, so if you want to do that, put a space before the format code.)
Change
fscanf(dat1,"%s.%s.%s\n",&d,&m,&g)
to
fscanf(dat1,"%[^.].%[^.].%[^.]\n",d,m,g);
Related
I have run into some code and was wondering what the original developer was up to. Below is a simplified program using this pattern:
#include <stdio.h>
int main() {
char title[80] = "mytitle";
char title2[80] = "mayataiatale";
char mystring[80];
/* hugh ? */
sscanf(title,"%[^a]",mystring);
printf("%s\n",mystring); /* Output is "mytitle" */
/* hugh ? */
sscanf(title2,"%[^a]",mystring); /* Output is "m" */
printf("%s\n",mystring);
return 0;
}
The man page for scanf has relevant information, but I'm having trouble reading it. What is the purpose of using this sort of notation? What is it trying to accomplish?
The main reason for the character classes is so that the %s notation stops at the first white space character, even if you specify field lengths, and you quite often don't want it to. In that case, the character class notation can be extremely helpful.
Consider this code to read a line of up to 10 characters, discarding any excess, but keeping spaces:
#include <ctype.h>
#include <stdio.h>
int main(void)
{
char buffer[10+1] = "";
int rc;
while ((rc = scanf("%10[^\n]%*[^\n]", buffer)) >= 0)
{
int c = getchar();
printf("rc = %d\n", rc);
if (rc >= 0)
printf("buffer = <<%s>>\n", buffer);
buffer[0] = '\0';
}
printf("rc = %d\n", rc);
return(0);
}
This was actually example code for a discussion on comp.lang.c.moderated (circa June 2004) related to getline() variants.
At least some confusion reigns. The first format specifier, %10[^\n], reads up to 10 non-newline characters and they are assigned to buffer, along with a trailing null. The second format specifier, %*[^\n] contains the assignment suppression character (*) and reads zero or more remaining non-newline characters from the input. When the scanf() function completes, the input is pointing at the next newline character. The body of the loop reads and prints that character, so that when the loop restarts, the input is looking at the start of the next line. The process then repeats. If the line is shorter than 10 characters, then those characters are copied to buffer, and the 'zero or more non-newlines' format processes zero non-newlines.
The constructs like %[a] and %[^a] exist so that scanf() can be used as a kind of lexical analyzer. These are sort of like %s, but instead of collecting a span of as many "stringy" characters as possible, they collect just a span of characters as described by the character class. There might be cases where writing %[a-zA-Z0-9] might make sense, but I'm not sure I see a compelling use case for complementary classes with scanf().
IMHO, scanf() is simply not the right tool for this job. Every time I've set out to use one of its more powerful features, I've ended up eventually ripping it out and implementing the capability in a different way. In some cases that meant using lex to write a real lexical analyzer, but usually doing line at a time I/O and breaking it coarsely into tokens with strtok() before doing value conversion was sufficient.
Edit: I ended ripping out scanf() typically because when faced with users insisting on providing incorrect input, it just isn't good at helping the program give good feedback about the problem, and having an assembler print "Error, terminated." as its sole helpful error message was not going over well with my user. (Me, in that case.)
It's like character sets from regular expressions; [0-9] matches a string of digits, [^aeiou] matches anything that isn't a lowercase vowel, etc.
There are all sorts of uses, such as pulling out numbers, identifiers, chunks of whitespace, etc.
You can read about it in the ISO/IEC9899 standard available online.
Here is a paragraph I quote from the document about [ (Page 286):
Matches a nonempty sequence of characters from a set of expected
characters.
The conversion specifier includes all subsequent characters in the
format string, up to and including the matching right bracket (]). The
characters between the brackets (the scanlist) compose the scanset,
unless the character after the left bracket is a circumflex (^), in
which case the scanset contains all characters that do not appear in
the scanlist between the circumflex and the right bracket. If the
conversion specifier begins with [] or [^], the right bracket
character is in the scanlist and the next following right bracket
character is the matching right bracket that ends the specification;
otherwise the first following right bracket character is the one that
ends the specification. If a - character is in the scanlist and is not
the first, nor the second where the first character is a ^, nor the
last character, the behavior is implementation-defined.
I have a file where each line looks like this:
cc ssssssss,n
where the two first 'c's are individual characters, possibly spaces, then a space after that, then the 's's are a string that is 8 or 9 characters long, then there's a comma and then an integer.
I'm really new to c and I'm trying to figure out how to put this into 4 seperate variables per line (each of the first two characters, the string, and the number)
Any suggestions? I've looked at fscanf and strtok but i'm not sure how to make them work for this.
Thank you.
I'm assuming this is a C question, as the question suggests, not C++ as the tags perhaps suggest.
Read the whole line in.
Use strchr to find the comma.
Do whatever you want with the first two characters.
Switch the comma for a zero, marking the end of a string.
Call strcpy from the fourth character on to extract the sssssss part.
Call atoi on one character past where the comma was to extract the integer.
A string is a sequence of characters that ends at the first '\0'. Keep this in mind. What you have in the file you described isn't a string.
I presume n is an integer that could span multiple decimal places and could be negative. If that's the case, I believe the format string you require is "%2[^ ] %9[^,\n],%d". You'll want to pass fscanf the following expressions:
Your FILE *,
The format string,
An array of 3 chars silently converted to a pointer,
An array of 9 chars silently converted to a pointer,
... and a pointer to int.
Store the return value of fscanf into an int. If fscanf returns negative, you have a problem such as EOF or some other read error. Otherwise, fscanf tells you how many objects it assigned values into. The "success" value you're looking for in this case is 3. Anything else means incorrectly formed input.
I suggest reading the fscanf manual for more information, and/or for clarification.
fscanf function is very powerful and can be used to solve your task:
We need to read two chars - the format is "%c%c".
Then skip a space (just add it to the format string) - "%c%c ".
Then read a string until we hit a comma. Don't forget to specify max string size. So, the format is "%c%c %10[^,]". 10 - max chars to read. [^,] - list of allowed chars. ^, - means all except a comma.
Then skip a comma - "%c%c %10[^,],".
And finally read an integer - "%c%c %10[^,],%d".
The last step is to be sure that all 4 tokens are read - check fscanf return value.
Here is the complete solution:
FILE *f = fopen("input_file", "r");
do
{
char c1 = 0;
char c2 = 0;
char str[11] = {};
int d = 0;
if (4 == fscanf(f, "%c%c %10[^,],%d", &c1, &c2, str, &d))
{
// successfully got 4 values from the file
}
}
while(!feof(f));
fclose(f);
I am having trouble accepting input from a text file. My program is supposed to read in a string specified by the user and the length of that string is determined at runtime. It works fine when the user is running the program (manually inputting the values) but when I run my teacher's text file, it runs into an infinite loop.
For this example, it fails when I am taking in 4 characters and his input in his file is "ABCDy". "ABCD" is what I am supposed to be reading in and 'y' is supposed to be used later to know that I should restart the game. Instead when I used scanf to read in "ABCD", it also reads in the 'y'. Is there a way to get around this using scanf, assuming I won't know how long the string should be until runtime?
Normally, you'd use something like "%4c" or "%4s" to read a maximum of 4 characters (the difference is that "%4c" reads the next 4 characters, regardless, while "%4s" skips leading whitespace and stops at a whitespace if there is one).
To specify the length at run-time, however, you have to get a bit trickier since you can't use a string literal with "4" embedded in it. One alternative is to use sprintf to create the string you'll pass to scanf:
char buffer[128];
sprintf(buffer, "%%%dc", max_length);
scanf(buffer, your_string);
I should probably add: with printf you can specify the width or precision of a field dynamically by putting an asterisk (*) in the format string, and passing a variable in the appropriate position to specify the width/precision:
int width = 10;
int precision = 7;
double value = 12.345678910;
printf("%*.*f", width, precision, value);
Given that printf and scanf format strings are quite similar, one might think the same would work with scanf. Unfortunately, this is not the case--with scanf an asterisk in the conversion specification indicates a value that should be scanned, but not converted. That is to say, something that must be present in the input, but its value won't be placed in any variable.
Try
scanf("%4s", str)
You can also use fread, where you can set a read limit:
char string[5]={0};
if( fread(string,(sizeof string)-1,1,stdin) )
printf("\nfull readed: %s",string);
else
puts("error");
You might consider simply looping over calls to getc().
I'm reading in a .txt file. I'm using fscanf to get the data as it is formatted.
The line I'm having problems with is this:
result = fscanf(fp, "%s", ap->name);
This is fine until I have a name with a whitespace eg: St Ives
So I use this to read in the white space:
result = fscanf(fp, "%[^\n]s", ap->name);
However, when I try to read in the first name (with no white space) it just doesn't work and messes up the other fscanf.
But I use the [^\n] it works fine within a different file I'm using. Not sure what is happening.
If I use fgets in the place of the fscanf above I get "\n" in the variable.
Edit//
Ok, so if I use:
result = fscanf(fp, "%s", ap->name);
result = fscanf(fp, "%[^\n]s", ap->name);
This allows me to read in a string with no white space. But When I get a "name" with whitespace it doesn't work.
One problem with this:
result = fscanf(fp, "%[^\n]s", ap->name);
is that you have an extra s at the end of your format specifier. The entire format specifier should just be %[^\n], which says "read in a string which consists of characters which are not newlines". The extra s is not part of the format specifier, so it's interpreted as a literal: "read the next character from the input; if it's an "s", continue, otherwise fail."
The extra s doesn't actually hurt you, though. You know exactly what the next character of input: a newline. It doesn't match, and input processing stops there, but it doesn't really matter since it's the end of your format specifier. This would cause problems, though, if you had other format specifiers after this one in the same format string.
The real problem is that you're not consuming the newline: you're only reading in all of the characters up to the newline, but not the newline itself. To fix that, you should do this:
result = fscanf(fp, "%[^\n]%*c", ap->name);
The %*c specifier says to read in a character (c), but don't assign it to any variable (*). If you omitted the *, you would have to pass fscanf() another parameter containing a pointer to a character (a char*), where it would then store the resulting character that it read in.
You could also use %[^\n]\n, but that would also read in any whitespace which followed the newline, which may not be what you want. When fscanf finds whitespace in its format specifier (a space, newline, or tab), it consumes as much whitespace as it can (i.e. you can think of it consuming the longest string that matches the regular expression [ \t\n]*).
Finally, you should also specify a maximum length to avoid buffer overruns. You can do this by placing the buffer length in between the % and the [. For example, if ap->name is a buffer of 256 characters, you should do this:
result = fscanf(fp, "%255[^\n]%*c", ap->name);
This works great for statically allocated arrays; unfortunately, if the array is dyamically sized at runtime, there's no easy to way to pass the buffer size to fscanf. You'll have to create the format string with sprintf, e.g.:
char format[256];
snprintf(format, sizeof(format), "%%%d[^\n]%%*c", buffer_size - 1);
result = fscanf(fp, format, ap->name);
Jumm wrote:
If I use fgets in the place of the fscanf above I get "\n" in the variable.
Which is a far easier problem to solve so go with it:
fgets( ap->name, MAX, fp ) ;
nlptr = strrchr ( ap->name, '\n' ) ;
if( nlptr != 0 )
{
*nlptr = '\0' ;
}
I'm not sure how you mean [^\n] is suppose to work. [] is a modifier which says "accept one character except any of the characters which is inside this block". The ^ inverts the condition. %s with fscanf only reads until it comes across a delimiter. For strings with spaces and newlines in them, use a combination of fgets and sscanf instead, and specify a restriction on the length.
There is no such thing as I gather you are trying to imply a regular expression in the fscanf function which does not exist, not that to my knowledge nor have I seen it anywhere - enlighten me on this.
The format specifier for reading a string is %s, it could be that you need to do it this way, %s\n which will pick up the newline.
But for pete's sake do not use the standard old gets family functions as specified by Clifford's answer above as that is where buffer overflows happen and was used in a infamous worm of the 1990's - the Morris Worm, more specifically in the fingerd daemon, that used to call gets that caused chaos. Fortunately, now, that has now been patched. And furthermore, a lot of programmers have been drilled into the mentality not to use the function.
Even Microsoft has adopted a safe version of gets family of functions, that specifies a parameter to indicate the length of buffer instead.
EDIT
My bad - I did not realize that Clifford indeed has specified the max length for input...Whoops! Sorry! Clifford's answer is correct! So +1 to Clifford's answer.
Thanks Neil for pointing out my error...
Hope this helps,
Best regards,
Tom.
I found the problem.
As Paul Tomblin said, I had an extra new line character in the field above. So using what tommieb75 said I used:
result = fscanf(fp, "%s\n", ap->code);
result = fscanf(fp, "%[^\n]s", ap->name);
And this fixed it!
Thanks for your help.
I have run into some code and was wondering what the original developer was up to. Below is a simplified program using this pattern:
#include <stdio.h>
int main() {
char title[80] = "mytitle";
char title2[80] = "mayataiatale";
char mystring[80];
/* hugh ? */
sscanf(title,"%[^a]",mystring);
printf("%s\n",mystring); /* Output is "mytitle" */
/* hugh ? */
sscanf(title2,"%[^a]",mystring); /* Output is "m" */
printf("%s\n",mystring);
return 0;
}
The man page for scanf has relevant information, but I'm having trouble reading it. What is the purpose of using this sort of notation? What is it trying to accomplish?
The main reason for the character classes is so that the %s notation stops at the first white space character, even if you specify field lengths, and you quite often don't want it to. In that case, the character class notation can be extremely helpful.
Consider this code to read a line of up to 10 characters, discarding any excess, but keeping spaces:
#include <ctype.h>
#include <stdio.h>
int main(void)
{
char buffer[10+1] = "";
int rc;
while ((rc = scanf("%10[^\n]%*[^\n]", buffer)) >= 0)
{
int c = getchar();
printf("rc = %d\n", rc);
if (rc >= 0)
printf("buffer = <<%s>>\n", buffer);
buffer[0] = '\0';
}
printf("rc = %d\n", rc);
return(0);
}
This was actually example code for a discussion on comp.lang.c.moderated (circa June 2004) related to getline() variants.
At least some confusion reigns. The first format specifier, %10[^\n], reads up to 10 non-newline characters and they are assigned to buffer, along with a trailing null. The second format specifier, %*[^\n] contains the assignment suppression character (*) and reads zero or more remaining non-newline characters from the input. When the scanf() function completes, the input is pointing at the next newline character. The body of the loop reads and prints that character, so that when the loop restarts, the input is looking at the start of the next line. The process then repeats. If the line is shorter than 10 characters, then those characters are copied to buffer, and the 'zero or more non-newlines' format processes zero non-newlines.
The constructs like %[a] and %[^a] exist so that scanf() can be used as a kind of lexical analyzer. These are sort of like %s, but instead of collecting a span of as many "stringy" characters as possible, they collect just a span of characters as described by the character class. There might be cases where writing %[a-zA-Z0-9] might make sense, but I'm not sure I see a compelling use case for complementary classes with scanf().
IMHO, scanf() is simply not the right tool for this job. Every time I've set out to use one of its more powerful features, I've ended up eventually ripping it out and implementing the capability in a different way. In some cases that meant using lex to write a real lexical analyzer, but usually doing line at a time I/O and breaking it coarsely into tokens with strtok() before doing value conversion was sufficient.
Edit: I ended ripping out scanf() typically because when faced with users insisting on providing incorrect input, it just isn't good at helping the program give good feedback about the problem, and having an assembler print "Error, terminated." as its sole helpful error message was not going over well with my user. (Me, in that case.)
It's like character sets from regular expressions; [0-9] matches a string of digits, [^aeiou] matches anything that isn't a lowercase vowel, etc.
There are all sorts of uses, such as pulling out numbers, identifiers, chunks of whitespace, etc.
You can read about it in the ISO/IEC9899 standard available online.
Here is a paragraph I quote from the document about [ (Page 286):
Matches a nonempty sequence of characters from a set of expected
characters.
The conversion specifier includes all subsequent characters in the
format string, up to and including the matching right bracket (]). The
characters between the brackets (the scanlist) compose the scanset,
unless the character after the left bracket is a circumflex (^), in
which case the scanset contains all characters that do not appear in
the scanlist between the circumflex and the right bracket. If the
conversion specifier begins with [] or [^], the right bracket
character is in the scanlist and the next following right bracket
character is the matching right bracket that ends the specification;
otherwise the first following right bracket character is the one that
ends the specification. If a - character is in the scanlist and is not
the first, nor the second where the first character is a ^, nor the
last character, the behavior is implementation-defined.