I have been trying to extract hours, seconds and minutes from an input text using sscanf. After sscanf function is performed, only s variable which holds the seconds has the right value. h and m which have hours and minutes in them hold only zeros. Please suggest changes to my code below.
char text[20];
if (fgets(text, sizeof text, stdin)!= NULL){
char* newline = strchr(text, '\n');
if (newline != NULL){
*newline = '\0';
}
}
uint8_t s = 0;
uint8_t m = 0;
uint8_t h = 0;
sscanf(text, "%02i:%02i:%02i",&h,&m,&s);
Note in the debugger, text has the right values.
This program:
#include <stdio.h>
int main(void)
{
const char hhmmss[] = "10:32:54";
int hh, mm, ss;
if (sscanf(hhmmss, "%i:%i:%i", &hh, &mm, &ss) != 3)
printf("Failed to scan 3 values from '%s'\n", hhmmss);
else
printf("From <<%s>> hh = %d, mm = %d, ss = %d\n", hhmmss, hh, mm, ss);
return 0;
}
gives this output:
From <<10:32:54>> hh = 10, mm = 32, ss = 54
The %02i conversions should also work, but the digits are somewhat superfluous.
The amended question shows that the variables are of type uint8_t, in which case you must use the correct conversion specifiers from <inttypes.h>:
#include <stdio.h>
#include <inttypes.h>
int main(void)
{
const char hhmmss[] = "10:32:54";
uint8_t s;
uint8_t m;
uint8_t h;
if (sscanf(hhmmss, "%02" SCNi8 ":%02" SCNi8 ":%02" SCNi8, &h, &m, &s) != 3)
printf("Failed to scan 3 values from '%s'\n", hhmmss);
else
printf("From <<%s>> h = %d, m = %d, s = %d\n", hhmmss, h, m, s);
return 0;
}
This produces the same output as before. With any of the scanf() family of functions, it is crucial that your format conversion specifiers match the types of the pointers you are passing into the function. You can get away with quite a lot of mismatches in printf() - certainly by comparison - because of default integer (in particular) promotions, but scanf() is a lot less forgiving.
#Jonathan Leffler's answer is entirely correct, but...
You should never use scanf or fscanf or sscanf for parsing input from file handle or a string, except perhaps in known-to-be-thrown-away-tomorrow code. They are too error-prone and too hard to control. For an exhaustive summary of the various problems with scanf, I recommend this series of articles. A few highlights:
If you need to read single characters, use getchar.
If you want to read a string, scanf has all the buffer overflow problems of gets.
If you want to read numbers, scanf's parsing is error-prone and hard to use. Use strtoul and strtod instead.
If you have more complicated input, everything is just worse.
What to do?
Read your own input using something better than gets, that is, without buffer overflow problems. Do not attempt to combine getting the bytes in with interpreting them.
Use a combination of strcspn, strspn, and stroul and strtod combined with some custom code to scan the input.
There are times when this too is a drag, but by that time your typically building some sort of input language that needs more generic techniques anyway.
Related
I have the following code which reads from a given input file into and then into struct I have made.
OFFFile ReadOFFFile(OFFFile fileData, FILE* srcFile)
{
int nvert, nfaces;
fscanf(srcFile, "%s\n");
fscanf(srcFile, "%d %d %s\n", &nvert, &nfaces);
fileData.nvert = nvert;
fileData.nfaces = nfaces;
fileData.vertices = (int *) malloc(fileData.nvert * sizeof(float));
fileData.triFaces = (int *) malloc(fileData.nfaces * sizeof(int));
// Print to check correct size was allocated
printf("%d\n", (fileData.nvert * sizeof(float)));
printf("%d\n", (fileData.nfaces * sizeof(int)));
int i;
float ftemp1, ftemp2, ftemp3;
int itemp1, itemp2, itemp3;
fscanf(srcFile, "%f", &ftemp1);
printf("%lf", ftemp1);
fscanf(srcFile, "%f", &ftemp2);
// fscanf(srcFile, " %lf", &ftemp3);
/* for (i = 0; i < nvert; ++i)
{
fscanf(srcFile, "%f %f %f\n", &ftemp1, &ftemp2, &ftemp3);
fileData.vertices[i].x = ftemp1;
fileData.vertices[i].y = ftemp2;
fileData.vertices[i].z = ftemp3;
}
*/
return(fileData);
}
The problem I am having is with the whole last section that is currently in quotes (The 2 fscanf lines above it are me attempting to test). If I have just one float being read it works fine, but when I add the second or third the whole function fails to even run, although it still compiles. I believe it to be caused by the negative sign in the input, but I don't know how I can fix it.
The data is in the form
OFF
4000 7000 0
0.8267261981964111 -1.8508968353271484 0.6781123280525208
0.7865174412727356 -1.8490413427352905 0.7289819121360779
With the floats continuing on for 4000 lines (hence for loop). These are the structs I have made
typedef struct
{
float x;
float y;
float z;
} Point3D;
typedef struct
{
int face1;
int face2;
int face3;
} triFace;
typedef struct
{
int nvert;
int nfaces;
Point3D *vertices;
triFace *triFaces;
} OFFFile;
Text dump of another file with a lot less lines, also does not work in the for loop. Only using this for testing. https://justpaste.it/9ohcc
Your main problem is the first line in the readOFFFile function:
fscanf(srcFile, "%s\n");
This tries to read a string (presumably the string OFF on the first line of the file), but you don't give fscanf any place to store the string, so it crashes. (As an aside, your compiler really should have warned you about this problem. If it didn't, it's old-fashioned, and there are lots of easy mistakes that it's probably not going to warn you about, and learning C is going to be much harder than it ought to be. Or perhaps you just need to find an option flag or checkbox to enable more warnings.)
You can tell fscanf to read and discard something, without storing it anywhere, using the * modifier. Here's a modified version of your program, that works for me.
void ReadOFFFile(OFFFile *fileData, FILE* srcFile)
{
fscanf(srcFile, "%*s");
if(fscanf(srcFile, "%d %d %*s", &fileData->nvert, &fileData->nfaces) != 2) {
exit(1);
}
fileData->vertices = malloc(fileData->nvert * sizeof(Point3D));
fileData->triFaces = malloc(fileData->nfaces * sizeof(triFace));
int i;
for (i = 0; i < fileData->nvert; ++i)
{
if(fscanf(srcFile, "%f %f %f", &fileData->vertices[i].x,
&fileData->vertices[i].y,
&fileData->vertices[i].z) != 3) {
exit(1);
}
}
}
I have made a few other changes. The other fscanf call, that reads three values but only stores two, also needs a * modifier. I check the return value of fscanf to catch errors (via a crude exit) if the input is not as expected. I got rid of the \n characters in the fscanf calls, since they're not necessary, and potentially misleading. I got rid of some unnecessary temporary variables, and I had the readOFFFile function accept a pointer to an OFFFile structure to fill in, rather than passing and returning it.
Here is the main program I tested it with:
int main()
{
OFFFile fd;
FILE *fp = fopen("dat", "r");
ReadOFFFile(&fd, fp);
for (int i = 0; i < fd.nvert; ++i)
printf("%f %f %f\n", fd.vertices[i].x, fd.vertices[i].y, fd.vertices[i].z);
}
This is still an incomplete program: there are several more places where you need to check for errors (opening the file, calling malloc, etc.), and when you do detect an error, you need to at least print a useful error message before exiting or whatever.
One more thing. As I mentioned, those \n characters you had in the fscanf format strings were unnecessary and misleading. To illustrate what I mean, once you get the program working, have it try to read a data file like this:
OFF 2 0
0 0.8267261981964111
-1.8508968353271484 0.6781123280525208
0.7865174412727356 -1.8490413427352905 0.7289819121360779
Totally malformed, but the program reads it without complaint! This is one reason (one of several dozen reasons, actually) why the scanf family of functions is basically useless for most things. These functions claim to "scan formatted data", but their definition of "formatted" is quite loose, in that they actually read free-form input, generally without any regard for line boundaries.
For some advice on graduating beyond scanf and using better, more reliable methods for reading input, see this question. See also this section and this section in some online C programming course notes.
The line:
fscanf(srcFile, "%s\n");
is invoking undefined behavior. The compiler should warn you about that. Once you've invoked UB, there's no point in speculating further about what is happening.
It's not clear to me what you intended that line to do, but if you use %s in a scanf, you need to give it a valid place to write data. You should always check the value returned by scanf to ensure that you have actually read some data, and you should never use "%s" without a width modifier. Perhaps you want something like:
char buf[256];
if( fscanf(srcFile, "%255s ", buf) == 1 ){
/* Do something with the string in buf */
}
From your comment, it seems that you are intending to use that scanf to skip a line. I strongly recommend using a while(fgetc) loop instead of scanf to do that. If you do want to use scanf, you could try something like fscanf(srcFile, "%*s\n"), but beware that it will stop at the first whitespace, and not necessarily consume an entire line. You could also do fscanf(srcFile, "%*[^\n]%*c"); to consume the line, but you are really better off using a fgetc in a while loop.
Addressing title question:
"How do I read multiple floats from one line of a file"
...with suggestions for a non-scanf() approach.
Assuming the file is opened, (and a file pointer) fp is obtained ) , the first two lines are already handled, and values into ints, say the lines value is converted to int lines;
And given your struct definition (modified to use double to accommodate type compatibility in code below):
typedef struct
{
double x;
double y;
double z;
} Point3D;
In a function somewhere here is one way to parse the contents of each data line into the 3 respective struct values using fgets(), strtok() and strtod():
char delim[] = " \n";
char *tok = NULL;
char newLine[100] = {0};
Point3D *point = calloc(lines, sizeof(*point));
if(point)
{
int i = 0;
while(fgets(newLine, sizeof newLine, fp))
{
tok = strtok(newLine, delim);
if(tok)
{
if(parseDbl(tok, &point[i].x))
{
tok = strtok(NULL, delim);
if(tok)
{
if(parseDbl(tok, &point[i].y))
{
tok = strtok(NULL, delim);
if(tok)
{
if(!parseDbl(tok, &point[i].z))
{
;//handle error
}else ;//continue
}else ;//handle error
}else ;//handle error
}else ;//handle error
}else ;//handle error
}else ;//handle error
i++;//increment for next read
}//end of while
}else ;//handle error
Where parseDbl is defined as:
bool parseDbl(const char *str, double *val)
{
char *temp = NULL;
bool rc = true;
errno = 0;
*val = strtod(str, &temp);
if (temp == str)
rc = false;
return rc;
}
I have a file format like this
1.9969199999999998 2.4613199999999997 130.81278270000001 AA
2.4613199999999997 2.5541999999999998 138.59131554109211 BB
2.5541999999999998 2.9953799999999995 146.83238401449094 CC
...........................
I have to read first three columns as float and the last column as char array in C. All the columns are tab separated and the there is an new line character at the end of each line. Everything works fine with fscanf(fp1, "%f\t%f\t%f\t%s\n", ...) till I have a some text at the end of each line (the char string part).
There are cases where instead of AA/BB/CC, I have an empty string in the file. How to handle that case. I have tried fscanf(fp1, "%f\t%f\t%f\t%s[^\n]\n", ...) and many other things, but I am unable to figure out the right way. Can you please help me out here?
Using float rather than double will throw away about half the digits shown. You get 6-7 decimal digits with float; you get 15+ digits with double.
As to your main question: use fgets() (or POSIX
getline()) to read lines and then sscanf() to parse the line that is read. This will avoid confusion. When the input is line-based but not regular enough, don't use fscanf() and family to read the data — the file-reading scanf() functions don't care about newlines, even when you do.
Note that sscanf() will return either 3 or 4, indicating whether there was a string at the end of a line or not (or EOF, 0, 1 or 2 if it is given an empty string, or a string which doesn't start with a number, or a string which only contains one or two numbers). Always test the return value from scanf() and friends — but do so carefully. Look for the number of values that you expect (3 or 4 in this example), rather than 'not EOF'.
This leads to roughly:
#include <stdio.h>
int main(void)
{
double d[3];
char text[20];
char line[4096];
while (fgets(line, sizeof(line), stdin) != 0)
{
int rc = sscanf(line, "%lf %lf %lf %19s", &d[0], &d[1], &d[2], &text[0]);
if (rc == 4)
printf("%13.6f %13.6f %13.6f [%s]\n", d[0], d[1], d[2], text);
else if (rc == 3)
printf("%13.6f %13.6f %13.6f -NA-\n", d[0], d[1], d[2]);
else
printf("Format error: return code %d\n", rc);
}
return 0;
}
If given this file as standard input:
1.9969199999999998 2.4613199999999997 130.81278270000001 AA
2.4613199999999997 2.5541999999999998 138.59131554109211 BB
2.5541999999999998 2.9953799999999995 146.83238401449094 CC
19.20212223242525 29.3031323334353637 3940.41424344454647
19.20212223242525 29.3031323334353637 3940.41424344454647 PolyVinyl-PolySaccharide
the output is:
1.996920 2.461320 130.812783 [AA]
2.461320 2.554200 138.591316 [BB]
2.554200 2.995380 146.832384 [CC]
19.202122 29.303132 3940.414243 -NA-
19.202122 29.303132 3940.414243 [PolyVinyl-PolySacch]
You can tweak the output format to suit yourself. Note that the %19s avoids buffer overflow even when the text is longer than 19 characters.
I've been sitting with this problem for 2 days and I can't figure out what I'm doing wrong. I've tried debugging (kind of? Still kind of new), followed this link: https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ And I've tried Google and all kinds of things. Basically I'm reading from a file with this format:
R1 Fre 17/07/2015 18.00 FCN - SDR 0 - 2 3.211
and I have to make the program read this into a struct, but when I try printing the information it comes out all wrong. My code looks like this:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_INPUT 198
typedef struct game{
char weekday[4],
home_team[4],
away_team[4];
int round,
hour,
minute,
day,
month,
year,
home_goals,
away_goals,
spectators;}game;
game make_game(FILE *superliga);
int main(void){
int input_number,
number_of_games = 198,
i = 0;
game tied[MAX_INPUT];
FILE *superliga;
superliga = fopen("superliga-2015-2016.txt", "r");
for(i = 0; i < number_of_games; ++i){
tied[i] = make_game(superliga);
printf("R%d %s %d/%d/%d %d.%d %s - %s %d - %d %d\n",
tied[i].round, tied[i].weekday, tied[i].day, tied[i].month,
tied[i].year, tied[i].hour, tied[i].minute, tied[i].home_team,
tied[i].away_team, tied[i].home_goals, tied[i].away_goals,
tied[i].spectators);}
fclose(superliga);
return 0;
}
game make_game(FILE *superliga){
double spect;
struct game game_info;
fscanf(superliga, "R%d %s %d/%d/%d %d.%d %s - %s %d - %d %lf\n",
&game_info.round, game_info.weekday, &game_info.day, &game_info.month,
&game_info.year, &game_info.hour, &game_info.minute, game_info.home_team,
game_info.away_team, &game_info.home_goals, &game_info.away_goals,
&spect);
game_info.spectators = spect * 1000;
return game_info;
}
The problem is in your file. It starts with whitespaces, not with R's as you stated in the control string.
Check the return value of fscanf() and you'll see that it's zero everytime.
If you add a leading whitespace to your fscanf() call, your problem will be solved, like this:
fscanf(superliga, " R%d %s %d/%d/%d %d.%d %s - %s %d - %d %lf\n",
&game_info.round, game_info.weekday, &game_info.day, &game_info.month,
&game_info.year, &game_info.hour, &game_info.minute, game_info.home_team,
game_info.away_team, &game_info.home_goals, &game_info.away_goals,
&spect);
If each line in the file is a separate record, you should read each line as a string, then try to parse each string.
(Note that this also has the added feature of speculative parsing: you can try parsing the line in several different formats, and accept the one that parses correctly. I like to use this when I accept e.g. vector inputs, so that the user can use x y z, x, y, z, x/y/z, (x,y,z), [x,y,z], <x y z>, <x,y,z>, and so on, depending on what they like. It's only one additional scanf per format, after all.)
To read lines, you can use fgets() into a local buffer. The local buffer must be long enough. If the program is to run on POSIX.1 machines only (i.e., not on Windows), then you can use getline() instead, which can dynamically reallocate the given buffer as needed, so you're not limited to any specific line length.
To parse the string, use sscanf().
Note that all tabs, spaces, and newlines in the pattern in all of the scanf family of functions are treated exactly the same: they indicate any number of any type of whitespace. In other words, \n does not mean "and then a newline"; it means the same as a space, i.e. "and possibly some whitespace here". However, all conversions except %c and %[ automatically skip any leading whitespace; so, with the exception of a space before one of those two, the spaces in the pattern are only meaningful to us humans, they do not have any functional effect in the scanning.
All scanf family of functions return the number of successful conversions. (The only exception is the "conversion" %n, which yields the number of characters consumed; some implementations include it in the conversion count, and some others do not.) If end of input occurs prior to the first conversion, or a read error occurs, or the input does not match with the fixed part of the pattern, the functions will return EOF.
Even if you suppress saving the result of a conversion -- for example, if you have a word in the input you don't need, you can convert but discard it with %*s --, it is counted. So, for example sscanf(line, " %*d %*s %*d") returns 3 if the line starts with an integer, followed by a word (anything that is not a newline nor contains whitespace), followed by an integer.
Rather than have the function return the parsed structure, pass a pointer to the structure (and the file handle to read from), and return a status code. I prefer 0 for success, and nonzero for failure, but feel free to change that.
In other words, I'd suggest you change your read function into
#ifndef GAME_LINE_MAX
#define GAME_LINE_MAX 1022
#endif
int read_game(game *one, FILE *in)
{
char buffer[GAME_LINE_MAX + 2]; /* + '\n' + '\0' */
char *line;
/* Sanity check: no NULL pointers accepted! */
if (!one || !in)
return -1;
/* Paranoid check: Fail if read error has already occurred. */
if (ferror(in))
return -1;
/* Read the line */
line = fgets(buffer, sizeof buffer, in);
if (!line)
return -1;
/* Parse the game; pattern from OP's example: */
if (sscanf(line, "R%d %3s %d/%d/%d %d.%d %3s - %3s %d - %d %d\n",
&(one->round), one->weekday,
&(one->day), &(one->month), &(one->year),
&(one->hour), &(one->minute)
one->home_team,
one->away_team,
&(one->home_goals),
&(one->away_goals),
&(one->spectators)) < 12)
return -1; /* Line not formatted like above */
/* Spectators in the file are in units of 1000; convert: */
one->spectators *= 1000;
/* Success. */
return 0;
}
To use the above function in a loop, reading games one after another from standard input (stdin):
game g;
while (!read_game(&g, stdin)) {
/* Do something with current game stats, g */
}
if (ferror(stdin)) {
/* Read error occurred! */
} else
if (!feof(stdin)) {
/* Not all data was read/parsed! */
}
The two if clauses above are to check if there was a real read error (as in, a problem with the hardware or something like that), and whether there was unread/unparsed data (not at end of file), respectively.
There are two differences in the scanning pattern compared to the OP: First, all strings parsed are limited to 3 characters, because the structure has only room for 3+1 each. The one character is reserved for the end of string '\0', which is not counted in the maximum length for %s. Second, I parse the spectator count directly, and just multiply the field by 1000 if successful.
Also note how I used one->weekday, one->home_team, and one->away_team to refer to the character arrays. This works, because an array variable can be used as if it was a pointer to the first element in that array. (Given char a[5];, a and &a and &(a[0]) can all be used to refer to the first element in the array a). I like to use this "raw form" when scanning, because it makes it easier to match them to %s conversions, and ensure the pattern matches the parameters.
First of all let me ask for your forgiveness if this is too trivial, I am not a C developer, usually I program in Fortran.
I am in need to read some columnated text files. The problem I have is that some columns can have blank space (non filled value) or not fully filed field.
Let me use a short example of the problem. Lets say I have a generator program like:
#include <stdio.h>
#include <stdlib.h>
int main(){
printf("xxxx%4d%4.2f\n",99,3.14);
}
When I execute this program I get:
$ ./t1
xxxx 993.14
If I get it into a text file and try to read using (e.g.) sscanf with the code:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *fmt = "%*4c%4d%4f";
char *line = "xxxx 993.14";
int ival;
float fval;
sscanf(line,fmt,&ival,&fval);
printf(">>>>%d|%f\n",ival,fval);
}
The result is:
$ ./t2
>>>>993|0.140000
What is the problem here? The sscanf seems to think that all space is meaningless and should be discarded. So the "%4c" does what it is meant to be, it counts 4 characters without discarding any blank space and discards everything due to "". Next the %4d start skipping all blank spaces and start count the 4 characters of the field upon finding the first valid character for the conversion. So the value, meant to be 99 becomes 993, and the 3.14 becomes 0.14.
In Fortran the reading code would be:
program t3
implicit none
integer :: ival
real :: fval
character(len=30) :: fmt="(4x,i4,f4.0)"
character(len=30) :: line="xxxx 993.14"
read(line,fmt) ival, fval
write(*,"('>>>>',i4,'|',f4.2)") ival,fval
end program t3
and the result would be:
$ ./t3
>>>> 99|3.14
That is, the format specification states the field width and nothing is discarding in conversion, except if instructed to by the "nX" specification.
Some final remarks to help the helpers:
The format to be read is an international standard and there is no
way to change it.
The number of existing files is to big to think of intervention or
format change.
It is not a CSV or similar format.
The code has to be in C for integration in a free software package.
Sorry to be too long, trying to state the problem as completely as possible.
The question is: Is there a way to tell sscanf to not skip the blank spaces? If not, is there a simple way to do it in C or it will be necessary write an specialized parser for each record type?
Thank you in advance.
When reading fixed-length fields with sscanf, it is best to parse the values as character strings (which you could do a number of ways), and then perform independent conversion of each of the fields. This allows you to handle conversion/error detection on a per-field basis. For example, you could use a format string of:
char *fmt = "%*4s%2[^0-9]%s";
which would read/discard the 4 leading characters, then read 2-chars as your integer, followed by the remainder of line (or up until the next whitespace) as a string containing your float value.
To handle the storage and parsing of line as fixed length fields, you could use temporary character arrays to hold each of the strings and then use sscanf to fill them much as you have attempted to do with the integer and float directly. e.g.:
char istr[8] = {0};
char fstr[16] = {0};
...
sscanf (line,fmt,istr,fstr);
(note: you could use minimum storage of istr[3] and fstr[7] in this given case, adjust the storage length as required, but providing space for the nul-terminating character)
You can then use strtol and strtof to provide conversion with error checking on each value. For example:
errno = 0;
if ((ival = (int)strtol (istr, NULL, 10)) == 0 && errno)
fprintf (stderr, "error: integer conversion failed.\n");
/* underflow/overflow checks omitted */
and
errno = 0;
if ((fval = strtof (fstr, NULL)) == 0 && errno)
fprintf (stderr, "error: integer conversion failed.\n");
/* nan and inf checks omitted */
Putting all the pieces together in you example, you could use something like:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
int main() {
char *fmt = "%*4s%2[^0-9]%s";
char *line = "xxxx 993.14";
char istr[8] = {0};
char fstr[16] = {0};
int ival;
float fval;
sscanf (line,fmt,istr,fstr);
errno = 0;
if ((ival = (int)strtol (istr, NULL, 10)) == 0 && errno)
fprintf (stderr, "error: integer conversion failed.\n");
/* underflow/overflow checks omitted */
errno = 0;
if ((fval = strtof (fstr, NULL)) == 0 && errno)
fprintf (stderr, "error: integer conversion failed.\n");
/* nan and inf checks omitted */
printf(">>>>%d|%6.2f\n",ival,fval);
return 0;
}
Example/Output
$ >>>>0|993.14
*scanf() is not designed to handle fixed column width with non-intervening white-space.
With sscanf(), to not skip spaces, code must use "%c", "%n", "%[]" as all other specifiers skip leading white-space and those skipped characters do not contribute to a width limit.
To scan the printed line, which in now in buffer, take advantage that the only use of '\n' is at the end of the line.
char str_int[5];
char str_float[5];
int n = 0;
sscanf(buffer, "%*4c%4[^\n]%4[^\n]%n", str_int, str_float, &n);
if (n != 12 || buffer[n] != '\n') Fail();
// Now convert str_int, str_float as needed.
Another way to use sscanf() would be to parse buffer as
int ival;
float fval;
if (strlen(buffer) != 13) Fail();
if (sscanf(&buffer[8], "%f", &fval) != 1) Fail();
buffer[8] = '\0';
if (sscanf(&buffer[4], "%d", &ival) != 1) Fail();
Note: The 4s in the below do not specified the output width as 4 characters. 4 is the minimum width to print.
printf("xxxx%4d%4.2f\n",ival, fval);
Code could use the following to detect problems.
if (13 != printf("xxxx%4d%4.2f\n",ival, fval)) Fail();
Watch out for
printf("xxxx%4d%4.2f\n",123, 9.995000001f); // "xxxx 12310.00\n"
First off, I dunno. There might be some way to wrangle sscanf to recognize the whitespace towards your integer count. But I just don't think scanf was made for this sort of format in mind. The tool's trying to be smart of helpful and it's biting you in the ass.
But if it's columnated data and you know the position of the various fields, there's a really easy work around. Just extract the field you want.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv)
{
char line[] = "xxxx 893.14";
char tmp[100];
int thatDamnNumber;
float myfloatykins;
//Get that field
memcpy(tmp, line+4, 4);
sscanf(tmp, "%d", &thatDamnNumber);
//Kill that field so it doesn't goober-up the float
memset(line+4, ' ', 4);
sscanf(line, "%*4c%f", &myfloatykins);
printf("%d %f\n", thatDamnNumber, myfloatykins);
return 0;
}
If there is a lot of this, you could make some generalized functions: integerExtract(int positionStart, int sizeInCharacters), floatExtract(), etc.
If each element is of fixed width you don't really need scanf(), try this
char copy[5];
const char *line = "xxxx 993.14";
int ival;
float fval;
copy[0] = line[4];
copy[1] = line[5];
copy[2] = line[6];
copy[3] = line[7];
copy[4] = '\0'; // nul terminate for `atoi' to work
ival = atoi(copy);
fval = atof(&line[8]);
fprintf(stdout, "%d -- %f\n", ival, fval);
If you want (probably should) you can use strtol() instead of atoi() and strtof() instead of atof() to check for malformed data.
Both these functions take a parameter to store the unconverted/invalid characters, you can check the passed pointer in order to verify that there was a problem with conversion.
Or if you really want scanf() do the same, capture the integer + whitespaces to a char array and then convert it to int later, like this
char integer[5];
const char *line = "xxxx 993.14";
int ival;
float fval;
if (sscanf(line, "%*4c%4[0-9 ]%f", integer, &fval) != 2)
return -1;
ival = atoi(integer);
fprintf(stdout, "%d -- %f\n", ival, fval);
The format "%*4c%4[0-9 ]%f" will
Skip the first four characters including white spaces.
Scan the next four characters if they consist only of digits or white spaces.
Scan the rest of the input string searching for a matching float value.
I am posting what I think is a final conclusion from the answers I have got so far and from other sources.
What is a very trivial task in Fortran is not a so trivial task in other languages. I guess — not sure — that the same task could be as easy as in Fortran in other languages. I think that Cobol, Pascal, PL/I and others from the time of punched card probably could be trivial.
I think that most languages nowadays are more comfortable with different data structure and inherited its I/O structure from C. I think that Java, Python, Perl(?) and others could serve as examples.
From what I saw in this thread there are two main problems to read / convert fixed column length text data with C.
The first problem is that, as Philip said in his answer: “The tool’s trying to be smart of helpful and it’s biting you in the ass.” Quite right! The point is that it seems that C text I/O thinks that “white space” is something like a NULL character and should be thrown away, completely disregarding any information of the start of field. The only exception to that seems to be the %nc that get exactly n chars, even blanks.
The second problem is that the conversion “tag” (how is that called?) %nf will keep converting while it finds a valid character, even if you say stop at the 4th character.
If we join those two problems with a field completely filled with white space, depending on the conversion tool used, it throws an error or keeps going madly looking for something meaningful.
At the end of the day, it seems that the only way is to extract the field length to another memory area, dynamically allocated or not (we can have an area for each column length), and try to parse this separate area, taking into account the possibility of a full white space area to cache the error.
int x;
printf("hello %n World\n", &x);
printf("%d\n", x);
It's not so useful for printf(), but it can be very useful for sscanf(), especially if you're parsing a string in multiple iterations. fscanf() and scanf() automatically advance their internal pointers by the amount of input read, but sscanf() does not. For example:
char stringToParse[256];
...
char *curPosInString = stringToParse; // start parsing at the beginning
int bytesRead;
while(needsParsing())
{
sscanf(curPosInString, "(format string)%n", ..., &bytesRead); // check the return value here
curPosInString += bytesRead; // Advance read pointer
...
}
It can be used to perform evil deeds.
Depends what you mean by practical. There are always other ways to accomplish it (print into a string buffer with s[n]printf and calculate the length, for example).
However
int len;
char *thing = "label of unknown length";
char *value = "value value value"
char *value2="second line of value";
printf ("%s other stuff: %n", thing, &len);
printf ("%s\n%*s, value, len, value2);
should produce
label of unknown length other stuff: value value value
second line of value
(although untested, I'm not near a C compiler)
Which is just about practical as a way of aligning things, but I wouldn't want to see it in code. There are better ways of doing it.
It's fairly esoteric. If you need to replace a placeholder in the generated string later you might want to remember an index into the middle of the string, so that you don't have to either save the original printf parameter or parse the string.
It could possibly be used as a quick way to get the lengths of various substrings.
#include <stdio.h>
int main(int argc, char* argv[])
{
int col10 = (10 - 1);
int col25 = (25 - 1);
int pos1 = 0;
int pos2 = 0;
printf(" 5 10 15 20 25 30\n");
printf("%s%n%*s%n%*s\n", "fried",
&pos1, col10 - pos1, "green",
&pos2, col25 - pos2, "tomatos");
printf(" ^ ^ ^ ^ ^ ^\n");
printf("%d %d\n", pos1, pos2);
printf("%d %d\n", col10 - pos1, col25 - pos2);
return 0;
}
I am missing something here for sure. Tomatos is too far to the right.
Here's something from the VS2005 CRT code:
/* if %n is disabled, we skip an arg and print 'n' */
if ( !_get_printf_count_output() )
{
_VALIDATE_RETURN(("'n' format specifier disabled", 0), EINVAL, -1);
break;
}
which brings up this:
alt text http://www.shiny.co.il/shooshx/printfn.png
for the following line:
printf ("%s other stuff: %n", thing, &len);
I'm guessing this is mainly to avoid what #eJames is talking about
you can call
int _get_printf_count_output();
to see if %n support is enable, or use
int _set_printf_count_output( int enable );
to Enable or disable support of the %n format.
from MSDN VS2008