For a school assignment, I have to read in a string that has at least one but up to three variables(named command, one, and two). There is always a character at the beginning of the string, but it may or may not be followed by integers. The format could be like any of the following:
i 5 17
i 3
p
d 4
I am using fgets to read the string from the file, but I'm having trouble processing it. I've been trying to use sscanf, but I'm getting segfaults reading in a string that only has one or two variables instead of three.
Is there a different function I should be using?
Or is there a way to format sscanf to do what I need?
I've tried sscanf(buffer, "%c %d %d", command, one, two) and several variations with no luck.
sscanf is probably up to this task, depending on the exact requirements and ranges of inputs.
The key here is is that the scanf family functions returns a useful value which indicates how many conversions were made. This can be less than zero: the value EOF (a negative value) can be returned if the end of the input occurs or an I/O error, before the first conversion is even attempted.
Note that the %c conversion specifier doesn't produce a null-terminated string. By default, it reads only one character and stores it through the argument pointer. E.g.
char ch;
sscanf("abc", "%c", &ch);
this will write the character 'a' into ch.
Unless you have an iron-clad assurance that the first field is always one character wide, it's probably better to read it as a string with %s. Always use a maximum width with %s not to overflow the destination buffer. For instance:
char field1[64]; /* one larger than field width, for terminating null */
sscanf(..., "%63s", field1, ...);
sscanf doesn't perform any overflow checks on integers. If %d is used to scan a large negative or positive value that doesn't fit into int, the behavior is simply undefined according to ISO C. So, just like with %s, %d is best used with a field width limitation. For instance, %4d for reading a four digit year. Four decimal digits will not overflow int.
Related
Well as said Im using C language and fscanf for this task but it seems to make the program crash each time then its surely that I did something wrong here, I havent dealed a lot with this type of input read so even after reading several topics here I still cant find the right way, I have this array to read the 2 bytes
char p[2];
and this line to read them, of course fopen was called earlier with file pointer fp, I used "rb" as read mode but tried other options too when I noticed this was crashing, Im just saving space and focusing in the trouble itself.
fscanf(fp,"%x%x",p[0],p[1]);
later to convert into decimal I have this line (if its not the EOF that we reached)
v = strtol(p, 0, 10);
Well v is mere integer to store the final value we are seeking. But the program keeps crashing when scanf is called or I think thats the case, Im not compiling to console so its a pitty that I cant output what has been done and what hasnt but in debugger it seems like crashing there
Well I hope you can help me out in this, Im a bit lost regarding this type of read/conversion any clue will help me greatly, thanks =).
PS forgot to add that this is not homework, a friend want to make some file conversion for a game and this code will manipulate the files needed alone, so while I could be using any language or environment for this, I always feel better in C language
char strings in C are really called null-terminated byte strings. That null-terminated part is important, as it means a string of two characters needs space for three characters to include the null-terminator character '\0'. Not having the terminator means string functions will go out of bounds in their search for it, leading to undefined behavior.
Furthermore the "%x" format is to read a heaxadecimal integer number and store it in an int. Mismatching format specifiers and arguments leads to undefined behavior.
Lastly and probably what's causing the crash: The scanf family of function expects pointers as their arguments. Not providing pointers will again lead to undefined behavior.
There are two solutions to the above problems:
Going with code similar to what you already use, first of all you must make space for the terminator in the array. Then you need to read two characters. Lastly you need to add the terminator:
char p[3] = { 0 }; // String for two characters, initialized to zero
// The initialization means that we don't need to explicitly add the terminator
// Read two characters, skipping possible leading white-space
fscanf(fp," %c%c",p[0],p[1]);
// Now convert the string to an integer value
// The string is in base-16 (two hexadecimal characters)
v = strtol(p, 0, 16);
Read the hexadecimal value into an integer directly:
unsigned int v;
fscanf(fp, "%2x", &v); // Read as hexadecimal
The second alternative is what I would recommend. It reads two characters and parses it as a hexadecimal value, and stores the result into the variable v. It's important to note that the value in v is stored in binary! Hexadecimal, decimal or octal are just presentation formats, internally in the computer it will still be stored in binary ones and zeros (which is true for the first alternative as well). To print it as decimal use e.g.
printf("%d\n", v);
You need to pass to fscanf() the address of a the variable(s) to scan into.
Also the conversion specifier need to suite the variable provided. In your case those are chars. x expects an int, to scan into a char use the appropriate length modifiers, two times h here:
fscanf(fp, "%hhx%hhx", &p[0], &p[1]);
strtol() expects a C-string as 1st parameter.
What you pass isn't a C-string, as a C-string ought to be 0-terminated, which p isn't.
To fix this you could do the following:
char p[3];
fscanf(fp, "%x%x", p[0], p[1]);
p[2] = '\0';
long v = strtol(p, 0, 10);
I have this .txt file that contains only:
THN1234 54
How can I take only the number 54, to isolate it from the rest and to use it as an integer variable in my program?
If the input is from standard input, then you could use:
int value;
if (scanf("%*s %d", &value) != 1)
…Oops - incorrectly formatted data…
…use value…
The %*s reads but discards optional leading blanks and a sequence of one or more non-blanks (THN1234); the blank skips more optional blanks; the %d reads the integer, leaving a newline behind in the input buffer. If what follows the blank is not convertible to a number, or if you get EOF, you get to detect it in the if condition and report it in the body of the if.
Hmmm…and I see that BLUEPIXY said basically the same (minus the explanation) in their comment, even down to the choice of integer variable name.
Wow. It's been a long time since I have used C. However, I think the answer is similar for C and C++ in this case. You can use strtok_r to split the string into tokens then take the second token and parse it into an int. See http://www.cplusplus.com/reference/clibrary/cstring/strtok/.
You might also want to look at this question as well.
The language I am using is C
I am trying to scan data from a file, and the code segment is like:
char lsm;
long unsigned int address;
int objsize;
while(fscanf(mem_trace,"%c %lx,%d\n",&lsm,&address,&objsize)!=EOF){
printf("%c %lx %d\n",lsm,address,objsize);
}
The file which I read from has the first line as follows:
S 00600aa0,1
I 004005b6,5
I 004005bb,5
I 004005c0,5
S 7ff000398,8
The results that show in stdout is:
8048350 134524916
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398,8
Obviously, the results had an extra line which comes nowhere.Is there anybody know how this could happen?
Thx!
This works for me on the data you supply:
#include <stdio.h>
int main(void)
{
char lsm[2];
long unsigned int address;
int objsize;
while (scanf("%1s %lx,%d\n", lsm, &address, &objsize) == 3)
printf("%s %9lx %d\n", lsm, address, objsize);
return 0;
}
There are multiple changes. The simplest and least consequential is the change from fscanf() to scanf(); that's for my convenience.
One important change is the type of lsm from a single char to an array of two characters. The format string then uses %1s reads one character (plus NUL '\0') into the string, but it also (and this is crucial) skips leading blanks.
Another change is the use of == 3 instead of != EOF in the condition. If something goes wrong, scanf() returns the number of successful matches. Suppose that it managed to read a letter but what followed was not a hex number; it would return 1 (not EOF). Further, it would return 1 on each iteration until it could find something that matched a hex number. Always test for the number of values you expect.
The output format was tidied up with the %9lx. I was testing on a 64-bit system, so the 9-digit hex converts fine. One problem with scanf() is that if you get an overflow on a conversion, the behaviour is undefined.
Output:
S 600aa0 1
I 4005b6 5
I 4005bb 5
I 4005c0 5
S 7ff000398 8
Why did you get the results you got?
The first conversion read a space into lsm, but then failed to convert S into a hex number, so it was left behind for the next cycle. So, you got the left-over garbage printed in the address and object size columns. The second iteration read the S and was then in synchrony with the data until the last line. The newline at the end of the format (like any other white space in the format string) eats white space, which is why the last line worked despite the leading blank.
A directive that is a conversion specification defines a set of
matching input sequences, as described below for each specifier. A
conversion specification is executed in the following steps:
Input white-space characters (as specified by the isspace function)
are skipped, unless the specification includes a [, c, or n specifier.
An input item is read from the stream, unless the specification
includes an n specifier.
[...]
The first time you call fscanf, your %c reads the first blank space in the file. Your white-space character reads zero or more characters of white-space, this time zero of them. Your %lx fails to match the S character in the file, so fscanf returns. You don't check the result. Your variables contain values that they had from earlier operations.
The second time you call fscanf, your %c reads the first S character in the file. From that point on, everything else succeeds too.
Added in editing, here is the simplest change to your format string to solve your problem:
" %c %lx,%d\n"
The space at the beginning will read zero or more characters of white-space and then %c will read the first non-white-space character in the file.
Here is another format string that will also solve your problem:
" %c %lx,%d"
The reason is that if you read and discard zero or more white-space characters twice in a row, the result is the same as doing it just once.
I think that fsanf reads the first character [space] into lsm then fails to read address and objsize because the format shift doesn't match for the rest of the line.
Then it prints a space then whatever happened to be in address and objsize when it was declared
EDIT--
fscanf consumes the whitespaces after each call, if you call ftell you'll see
printf("%c %lx %d %d\n",lsm,address,objsize,ftell(mem_trace));
Why does C's printf format string have both %c and %s?
I know that %c represents a single character and %s represents a null-terminated string of characters, but wouldn't the string representation alone be enough?
Probably to distinguish between null terminated string and a character. If they just had %s, then every single character must also be null terminated.
char c = 'a';
In the above case, c must be null terminated. This is my assumption though :)
%s prints out chars until it reaches a 0 (or '\0', same thing).
If you just have a char x;, printing it with printf("%s", &x); - you'd have to provide the address, since %s expects a char* - would yield unexpected results, as &x + 1 might not be 0.
So you couldn't just print a single character unless it was null-terminated (very inefficent).
EDIT: As other have pointed out, the two expect different things in the var args parameters - one a pointer, the other a single char. But that difference is somewhat clear.
The issue that is mentioned by others that a single character would have to be null terminated isn't a real one. This could be dealt with by providing a precision to the format %.1s would do the trick.
What is more important in my view is that for %s in any of its forms you'd have to provide a pointer to one or several characters. That would mean that you wouldn't be able to print rvalues (computed expressions, function returns etc) or register variables.
Edit: I am really pissed off by the reaction to this answer, so I will probably delete this, this is really not worth it. It seems that people react on this without even having read the question or knowing how to appreciate the technicality of the question.
To make that clear: I don't say that you should prefer %.1s over %c. I only say that reasons why %c cannot be replaced by that are different than the other answer pretend to tell. These other answers are just technically wrong. Null termination is not an issue with %s.
The printf function is a variadic function, meaning that it has variable number of arguments. Arguments are pushed on the stack before the function (printf) is called. In order for the function printf to use the stack, it needs to know information about what is in the stack, the format string is used for that purpose.
e.g.
printf( "%c", ch ); tells the function the argument 'ch'
is to be interpreted as a character and sizeof(char)
whereas
printf( "%s", s ); tells the function the argument 's' is a pointer
to a null terminated string sizeof(char*)
it is not possible inside the printf function to otherwise determine stack contents e.g. distinguishing between 'ch' and 's' because in C there is no type checking during runtime.
%s says print all the characters until you find a null (treat the variable as a pointer).
%c says print just one character (treat the variable as a character code)
Using %s for a character doesn't work because the character is going to be treated like a pointer, then it's going to try to print all the characters following that place in memory until it finds a null
Stealing from the other answers to explain it in a different way.
If you wanted to print a character using %s, you could use the following to properly pass it an address of a char and to keep it from writing garbage on the screen until finding a null.
char c = 'c';
printf('%.1s', &c);
For %s, we need provide the address of string, not its value.
For %c, we provide the value of characters.
If we used the %s instead of %c, how would we provide a '\0' after the characters?
Id like to add another point of perspective to this fun question.
Really this comes down to data typing. I have seen answers on here that state that you could provide a pointer to the char, and provide a
"%.1s"
This could indeed be true. But the answer lies in the C designer's trying to provide flexibility to the programmer, and indeed a (albeit small) way of decreasing footprint of your application.
Sometimes a programmer might like to run a series of if-else statements or a switch-case, where the need is to simply output a character based upon the state. For this, hard coding the the characters could indeed take less actual space in memory as the single characters are 8 bits versus the pointer which is 32 or 64 bits (for 64 bit computers). A pointer will take up more space in memory.
If you would like to decrease the size through using actual chars versus pointers to chars, then there are two ways one could think to do this within printf types of operators. One would be to key off of the .1s, but how is the routine supposed to know for certain that you are truly providing a char type versus a pointer to a char or pointer to a string (array of chars)? This is why they went with the "%c", as it is different.
Fun Question :-)
C has the %c and %s format specifiers because they handle different types.
A char and a string are about as different as night and 1.
%c expects a char, which is an integer value and prints it according to encoding rules.
%s expects a pointer to a location of memory that contains char values, and prints the characters in that location according to encoding rules until it finds a 0 (null) character.
So you see, under the hood, the two cases while they look alike they have not much in common, as one works with values and the other with pointers. One is instructions for interpreting a specific integer value as an ascii char, and the other is iterating the contents of a memory location char by char and interpreting them until a zero value is encountered.
I have done a experiment with printf("%.1s", &c) and printf("%c", c).
I used the code below to test, and the bash's time utility the get the runing time.
#include<stdio.h>
int main(){
char c = 'a';
int i;
for(i = 0; i < 40000000; i++){
//printf("%.1s", &c); get a result of 4.3s
//printf("%c", c); get a result of 0.67s
}
return 0;
}
The result says that using %c is 10 times faster than %.1s. So, althought %s can do the job of %c, %c is still needed for performance.
Since no one has provided an answer with ANY reference whatsoever, here is a printf specification from pubs.opengroup.com which is similar to the format definition from IBM
%c
The int argument shall be converted to an unsigned char, and the resulting byte shall be written.
%s
The argument shall be a pointer to an array of char. Bytes from the array shall be written up to (but not including) any terminating null byte. If the precision is specified, no more than that many bytes shall be written. If the precision is not specified or is greater than the size of the array, the application shall ensure that the array contains a null byte.
I am having trouble accepting input from a text file. My program is supposed to read in a string specified by the user and the length of that string is determined at runtime. It works fine when the user is running the program (manually inputting the values) but when I run my teacher's text file, it runs into an infinite loop.
For this example, it fails when I am taking in 4 characters and his input in his file is "ABCDy". "ABCD" is what I am supposed to be reading in and 'y' is supposed to be used later to know that I should restart the game. Instead when I used scanf to read in "ABCD", it also reads in the 'y'. Is there a way to get around this using scanf, assuming I won't know how long the string should be until runtime?
Normally, you'd use something like "%4c" or "%4s" to read a maximum of 4 characters (the difference is that "%4c" reads the next 4 characters, regardless, while "%4s" skips leading whitespace and stops at a whitespace if there is one).
To specify the length at run-time, however, you have to get a bit trickier since you can't use a string literal with "4" embedded in it. One alternative is to use sprintf to create the string you'll pass to scanf:
char buffer[128];
sprintf(buffer, "%%%dc", max_length);
scanf(buffer, your_string);
I should probably add: with printf you can specify the width or precision of a field dynamically by putting an asterisk (*) in the format string, and passing a variable in the appropriate position to specify the width/precision:
int width = 10;
int precision = 7;
double value = 12.345678910;
printf("%*.*f", width, precision, value);
Given that printf and scanf format strings are quite similar, one might think the same would work with scanf. Unfortunately, this is not the case--with scanf an asterisk in the conversion specification indicates a value that should be scanned, but not converted. That is to say, something that must be present in the input, but its value won't be placed in any variable.
Try
scanf("%4s", str)
You can also use fread, where you can set a read limit:
char string[5]={0};
if( fread(string,(sizeof string)-1,1,stdin) )
printf("\nfull readed: %s",string);
else
puts("error");
You might consider simply looping over calls to getc().