I want to get the following piece of code to work:
#define READIN(a, b) if(scanf('"#%d"', '"&a"') != 1) { printf("ERROR"); return EXIT_FAILURE; }
int main(void)
{
unsigned int stack_size;
printf("Type in size: ");
READIN(d, stack_size);
}
I don't get how to use directives with the # operator. I want to use the scanf with print ERROR etc. several times, but the "'"#%d"' & '"&a"'" is, I think, completely wrong. Is there any way to get that running? I think a macro is the best solution — or do you disagree?
You should only stringify arguments to the macro, and they must be outside of strings or character constants in the replacement text of the macro. Thus you probably should use:
#define READIN(a, b) do { if (scanf("%" #a, &b) != 1) \
{ fprintf(stderr, "ERROR\n"); return EXIT_FAILURE; } \
} while (0)
int main(void)
{
unsigned int stack_size;
printf("Type in size: ");
READIN(u, stack_size);
printf("You entered %u\n", stack_size);
return(0);
}
There are many changes. The do { ... } while (0) idiom prevents you from getting compilation errors in circumstances such as:
if (i > 10)
READIN(u, j);
else
READIN(u, k);
With your macro, you'd get an unexpected keyword 'else' type of message because the semi-colon after the first READIN() would be an empty statement after the embedded if, so the else could not belong to the visible if or the if inside the macro.
The type of stack_size is unsigned int; the correct format specifier, therefore, is u (d is for a signed int).
And, most importantly, the argument a in the macro is stringized correctly (and string concatenation of adjacent string literals - an extremely useful feature of C89! - takes care of the rest for you. And the argument b in the macro is not embedded in a string either.
The error reporting is done to stderr (the standard stream for reporting errors on), and the message ends with a newline so it will actually appear. I didn't replace return EXIT_FAILURE; with exit(EXIT_FAILURE);, but that would probably be a sensible choice if the macro will be used outside of main(). That assumes that 'terminate on error' is the appropriate behaviour in the first place. It often isn't for interactive programs, but fixing it is a bit harder.
I'm also ignoring my reservations about using scanf() at all; I usually avoid doing so because I find error recovery too hard. I've only been programming in C for about 28 years, and I still find scanf() too hard to control, so I essentially never use it. I typically use fgets() and sscanf() instead. Amongst other merits, I can report on the string that caused the trouble; that's hard to do when scanf() may have gobbled some of it.
My thought with scanf() here is, to only read in positive numbers and no letters. My overall code does create a stack, which the user types in and the type should be only positive, otherwise error. [...] I only wanted to know if there's a better solution to forbid the user to type in something other than positive numbers?
I just tried the code above (with #include <stdlib.h> and #include <stdio.h> added) and entered -2 and got told 4294967294, which isn't what I wanted (the %u format does not reject -2, at least on MacOS X 10.7.2). So, I would go with fgets() and strtoul(), most likely. However, accurately detecting all possible problems with strtoul() is an exercise of some delicacy.
This is the alternative code I came up with:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>
#include <string.h>
int main(void)
{
unsigned int stack_size = 0;
char buffer[4096];
printf("Type in size: ");
if (fgets(buffer, sizeof(buffer), stdin) == 0)
printf("EOF or error detected\n");
else
{
char *eos;
unsigned long u;
size_t len = strlen(buffer);
if (len > 0)
buffer[len - 1] = '\0'; // Zap newline (assuming there is one)
errno = 0;
u = strtoul(buffer, &eos, 10);
if (eos == buffer ||
(u == 0 && errno != 0) ||
(u == ULONG_MAX && errno != 0) ||
(u > UINT_MAX))
{
printf("Oops: one of many problems occurred converting <<%s>> to unsigned integer\n", buffer);
}
else
stack_size = u;
printf("You entered %u\n", stack_size);
}
return(0);
}
The specification of strtoul() is given in ISO/IEC 9899:1999 §7.20.1.4:
¶1 [...]
unsigned long int strtoul(const char * restrict nptr,
char ** restrict endptr, int base);
[...]
¶2 [...] First,
they decompose the input string into three parts: an initial, possibly empty, sequence of
white-space characters (as specified by the isspace function), a subject sequence
resembling an integer represented in some radix determined by the value of base, and a
final string of one or more unrecognized characters, including the terminating null
character of the input string. Then, they attempt to convert the subject sequence to an
integer, and return the result.
¶3 [...]
¶4 The subject sequence is defined as the longest initial subsequence of the input string,
starting with the first non-white-space character, that is of the expected form. The subject
sequence contains no characters if the input string is empty or consists entirely of white
space, or if the first non-white-space character is other than a sign or a permissible letter
or digit.
¶5 If the subject sequence has the expected form and the value of base is zero, the sequence
of characters starting with the first digit is interpreted as an integer constant according to
the rules of 6.4.4.1. If the subject sequence has the expected form and the value of base
is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value
as given above. If the subject sequence begins with a minus sign, the value resulting from
the conversion is negated (in the return type). A pointer to the final string is stored in the
object pointed to by endptr, provided that endptr is not a null pointer.
¶6 [...]
¶7 If the subject sequence is empty or does not have the expected form, no conversion is
performed; the value of nptr is stored in the object pointed to by endptr, provided
that endptr is not a null pointer.
Returns
¶8 The strtol, strtoll, strtoul, and strtoull functions return the converted
value, if any. If no conversion could be performed, zero is returned. If the correct value
is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN,
LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type
and sign of the value, if any), and the value of the macro ERANGE is stored in errno.
The error I got was from a 64-bit compilation where -2 was converted to a 64-bit unsigned long, and that was outside the range acceptable to a 32-bit unsigned int (the failing condition was u > UINT_MAX). When I recompiled in 32-bit mode (so sizeof(unsigned int) == sizeof(unsigned long)), then the value -2 was accepted again, interpreted as 4294967294 again. So, even this is not delicate enough...you probably have to do a manual skip of leading blanks and reject a negative sign (and maybe a positive sign too; you'd also need to #include <ctype.h> too):
char *bos = buffer;
while (isspace(*bos))
bos++;
if (!isdigit(*bos))
...error - not a digit...
char *eos;
unsigned long u;
size_t len = strlen(bos);
if (len > 0)
bos[len - 1] = '\0'; // Zap newline (assuming there is one)
errno = 0;
u = strtoul(bos, &eos, 10);
if (eos == bos ||
(u == 0 && errno != 0) ||
(u == ULONG_MAX && errno != 0) ||
(u > UINT_MAX))
{
printf("Oops: one of many problems occurred converting <<%s>> to unsigned integer\n", buffer);
}
As I said, the whole process is rather non-trivial.
(Looking at it again, I'm not sure whether the u == 0 && errno != 0 clause would ever catch any errors...maybe not because the eos == buffer (or eos == bos) condition catches the case there's nothing to convert at all.)
You are incorrectly encasing your macro argument(s), it should look like:
#define READIN(a, b) if(scanf("%"#a, &b) != 1) { printf("ERROR"); return EXIT_FAILURE; }
you use of the stringify operator was also incorrect, it must directly prefix the argument name.
In short, use "%"#a, not '"#%d"', and &b, not '"&a"'.
as a side note, for longish macro's like those, it helps to make them multi-line using \, this keeps them readable:
#define READIN(a, b) \
if(scanf("%"#a, &b) != 1) \
{ \
printf("ERROR"); \
return EXIT_FAILURE; \
}
When doing something like this, one should preferably use a function, something along the lines of this should work:
inline int readIn(char* szFormat, void* pDst)
{
if(scanf(szFormat,pDst) != 1)
{
puts("Error");
return 0;
}
return 1;
}
invoking it would be like so:
if(!readIn("%d",&stack_size))
return EXIT_FAILURE;
scanf(3) takes a const char * as a first argument. You are passing '"..."', which is not a C "string". C strings are written with the " double quotes. The ' single quotes are for individual characters: 'a' or '\n' etc.
Placing a return statement inside a C preprocessor macro is usually considered very poor form. I've seen goto error; coded inside preprocessor macros before for repetitive error handling code when storing formatted data to and reading data from a file or kernel interface, but these are definitely exceptional circumstances. You would detest debugging this in six months time. Trust me. Do not hide goto, return, break, continue, inside C preprocessor macros. if is alright so long as it is entirely contained within the macro.
Also, please get in the habit of writing your printf(3) statements like this:
printf("%s", "ERROR");
Format string vulnerabilities are exceedingly easy to write. Your code does not contain any such vulnerability now, but trust me, at some point in the future those strings are inevitably modified to include some user-supplied content, and putting in an explicit format string now will help prevent these in the future. At least you'll think about it in the future if you see this.
It is considered polite to wrap your multi-line macros in do { } while (0) blocks.
Finally, the stringification is not quite done correctly; try this instead:
#define READIN(A, B) do { if (scanf("%" #A, B) != 1) { \
/* error handling */ \
} else { \
/* success case */ \
} } while(0)
Edit: I feel I should re-iterate akappa's advice: Use a function instead. You get better type checking, better backtraces when something goes wrong, and it is far easier to work with. Functions are good.
Related
I am writing a program in c. The incoming string is like this *H1W000500, this is a legit string and I copy the contents of the string after *H1W i.e 000500 to an integer type.
But I want filter this string if the string is not legit. For example *H1W..... or *H1W~##$, If string is not legit, do not copy content and skip. Only Copy contents if the string is legit as written above.
Here what I am doing, but whenever irrelevant string is there, it copies zero value, which is undesirable.
char ReceivedData[50];
unsigned int Head1Weight;
p = strstr(ReceivedData, "*H1W");
if(p)
{
Head1Weight = strtoul(p+4,&ptr,10);
}
You are close, but your use of strstr can be better expressed with strncmp to compare the first 4 chars of receiveddata. (if your target string exists in the middle of receiveddata, then strstr is fine) You also need to provide error checking on your strtoul conversion. Putting those pieces together you could do something like the following (note: this is shown for a single value, in a loop, change return to continue as noted in the comments)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <errno.h>
/* declare constants, avoid magic number use in code */
enum { PRE = 4, BASE = 10, MAX = 50 };
int main (void) {
char receiveddata[MAX] = "*H1W000500", *p = NULL;
unsigned long head1weight;
if (strncmp (receiveddata, "*H1W", PRE) != 0) /* cmp 4 chars */
return 1; /* you would continue here */
if (strlen (receiveddata) <= PRE) /* more chars exist? */
return 1; /* you would continue here */
errno = 0; /* set errno to known value */
head1weight = (unsigned)strtoul (&receiveddata[PRE], &p, BASE);
/* check for error conversions on conversion */
if ((errno == ERANGE && (head1weight == ULONG_MAX)) ||
(errno != 0 && head1weight == 0)) {
perror ("strtoul");
return 1; /* you would continue here */
}
if (&receiveddata[PRE] == p) { /* check if chars converted */
fprintf (stderr, "No digits were found\n");
return 1; /* you would continue here */
}
printf ("head1weight : %lu\n", head1weight);
return 0;
}
Example Use/Output
$ ./bin/parsetounsigned
head1weight : 500
Look it over and let me know if you have further questions.
(note: C generally avoids the use of MixedCase and camelCase variable names in favor of all lower-case, reserving all upper-case for use with constants and macros. It is style, so it is completely up to you...)
From the Linux man page on strtoul.
If there were no digits at all, strtoul() stores the original value of nptr in *endptr (and returns 0).
So if, after the strtoul, ptr is the same as the starting pointer, you know there were no legitimate characters.
char* ptr;
unsigned long Head1Weight = strtoul(p + 4, &ptr, 10);
if (ptr == p + 4)
{
// There were no digits
}
else if (strlen(ptr) > 0)
{
// There were characters in the string after the end of the number
}
OP's p = strstr(ReceivedData, "*H1W"); if(p) { is insufficient as it passes strstr("abcd12*H1W", "*H1W"), although it is a start.
OP has validation goals that are not specific enough. "the contents of the string after *H1W i.e 000500 to an integer type."
"+123", "-123", " 123" evaluate to integers, are they valid?
"123 " can evaluate to an integers, is that valid?
The sample implies the numeric part should be exactly 6 decimal digits, yet that is uncertain.
Sample code uses unsigned, could that be 16-bit? Only "000000" to "065535" valid?
"-123" converts successful via strtoul(), valid for this goal?
should this pass "*H1W000500xyz"? Is extra text allowed or to be ignored?
This is common in writing code as the specifications initially have interpretation issues and then tend to evolve.
Code should allow for evolution.
Let us start with *H1W followed by exactly 6 decimal digits with sscanf(). Code below uses "%n" to record the scanning position after checking for digits. This approach needs additional work should PREFIX contain %.
// PREFIX should not contain %
#define PREFIX "*H1W"
#define DIGIT_FMT "%*[0-9]"
#define VALID_LENGTH 10
char ReceivedData[50];
unsigned long Head1Weight = 0;
int n = 0;
sscanf(ReceivedData, PREFIX DIGIT_FMT "%n", &n);
if (n == VALID_LENGTH && ReceivedData[VALID_LENGTH] == '\0') {
Head1Weight = strtoul(ReceivedData + sizeof PREFIX - 1, NULL, 10);
}
#include<stdio.h>
void main()
{
unsigned int a;
printf("Enter a number:");
scanf("%u",&a);
if ( a <= 4294967295) {
printf("Entered no is within limit\n");
}
else {
printf("Entered no is not in the limit");
}
}
Which input number will execute the else block with the above condition?
The if block is always executed though input is greater than the limit. It is because of wrapping. Is there is any way to find out wrapping?
This is because maximum limit of unsigned int is 4294967295 and it is 32 bits of 1 in binary
if my input is 4294967296 it becomes 1 followed by 32 bits of 0.
we can't access that 33rd bit of 1
Is there is any possibility to access that 33rd bit?
Which input number will execute the else block with the above condition?
Most likely your system uses 32 bit unsigned int with maximum value 4294967295.
If so, there is no input that can trigger the else block.
As commented by many, the simple solution is to use a variable with more bits (if possible). But as pointed out by #Brendan it just give you other problems.
A more robust solution is to write your own function to parse the input instead of using scanf
Such a function could be something like this:
#include <stdio.h>
#include <limits.h>
// Returns 1 if the string can by converted to unsigned int without overflow
// else return 0
int string2unsigned(const char* s, unsigned int* u)
{
unsigned int t = 0;
if (*s > '9' || *s < '0') return 0; // Check for unexpected char
while(*s)
{
if (*s == '\n') break; // Stop if '\n' is found
if (*s > '9' || *s < '0') return 0; // Check for unexpected char
if (t > UINT_MAX/10) return 0; // Check for overflow
t = 10 * t;
if (t > UINT_MAX - (*s - '0')) return 0; // Check for overflow
t = t + (*s - '0');
s++;
}
*u = t;
return 1;
}
int main(void) {
unsigned int u;
char s[100];
if (!fgets(s, 100, stdin)) // Use fgets to read a line
{
printf("Failed to read input\n");
}
else
{
if (string2unsigned(s, &u))
{
printf("OK %u\n", u);
}
else
{
printf("Illegal %s", s);
}
}
return 0;
}
Example:
input: 4294967295
output: OK 4294967295
input: 4294967296
output: Illegal 4294967296
(Thanks to #chux for suggesting a better alternative to my original code)
Using a larger integer type (with more bits) is wrong and broken. All that will do is give you a new problem (e.g. accepting numbers above 18446744073709551615 when it shouldn't).
For user interface design; a single "Entered number is not within the limit" error message is unacceptable. You have to tell the user what the real problem is and also remind the user of what you are expecting.
At a minimum, there are at least 4 different kinds of errors you should handle (and therefore at least 4 different error messages you should be able to display):
No input received (stdin gave you an EOF)
Input contained something that isn't a valid character (e.g. the byte 0x00, malformed UTF-8 multi-byte sequences, ASCII characters above 0x7F, etc)
Input contained an unrecognised/unaccepted character (e.g. the user typed "Bork!", or "0x1234", or "twelve"; or the user typed "12,345.00" and your code can't handle thousand's separators or fractions)
Input is a valid number but is not within a certain range. This might include negative values like "-1234" (and in that case the minus sign should not be treated as unrecognised/unaccepted character causing the wrong error message to be given).
Also note that the limit does not depend on the size of the variable type you use; the size of the variable type you choose depends on the limit. For example, if you're asking someone to enter their age and they type "12345" then that should cause an error message like "Number out of range (age must be between 1 and 150)." regardless of whether it causes an overflow or not. In this case the limit would be 150 (and not something like INT_MAX because nobody is ever likely to be that old). Because the limit is 150, you can't choose to use signed char as your variable type but could choose to use uint8_t (or anything larger).
With all of this in mind; scanf() is never usable for parsing "strings from humans". You must write your own parser or find something (a "non-standard" library) that is suitable.
First of all let me ask for your forgiveness if this is too trivial, I am not a C developer, usually I program in Fortran.
I am in need to read some columnated text files. The problem I have is that some columns can have blank space (non filled value) or not fully filed field.
Let me use a short example of the problem. Lets say I have a generator program like:
#include <stdio.h>
#include <stdlib.h>
int main(){
printf("xxxx%4d%4.2f\n",99,3.14);
}
When I execute this program I get:
$ ./t1
xxxx 993.14
If I get it into a text file and try to read using (e.g.) sscanf with the code:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *fmt = "%*4c%4d%4f";
char *line = "xxxx 993.14";
int ival;
float fval;
sscanf(line,fmt,&ival,&fval);
printf(">>>>%d|%f\n",ival,fval);
}
The result is:
$ ./t2
>>>>993|0.140000
What is the problem here? The sscanf seems to think that all space is meaningless and should be discarded. So the "%4c" does what it is meant to be, it counts 4 characters without discarding any blank space and discards everything due to "". Next the %4d start skipping all blank spaces and start count the 4 characters of the field upon finding the first valid character for the conversion. So the value, meant to be 99 becomes 993, and the 3.14 becomes 0.14.
In Fortran the reading code would be:
program t3
implicit none
integer :: ival
real :: fval
character(len=30) :: fmt="(4x,i4,f4.0)"
character(len=30) :: line="xxxx 993.14"
read(line,fmt) ival, fval
write(*,"('>>>>',i4,'|',f4.2)") ival,fval
end program t3
and the result would be:
$ ./t3
>>>> 99|3.14
That is, the format specification states the field width and nothing is discarding in conversion, except if instructed to by the "nX" specification.
Some final remarks to help the helpers:
The format to be read is an international standard and there is no
way to change it.
The number of existing files is to big to think of intervention or
format change.
It is not a CSV or similar format.
The code has to be in C for integration in a free software package.
Sorry to be too long, trying to state the problem as completely as possible.
The question is: Is there a way to tell sscanf to not skip the blank spaces? If not, is there a simple way to do it in C or it will be necessary write an specialized parser for each record type?
Thank you in advance.
When reading fixed-length fields with sscanf, it is best to parse the values as character strings (which you could do a number of ways), and then perform independent conversion of each of the fields. This allows you to handle conversion/error detection on a per-field basis. For example, you could use a format string of:
char *fmt = "%*4s%2[^0-9]%s";
which would read/discard the 4 leading characters, then read 2-chars as your integer, followed by the remainder of line (or up until the next whitespace) as a string containing your float value.
To handle the storage and parsing of line as fixed length fields, you could use temporary character arrays to hold each of the strings and then use sscanf to fill them much as you have attempted to do with the integer and float directly. e.g.:
char istr[8] = {0};
char fstr[16] = {0};
...
sscanf (line,fmt,istr,fstr);
(note: you could use minimum storage of istr[3] and fstr[7] in this given case, adjust the storage length as required, but providing space for the nul-terminating character)
You can then use strtol and strtof to provide conversion with error checking on each value. For example:
errno = 0;
if ((ival = (int)strtol (istr, NULL, 10)) == 0 && errno)
fprintf (stderr, "error: integer conversion failed.\n");
/* underflow/overflow checks omitted */
and
errno = 0;
if ((fval = strtof (fstr, NULL)) == 0 && errno)
fprintf (stderr, "error: integer conversion failed.\n");
/* nan and inf checks omitted */
Putting all the pieces together in you example, you could use something like:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
int main() {
char *fmt = "%*4s%2[^0-9]%s";
char *line = "xxxx 993.14";
char istr[8] = {0};
char fstr[16] = {0};
int ival;
float fval;
sscanf (line,fmt,istr,fstr);
errno = 0;
if ((ival = (int)strtol (istr, NULL, 10)) == 0 && errno)
fprintf (stderr, "error: integer conversion failed.\n");
/* underflow/overflow checks omitted */
errno = 0;
if ((fval = strtof (fstr, NULL)) == 0 && errno)
fprintf (stderr, "error: integer conversion failed.\n");
/* nan and inf checks omitted */
printf(">>>>%d|%6.2f\n",ival,fval);
return 0;
}
Example/Output
$ >>>>0|993.14
*scanf() is not designed to handle fixed column width with non-intervening white-space.
With sscanf(), to not skip spaces, code must use "%c", "%n", "%[]" as all other specifiers skip leading white-space and those skipped characters do not contribute to a width limit.
To scan the printed line, which in now in buffer, take advantage that the only use of '\n' is at the end of the line.
char str_int[5];
char str_float[5];
int n = 0;
sscanf(buffer, "%*4c%4[^\n]%4[^\n]%n", str_int, str_float, &n);
if (n != 12 || buffer[n] != '\n') Fail();
// Now convert str_int, str_float as needed.
Another way to use sscanf() would be to parse buffer as
int ival;
float fval;
if (strlen(buffer) != 13) Fail();
if (sscanf(&buffer[8], "%f", &fval) != 1) Fail();
buffer[8] = '\0';
if (sscanf(&buffer[4], "%d", &ival) != 1) Fail();
Note: The 4s in the below do not specified the output width as 4 characters. 4 is the minimum width to print.
printf("xxxx%4d%4.2f\n",ival, fval);
Code could use the following to detect problems.
if (13 != printf("xxxx%4d%4.2f\n",ival, fval)) Fail();
Watch out for
printf("xxxx%4d%4.2f\n",123, 9.995000001f); // "xxxx 12310.00\n"
First off, I dunno. There might be some way to wrangle sscanf to recognize the whitespace towards your integer count. But I just don't think scanf was made for this sort of format in mind. The tool's trying to be smart of helpful and it's biting you in the ass.
But if it's columnated data and you know the position of the various fields, there's a really easy work around. Just extract the field you want.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv)
{
char line[] = "xxxx 893.14";
char tmp[100];
int thatDamnNumber;
float myfloatykins;
//Get that field
memcpy(tmp, line+4, 4);
sscanf(tmp, "%d", &thatDamnNumber);
//Kill that field so it doesn't goober-up the float
memset(line+4, ' ', 4);
sscanf(line, "%*4c%f", &myfloatykins);
printf("%d %f\n", thatDamnNumber, myfloatykins);
return 0;
}
If there is a lot of this, you could make some generalized functions: integerExtract(int positionStart, int sizeInCharacters), floatExtract(), etc.
If each element is of fixed width you don't really need scanf(), try this
char copy[5];
const char *line = "xxxx 993.14";
int ival;
float fval;
copy[0] = line[4];
copy[1] = line[5];
copy[2] = line[6];
copy[3] = line[7];
copy[4] = '\0'; // nul terminate for `atoi' to work
ival = atoi(copy);
fval = atof(&line[8]);
fprintf(stdout, "%d -- %f\n", ival, fval);
If you want (probably should) you can use strtol() instead of atoi() and strtof() instead of atof() to check for malformed data.
Both these functions take a parameter to store the unconverted/invalid characters, you can check the passed pointer in order to verify that there was a problem with conversion.
Or if you really want scanf() do the same, capture the integer + whitespaces to a char array and then convert it to int later, like this
char integer[5];
const char *line = "xxxx 993.14";
int ival;
float fval;
if (sscanf(line, "%*4c%4[0-9 ]%f", integer, &fval) != 2)
return -1;
ival = atoi(integer);
fprintf(stdout, "%d -- %f\n", ival, fval);
The format "%*4c%4[0-9 ]%f" will
Skip the first four characters including white spaces.
Scan the next four characters if they consist only of digits or white spaces.
Scan the rest of the input string searching for a matching float value.
I am posting what I think is a final conclusion from the answers I have got so far and from other sources.
What is a very trivial task in Fortran is not a so trivial task in other languages. I guess — not sure — that the same task could be as easy as in Fortran in other languages. I think that Cobol, Pascal, PL/I and others from the time of punched card probably could be trivial.
I think that most languages nowadays are more comfortable with different data structure and inherited its I/O structure from C. I think that Java, Python, Perl(?) and others could serve as examples.
From what I saw in this thread there are two main problems to read / convert fixed column length text data with C.
The first problem is that, as Philip said in his answer: “The tool’s trying to be smart of helpful and it’s biting you in the ass.” Quite right! The point is that it seems that C text I/O thinks that “white space” is something like a NULL character and should be thrown away, completely disregarding any information of the start of field. The only exception to that seems to be the %nc that get exactly n chars, even blanks.
The second problem is that the conversion “tag” (how is that called?) %nf will keep converting while it finds a valid character, even if you say stop at the 4th character.
If we join those two problems with a field completely filled with white space, depending on the conversion tool used, it throws an error or keeps going madly looking for something meaningful.
At the end of the day, it seems that the only way is to extract the field length to another memory area, dynamically allocated or not (we can have an area for each column length), and try to parse this separate area, taking into account the possibility of a full white space area to cache the error.
I tried
sscanf(str, "%016llX", &int64 );
but seems not safe. Is there a fast and safe way to do the type casting?
Thanks~
Don't bother with functions in the scanf family. They're nearly impossible to use robustly. Here's a general safe use of strtoull:
char *str, *end;
unsigned long long result;
errno = 0;
result = strtoull(str, &end, 16);
if (result == 0 && end == str) {
/* str was not a number */
} else if (result == ULLONG_MAX && errno) {
/* the value of str does not fit in unsigned long long */
} else if (*end) {
/* str began with a number but has junk left over at the end */
}
Note that strtoull accepts an optional 0x prefix on the string, as well as optional initial whitespace and a sign character (+ or -). If you want to reject these, you should perform a test before calling strtoull, for instance:
if (!isxdigit(str[0]) || (str[1] && !isxdigit(str[1])))
If you also wish to disallow overly long representations of numbers (leading zeros), you could check the following condition before calling strtoull:
if (str[0]=='0' && str[1])
One more thing to keep in mind is that "negative numbers" are not considered outside the range of conversion; instead, a prefix of - is treated the same as the unary negation operator in C applied to an unsigned value, so for example strtoull("-2", 0, 16) will return ULLONG_MAX-1 (without setting errno).
Your title (at present) contradicts the code you provided. If you want to do what your title was originally (convert a string to an integer), then you can use this answer.
You could use the strtoull function, which unlike sscanf is a function specifically geared towards reading textual representations of numbers.
const char *test = "123456789abcdef0";
errno = 0;
unsigned long long result = strtoull(test, NULL, 16);
if (errno == EINVAL)
{
// not a valid number
}
else if (errno == ERANGE)
{
// does not fit in an unsigned long long
}
At the time I wrote this answer, your title suggested you'd want to write an uint64_t into a string, while your code did the opposite (reading a hex string into an uint64_t). I answered "both ways":
The <inttypes.h> header has conversion macros to handle the ..._t types safely:
#include <stdio.h>
#include <inttypes.h>
sprintf( str, "%016" PRIx64, uint64 );
Or (if that is indeed what you're trying to do), the other way round:
#include <stdio.h>
#include <inttypes.h>
sscanf( str, "%" SCNx64, &uint64 );
Note that you cannot enforce widths etc. with the scanf() function family. It parses what it gets, which can yield undesired results when the input does not adhere to expected formatting. Oh, and the scanf() function family only knows (lowercase) "x", not (uppercase) "X".
gcc 4.4.4 c89
What is better to convert a string to an integer value.
I have tried 2 different methods atoi and sscanf. Both work as expected.
char digits[3] = "34";
int device_num = 0;
if(sscanf(digits, "%d", &device_num) == EOF) {
fprintf(stderr, "WARNING: Incorrect value for device\n");
return FALSE;
}
or using atoi
device_num = atoi(digits);
I was thinking that the sscanf would be better as you can check for errors. However, atoi doesn't doing any checking.
You have 3 choices:
atoi
This is probably the fastest if you're using it in performance-critical code, but it does no error reporting. If the string does not begin with an integer, it will return 0. If the string contains junk after the integer, it will convert the initial part and ignore the rest. If the number is too big to fit in int, the behaviour is unspecified.
sscanf
Some error reporting, and you have a lot of flexibility for what type to store (signed/unsigned versions of char/short/int/long/long long/size_t/ptrdiff_t/intmax_t).
The return value is the number of conversions that succeed, so scanning for "%d" will return 0 if the string does not begin with an integer. You can use "%d%n" to store the index of the first character after the integer that's read in another variable, and thereby check to see if the entire string was converted or if there's junk afterwards. However, like atoi, behaviour on integer overflow is unspecified.
strtol and family
Robust error reporting, provided you set errno to 0 before making the call. Return values are specified on overflow and errno will be set. You can choose any number base from 2 to 36, or specify 0 as the base to auto-interpret leading 0x and 0 as hex and octal, respectively. Choices of type to convert to are signed/unsigned versions of long/long long/intmax_t.
If you need a smaller type you can always store the result in a temporary long or unsigned long variable and check for overflow yourself.
Since these functions take a pointer to pointer argument, you also get a pointer to the first character following the converted integer, for free, so you can tell if the entire string was an integer or parse subsequent data in the string if needed.
Personally, I would recommend the strtol family for most purposes. If you're doing something quick-and-dirty, atoi might meet your needs.
As an aside, sometimes I find I need to parse numbers where leading whitespace, sign, etc. are not supposed to be accepted. In this case it's pretty damn easy to roll your own for loop, eg.,
for (x=0; (unsigned)*s-'0'<10; s++)
x=10*x+(*s-'0');
Or you can use (for robustness):
if (isdigit(*s))
x=strtol(s, &s, 10);
else /* error */
*scanf() family of functions return the number of values converted. So you should check to make sure sscanf() returns 1 in your case. EOF is returned for "input failure", which means that ssacnf() will never return EOF.
For sscanf(), the function has to parse the format string, and then decode an integer. atoi() doesn't have that overhead. Both suffer from the problem that out-of-range values result in undefined behavior.
You should use strtol() or strtoul() functions, which provide much better error-detection and checking. They also let you know if the whole string was consumed.
If you want an int, you can always use strtol(), and then check the returned value to see if it lies between INT_MIN and INT_MAX.
To #R.. I think it's not enough to check errno for error detection in strtol call.
long strtol (const char *String, char **EndPointer, int Base)
You'll also need to check EndPointer for errors.
Combining R.. and PickBoy answers for brevity
long strtol (const char *String, char **EndPointer, int Base)
// examples
strtol(s, NULL, 10);
strtol(s, &s, 10);
When there is no concern about invalid string input or range issues, use the simplest: atoi()
Otherwise, the method with best error/range detection is neither atoi(), nor sscanf().
This good answer all ready details the lack of error checking with atoi() and some error checking with sscanf().
strtol() is the most stringent function in converting a string to int. Yet it is only a start. Below are detailed examples to show proper usage and so the reason for this answer after the accepted one.
// Over-simplified use
int strtoi(const char *nptr) {
int i = (int) strtol(nptr, (char **)NULL, 10);
return i;
}
This is the like atoi() and neglects to use the error detection features of strtol().
To fully use strtol(), there are various features to consider:
Detection of no conversion: Examples: "xyz", or "" or "--0"? In these cases, endptr will match nptr.
char *endptr;
int i = (int)strtol(nptr, &endptr, 10);
if (nptr == endptr) return FAIL_NO_CONVERT;
Should the whole string convert or just the leading portion: Is "123xyz" OK?
char *endptr;
int i = (int)strtol(nptr, &endptr, 10);
if (*endptr != '\0') return FAIL_EXTRA_JUNK;
Detect if value was so big, the the result is not representable as a long like "999999999999999999999999999999".
errno = 0;
long L = strtol(nptr, &endptr, 10);
if (errno == ERANGE) return FAIL_OVERFLOW;
Detect if the value was outside the range of than int, but not long. If int and long have the same range, this test is not needed.
long L = strtol(nptr, &endptr, 10);
if (L < INT_MIN || L > INT_MAX) return FAIL_INT_OVERFLOW;
Some implementations go beyond the C standard and set errno for additional reasons such as errno to EINVAL in case no conversion was performed or EINVAL The value of the Base parameter is not valid.. The best time to test for these errno values is implementation dependent.
Putting this all together: (Adjust to your needs)
#include <errno.h>
#include <stdlib.h>
int strtoi(const char *nptr, int *error_code) {
char *endptr;
errno = 0;
long i = strtol(nptr, &endptr, 10);
#if LONG_MIN < INT_MIN || LONG_MAX > INT_MAX
if (errno == ERANGE || i > INT_MAX || i < INT_MIN) {
errno = ERANGE;
i = i > 0 : INT_MAX : INT_MIN;
*error_code = FAIL_INT_OVERFLOW;
}
#else
if (errno == ERANGE) {
*error_code = FAIL_OVERFLOW;
}
#endif
else if (endptr == nptr) {
*error_code = FAIL_NO_CONVERT;
} else if (*endptr != '\0') {
*error_code = FAIL_EXTRA_JUNK;
} else if (errno) {
*error_code = FAIL_IMPLEMENTATION_REASON;
}
return (int) i;
}
Note: All functions mentioned allow leading spaces, an optional leading sign character and are affected by locale change. Additional code is required for a more restrictive conversion.
Note: Non-OP title change skewed emphasis. This answer applies better to original title "convert string to integer sscanf or atoi"
If user enters 34abc and you pass them to atoi it will return 34.
If you want to validate the value entered then you have to use isdigit on the entered string iteratively