C: fscanf() on a hexadecimal giving me incorrect values

C: fscanf() on a hexadecimal giving me incorrect values - c

I've looked everywhere trying to figure out what I'm doing wrong and can't seem to figure out WHY this block of code isn't handling hexadecimal numbers correctly.
FILE *listFile = fopen(argv[1], "r");
unsigned int hexIndex;
if(listFile == NULL){
printf("error\n");
return 0;
}
while(!feof(listFile))
{
fscanf(listFile, "%x\n", &hexIndex);
printf("hexindex %d\n", hexIndex);
if(hashInsert(hexIndex) == 0)
{
printf("uniques: %d\n", uniques);
uniques++;
}
}
For example, given a file with the following 3 hexadecimal addresses:
0xFFFFFFFFFF
0x7f1a91026b00
0x7f1a91026b03
The program prints out:
hexindex -1
hexindex -1862112512
hexindex -1862112509
which I'm 100% sure is incorrect. I've spent hours trying to figure out what I've done wrong, and I feel that I might be overlooking something simple. I've tried using different integer types like size_t, longs, etc. but run into the same exact output every time.
Can anybody give me some insight as to what I might be doing wrong?

You should use %u or %x to printf an unsigned int. Currently you are using %d which causes undefined behaviour.
A second thing to check is that the intended values will fit into unsigned int. Some of your sample values are 48 bits long. If your system has 32-bit unsigned int then you cannot use unsigned int for this purpose.
The portable way to go here is to use uint64_t as the variable type, and the scanf hex specifier is SCNx64 and the printf specifier is PRIu64 or PRIx64. For example:
#include <inttypes.h>
uint64_t hexIndex;
// ...
while( 1 == fscanf(listFile, "%" SCNx64, &hexIndex) )
{
printf("hexindex %" PRIu64 "\n", hexIndex);
if(hashInsert(hexIndex) == 0)
{
printf("uniques: %d\n", uniques);
uniques++;
}
}
(note: don't use while...feof)

Related

how to correctly sscanf a decimal to a char

PVS-Studio gave me a warning about this :
char c;
sscanf(line, "%d", &c);
I changed %d to %c but this created a bug because "c" now contains the ASCII value of the number and not the decimal one, so I went back to "%d".
So what's the correct specifier to ? is there another solution ?

c is a char. You asked to scan an int. PVS-Studio did right in warning you. Change the type of c to int and scan for a %d.

There are multiple solutions for your problem:
you can specify the correct destination type:
char c;
if (sscanf(line, "%hhd", &c) == 1) {
/* successful conversion */
...
}
you can use an intermediary variable:
char c;
int cc;
if (sscanf(line, "%d", &cc) == 1) {
/* successful conversion */
c = cc;
...
}
you can use different conversion function:
#include <stdlib.h>
...
char c;
c = atoi(line); // no error handling, return 0 if not a number
Note however that in all cases, if the numeric value converted by sscanf() or atoi() is outside the range of type char, the behavior is undefined. Most current system will just use the low order byte of the conversion result, but the C Standard does not guarantee it.

correct use of malloc in function with passed uint32_t array pointer

I'm having difficulty using malloc in a function where I read a binary file with 4 byte unsigned integers, free the passed array reference, remalloc it to the new size and then try to access members of the array. I think the problem is due to the uint32_t type as the array seems to be being seen as a array of 8 byte integers rather than 4 byte ones. Not exactly sure where I am going wrong, it might be with the use of malloc, IE maybe I need to instruct it to create uint32_t types in a different way to what I am doing, or maybe its something else. Code:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <malloc.h>
int filecheck (uint32_t **sched, int * count) {
int v, size = 0;
FILE *f = fopen("/home/pi/schedule/default", "rb");
if (f == NULL) {
return 0;
}
fseek(f, 0, SEEK_END);
size = ftell(f);
fseek(f, 0, SEEK_SET);
int schedsize = sizeof(uint32_t);
int elementcount = size / schedsize;
free(*sched);
*sched = malloc(size);
if (elementcount != fread(sched, schedsize, elementcount, f)) {
free(*sched);
return 0;
}
fclose(f);
// This works correctly and prints data as expected
for (v=0;v<elementcount;v++) {
printf("Method 1 %02d %u \n", v, ((uint32_t*)sched)[v]);
}
// This skips every other byte byt does not print any numbers > 32 bit unsigned
for (v=0;v<elementcount;v++) {
printf("Method 2 %02d %u \n", v, sched[v]);
}
// This treats the binary file as if it was 64 bit uints, printing 64 bit numbers
for (v=0;v<elementcount;v++) {
printf("Method 3 %02d %lu \n", v, sched[v]);
}
*count = elementcount;
return 1;
}
int main (){
uint32_t *sched = NULL;
int i, count = 0;
if (filecheck(&sched, &count)) {
for (i=0;i<count;i++) { // At the next line I get a segmentation fault
printf("Method 4 %02d %u\n", i, sched[i]);
}
} else {
printf("Error\n");
}
return 0;
}
I added the file reading code as requested. Trying to access the array sched in main() the way I am doing will print the data I am reading as if it was 8 byte integers. So I guess sched is being seen as an array of 64 bit integers rather than 32 bit as I have defined it as. I guess this is because I freed it and used malloc on it again. But I believe I instructed malloc that the type it should create is a uint32_t so I am confused as to why the data is not being treated as such.
Edit: Found a way to make it work, but not sure if its right (method 1). Is this the only and cleanest way to get this array treated as uint32_t type? Surely the compiler should know what type I am dealing with without me having to cast it every time I use it.
Edit: Actually when I try to access it in main, I get a seg fault. I added a comment in the code to reflect that. I didnt notice this before as I was using the for print loop in several places to trace how the data was being viewed by the compiler.
Edit: Not sure if there is a way I can add the binary data file. I am making it by hand in a hex editor and its exact content is not that important. In the bin file, FF FF FF FF FF FF FF FF should be read as 2 uint32_t types, rather than as 1 64 bit integer. Fread AFAIK does not care about this, it just fills the buffer, but stand to be corrected if thats wrong.
TIA, Pete

// This works correctly and prints data as expected
for (v=0;v<elementcount;v++) {
printf("Method 1 %02d %u \n", v, ((uint32_t*)sched)[v]);
}
This is not quite right, because sched is a uint32_t**, and you allocated to *sched. And also because %u is not how you should print uint32_t.
First dereference sched and then apply the array index access.
You want printf("%02d %"PRIu32"\n", v, (*sched)[v]));
// This skips every other byte byt does not print any numbers > 32 bit unsigned
for (v=0;v<elementcount;v++) {
printf("Method 2 %02d %u \n", v, sched[v]);
}
Yeah, that's because sched[v] is a uint32_t*, and since it's a pointer type and you're probably running on a 64-bit machine, it's probably 64 bits... so you're iterating with 8-byte increments instead of 4-byte ones and trying to print pointers as %u. It also goes out of bounds because 8*10 is larger than 4*10, and that is likely to cause a segmentation fault.
// This treats the binary file as if it was 64 bit uints, printing 64 bit numbers
for (v=0;v<elementcount;v++) {
printf("Method 2 %02d %lu \n", v, sched[v]);
}
This is like the second example only you're printing with the marginally more appropriate but still incorrect %lu.
You need to fread() into *sched instead of sched, as well, or you will cause UB and this will very likely cause a segmentation fault when you try to read the array.
Also, your for loop in main should never run since you aren't setting count to anything in filecheck so it should still be zero (in the absence of UB, at least).
Other comments:
As you've just learned, don't cast the result of malloc() in C.
main() should return a value. 0 if everything went fine.
You can write some uint32_t to a file as bytes using fwrite(), of course.
Check the result of malloc() to see if it succeeded.
Use size_t for size_t values instead of int, and print accordingly (%zu).

getc return value stored in a char variable

On this Wikipedia page there is a sample C program reading and printing first 5 bytes from a file:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char buffer[5] = {0}; /* initialized to zeroes */
int i;
FILE *fp = fopen("myfile", "rb");
if (fp == NULL) {
perror("Failed to open file \"myfile\"");
return EXIT_FAILURE;
}
for (i = 0; i < 5; i++) {
int rc = getc(fp);
if (rc == EOF) {
fputs("An error occurred while reading the file.\n", stderr);
return EXIT_FAILURE;
}
buffer[i] = rc;
}
fclose(fp);
printf("The bytes read were... %x %x %x %x %x\n", buffer[0], buffer[1], buffer[2], buffer[3], buffer[4]);
return EXIT_SUCCESS;
}
The part I don’t understand is that it uses getc function which returns an int and stores it in an array of chars - how is it possible to store ints in a char array ?

Techically, C allows you to "shorten" a variable by assigning it to something that is smaller than itself. The specification doesn't say EXACTLY what happens when you do that (because of technicalities in some machines where slightly weird things happens), but in practice, on nearly all machines that you are likely to use unless you work on museum pieces or some very special hardware, it simply acts as if the "upper" bits of the larger number has been "cut off".
And in this particular case, getc is specifically designed to return something that fits in a char, except for the case when it returns EOF, which often has the value -1. Although quite often, char may well support having the value -1 too, but it's not guaranteed to be the case (if char is an unsigned type - something the C and C++ standards support equally with char being a signed type that can be -1).

Check this out:-
If the integer value returned by getc() is stored into a variable of
type char and then compared against the integer constant EOF, the
comparison may never succeed, because sign-extension of a variable of
type char on widening to integer is implementation-defined.

Yes, getc() returns an integer. However, except for the special return value EOF, the returned value will always be within the range of a char (-128 to 127 on a 2's compliment machine with default signed chars).
Therefore, after checking for EOF, it is always safe to transfer the value to a char variable without data loss.

Why is the output file corrupt / filled with garbage? (Expecting numbers)

The following line of code writes unsigned int values into a file but the content of the file is not readable.
struct rabin_polynomial
{
uint64_t start;
uint16_t length;
struct rabin_polynomial *next_polynomial;
};
fprintf(out_file, "%llu,%u",poly->start,poly->length);
If I display the out put of the code instead to the command line screen it is readable.
the file "out_file" is not opened in binary mode.
Here part of the content of the output file:
-ÍÍÍÍÍÍp\y";^æó r\ ÍÍÍÍ- ÍÍÍÍÍÍ
Øâˆ¿»Iðr\ ÍÍÍÍ- wÍÍÍÍÍÍ7OT-OØÚ‚\ ÍÍÍÍ¤* L ÍÍÍÍÍÍî›ùçÉç`‚\ ÍÍÍÍð3 ÍÍÍÍÍÍ
Øâˆ¿»I°‚\ ÍÍÍÍðC ÍÍÍÍÍÍíK¬è‹Ç{ ƒ\ ÍÍÍÍðS • ÍÍÍÍÍÍ-Ló3lJ–ÞPƒ\ ÍÍÍÍ…]
And here is the expected out put:
0,2861
2861,4096
6957,3959
10916,2380
13296,4096
17392,4096

If you're not getting the textual values you expect, it's possibly down to the fact that you're using incorrect format specifiers (I'm assuming you've populated the variables you're trying to print here, though you may want to confirm this).
%llu is explicitly for unsigned long long int which is not necessarily the same width as uint64_t.
In C99, inttypes.h has macros for the format specifiers to be used for the exact-width and -at-least-as-wide-as data types.
For example:
uint64_t xyzzy = 42;
printf ("Number is: %" PRIu64 "\n", xyzzy);
In this case, PRIu64 means the printf format specifier, unsigned decimal output, for a 64-bit exact width variable. There are a wide variety of others for varying output types, plus equivalents for the scanf family as well (starting with SCN).
Section 7.8.1 Macros for format specifiers of C99 lists them in detail.
Based on your update where you're not getting incorrect numbers but are rather getting what could only be described as rubbish, I would say your problems lie elsewhere. Even with corrupt pointers or data, I would not expect fprintf to generate non-numeric data for numeric format strings. It's certainly possible since it's undefined behaviour but very unlikely.
You could get that sort of output for strings but that's not the case here.
In other words, I think you have to look elsewhere in your code for (as an example) memory corruption issues.
One thing you could do to test if the problem lies in the line you think it does, is to change it to:
printf("DEBUG: %llu,%u\n",poly->start,poly->length);
fprintf(out_file, "%llu,%u",poly->start,poly->length);
and see what comes out on the terminal.

You may have to share your read and write parts of your code for us to help you. But looking at the content written into the out file, it looks like an issue with format specifiers in fprintf() and fscanf().
The following program may serve you as a reference.
#include <stdio.h>
#include <stdlib.h>
#define LEN 6
int main(void)
{
unsigned long long start[LEN] = {0, 2861, 6957, 10916, 13296, 17392};
unsigned short int length[LEN] = {2861, 4096, 3959, 2380, 4096, 4096};
int i;
unsigned long long s;
unsigned short int l;
FILE *out_file, *in_file;
if ((out_file = fopen("out_file", "w")) == NULL) {
printf("ERROR: unable to open out_file\n");
return -1;
}
for (i=0; i<LEN; i++)
fprintf(out_file, "%llu,%hu \n",start[i], length[i]);
fclose(out_file);
if ((in_file = fopen("out_file", "r")) == NULL) {
printf("ERROR: unable to open out_file\n");
return -1;
}
for (i=0; i<LEN; i++) {
fscanf(out_file, "%llu,%hu \n", &s, &l);
printf("start = %llu - length = %hu \n", s, l);
}
fclose(in_file);
return 0;
}
Please note that the format (2nd argument) in the fscanf() and fprintf() must match when reading and writing the lines respectively.

Please check content of file with hex editor and compare with actual values of poly structure.
You will see what is problem.

how to convert hex string to unsigned 64bit (uint64_t) integer in a fast and safe way?

I tried
sscanf(str, "%016llX", &int64 );
but seems not safe. Is there a fast and safe way to do the type casting?
Thanks~

Don't bother with functions in the scanf family. They're nearly impossible to use robustly. Here's a general safe use of strtoull:
char *str, *end;
unsigned long long result;
errno = 0;
result = strtoull(str, &end, 16);
if (result == 0 && end == str) {
/* str was not a number */
} else if (result == ULLONG_MAX && errno) {
/* the value of str does not fit in unsigned long long */
} else if (*end) {
/* str began with a number but has junk left over at the end */
}
Note that strtoull accepts an optional 0x prefix on the string, as well as optional initial whitespace and a sign character (+ or -). If you want to reject these, you should perform a test before calling strtoull, for instance:
if (!isxdigit(str[0]) || (str[1] && !isxdigit(str[1])))
If you also wish to disallow overly long representations of numbers (leading zeros), you could check the following condition before calling strtoull:
if (str[0]=='0' && str[1])
One more thing to keep in mind is that "negative numbers" are not considered outside the range of conversion; instead, a prefix of - is treated the same as the unary negation operator in C applied to an unsigned value, so for example strtoull("-2", 0, 16) will return ULLONG_MAX-1 (without setting errno).

Your title (at present) contradicts the code you provided. If you want to do what your title was originally (convert a string to an integer), then you can use this answer.
You could use the strtoull function, which unlike sscanf is a function specifically geared towards reading textual representations of numbers.
const char *test = "123456789abcdef0";
errno = 0;
unsigned long long result = strtoull(test, NULL, 16);
if (errno == EINVAL)
{
// not a valid number
}
else if (errno == ERANGE)
{
// does not fit in an unsigned long long
}

At the time I wrote this answer, your title suggested you'd want to write an uint64_t into a string, while your code did the opposite (reading a hex string into an uint64_t). I answered "both ways":
The <inttypes.h> header has conversion macros to handle the ..._t types safely:
#include <stdio.h>
#include <inttypes.h>
sprintf( str, "%016" PRIx64, uint64 );
Or (if that is indeed what you're trying to do), the other way round:
#include <stdio.h>
#include <inttypes.h>
sscanf( str, "%" SCNx64, &uint64 );
Note that you cannot enforce widths etc. with the scanf() function family. It parses what it gets, which can yield undesired results when the input does not adhere to expected formatting. Oh, and the scanf() function family only knows (lowercase) "x", not (uppercase) "X".

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight