I am trying to write a C program that reads a binary file and converts it to a data type. I am generating a binary file with a head command head -c 40000 /dev/urandom > data40.bin. The program works for data types int and char but fails for double. Here is the code for the program.
void double_funct(int readFrom, int writeTo){
double buffer[150];
int a = read(readFrom,buffer,sizeof(double));
while(a!=0){
int size = 1;
int c=0;
for(c=0;c<size;c++){
char temp[100];
int x = snprintf(temp,100,"%f ", buffer[c]);
write(writeTo, temp, x);
}
a = read(readFrom,buffer,sizeof(double));
}
}
and this is the char function that works
void char_funct(int readFrom, int writeTo){
char buffer[150];
int a = read(readFrom,buffer,sizeof(char));
while(a!=0){
int size = 1;
int c=0;
for(c=0;c<size;c++){
char temp[100]=" ";
snprintf(temp,100,"%d ", buffer[c]);
write(writeTo, temp, strlen(temp));
}
a = read(readFrom,buffer,sizeof(char));
}
}
The problem is that with char I need to get 40000 words with wc -w file and I get them. Now with double I get random amount of words but theoretically I should get 5000 from 40000 bytes of data but I get a random amount between 4000 and 15000 and for char I get 40000 like it should 1 byte for one character.
I don't know what is wrong the same code works for int where I get 10000 words from 40000 bytes of data.
The main problem seems to be that your temp array is not large enough for your printf format and data. IEEE-754 doubles have a decimal exponent range from from -308 to +308. You're printing your doubles with format "%f", which produces a plain decimal representation. Since no precision is specified, the default precision of 6 applies. This may require as many as 1 (sign) + 309 (digits) + 1 (decimal point) + 6 (trailing decimal places) + 1 (terminator) chars (a total of 318), but you only have space for 100.
You print to your buffer using snprintf(), and therefore do not overrun the array bounds there, but snprintf() returns the number of bytes that would have been required, less the one required for the terminator. That's the number of bytes you write(), and in many cases that does overrun your buffer. You see the result in your output.
Secondarily, you'll also see a large number of 0.00000 in your output, arising from rounding small numbers to 6-decimal-digit precision.
You would probably have better success if you change the format with which you're printing the numbers. For example, "%.16e " will give you output in exponential format with a total of 17 significant digits (one preceding the decimal point). That will not require excessive space in memory or on disk, and it will accurately convey all numbers, regardless of scale, supposing again that your doubles are represented per IEEE 754. If you wish, you can furthermore eliminate the (pretty safe) assumption of IEEE 754 format by employing the variation suggested by #chux in comments. That would be the safest approach.
One more thing: IEEE floating point supports infinities and multiple not-a-number values. These are very few in number relative to ordinary FP numbers, but it is still possible that you'll occasionally hit on one of these. They'll probably be converted to output just fine, but you may want to consider whether you need to deal specially with them.
Related
This question already has answers here:
What is the simplest way of implementing bigint in C?
(5 answers)
How can I compute a very big digit number like (1000 digits ) in c , and print it out using array
(4 answers)
Store very big numbers in an integer in C
(2 answers)
Closed 3 months ago.
Implement a program to multiply two numbers, with the mention that the first can have a maximum of 2048 digits, and the second number is less than 100. HINT: multiplication can be done using repeated additions.
Up to a certain point, the program works using long double, but when working with larger numbers, only INF is displayed. Any ideas?
Implement a program to multiply two numbers, with the mention that the first can have a maximum of 2048 digits, and the second number is less than 100.
OK. The nature of multiplication is that if a number with N bits is multiplied by a number with M bits, then the result will have up to N+M bits. In other words, you need to handle a result that has 2148 bits.
A long double could be anything (it's implementation dependent). Most likely (Windows or not 80x86) is that it's a synonym for double, but sometimes it might be larger (e.g. the 80-bit format described on this Wikipedia page ). The best you can realistically hope for is a dodgy estimate with lots of precision loss and not a correct result.
The worst case (the most likely case) is that the exponent isn't big enough either. E.g. for double the (unbiased) exponent has to be in the range −1022 to +1023 so attempting to shove a 2048 bit number in there will cause an overflow (an infinity).
What you're actually being asked to do is implement a program that uses "big integers". The idea would be to store the numbers as arrays of integers, like uint32_t result[2148/32];, so that you actually do have enough bits to get a correct result without precision loss or overflow problems.
With this in mind, you want a multiplication algorithm that can work with big integers. Note: I'd recommend something from that Wikipedia page's "Algorithms for multiplying by hand" section - there's faster/more advanced algorithms that are way too complicated for (what I assume is) a university assignment.
Also, the "HINT: multiplication can be done using repeated additions" is a red herring to distract you. It'd take literally days for a computer do the equivalent of a while(source2 != 0) { result += source1; source2--; } with large numbers.
Here's a few hints.
Multiplying a 2048 digit string by a 100 digit string might yield a string with as many as 2148 digits. That's two high for any primitive C type. So you'll have to do all the math the hard way against "strings". So stay in the string space since your input will most likely be read in as much.
Let's say you are trying to multiple "123456" x "789".
That's equivalent to (123456 * (700 + 80 + 9)
Which is equivalent to to 123456 * 700 + 123456 * 80 + 123456 * 9
Which is equivalent to doing these steps:
result1 = Multiply 123456 by 7 and add two zeros at the end
result2 = Multiply 123456 by 8 and add one zero at the end
result3 = Multiply 123456 by 9
final result = result1+result2+result3
So all you need is a handful of primitives that can take a digit string of arbitrary length and do some math operations on it.
You just need these three functions:
// Returns a new string that is identical to s but with a specific number of
// zeros added to the end.
// e.g. MultiplyByPowerOfTen("123", 3) returns "123000"
char* MultiplyByPowerOfTen(char* s, size_t zerosToAdd)
{
};
// Performs multiplication on the big integer represented by s
// by the specified digit
// e.g. Multiple("12345", 2) returns "24690"
char* Multiply(char* s, int digit) // where digit is between 0 and 9
{
};
// Performs addition on the big integers represented by s1 and s2
// e.g. Add("12345", "678") returns "13023"
char* Add(char* s1, char* s2)
{
};
Final hint. Any character at position i in your string can be converted to its integer equivalent like this:
int digit = s[i] - '0';
And any digit can be converted back to a printable char:
char c = '0' + digit
I am using very basic code to convert a string into a long and into a double. The CAN library I am using requires a double as an input. I am attempting to send the device ID as a double to another device on the CAN network.
If I use an input string of that is 6 bytes long the long and double values are the same. If I add a 7th byte to the string the values are slightly different.
I do not think I am hitting a max value limit. This code is run with ceedling for an automated test. The same behaviour is seen when sending this data across my CAN communications. In main.c the issue is not observed.
The test is:
void test_can_hal_get_spn_id(void){
struct dbc_id_info ret;
memset(&ret, NULL_TERMINATOR, sizeof(struct dbc_id_info));
char expected_str[8] = "smg123";
char out_str[8];
memset(&out_str, 0, 8);
uint64_t long_val = 0;
double phys = 0.0;
memcpy(&long_val, expected_str, 8);
phys = long_val;
printf("long %ld \n", long_val);
printf("phys %f \n", phys);
uint64_t temp = (uint64_t)phys;
memcpy(&out_str, &temp, 8);
printf("%s\n", expected_str);
printf("%s\n", out_str);
}
With the input = "smg123"
[test_can_hal.c]
- "long 56290670243187 "
- "phys 56290670243187.000000 "
- "smg123"
- "smg123"
With the input "smg1234"
[test_can_hal.c]
- "long 14692989459197299 "
- "phys 14692989459197300.000000 "
- "smg1234"
- "tmg1234"
Is this error just due to how floats are handled and rounded? Is there a way to test for that? Am I doing something fundamentally wrong?
Representing the char array as a double without the intermediate long solved the issue. For clarity I am using DBCPPP. I am using it in C. I should clarify my CAN library comes from NXP, DBCPPP allows my application to read a DBC file and apply the data scales and factors to my raw CAN data. DBCPPP accepts doubles for all data being encoded and returns doubles for all data being decoded.
The CAN library I am using requires a double as an input.
That sounds surprising, but if so, then why are you involving a long as an intermediary between your string and double?
If I use an input string of that is 6 bytes long the long and double values are the same. If I add a 7th byte to the string the values are slightly different.
double is a floating point data type. To be able to represent values with a wide range of magnitudes, some of its bits are used to represent scale, and the rest to represent significant digits. A typical C implementation uses doubles with 53 bits of significand. It cannot exactly represent numbers with more than 53 significant binary digits. That's enough for 6 bytes, but not enough for 7.
I do not think I am hitting a max value limit.
Not a maximum value limit. A precision limit. A 64-bit long has smaller numeric range but more significant digits than an IEEE-754 double.
So again, what role is the long supposed to be playing in your code? If the objective is to get eight bytes of arbitrary data into a double, then why not go directly there? Example:
char expected_str[8] = "smg1234";
char out_str[8] = {0};
double phys = 0.0;
memcpy(&phys, expected_str, 8);
printf("phys %.14e\n", phys);
memcpy(&out_str, &phys, 8);
printf("%s\n", expected_str);
printf("%s\n", out_str);
Do note, however, that there is some risk when (mis)using a double this way. It is possible for the data you put in to constitute a trap representation (a signaling NaN might be such a representation, for example). Handling such a value might cause a trap, or cause the data to be corrupted, or possibly produce other misbehavior. It is also possible to run into numeric issues similar to the one in your original code.
Possibly your library provides some relevant guarantees in that area. I would certainly hope so if doubles are really its sole data type for communication. Otherwise, you could consider using multiple doubles to covey data payloads larger than 53 bits, each of which you could consider loading via your original technique.
If you have a look at the IEEE-754 Wikipedia page, you'll see that the double precision values have a precision of "[a]pproximately 16 decimal digits". And that's roughly where your problem seems to appear.
Specifically, though it's a 64-bit type, it does not have the necessary encoding to provide 264 distinct floating point values. There are many bit patterns that map to the same value.
For example, NaN is encoded as the exponent field of binary 1111 1111 with non-zero fraction (23 bits) regardless of the sign (one bit). That's 2 * (223 - 1) (over 16 million) distinct values representing NaN.
So, yes, your "due to how floats are handled and rounded" comment is correct.
In terms of fixing it, you'll either have to limit your strings to values that can be represented by doubles exactly, or find a way to send the strings across the CAN bus.
For example (if you can't send strings), two 32-bit integers could represent an 8-character string value with zero chance of information loss.
#include <stdio.h>
struct real_num
{
int int_num;
int frac_num;
};
void main()
{
struct real_num num1;
printf("input the number : ");
scanf("%d.%d",&num1.int_num,&num1.frac_num):
printf("%d.%d",num1.int_num,num1.frac_num);
}
i input 12.012 but buffer save 12.12 i want a 012 but this buffer save 12
what should i do? i want a save 012 (using only int)
Numbers are a matter of arithmetic. 1, 01, 1.0, 1.000, 0x01, 1e0 all describe the same number: whichever representation you use has the same mathematical properties, and behaves identically in calculation (ignoring the matter of computer storage of numbers as int or float or double... which is again another matter entirely).
The representation of a number is a matter of sequences of characters, or strings. Representations of numbers can be formatted differently, and can be in different bases, but can't be calculated with directly by a computer. To store leading zeroes, you need a string, not an int.
You typically convert from number representation to number at input, and from number to number representation at output. You would achieve your stated desire by not converting from number representation to number at input, but leaving it as a string.
You don't want to store 012, you want to store 0.012.
The value 0.012 in binary is (approximately):
0.00000011000100100110111010010111b
..and the value 12.012 is (approximately):
110.00000011000100100110111010010111b
Note that 0.012 is impossible to store precisely in binary because it would consume an infinite number of bits; in the same way that 1/3 can't be written precisely in decimal (0.333333333.....) because you'd need an infinite number of digits.
Let's look at 12.012. In hex it's this:
0x0000000C.03126E97
This makes it easier to see how the number would be stored in a pair of 32-bit integers. The integer part in one 32-bit integer, and the fractional part in another 32-bit integer.
The first problem is that you're using signed 32-bit integers, which means that one of the bits of the fraction is wasted for a sign bit. Essentially, you're using a "sign + 31 bit integer + wasted bit + 31 bit fraction" fixed point format. It'd be easier and better to use an unsigned integer for the fractional bits.
The second problem is that standard C functions don't support fixed point formats. This means that you either have to write your own "string to fixed point" and "fixed point to string" conversion routines, or you have use C's floating point conversion routines and write your own "floating point to fixed point" and "fixed point to floating point" conversion routines.
Note that the latter is harder (floating point is messy), slower, and less precise (double floating point format only supports 53 bits of precision while you can store 62 bits of precision).
A fraction does not consists of a single integer. A fraction consists of 2 integers: numerator/denominator.
Code needs to keep track of width of the fraction input. Could use "%n" to record offset in scan.
#include <stdio.h>
struct real_number {
int ipart;
int num;
int den_pow10;
};
void main(void) {
struct real_number num1;
printf("input the number : ");
fflush(stdout);
int n1 = 0;
int n2 = 0;
scanf("%d.%n%d%n",&num1.ipart, &n1, &num1.num , &n2):
if (n2 == 0) {
fprintf(stderr, "bad input\n");
return -1;
}
num1.den_pow10 = n2 - n1;
printf("%d.%*0d",num1.ipart,num1.den_pow10, num1.frac_num);
return 0;
}
Input/Output
input the number : 12.00056
Result 12.00056
I'm learning C from K&R (Second Edition) and am confused by one of the book's early examples. In section 1.5.2, the book first exhibits a character-counting program that looks like this:
#include <stdio.h>
/* count characters in input; 1st version */
main()
{
long nc;
nc = 0;
while (getchar() != EOF)
++nc;
printf("%ld\n", nc);
}
and then remarks:
It may be possible to cope with even bigger numbers by using a double
and exhibits this alternative version of the program:
#include <stdio.h>
/* count characters in input; 2nd version */
main()
{
double nc;
for (nc = 0; getchar() != EOF; ++nc)
;
printf("%.0f\n", nc);
}
Does using a double here make any sense? It doesn't seem to; surely a long long would be superior, since it can store bigger integers than a double can (without loss of precision) in the same space and helps readability by conveying at declaration time that the variable is an integer.
Is there some justification for using a double here that I'm missing, or is the K&R example just plain bad code that's been shoehorned in to demonstrate the double type?
double vs. long
Is there any rational reason to use a double to store an integer when precision loss isn't acceptable?
[...] Does using a double here make any sense?
Even in C2011, type long may have as few as 31 value bits, so its range of representable values may be as small as from -231 to 231 - 1 (supposing two's complement representation; slightly narrower with sign/magnitude representation).
C does not specify details of the representation of floating-point values, but IEEE-754 representation is near-universal nowadays. C doubles are almost always represented in IEEE-754 binary double precision format, which provides 53 bits of mantissa. That format can exactly represent all integers from -(253 - 1) to 253 - 1, and arithmetic involving those numbers will be performed exactly if it is performed according to IEEE specifications and if the mathematical result and all intermediate values are exactly representable integers (and sometimes even when not).
So using double instead of long could indeed yield a much greater numeric range without sacrificing precision.
double vs. long long
surely a long long would be superior [...]
long long has a larger range of (exactly) representable integer values than double, and therefore there is little reason to prefer double over long long for integers if the latter is available. However, as has been observed in comments, long long did not exist in 1978 when the first edition of K&R was published, and it was far from standard even in 1988 when the second edition was published. Therefore, long long was not among the alternatives Kernighan and Ritchie were considering. Indeed, although many C90 compilers eventually supported it as an extension, long long was not standardized until C99.
In any case, I'm inclined to think that the remark that confused you was not so much an endorsement of using double for the purpose, as a sidebar comment about the comparative range of double.
In the old 32-bit computer, using "long long" is more expensive than "double". because using "long long" each 64-bit integer addition needs to be computed by 2 CPU instructions: "ADD" & "ADC". But by using "double" only one FPU addition is enough to increment the counter. And from the IEEE-754 standard, "double" has a precision of 53 bit (1-bit sign + 11 bit exponent + (52+1 implicit) bit mantissa), which is ok to represent any integer ranged in [-2^53, 2^53], inclusive.
While in the 64-bit computer, usually long long is better, but still there might be some situation that using "double" can perform faster. e.g, if you have hyper-threading enabled, both FPU and integer unit can be operating by different threads, at the same time.
The approach, as such, is quite reasonable for the time of writing. If the largest integer types that is portably available is long, and can only be relied on to be 32 bits wide, whereas the type double effectively gives you 52 bit integers, it is reasonable to use it.
The problem is that the program makes no checks for corner cases. If we plug the type long into it, it may overflow past LONG_MAX, triggering undefined behavior.
With the floating-point representation, another issue occurs: when the accumulated value becomes large enough, adding another 1.0 results in no change of value: the loop stops counting. This is actually less harmful than the long overflow; though it is still defective. It happens because beyond a certain range, the floating point type becomes too sparse to represent consecutive integers. The value nc + 1 is not exactly representable in the double type, and gets rounded.
The program can be improved by adding a check for this situation: a check that difference between the new and old value is not 1.0, as expected. In this case, the program can emit a diagnostic and terminate, so that it doesn't produce an incorrect count.
A way to improve the program would be to extend its range using two double-s, both initialized to 0.0. When one of the values reaches its counting limit, its value is accumulated to the other, and it is cleared back to 0.0. In this manner, the program can continue to count far beyond the integer range of double, though providing only an approximation of the count (good to about DBL_DIG significant figures).
Another issue with the program is that it doesn't distinguish end-of-file from error. When getchar returns EOF, either condition can be true; the ferror stream accessor can be used to distinguish which. The program will count bytes until an I/O error occurs, and then report that count without mentioning the event.
Lastly, the program neglects to return an exit status from main, so that it has no clear termination status.
The following is a test version of the proposed program in which the type of the batch and batch_new variables is altered to float.
This allows for simple empirical testing of the concept, using reasonably small small inputs.
#include <stdio.h>
#include <stdlib.h>
#include <float.h>
/* count characters in input; SO version */
int main(void)
{
double total;
float batch; /* change me to double */
for (total = 0.0, batch = 0.0; getchar() != EOF; ) {
float batch_new = batch + 1.0; /* me too */
if (batch_new - batch != 1.0) {
total += batch;
batch = 1.0;
} else {
batch = batch_new;
}
}
if (ferror(stdin)) {
printf("I/O error on standard input\n");
return EXIT_FAILURE;
}
if (total == 0.0)
printf("%.0f (exact)\n", batch);
else
printf("%.*g (approx)\n", DBL_DIG, total + batch);
return 0;
}
Runs:
~/test$ dd if=/dev/zero bs=1024 count=$((15 * 1024)) | ./count
15360+0 records in
15360+0 records out
15728640 bytes (16 MB) copied, 0.293093 s, 53.7 MB/s
15728640 (exact)
~/test$ (dd if=/dev/zero bs=1024 count=$((15 * 1024)) ; echo -n x) | ./count
15360+0 records in
15360+0 records out
15728640 bytes (16 MB) copied, 0.288816 s, 54.5 MB/s
15728641 (exact)
~/test$ (dd if=/dev/zero bs=1024 count=$((16 * 1024))) | ./count
16384+0 records in
16384+0 records out
16777216 bytes (17 MB) copied, 0.343045 s, 48.9 MB/s
16777216 (exact)
~/test$ (dd if=/dev/zero bs=1024 count=$((16 * 1024)) ; echo -n x) | ./count
16384+0 records in
16384+0 records out
16777216 bytes (17 MB) copied, 0.304446 s, 55.1 MB/s
16777217 (approx)
~/test$ (dd if=/dev/zero bs=1024 count=$((16 * 1024)) ; echo -n xx) | ./count
16384+0 records in
16384+0 records out
16777216 bytes (17 MB) copied, 0.300321 s, 55.9 MB/s
16777218 (approx)
(The approx values are still exact because we are nowhere near the limit of the double type.)
I'm converting a double to a char string:
char txt[10];
double num;
num = 45.344322345
sprintf(txt, "%.1f", num);
and using ".1f" to truncate the decimal places, to the tenths digit.
i.e. - txt contains 45.3
I usually use precision in sprintf to ensure the char buffer is not overflowed.
How can I do that here also truncating the decimal, without using snprintf?
(i.e. if num = 345694876345.3 for some reason)
Thanks
EDIT If num is > buffer the result no longer matters, just do not want to crash. Not sure what would make the most sense in that case.
EDIT2 I should have made it more clear than in just the tag, that this is a C program.
I am having issues using snprintf in a C program. I don't want to add any 3rd party libraries.
Use snprintf() , which will tell you how many bytes were not printed. In general, you should size your array to be large enough to handle the longest string representation of the target integer type. If not known in advance, use malloc() (or asprintf(), which is non-standard, but present on many platforms).
Edit
snprintf() will fail gracefully if the format exceeds the given buffer, it won't overflow. If you don't need to handle that, then simply using it will solve your problem. I can't think of an instance where you would not want to handle that, but then again, I'm not working on whatever you are working on :)
Why not just make your buffer big enough to hold the largest possible string representation of a double?
Assuming a 64-bit double using the IEEE standard for floating point arithmetic, which uses 52 bits for a mantissa: 2^52 = 4,503,599,627,370,500. So we need 16 characters to hold all the digits before and after the decimal point. 19 considering the decimal point, sign character and null terminator.
I would just use a buffer size of at least 20 characters and move on.
If you need to print a double using scientific notation, you will need to add enough space for the exponent. Assuming a 11 bit signed exponent, that's another 4 characters for the exponent plus a sign for the exponent and the letter 'E'. I would just go with 30 characters in that case.
If you absolutely must do it on your own, count the digits in the number before trying to convert:
int whole = num;
int wholeDigits = 0;
do {
++wholeDigits;
}
while (whole /= 10);
double fraction = num - (int) num;
int decimallDigits = 0;
while (fraction > 0) {
++decimalDigits;
fraction *= 10;
fraction = fraction - (int) fraction;
}
int totalLength = decimalDigits ? wholeDigits + decimalDigits + 1 : wholeDigits;
You should probably verify that this ad-hoc code works as advertised before relying on it to guard against crashes. I recommend that you use snprintf or something similar instead of my code, as others have said.
Why do you want to do it without snprintf? You should be using snprintf regardless of whether your format string contains a double, another string or anything else, really. As far as I can see, there's no reason not to.