Please help me to set output of BigDecimal. For example:
I need output of numbers wich length is less than 8 digits like this: "12345678", and if more then: "1,2345Е+9".
Methods
toPlainString();
toEngineeringString();
toString();
are working not as expected:
public class TestBigDecimalOutput {
public static void main(String[] args) {
BigDecimal d = new BigDecimal("57657657657453646587980887663654676580.24545346565476767645");
String outPlainString = d.toPlainString();
String outEngineeringString = d.toEngineeringString();
String outString = d.toString();
System.out.println(outPlainString);
System.out.println(outEngineeringString);
System.out.println(outString);
}
}
Code below makes next output:
57657657657453646587980887663654676580.24545346565476767645 57657657657453646587980887663654676580.24545346565476767645 57657657657453646587980887663654676580.24545346565476767645
Give me a hint what am I doing wrong?
To the best of my knowledge (and I kind of hope I'm wrong) you can't make a BigDecimal render in a particular way. There's no way (that I know) to make it use scientific or engineering notation, or to control when it does. Engineering notation will only used with the number "needs an exponent," and the circumstances in which it will need an exponent are explained in the documentation for BigInteger.toString(). I find the explanation hard to follow, frankly, and I've never been able to figure out how to make the behaviour follow a useful pattern in an application.
I've found that it's easier just not to try, and to format the number myself. That means extracting the mantissa and exponent in base 10, and then concatenating as strings with an "E" between the components. The mantissa part will need to be rounded, probably. If you know the exponent (in base 10) you can work out from its magnitude whether it exceeds the threshold you want, for using scientific notation. If it doesn't, you can just use BigDecimal.toPlainString(), perhaps with suitable rounding.
Engineering notation can be handled in the same way, but with the exponent rounded to the nearest multiple of three, and the mantissa adjusted to compensate.
There's a useful library 1 that can do the necessary logarithm and power operations, or see this question 2 for some suggestions.
I have some code that does this stuff, which I'm quite happy to share, but it's way to long to include, and I can't promise that it's production-quality, or even readable.
Related
How to check a given napi_value of type napi_number is an integer or decimal (a number with fractional value) by using node.js native N-API function .
Look like there is no isInt() or isDouble() equivalent function in N-API (we don’t want to use V8 function call either). Let us consider a scenario where we are calling a native addon function f1() from JavaScript by passing a JavaScript object as argument as shown in the snippet.
let obj = { n1: 123, n2: 123.45 };
myaddon.f1( obj );
The native function f1() want to extract value associated with the keys n1 and n2 by calling the best fit value extraction N-API function. For example to extract value of n1 it may be best to use one of napi_get_value_int* and similarly for the n2 the double is a better choice.
napi_get_value_double
napi_get_value_int32
napi_get_value_uint32
napi_get_value_int64
Unfortunately I could not find any N-API function to verify the derivative of napi_number property. Have you come across similar situation, if so how did you solve this problem?
https://nodejs.org/api/n-api.html
A request has been opened with node-addon-api team regarding this feature and they provided an answer that sound logical. I thought of sharing the answer with this community that may help similar queries others may have. Here is the answer from node-addon-api team
While handling numbers with JavaScript, it is important to know
that all numbers in JavaScript are double-precision 64 bit IEEE
754 values (despite some engines like v8 might have additional
represents on small integers, there is no such definition in ECMA spec
and no way to determine these types of number in JavaScript).
napi_get_value_{double,int32,uint32,int64} just convert these value to
its desired one. There might be a precision loss in the conversion. If
a determined number is required in the case, use BigInt instead.
EDIT
SOLVED
Solution was to use the long double versions of sin & cos: sinl & cosl.
It is my first post here, so bear with me :).
I come today here to ask for your input on a small problem that I am having with a C application at work. Basically, I am computing an Extended Kalman Filter and one of my formulas (that I store in a variable) has multiple computations of sin and cos, at least 16 in total in the same line. I want to decrease the time it takes for the computation to be done, so the idea is to compute each cos and sin separately, store them in a variable, and then replace the variables back in the formula.
So I did this:
const ComputationType sin_Roll = compute_sin((ComputationType)(Roll));
const ComputationType sin_Pitch = compute_sin((ComputationType)(Pitch));
const ComputationType cos_Pitch = compute_cos((ComputationType)(Pitch));
const ComputationType cos_Roll = compute_cos((ComputationType)(Roll));
Where ComputationType is a macro definition (renaming) of the type Double. I know it looks ugly, a lot of maybe unnecessary castings, but this code is generated in Python and it was specifically designed so....
Also, compute_cos and compute_sin are defined as such:
#define compute_sin(a) sinf(a)
#define compute_cos(a) cosf(a)
My problem is the value I get from the "optimized" formula is different from the value of the original one.
I will post the code of both and I apologise in advance because it is very ugly and hard to follow but the main points where cos and sin have been replaced can be seen. This is my task, to clean it up and optimize it, but I am doing it step by step to make sure I don't introduce a bug.
So, the new value is:
ComputationType newValue = (ComputationType)(((((((ComputationType)-1.0))*(sin_Pitch))+((DT)*((((Dg_y)+((((ComputationType)-1.0))*(Gy)))*(cos_Pitch)*(cos_Roll))+(((Gz)+((((ComputationType)-1.0))*(Dg_z)))*(cos_Pitch)*(sin_Roll)))))*(cos_Pitch)*(cos_Roll))+((((DT)*((((Dg_y)+((((ComputationType)-1.0))*(Gy)))*(cos_Roll)*(sin_Pitch))+(((Gz)+((((ComputationType)-1.0))*(Dg_z)))*(sin_Pitch)*(sin_Roll))))+(cos_Pitch))*(cos_Roll)*(sin_Pitch))+((((ComputationType)-1.0))*(DT)*((((Gz)+((((ComputationType)-1.0))*(Dg_z)))*(cos_Roll))+((((ComputationType)-1.0))*((Dg_y)+((((ComputationType)-1.0))*(Gy)))*(sin_Roll)))*(sin_Roll)));
And the original is:
ComputationType originalValue = (ComputationType)(((((((ComputationType)-1.0))*(compute_sin((ComputationType)(Pitch))))+((DT)*((((Dg_y)+((((ComputationType)-1.0))*(Gy)))*(compute_cos((ComputationType)(Pitch)))*(compute_cos((ComputationType)(Roll))))+(((Gz)+((((ComputationType)-1.0))*(Dg_z)))*(compute_cos((ComputationType)(Pitch)))*(compute_sin((ComputationType)(Roll)))))))*(compute_cos((ComputationType)(Pitch)))*(compute_cos((ComputationType)(Roll))))+((((DT)*((((Dg_y)+((((ComputationType)-1.0))*(Gy)))*(compute_cos((ComputationType)(Roll)))*(compute_sin((ComputationType)(Pitch))))+(((Gz)+((((ComputationType)-1.0))*(Dg_z)))*(compute_sin((ComputationType)(Pitch)))*(compute_sin((ComputationType)(Roll))))))+(compute_cos((ComputationType)(Pitch))))*(compute_cos((ComputationType)(Roll)))*(compute_sin((ComputationType)(Pitch))))+((((ComputationType)-1.0))*(DT)*((((Gz)+((((ComputationType)-1.0))*(Dg_z)))*(compute_cos((ComputationType)(Roll))))+((((ComputationType)-1.0))*((Dg_y)+((((ComputationType)-1.0))*(Gy)))*(compute_sin((ComputationType)(Roll)))))*(compute_sin((ComputationType)(Roll)))));
What I want is to get the same value as in the original formula. To compare them I use memcmp.
Any help is welcome. I kindly thank you in advance :).
EDIT
I will post also the values that I get.
New value : -1.2214615708217025e-005
Original value : -1.2214615708215651e-005
They are similar up to a point, but the application is safety critical and it is necessary to validate the results.
You can not meet your expectation for a couple of reasons.
By altering the code you adjust the machine instructions being used in subtle ways that will impact the final value.
For instance if originally it was using fused multiplies and adds and this is no longer happening it will change the result.
You don't mention the target architecture. Some architectures retain more than 64bits in the floating point pipeline. These extra bits get rounded when forced into 64bit memory. Again altering how this works will have minor effects on the final output.
I am very new to the HM HEVC (and the JEM) reference software, and I am currently trying to understand the source code. I want to add some lines to display for each component: name of Algo (i.e. inter/intra Algos) + length of the bitstream+ position in output bin file.
To know which component cost more bits to code and how codec is working. I want to do same thing for the JEM also after that.
my problem first is that I am unable of understanding a lot of function there, the comment is not sufficient, so is there any references to understand the code??!! (I already read the Manuel ,doesn’t help).
2nd I don’t know where & how exactly to add these lines; is it in TEncGOP, TEncSlice or TEncCU. Ps: I don’t think in TEncGOP.compressGOP so maybe in the 2 other classes.
(I put the answer to comment that #Mourad put four hours ago here, becuase it will be long)
I assume that you could manage to find where the actual encoding after the RDO loop is implemented. As you correctly mentioned, xEncodeCU is the function you need to refer to make sure you are not still in the RDO.
Now you need to find the exact function in xEncodeCU that is responsible for your target codec tool.
For instance, if you want to count the number of bits for coefficient coding, you should be looking into the m_pcEntropyCoder->encodeCoeff() (It's a JEM function and may have different name in the HM). Once you find this line in the xEncodeCU, you may do this and get the number of bits written inside encodeCoeff() function:
UInt b_before = m_pcEntropyCoder->getNumberOfWrittenBits();
m_pcEntropyCoder->encodeCoeff( ... );
UInt b_after = m_pcEntropyCoder->getNumberOfWrittenBits();
UInt writtenBitsCoeff = b_after - b_before;
One important point: as you cas see, the function getNumberOfWrittenBits() gives you integer rates, which is obtained by rounding sum of fractional rates corresponding to all syntax elements coded inside the function encodeCoeff. This error might or might not be acceptable, depending on your problem. For example, if instead of coefficient coding rate, you wanted to know the rate of CBF, then this error would not be acceptable at all. Because, CBF rate is mostly less than one bit. If this is your case, then you would need to calculate the fractional bits one-by-one. It would be totally different and relatively more complicated than this.
Point 1: There is one rule of tumb that logging coding decisions (e.g. pred mode, MV, IPM, block size) is much easier at the decoder side than encoder. This is because of the fact that you have super complicated RDO process at the encoder side that can easily make you get lost in the loops. But at the decoder side, everything appears only once. However, if you insist on doing it at the encoder side, you may find some tips here: Get some information from HEVC reference software
Point 2: Unlike coding decisions, logging rate (i.e. number of written bits for different syntax elements) is more complicated at the decoder side than encoder. This is particularly true for fractional bits associated to anything that is encoded in non-EP mode (i.e. with CABAC contexts). So you may do this part at the ecoder side. But I am afraid it is not easy.
Point 3: I think the best way to understand the code is to read it line-by-line. It's very time-consuming but if you theoritically know the standard(s), you will probably be able to distiguish important parts and ignore the rest.
PS: I think there are too many questions, mostly too general, in your post. It makes it a bit difficult for me to answer them all together. So you I'll wait for you to take your next step and ask more precise questions.
I'm trying to create a Gauss Eliminator in C. For this, from time to time, I need to check whether a matrix is numerically singular: if a certain number (double) is very very very small.
My problem is, that if I try to do this:
if(0 == matrix->items[from]){
fprintf(stderr,"Matrix is still singular after attempting pivot. Exitig.\n");
}
This will never yield true. Because of the inaccuracy of double, it will never be exactly 0. However, when trying to run the program, cases like this fill up the numbers with inf or NaN, depending on whether multiplying or dividing with it and its combinations.
In order to filter these, I would need something like this:
#define EPSILON very_small
// rest of the code
if(matrix->items[from] < EPSILON){
...singular
}
What is the recommended value for this EPSILON? Is it the absolute accuracy of double, or maybe a bit larger value?
By the way, which would be better, defining it as a macro as above, or using it like:
const double EPSILON = ...;
Sorry if I'm not being clear enough, English is not my native language.
Thanks for your replies.
I need to check whether a matrix is numerically singular
Usually this is detected by preventing double overflow.
// Check if 1.0/determinant will overflow.
if (fabs(determinant) <= 1.0/(0.99*DBL_MAX)) {
Handle_Singular_Case()
} else {
one_over_det = 1.0/determinant;
}
Using DBL_EPSILON (example: 2e-16) is usually the wrong solution. double math needs relative comparisons to insure good calculations far way from 1.0 magnitude.
// Rarely the right thing to do.
#define EPSILON DBL_EPSILON
if(fabs(matrix->items[from]) < EPSILON){
Yet this is very context sensitive #Weather Vane.
Yet OP's real problem is certainly here: "when trying to run the program, cases like this fill up the numbers with inf or NaN, depending on whether multiplying or dividing with it and its combinations.". Various techniques can be used to avoid this issue like doing elimination with partial pivoting.
To address that issue, best to post code and sample data.
I'm trying to check my floating point operations in c99.
Should I be doing all of my operations inside of isnormal()? Does this code make sense?
double dTest1 = 0.0;
double dTest2 = 0.0;
double dOutput = 0.0;
dTest1 = 5.0;
dTest2 = 10.3;
dOutput = dTest1 * dTest2;
//add some logic based on output
isnormal(dOutput);
Your use of isnormal does not look like anything idiomatic. I am not sure what you expect exactly from using isnormal this way (it's obviously going to be true for 5.0*10.3, I would expect the compiler to optimize it so), but here are at least some obvious problems assuming you use it for other computations:
Zero is not normal, so you shouldn't use isnormal as a sanity check for a result that can be zero.
isnormal will not tell you if your computation came so close to zero that it lost precision (the subnormal range) and went back into the normal range later.
You might be better served by FPU exceptions: there is one for each possible event for which you might want to know if it happened since you initiated your computations, and the way to use them is sketched out in this existing answer.