Increment decimal position in Solr - solr

Making a field as "float" fieldtype converts the source data 50 to 50.0 in the response.
Is there any way I could make it 50.00 by some tweaks in schema??

This responsibility is better handled in your display layer than in the response from Solr (and as far as I know, you can't define a round operation to be performed when a value is retrieved). Floating point values are inherently non-exact, so a different number then 50.0 could end up as 48.99999999999997, etc. This precision also changes depending on how large your numbers are.
Perform rounding in the display layer before displaying any values to the user.

Related

How to round Double's to exact bit-representation?

I'm doing geography calculations, and ultimately end up with a latitude and longitude to store in a Geography::Point object.
Both latitude and longitude can have 7 digits at most (which also gives precision up to 11 mm, which is plenty).
The problem is: if the value of a field cannot be stored correctly in a Double, MS SQL rounds towards the nearest number that can, but does so by adding a bunch of digits.
=> e.g. 5.9395772 is stored as 5.9395771999999996
The problem this creates, is that [Position].ToString() then exceeds the maximum amount of characters is allowed for that column (and no, I can't increase that limit).
Since we're dealing with Latitude, Longitude, Altitude and Accuracy, there's space for exactly 11 characters for Latitude and Longitude each:
String.Format(CultureInfo.InvariantCulture, "{0:##0.0######}", num)
I've tried simply Math.Round()ing to 6 digits, but then other numbers (e.g. 6.098163 to 6.0981629999999996) get the same problem.
How do I Math.Round towards the nearest 7-digit valid bit representation?
EDIT/ADD
Public Function ToString_LatLon(ByVal num As Double) As String
num = Math.Round(num, 7, MidpointRounding.AwayFromZero)
Return String.Format(CultureInfo.InvariantCulture, "{0:##0.0######}", num)
End Function 'IN = 5.9395772, OUT = 5.9395772
The above code receives a Double and correctly returns the String representation. I've checked it, this is correct also for troubling numbers.
It's stored in SQL Server through the framework we use. I think the problem occurs when storing the value
When I retrieve the value, I get an error in VB, saying the value is wider than the framework allows (max of 50 characters).
If I run a query in SSMS, I find e.g. POINT (X.0981629999999996 XX.664725 NULL 15602.707) (51 characters, anonimized).
EDIT 2
I've done some more research and some calculations. It seems that the stored value 5.9395772 is converted to binary and returned as 5.9395771999999996, which is stored as a double inside the database (in a binary Geography::Point object, not to worry.) Convert the binary 0 10000000001 0111110000100010000010000110100010000100010011011101 back to decimal, and you get 5.93957719999999955717839839053340256214141845703125, but abbreviated at 16 decimals - whereas I would like it abbreviated at 7 decimals.
Solutions:
Round the value down/up to the nearest value where everything from the 8th decimal onward is 0 (or enough zeroes before another nonzero digit is found)
Query for only so many decimals.
Query the actual (hexadecimal) value, and convert that (instead of the string representation)
Keep the string representation, but round the values before storing and after retrieving to the required amount of decimals.
Discussions:
Both in office and here (at #RobertBaron's answer): this is quite tricky, might have a huge decrease in precision, and is basically a lot of work.
Perhaps this is possible, I don't know.
This would be the cleanest solution, as my colleagues and I agree, however this is a lot of work in developing and testing.
Instead of caring about the value in memory to be equal to the value in the database, we don't care about the value in the database (too much).
In the end, after quite some whiteboard bit-calculations and a lengthy discussion, we've gone with option 4. After we retrieve the [Position].ToString() (for which we've increased the string limit) from the database, we convert that as we're already doing, and as additional step before using it anywhere we round the value to the required amount of decimals. When returning the value to the database, we once again round the value to the amount of decimals, and don't care what the database really does with it.
Essentially, this is option 2, but then on the program-side instead of database-side.
This is only a partial answer.
If by valid bit representation you mean exact bit representation, then this is possible. The decimal numbers that have exact bit representation are 1/2, 1/4, 3/4, 1/8, 3/8, 5/8, 7/8, 1/16, 3/16, ...
The challenge is to characterize among these powers of two, those whose base 10 representation has 7 digits or less, and then to round any base 10 number to the closest of these numbers.
I am posting this in the hope that it may get you one step further toward a solution.
If you cannot change the data type into a DECIMAL for whatever reasons, you have to cast it into a DECIMAL every time you need the value. It's that simple. And you can either do it on the SQL Server side or in VB.NET, but you need a DECIMAL. DOUBLEs are imprecise.
By the way, it is not the SQL Server that rounds towards the nearest number it recognizes by adding a bunch of digits - it's the processor that does it. That's also why you may get slightly different DOUBLE values after restoring your database on another server.
And never ever even think of using them as an ID: I know an application that uses FLOAT values containing the timestamp (<creation day since whatever>.<time as fractals of the day>) as part of the primary key (of nearly every table!). Every 10000th record or so cannot be addressed directly by its ID because the value differs somewhat on the client that sends the query and the server by some nanoseconds although the number looks exactly the same in SSMS on the client and the server.

Real to Float conversion with no loss of data

I had a table with two columns for coordinates stored in. These columns were REAL datatype, and I noticed that from my application it was only showing 5 decimals for coordinates, and positions were not accurate enough.
I decided to change datatype to FLOAT, so I could use more decimals. It was for my pleasant surprise that when I changed the column data type, the decimals suddenly appeared without me having to store all the coordinates again.
Anyone can tell me why this happens? What happens with the decimal precision on REAL datatype?. Isn´t the data rounded and truncated when inserted? Why when I changed the datatype the precision came up with no loss of data?..
You want to use a Decimal data-type.
Floating point values are caluclated by a value and an exponenent. This allows you have store huge number representations in small amounts of memory. This also means that you don't always get exactly the number you're looking for, just very very close. This is why when you compare floating point values, you compare them within a certain tolerance.
It was for my pleasant surprise that when I changed the column data type, the decimals suddenly appeared without me having to store all the coordinates again.
Be careful, this doesn't mean that the value that was filled in is the accurate value of what you're looking for. If you truncated your original calculation, you need to get those numbers again without cutting off any precision. The values that it autofills when you convert from Real to Float aren't the rest of what you truncated, they are entirely new values which result from adding more precision to the calculation used to populate your Real value.
Here is a good thread that explains the difference in data-types in SQL:
Difference between numeric, float and decimal in SQL Server
Another helpful link:
Bad habits to kick : choosing the wrong data type

ELKI how to increase the precision?

I am using ELKI mini GUI for clustering my data points. I have some 1300 GPS data points which I would like to cluster my GPS points (DBSCAN and OPTICS). As an input file for dbc.in I am using a csv file with only 2 columns (X,Y). The problem is, my X,Y (in projected) coordinates are very precise upto 6 decimal places. But after running the cluster algo I am getting lower precision (upto 3 decimal places). How can I increase the precision of output points?
And also when it is generating the clusters, it is automatically invoking some virtual IDs which are not corresponding to my actual point IDs (ID, X, Y). However, ID is not given in the input csv. It comprises only two columns (X,Y).
ELKI relies on double for representing numbers. If you need a higher precision, you will have to implement your own parser and output modules (it's easy though, as we have a highly modular architecture).
Default output serialization to text is handled by Java. Precision is therefore what you get from Java by default. This should be 15-16 digits of precision, if you are using DoubleVector, and 7-8 digits if you are using FloatVector.
A quick check with groovysh:
new DoubleVector([12345.678901234567890, 3456.109453] as double[]);
===> 12345.678901234567 3456.109453
new FloatVector([12345.678901234567890, 3456.109453] as float[]);
===> 12345.679 3456.1094
yields only the loss to be expected from double and float precision.
The best way to get row labels is to... add row labels to your data.
Wrt. to your add-on question in the comments: The default parser will treat a text row at the beginning of your file as column labels. So just put "X Y" into the first line of your file.
A reasonable input format will therefore be:
X Y Label
1 2 Point7
3 4 "Point 8"
The following are not-so-good ideas:
5 6 123shouldwork
7 8 don't do this: 3 parser will retain the 3
label should be non-numeric, so that the parser will treat it as label automatically. Otherwise, you have to set the appropriate parameter.
DBIDs are meant for internal handling. Maybe we should not write them to the output at all. FixedDBIDFilter is a hackish work-around; it is meant to be used to get reproducible hashing when using algorithms that need id-based hashing and doing multiple runs in the MiniGUI. Because on multiple runs, DBIDs will be continuously enumerated.

Quantity reference type, etc

I have been working on ADempiere these past few days and I am confused about something.
I created a new column on my database table named Other_Number with the reference type Quantity. Max length is 20.
On my Java source, I used BigDecimal.
Now every time I try to input exactly 20 digits on the Other_Number field, the last 4 digits gets rounded. Say if I input 12345678901234567891. When I try to save it, it becomes 12345678901234567000.
Other than that. All the records that gets saved on the database (PSQL) gets appended with ".000000000000" (that's 12 zeros).
Now I need to do something so that when I input 20 digits, the last 4 digits don't get rounded.
Also I need to get rid of that ".000000000000"
Can you please tell me why this is happening?
ADempiere as a financials ERP software is crucial in how it deals with financial amounts. In the database the exact BigDecimal value has to maintain its data integrity. Precision and rounding has been done as perfect as possible in code. Been part of the established famous project Compiere ERP, where iDempiere and Openbravo are also forks from, such financial amount management is already well defined and solved.
Perhaps you need to set precision in its appropriate window http://wiki.idempiere.org/en/Currency_%28Window_ID-115%29
If it's not actually a number you want but rather some kind of reference field that contains only numeric digits, change the definition in the Application Dictionary to be:
Reference: String
Length: 20
Value Format: 00000000000000000000 (i.e. 20 Zeros!)
This will force the input be numeric only (i.e. alpha characters will be ignored!) and because it is a String there will be no rounding
Adempiere will support upto 14(+5) digits (trillions) amount/quantity of business (USD currency).
What currency you are using, is it possible to use this much amount/quantity in ERP system ?
If you want to change the logic, then you can change logic at the getNumberFormat method of DispalyType.java class.
What was the business scenario?
In Adempiere java code "setScale" Method is used to rounded the value
Example:
BigDecimal len= value
len= len.setScale(2,4);
setLength(len);

Is there a good reason for storing percentages that are less than 1 as numbers greater than 1?

I inherited a project that uses SQL Server 200x, wherein a column that stores a value that is always considered as a percentage in the problem domain is stored as its greater than 1 decimal equivalent. For example, 70% (0.7, literally) is stored as 70, 100% as 100, etc. Aside from the need to remember to * 0.01 on retrieved values and * 100 before persisting values, it doesn't seem to be a problem in and of itself. It does make my head explode though... so is there a good reason for it that I'm missing? Are there compelling reasons to fix it, given that there is a fair amount of code written to work with the pseudo-percentages?
There are a few cases where greater than 100% occurs, but I don't see why the value wouldn't just be stored as 1.05, for example, in those cases.
EDIT: Head feeling better, and slightly smarter. Thanks for all the insights.
There are actually four good reasons I can think of that you might want to store—and calculate with—whole-number percentage values rather than floating-point equivalents:
Depending on the data types chosen, the integer value may take up less space.
Depending on the data type, the floating-point value may lose precision (remember that not all languages have a data type equivalent to SQL Server's decimal type).
If the value will be input from or output to the user very frequently, it may be more convenient to keep it in a more user-friendly format (decision between convert when you display and convert when you calculate ... but see the next point).
If the principle values are also integers, then
principle * integerPercentage / 100
which uses all integer arithmetic is usually faster than its floating-point equivalent (likely significantly faster in the case of a floating-point type equivalent to T-SQL's decimal type).
If its a byte field then it takes up less room in the db than floating point numbers, but unless you have millions and millions of records, you'll hardly see a difference.
Since floating-point values can't be compared for equality, an integer may have been used to make the SQL simpler.
For example
(0.3==3*.1)
is usually False.
However
abs( 0.3 - 3*.1 )
Is a tiny number (5.55e-17). But it's pain to have to do everything with (column-SomeValue) BETWEEN -0.0001 AND 0.0001 or ABS(column-SomeValue) < 0.0001. You'd rather do column = SomeValue in your WHERE clause.
Floating point numbers are prone to rounding errors and, therefore, can act "funny" in comparisons. If you always want to deal with it as fixed decimal, you could either choose a decimal type, say decimal(5,2), or do the convert and store as int thing that your db does. I'd probably go the decimal route, even though the int would take up less space.
A good guess is because anything you do with integers (storing, calculating, stuffing into an edit for for a user, etc.) is marginally easier and more efficient than doing the same with floating point numbers. And the rounding issues aren't so obvious when you look at the data.
If these are numbers that end users are likely to see and interact with, percentages are easier to understand than decimals.
This is one of those situations where a notation aid can help; in the program, be consistent in using a prefix (Hungarian) or postfix to specify values that are percentages vs. those that are decimal. If you can extend a naming convention to the database fields themselves, so much the better.
And to add to the data storage issue, if you can use integer arithmetic for whatever processing you are doing, the performance is much better than when doing floating point arithmetic... So storing ther percetages as integer values may allow the processing logic to itilize integer arithmetic
If you're actually using them as a coefficient (or expect users of the database to do this sort of thing in reports), there's a case for storing them as a coefficient - particularly if there's a reason to do calculations involving more than one.
However, if you do this you should be consistent - either all percentages or all coefficients.

Resources