Round of Accurately from the last value after decimal - sql-server

I have stuck (again) and looking for smart human beings of planet earth to help me out.
Background
I have an application which distributes the amounts to some users in a given percentage. Say I have $35000 and it will distribute the amounts to 3 users (A, B and C) in some ratio. So the amount distributed will be
A - 5691.05459265518
B - 14654.473815207
C - 14654.4715921378
which totals up to $35000
The Problem
I have to provide the results on the front end in 2 decimal spaces instead of float. So I use the round function of SQL Server with the precision value of 2 to convert these to 2 decimal spaces with rounding. But the issue is that when I total these values this comes out to be $34999.9999 instead of $35000.
My Findings
I searched a bit and found
If the expression that you are rounding ends with a 5, the Round()
function will round the expression so that the last digit is an even
number. Here are some examples:
Round(34.55, 1) - Result: 34.6 (rounds up)
Round(34.65, 1) - Result: 34.6 (rounds down)
So technically the answer is correct but I am looking for a function or a way to round of the value exactly what it should have been. I found that if I start rounding off (if the value is less than 5 then leave the previous number else increment the previous digit by 1 ) from the last digit after the decimal and keep on backtracking while I am left with only 2 decimal places.
Please advise.

Related

Find high & low peak points in cell array MATLAB

I want to find "significant" changes in a cell array in MATLAB for when I have a movement.
E.g. I have YT which represents movements in a yaw presentation for a face interaction. YT can change based on an interaction from anywhere upwards of 80x1 to 400x1. The first few lines might be
YT = {-7 -8 -8 -8 -8 -9 -9 -9 -6 ...}
I would like to record the following
Over the entire cell array;
1) Count the number of high and low peaks
I can do this with findpeak but not for low peaks?*
2) Measure the difference between each peak -
For this example, peaks -9 and -6 so difference of +3 between those. So report 1 peak change of +3. At the moment I am only interested in changes of +/- 3, but this might change, so I will need a threshold?
and then over X number of cells (repeating for the cell array)
3) count number of changes - for this example, 3 changes
3) count number of significant changes - for this example, 1 changes of -/+3
4) describe the change - 1 change of -1, 1 change of -1, 1 change of +3
Any help would be appreciated, bit of a MATLAB noob.
Thanks!
1) Finding negative peaks is the same as finding positive ones - all you need to do is multiply the sequence by -1 and then findpeaks again
2) If you simply want the differences, then you could subtract the vectors of the positive and negative peaks (possibly offset by one if you want differences in both directions). Something like pospeaks-negpeaks would do one side. You'd need to identify whether the positive or negative peak was first (use the loc return from findpeaks to determine this), and then do pospeaks(1:end-1)-negpeaks(2:end) or vice versa as appropriate.
[edit]As pointed out in your comment, the above assumes that pospeaks and negpeaks are the same length. I shouldn't have been so lazy! The code might be better written as:
if (length(pospeaks)>length(negpeaks))
% Starts and ends with a positive peak
neg_diffs=pospeaks(1:end-1)-negpeaks;
pos_diffs=negpeaks-pospeaks(2:end);
elseif (length(pospeaks)<length(negpeaks))
% Starts and ends with a negative peak
pos_diffs=negpeaks(1:end-1)-pospeaks;
neg_diffs=pospeaks-negpeaks(1:end-1);
elseif posloc<negloc
% Starts with a positive peak, and ends with a negative one
neg_diffs=pospeaks-negpeaks;
pos_diffs=pospeaks(2:end)-negpeaks(1:end-1);
else
% Starts with a negative peak, and ends with a positive one
pos_diffs=negpeaks-pospeaks;
neg_diffs=negpeaks(2:end)-pospeaks(1:end-1);
end
I'm sure that could be coded more effectively, but I can't think just now how to write it more compactly. posloc and negloc are the location returns from findpeaks.[/edit]
For (3) to (5) it is easier to record the differences between samples: changes=[YT{2:end}]-[YT{1:end-1}];
3) To count changes, count the number of non-zeros in the difference between adjacent elements: sum(changes~=0)
4) You don't define what you mean by "significant changes", but the test is almost identical to 3) sum(abs(changes)>=3)
5) It is simply changes(changes~=0)
I would suggest diff is the command which can provide the basis of a solution to all your problems (prior converting the cell to an array with cell2mat). It outputs the difference between adjacent values along an array:
1) You'd have to define what a 'peak' is but at a guess:
YT = cell2mat(YT); % convert cell to array
change = diff(YT); % get diffs
highp = sum(change >= 3); % high peak threshold
lowp = sum(change <= -3); % low peak threshold
2) diff(cell2mat(YT)) provides this.
3)
YT = cell2mat(YT); % convert cell to array
change = diff(YT); % get diffs
count = sum(change~=0);
4) Seems to be answered in the other points?

Google: Divide and return result in form of string [duplicate]

This question already has answers here:
convert fraction into string and also insert [] for repeating part
(3 answers)
Closed 8 years ago.
I recently came across an interview question asked by Google and I am not able to find an optimized algorithm to solve this question:
Given 2 numbers a and b. Divide a and b and return result in form of a string.
Example 1
Input: a=100 , b=3
Output: 33.(3)
Note: (100/3)=33.33333....Here 3 is in brackets because it gets repeated continuously.
Example 2
Input: a=5 , b=10
Output: 0.5
Example 3
Input: a=51 , b=7
Output: 7.(285714)
Note: 51/7 = 7.285714285714285714285714285714......... Here 285714 is in brackets because it is repeating.
It would be great if anyone can think of a time-optimized algorithm for this question.
Thank You in advance.
You can simply perform long division by hand, which is O(N) on the number of digits -- it's hard to see how you could do better than that.
The only problem with long division is that it would not terminate ever if the fraction is a repeating decimal, but you can easily detect this before starting (the fraction is a repeating decimal iff b has any factors other than 2 and 5). If it is a repeating decimal, you need to keep a list of interim remainders that you have already seen. When you encounter one that you have seen before, you know that you have just found the end of the repeating period.
You might try to keep track of the last N digits in the quotient (N being equal to the number of the digits in the divisor) and the remainder, once you hit the same combination ( last N digits + remainder) your number sequence is going to repeat (I don't have a hard proof for that, but interview questions aren't supposed to be very hard...)

Calculate a series up to five decimal places

I want to write a C program that will calculate a series:
1/x + 1/2*x^2 + 1/3*x^3 + 1/4*x^4 + ...
up to five decimal places.
The program will take x as input and print the f(x) (value of series) up to five decimal places. Can you help me?
For evaluating a polynomial, Horner form generally has better numerical stability than expanded form See http://reference.wolfram.com/legacy/v5/Add-onsLinks/StandardPackages/Algebra/Horner.html
If first term was a typo then try (((((1/4 )* x + 1/3) * x ) + 1/2) * x + 1) * x
Else if first term is really 1/x (((((1/4 )* x + 1/3) * x ) + 1/2) * x*x + 1/x
Of course, you still have to analyze convergence and numerical stability as developped in Eric Postpischil answer.
Last thing, does the serie you submited as example really converge to a finite value for some x???
In order to know that the sum you have calculated is within a desired distance to the limit of the series, you need to demonstrate that the sources of error are less than the desired distance.
When evaluating a series numerically, there are two sources of error. One is the limitations of numerical calculation, such as floating-point rounding. The other is the sum of the remaining terms, which have not been added into the partial sum.
The numerical error depends on the calculations done. For each series you want to evaluate, a custom analysis of the error must be performed. For the sample series you show, a crude but sufficient bound on the numerical error could like be calculated without too much effort. Is this the series you are primarily interested in, or are there others?
The sum of the remaining terms also requires a custom analysis. Often, given a series, we can find an expression that can be proven to be at least as large as the sum of all remaining terms but that is more easily calculated.
After you have established bounds on these two errors, you could sum terms of the series until the sum of the two bounds is less than the desired distance.

Decimal (10,9) variable can't hold the number 50 (SQL Server 2008)

This one is pretty straightforward. Why does the code below cause the error below?
declare #dTest decimal(10, 9)
set #dTest = 50
Error:
Msg 8115, Level 16, State 8, Line 3
Arithmetic overflow error converting int to data type numeric.
According to the MSDN documentation on decimal(p, s), p (or 10 in my case) is the "maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point" whereas s (or 9 in my case) is the "maximum number of decimal digits that can be stored to the right of the decimal point."
My number, 50, has only 2 digits total (which less than the maximum 10), and 0 digits to the right of the decimal (which is less than the maximum 9), therefore it should work.
I found this question about essentially the same issue, but no one explained why the documentation seems to conflict with the behavior. It seems like the s dimension is actually being interpreted as the fixed number of digits to the right of the decimal, and being subtracted from the p number, which in my case leaves 10 - 9 = only 1 digit remaining to handle the left side.
Can anyone provide a reasonable way to interpret the documentation as written to match the behavior?
EDIT:
I see some explanations below, but they don't address the fundamental problem with the wording of the docs. I would suggest this change in wording:
For "p (precision)" change "The maximum total number of decimal digits that can be stored" to read "The maximum total number of decimal digits that will be stored".
And for "s (scale)" change "The maximum number of decimal digits that can be stored to the right of the decimal point." to "The number of decimal digits that will be stored to the right of the decimal point. This number is substracted from p to determine the maximum number of digits to the left of the decimal point."
I'm going to submit a bug report to Connect unless some one has a better explanation.
10 - 9 is 1. DECIMAL(10, 9) can hold a number in the format 0.000000000. 50 has two digits before the decimal point, and is therefore out of range. You quoted it yourself:
According to the MSDN documentation on decimal(p, s), p (or 10 in my case) is the "maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point" whereas s (or 9 in my case) is the "maximum number of decimal digits that can be stored to the right of the decimal point."
I submitted a bug report to Connect: Misleading documentation on the decimal data type
A reasonable way to interpret the documentation is that trailing decimal zero digits are not ignored. So your number has 9 decimal digits to the right of the decimal point, and they all happen to be 0.
DECIMAL(10, 9) is a fixed precision and scale numeric data type. This means that it always stores the same number of digits to the right of the decimal point. So the data type you specified can only store numbers with one digit to the left of the decimal point and 9 digits to the right. Obviously, 50 does not fit in a number of that format.
Go though the link below.
http://msdn.microsoft.com/en-gb/library/ms190476.aspx
Precision is the number of digits in a number. Scale is the number of digits to the right of the decimal point in a number. For example, the number 123.45 has a precision of 5 and a scale of 2.

Picking a random item based on probabilities

There's a similar question, I know, but it confused me, so I thought it easier to ask in my way.
So I have an array of values, positive and negative. The higher they are, the more probability they have of being chosen.
I'm having trouble actually figuring out how to assign the probabilities and then randomly choose one. I'm guessing the array will need to be sorted first, but then I'm a bit lost after that.
"I have various different sizes of cups of coffee. The larger they are, the more I want to charge for them. I'm having trouble actually figuring out how to assign prices".
This isn't just a programming problem - you've specified that probability increases with value, but you haven't said how it increases with value. Normally, coffee shops don't charge in direct proportion to the amount of coffee. You can't assign probabilities in proportion to value, because some of your values are negative, but probabilities cannot be negative.
Sounds like you need to nail down the problem a bit more before you can write any code.
If you really don't care how probability relates to value, other than that they increase in order of value, then one easy way would be:
sort your array
assign a probability of 1 to the first element, 2 to the second, and so on.
now, your probabilities don't add up to 1, which is a problem. So divide each probability by the total of all the probabilities you have assigned: (1 + 2 + .. + n) = n(n+1)/2. This is called "normalization".
Given your list of probabilities, which add up to 1, the easiest way to repeatedly choose one is generally to calculate the cumulative probabilities, which I will demonstrate with an example:
value (sorted): -12 -3 127 1000000
assigned probability: 0.1 0.2 0.3 0.4
cumulative probability: 0.1 0.3 0.6 1.0
The cumulative probability is defined as the sum of all the probabilities up to that point.
Now, from your random number generator you need a random (floating-point) value between 0 and 1. If it lies between 0 and 0.1, you've picked -12. If it lies between 0.1 and 0.3, you've picked -3, and so on. To figure out which range it lies in, you could walk linearly through your array, or you could do a binary search.
You could skip the normalization step and the use of floating-point, if you wanted. Assign "cumulative probabilities" (1, 3, 6, 10 ...) , but make it understood that the actual probability is the stored integer value divided by n(n+1)/2. Then choose a random integer from 0 to n(n+1)/2 - 1. If it's less than 1, you've selected the first value, else if less than 3 the second, and so on. This may or may not make the code clearer, and your RNG may or may not do well choosing integer values from a large range.
Note that you could have assigned probabilities (0.001, 0.002, 0.003, 0.994) instead of (0.1, 0.2, 0.3, 0.4), and still satisfied your requirement that "the higher the value, the higher the probability".
One way could be
Make all values positive (add absolute value of the minimum value to all values)
Normalize the values to sum to 1 (divide each value with the sum of the values)
To randomize a value from the generated distribution now you can
Pick random number on [0,1].
Start summing the probabilites until the sum is greater or equal to the random value. Choose that index as your random value.
Following up on Steve Jessop's suggestion, after you've chosen a random integer from 0 to n(n+1)/2 - 1, you can just get the triangular root: (-1 + sqrt((8*x)+1))/2

Resources