Wierd SUM result in nested Row Group - sql-server

I have three nested ROW groups:-
The first one is a depended on a wether a field is true or false in the dataset, for each case. This is the where the error is worst. The second is nested on the first and is based on a group variable in the cases (1 to many), the third is the ref number of the cases.
The sums don't work for a cloumn that is produced by a join, depending on the ID of the second group. It seems to pull the right value, but multiplies by the number of cases. I can divide by the case numbers here, inside the last nested group(ref#) to get the right value. Tried using "Count" , Blank, Add total after..
If I try to sum the column with "=Sum(ReportItems!Textbox231.Value)" Produces:-
The Value expression for the textrun 'Textbox232.Paragraphs[0].TextRuns[0]' uses an aggregate function on a report item. Aggregate functions can be used only on report items contained in page headers and footers.
The sums work fine for the non joined values..in all three nested row groups. But for the joined values they are out by an order of magnitude. Why is this?
SUM not working for 3rd column
SUM yields wired results
SELECT DISTINCT

Here is a common reason why this kind of problem happen.
The likely reason for the SUM being wrong is the fact that the DISTINCT in your select hides duplicates in the underlying query. Since the SUM is executed before the distinct, it sum the results that you don't see after they're filtered out by the DISTINCT.
Instead of DISTINCT use a GROUP BY query, then you can either make a base query that do not have duplicates (which you don't have to hide with a DISTINCT) or if you can't get rid of the duplicates, aggregate your column before displaying it by doing a MIN, a MAX or an AVG.
I'd be happy to help more but there's not enough information in your question to reproduce the problem on my computer.
There are other reasons why a SUM can return unexpected results: typically implicit cast (SQL server decides on an unexpected datatype and rounds the numbers), and in some situations a CASE clause which is executed either before or after a WHERE condition. But these don't seem to be the problem here.
Example
DECLARE #T TABLE (ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, NumVal INT)
DECLARE #i INT
SET #i = 1
WHILE #i < 1000
BEGIN
INSERT INTO #T (NumVal) VALUES (#i)
IF RIGHT (CAST (#i AS VARCHAR(12)),1) = 7
BEGIN INSERT INTO #T (NumVal) VALUES (#i) END
SET #i = #i +1
END
SELECT DISTINCT NumVal, SUM (NumVal) FROM #T GROUP BY NumVal
In the example above, I have inserted 999 distinct entries in a table, but duplicated any number which ends with 7. The select distinct give the impression that there are only 999 entries, while a sum adds the numbers ending with 7. Your situation is probably more complicated, but what I want to show here is that duplicates in the underlying becomes invisible with a DISTINCT and reappear with a SUM:
NumVal Sum
1 1
2 2
3 3
4 4
5 5
6 6
7 14
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 34
18 18
19 19
20 20
21 21
22 22
23 23
24 24
25 25
26 26

Related

find power of two key in sum( multiple power of twos)

With an ~ 18 years old application users file "cases" and each case creates a row in a "journal" table a data base (on SQL 2000). This cases can be tagged with "descriptors" where a somewhere hard coded limit of 50 is set. The descriptors/tags are stored in a lookup table and the key for the descriptors is a number from the power of two sequence (2^n).
This table looks like this:
key
descriptor
1
D 1
2
D 2
4
D 3
8
D 4
16
D 5
There are 50 rows, which means the biggest key is 562.949.953.421.312. Each case can have up to 8 descriptors, which are unfortunately stored in a single column in the case journal table. They keys are stored as a summary of all descriptors on that case.
A case with the descriptor D2 has 2 in the journal
A case with the descriptors D2 and D4 has 10
A case with the descriptors D1, D3 and D5 has 21
The Journal has 100 million records. Now the first time since years there is the requirement to analyze the journal by descriptors.
What would be a smart (mathematical) way to query the journal and get the results for one descriptor?
Edit:
in answer to the comment of #Squirrel:
jkey
jvalue
descriptors
1
V 1
0
2
V 2
24
3
V 3
3
4
V 4
12
5
V 5
6
You need to use bitwise operators.
Assuming the column is bigint then
where yourcolumn & 16 > 0
will find the ones matching D5 for example
If you are trying this query for literal values larger than fit into a signed 32 bit int make sure you cast them to BIGINT by the way as they will be interpreted as numeric datatype by default which cannot be used with bitwise operators.
WHERE yourcolumn & CAST(562949953421312 AS BIGINT) > 0
You may also similarly need to cast yourcolumn if it is in fact numeric rather than bigint

Autoincrement value

I'm working on a local project and I would like to setup a column in a SQL Server table that I called serial. This value should have the next format/validation [A-Z0-9][A-Z0-9][A-Z0-9][A-Z0-9]. Four characters that will have numbers or capital letters.
For example, the first record would be AAA0, and then start incrementing from right to left, but the interesting part is that once it reaches AAA9, I want to continue but with letters, so once 9 is reached, continue with A: AAAA. My last combination would be AAAZ, and then continue with the second position and so on, until I could complete ZZZZ.
Serial
------
AAA0
…
AAAZ
AAB0
…
AABZ
AAC0
…
AACZ
…
ZZZZ
I've been able to do something similar, but for example only incrementing numbers 0000 until 9999 or with just letters AAAA to ZZZZ (see below dummy data).
Please let me know if you have any comments or if you have seen this scenario before. I will appreciate any help.
I've reviewed some data on the internet and did this small test, but once again, this is changing letters.
CREATE TABLE #MyTable
(
MyHeadID INT IDENTITY(0,1) NOT NULL,
Consecutive AS
CHAR(MyHeadID/17576%26+65) + --26^3
CHAR(MyHeadID/676%26+65) + --26^2
CHAR(MyHeadID/26%26+65) + --26^1
CHAR(MyHeadID%26+65) --26^0
PERSISTED NOT NULL,
UniqueID VARCHAR(36) NOT NULL,
CreatedDate datetime DEFAULT GETDATE()
)
INSERT INTO #MyTable (UniqueID)
(SELECT NEWID())
SELECT Consecutive FROM #MyTable
Let's not talk about the UniqueID and CreatedDate columns, those are just for testing purposes. I created an identity column and then the code for the autoincrement on letters.
I also found this reply here but it's not my case since in that reply they are dividing 4 letters and 4 numbers with a substring it can be resolved. In my case, I need to intercalculate numbers and letters from right to left.
Here's a possible solution, counting in base 36 AAA0 starts at decimial 479879.
Ideally you'd have an identity incrementing number on your table starting at 479879. Simulating that with a numbers table generated using a CTE, the following gives base36 counting:
with numbers as (
select 479879 + Row_Number() over (order by a.object_id)n
from sys.all_objects a cross join sys.all_objects b
)
select top 10000
Concat(Char(((n/36/36/36) % 36) + case when (n/36/36/36) % 36 between 0 and 9 then 48 else 55 end),
Char(((n/36/36) % 36) + case when (n/36/36) % 36 between 0 and 9 then 48 else 55 end),
Char(((n/36) % 36) + case when (n/36) % 36 between 0 and 9 then 48 else 55 end),
Char((n % 36) + case when n % 36 between 0 and 9 then 48 else 55 end)) Base36
from numbers
order by n
See Fiddle
The previous answer is good but you can do this and cover all numbers with simple math. Also, let's assume your IDENTITY starts at one. There are 36 values for each digit, so those are base 36 encoded as pointed out. The encoding goes 0 - 9 then A - Z as you mentioned in your post. The get the individual digits of some number n from right to left, the algorithm goes:
n mod 36 is the right most digit.
n / 36 mod 36 gives the second-to-right digit.
n / (36 * 36) mod 36 gives the second-to-left digit.
n / (36 * 36 * 36) mod 36 gives left digit.
To test this logic, we can write a function:
CREATE FUNCTION CustomNumber(#id INT)
RETURNS CHAR(4)
AS BEGIN
RETURN CHAR((#id-1)/ POWER(36, 3)% 36+CASE WHEN (#id-1)/ POWER(36, 3)% 36 BETWEEN 0 AND 9 THEN 48 ELSE 55 END)
+CHAR((#id-1)/ POWER(36, 2)% 36+CASE WHEN (#id-1)/ POWER(36, 2)% 36 BETWEEN 0 AND 9 THEN 48 ELSE 55 END)
+CHAR((#id-1)/ 36% 36+CASE WHEN (#id-1)/ 36% 36 BETWEEN 0 AND 9 THEN 48 ELSE 55 END)
+CHAR((#id-1)% 36+CASE WHEN (#id-1)% 36 BETWEEN 0 AND 9 THEN 48 ELSE 55 END);
END;
We can then test this function by calling it to make sure it works. To call it, lets create some numbers. This is complete overkill, but let's pass it 1 through 1,000,000 so we can see it in action:
;WITH digits (I)
AS (
SELECT I
FROM (VALUES (0), (1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I) ),
integers (I)
AS (SELECT D1.I + (10 * D2.I) + (100 * D3.I) + (1000 * D4.I) + (10000*D5.I) + (100000*D6.I)
FROM digits AS D1
CROSS JOIN digits AS D2
CROSS JOIN digits AS D3
CROSS JOIN digits AS D4
CROSS JOIN digits AS D5 CROSS JOIN digits AS D6
)
SELECT I, dbo.CustomNumber(I)
FROM integers
WHERE I > 0
ORDER BY I;
If you run that and patiently wait (it takes about 20 seconds on my weak laptop, but you don't have to use a million numbers if you don't like) you will see that it does produce the result that you want. At this point we know the formula is correct, so you have options to add it to your table.
One option is to use the formula as a PERSISTED column as you did. Another option is to use a trigger. You can leave the formula as a function or you can just put the code in directly. If you need more characters than 4, you can easily add another by following the pattern (just change the POWER to which you raise).
As I mentioned, the previous answer is good, I just wanted to show another method and its derivation. I have had to implement custom sequences several times with varying formats and you can use this general technique.

SQL Server query problem. example is in excel sheet picture

Please see the following pic and i want to convert this formula in SQL Server.
in excel sheet
M N
15 1 0
16 3 1
17 5 2
18 8 4
19 9 4
N= IF(M16-M15<=1,N15,M16-M15-1+N15
Please see the screenshot for reference:
As per your tags, this can be done with LAG and then doing a running total.
For each row, first calculate the difference in M from the previous row (using LAG) - I call this Dif_Last_M. This mirrors the 'M24-M23' part of your formula.
If Dif_Last_M is <= 1, add 0 to the running total (effectively making the running total the same as for the previous row)
Else if Dif_Last_M is > 1, add (Dif_Last_M minus 1) to the running total
Here is the code assuming your source table is called #Temp and has an ID (sorting value)
WITH M_info AS
(SELECT ID, M, (M - LAG(M, 1) OVER (ORDER BY ID)) AS Dif_Last_M
FROM #Temp
)
SELECT ID,
M,
SUM(CASE WHEN Dif_Last_M > 1 THEN Dif_Last_M - 1 ELSE 0 END) OVER (ORDER BY ID) AS N
FROM M_info;
And here are the results
ID M N
1 1 0
2 3 1
3 5 2
4 8 4
5 9 4
6 12 6
7 13 6
Here is a db<>fiddle with the above. It also includes additional queries showing
The result from the CTE
The values used in the running total
Note that while it possible to do this with recursive CTEs, they tend to have performance problems (they are loops, fundamentally). Soit is better (performance-wise) to avoid recursive CTEs if possible.

How can I create a column that is a 0-9 hash of text in another column?

Our application has the following table definition:
CREATE TABLE [dbo].[Phrase] (
[PhraseId] UNIQUEIDENTIFIER DEFAULT (newid()) NOT NULL,
[English] NVARCHAR (250) NOT NULL,
[EnglishHash] AS (CONVERT([bigint],hashbytes('md5',[English])%(5)+(5))) PERSISTED,
PRIMARY KEY CLUSTERED ([PhraseId] ASC)
);
The intention was for the EnglishHash column to be a value of either 0,1,2,3,4,5,6,7,8, or 9
However it's only giving values: 1,2,3,4,5,6,7,8, or 9
Can anyone help to explain how I can modify this so it gives values 0-9 inclusive?
Note that I tried out the suggestion by Sandip. This gives me a distribution but over 11,000 records the distribution is not what I expected:
0 593
9 652
3 1324
6 1253
7 1293
1 1932
4 1325
5 1282
2 1295
8 635
Your results only allow you to have 9 distinct values because you are taking the modulus of 5. Look at the below results. if you were to continue taking the modulus, you'll see it cannot be any integer greater than 4 and no less than -4 (this is before adding the last +5). It's just going to start looping. Instead why don't you take modulus of 10 of the absolute value of the bigint value of the binary hash.
SELECT 0%5
, 1%5
, 2%5
, 3%5
, 4%5
, 5%5
, 6%5
SELECT 0%5
, -1%5
, -2%5
, -3%5
, -4%5
, -5%5
, -6%5
Try using this instead
ABS(CONVERT(bigint, HASHBYTES('md5',[English])))%10
Here's a nice example using the system error messages for some random text.
SELECT ABS(CONVERT(bigint, HASHBYTES('md5',[text])))%10 AS 'Result'
, COUNT(*) AS 'Distribution'
from sys.messages
GROUP BY ABS(CONVERT(bigint, HASHBYTES('md5',[text])))%10
ORDER BY ABS(CONVERT(bigint, HASHBYTES('md5',[text])))%10
Results:
Result Distribution
0 25326
1 25218
2 25115
3 25322
4 25167
5 25322
6 25278
7 25119
8 25139
9 25158
try below query, it gives me different hash value as your requirement:
--C=0
--B=1
--F=2
--t=3
--D=4
--S=5
--G=6
--A=7
--j=8
--P=9
DECLARE #myText VARCHAR='A'
SELECT ABS((HashBytes( 'md5', #myText ) %9)-1)

Best way to store list of numbers and to retrieve them

What is the best way to store a list of random numbers (like lotto/bingo numbers) and retrieve them? I'd like to store on a Database a number of rows, where each row contains 5-10 numbers ranging from 0 to 90. I will store a big number of those rows. What I'd like to be able is to retrieve the rows that have at least X number in common to a newly generated row.
Example:
[3,4,33,67,85,99]
[55,56,77,89,98,99]
[3,4,23,47,85,91]
Those are on the DB
I will generate this:
[1,2,11,45,47,88] and now I want to get the rows that have at least 1 number in common with this one.
The easiest (and dumbest?) way is to make 6 select and check for similar results.
I thought to store numbers with a large binary string like
000000000000000000000100000000010010110000000000000000000000000 with 99 numbers where each number represent a number from 1 to 99, so if I have 1 at the 44th position, it means that I have 44 on that row. This method is probably shifting the difficult tasks to the Db but it's again not very smart.
Any suggestion?
You should create a table like so:
TicketId Number
1 3
1 4
1 33
1 67
1 85
1 99
2 55
2 56
2 77
etc...
Then your query, at least for X = 1, becomes:
SELECT DISTINCT TicketId FROM Ticket WHERE Number IN (1, 2, 11, 45, 47, 88)
The advantage of this is that you can use an index instead of a full table scan.
For X greater than one, you could do the following:
SELECT TicketId, COUNT(*) AS cnt
FROM Ticket WHERE Number IN (1, 2, 11, 45, 47, 88)
GROUP BY TicketId
HAVING COUNT(*) >= 3
Again this will be able to use the index.

Resources