ORDER BY not putting SELECT statement in numerical order - sql-server

I am working on a SELECT statement.
USE SCRUMAPI2
DECLARE #userParam VARCHAR(100)
,#statusParam VARCHAR(100)
SET #userParam = '%'
SET #statusParam = '%'
SELECT ROW_NUMBER() OVER (
ORDER BY PDT.[Name] DESC
) AS 'RowNumber'
,PDT.[Name] AS Project
,(
CASE WHEN (
STY.KanBanProductId IS NOT NULL
AND STY.SprintId IS NULL
) THEN 'KanBan' WHEN (
STY.KanBanProductId IS NULL
AND STY.SprintId IS NOT NULL
) THEN 'Sprint' END
) AS ProjectType
,STY.[Number] StoryNumber
,STY.Title AS StoryTitle
,TSK.[Name] AS Task
,CONVERT(VARCHAR(20), STY.Effort) AS Effort
,CONVERT(VARCHAR(20), TSK.OriginalEstimateHours) AS OriginalEstimateHours
,TSK.STATUS AS STATUS
FROM Task TSK
LEFT JOIN Story STY ON TSK.StoryId = STY.PK_Story
LEFT JOIN Sprint SPT ON STY.SprintId = SPT.PK_Sprint
LEFT JOIN Product PDT ON STY.ProductId = PDT.PK_Product
WHERE TSK.PointPerson LIKE #userParam
AND TSK.STATUS LIKE #statusParam
GROUP BY STY.[Number]
,TSK.STATUS
,STY.Title
,PDT.[Name]
,TSK.CreateDate
,TSK.[Name]
,STY.KanBanProductId
,STY.SprintId
,TSK.OriginalEstimateHours
,STY.Effort
My issue that that although I have the ORDER BY sorting by story number first it is not returning as expected (below is column STY.[Number]):
As you can see it foes from 33 to 4 to 42, I want it in numerical order so that 4 would be between 3 and 5 not 33 and 42. How do I achieve this?

Given the structure of your data (with a constant prefix), probably the easiest way to get what you want is:
order by len(STY.[Number]), STY.[Number]
This orders first by the length and then by the number itself.

Those are strings. Do you really expect SQL Server to be able to identify that there is a number at character 6 in every single row in the result, and instead of ordering by character 6, they pretend that, say, SUPP-5 is actually SUPP-05? If that worked for you, people who expect the opposite behavior (to treat the whole string as a string) would be complaining. The real fix is to store this information in two separate columns, since it is clearly two separate pieces of data.
In the meantime, you can hack something, like:
ORDER BY LEFT(col, 4), CONVERT(INT, SUBSTRING(col, 6, 255)));
As Martin explained, this should be on the outer query, not just used to generate a ROW_NUMBER() - generating a row number alone doesn't guarantee the results will be ordered by that value. And this will only work with additional checks to ensure that every single row has a value following the dash that can be converted to an int. As soon as you have SUPP-5X this will break.

It's sorting by the string in lexicography order. To get numerical ordering you need to extract the number from the string (with substring()) and cast it to integer.

Related

Is there a way you can produce an output like this in T-SQL

I have a column which I translate the values using a case statements and I get numbers like this below. There are multiple columns I need to produce the result like this and this is just one column.
How do you produce the output as a whole like this below.
The 12 is the total numbers counting from top to bottom
49 is the Average.
4.08 is the division 49/12.
1 is how many 1's are there in the output list above. As you can see there is only one 1 in the output above
8.33% is the division and percentage comes from 1/12 * 100
and so on. Is there a way to produce this output below?
drop table test111
create table test111
(
Q1 nvarchar(max)
);
INSERT INTO TEST111(Q1)
VALUES('Strongly Agree')
,('Agree')
,('Disagree')
,('Strongly Disagree')
,('Strongly Agree')
,('Agree')
,('Disagree')
,('Neutral');
SELECT
CASE WHEN [Q1] = 'Strongly Agree' THEN 5
WHEN [Q1] = 'Agree' THEN 4
WHEN [Q1] = 'Neutral' THEN 3
WHEN [Q1] = 'Disagree' THEN 2
WHEN [Q1] = 'Strongly Disagree' THEN 1
END AS 'Test Q1'
FROM test111
I have to make a few assumptions here, but it looks like you want to treat an output column like a column in a spreadsheet. You have 12 numbers. You then have a blank "separator" row. Then a row with the number 12 (which is the count of how many numbers you have). Then a row with the number 49, which is the sum of those 12 numbers. Then the 4.08 row, which is rougly the average, and so on.
Some of these outputs can be provided by cube or rollup, but neither is a complete solution.
If you wanted to produce this output directly from TSQL, you would need to have multiple select statements and combine the results of all of those statements using union all. First you would have a select just to get the numbers. Then you would have a second select which outputs a "blank". Then another select which is providing a count. Then another select which is providing a sum. And so on.
You would also no longer be able to output actual numbers, since a "blank" is not a number. Visually it's best represented as an empty string. But now your output column has to be of datatype char or varchar.
You also have to make sure rows come out in the correct order for presentation. So you need a column to order by. You would have to add some kind of ordering column "manually" to each of the select statements, so when you union them all together you can tell SQL in what order the output should be provided.
So the answer to "can it be done?" is technically "yes". But if you think seems like a whole lot of laborious and inefficient TSQL work, you'd be right.
The real solution here is to change your approach. SQL should not be concerned with "output formatting". What you should do is just return the actual data (your 12 numbers) from SQL, and then do all of the additional presentation (like adding a blank row, adding a count row, etc), in the code of the program that is calling SQL to get that data.
I must say, this is one of the strangest T-SQL requirements I've seen, and is really best left to the presentation layer.
It is possible using GROUPING SETS though. We can use it to get an extra rollup row that aggregates the whole table.
Once you have the rollup, you need to unpivot the totalled row (identified by GROUPING() = 1) to get your final result. We can do this using CROSS APPLY.
This is impossible without a row-identifier. I have added ROW_NUMBER, but any primary or unique key will do.
WITH YourTable AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rn,
CASE WHEN [Q1] = 'Strongly Agree' THEN 5
WHEN [Q1] = 'Agree' THEN 4
WHEN [Q1] = 'Neutral' THEN 3
WHEN [Q1] = 'Disagree' THEN 2
WHEN [Q1] = 'Strongly Disagree' THEN 1
END AS TestQ1
FROM test111
),
RolledUp AS (
SELECT
rn,
TestQ1,
grouping = GROUPING(TestQ1),
count = COUNT(*),
sum = SUM(TestQ1),
avg = AVG(TestQ1 * 1.0),
one = COUNT(CASE WHEN TestQ1 = 1 THEN 1 END),
onePct = COUNT(CASE WHEN TestQ1 = 1 THEN 1 END) * 1.0 / COUNT(*)
FROM YourTable
GROUP BY GROUPING SETS(
(rn, TestQ1),
()
)
)
SELECT v.TestQ1
FROM RolledUp r
CROSS APPLY (
SELECT r.TestQ1, 0 AS ordering
WHERE r.grouping = 0
UNION ALL
SELECT v.value, v.ordering
FROM (VALUES
(NULL , 1),
(r.count , 2),
(r.sum , 3),
(r.avg , 4),
(r.one , 5),
(r.onePct, 6)
) v(value, ordering)
WHERE r.grouping = 1
) v
ORDER BY
v.ordering,
r.rn;
db<>fiddle

How can I apply a logical OR to a BINARY column so the result is all the correct values?

I have a set of results that are stored in a binary(12) column. I'm looking for all the flags that have been set at various times for a particular condition. It goes something like
Status
Flags
1234
0x000000000000000000002000
5678
0x000000000000000000000000
1234
0x000000000000000000000040
What I would like to do is write a query such as
SELECT Status, OR(Flags)
FROM StatusTable
GROUP BY Status
giving the result
Status
OR(Flags)
1234
0x000000000000000000002040
5678
0x000000000000000000000000
I can find examples that let me manually OR two values but nothing that applies an OR to a result column. I've greatly simplified the example but we're talking thousands of values with thousands of statuses (mostly 0x000000000000000000000000) making it impractical to manually OR them. I suppose a function could be used and a cursor to loop each one but surely there's an out of the box solution to this?
Writing this in TSQL would be hard, if you can not fix the underlaying design, you can write a .Net CLR User-Defined Aggregate.
In C# you have binary OR operator: Bitwise and shift operators.
You can follow this guide to write a CLR function:CLR User-Defined Aggregates
T-SQL is pretty bad at bit-twiddling. But it does have & and |. You need to break out each flag, aggregate it over the table then pack it back in
This query is not made any easier by the lack of binary STRING_AGG.
SELECT
[Status],
CAST(SUM(CASE WHEN byte = 1 THEN val END) AS binary(4)) +
CAST(SUM(CASE WHEN byte = 5 THEN val END) AS binary(4)) +
CAST(SUM(CASE WHEN byte = 9 THEN val END) AS binary(4)) AS OrFlags
FROM (
SELECT [Status], v1.bytePos,
MAX(CASE WHEN CAST(SUBSTRING(Flags, v1.bytePos, 4) AS int) & v2.bitt <> 0 THEN v2.bitt ELSE 0 END) AS val
FROM #StatusTable
CROSS JOIN (VALUES(1),(5),(9)) v1(bytePos)
CROSS JOIN (VALUES
(1),(2),(4),(8),(16),(32),(64),(128),(256),(512),(1024),(2048),(4096),(8192),(16384),(32768),(65536),(131072),(262144),(524288),(1048576),(2097152),(4194304),(8388608),(16777216),(33554432),(67108864),(134217728),(268435456),(536870912),(1073741824),(-2147483648)
) v2(bitt)
GROUP BY [Status], v1.bytePos, v2.bitt
) t
GROUP BY Status
Steps are as follows:
Take the base table, cross join 1,5,9 starting byte numbers for each 32-bit Integer.
Cross-join again all bit-flags
Group up by Status and position/flag
Select out the Status, Integer position, and whether any value has this flag set
Finally conditionally sum up the flags and convert results back into binary
If you want to effect an AND aggregation change MAX to MIN.

Is there a good expression for the maximum character value in an SQL collation?

I'm building a simple SQL report that assembles possible product titles from several tables and displays unnamed products last. We have a setup that allows individual locations to override the product title from a master product table, and I thought I could do something like
SELECT a.ProdCode, COALESCE(a.ProdNameOverride, m.ProdName, '') AS ProdName
FROM ProdInventory a
INNER JOIN MasterProdTable m ON a.ProdCode = m.ProdCode
WHERE a.ProdLocation = #ReportProdLocation
ORDER BY COALESCE(a.ProdNameOverride, m.ProdName, char(255)) ASC, a.ProdCode ASC
because, hey, char(255) has to sort after all of the other possible ASCII characters, right?
Well, no. It's a diacritical Y, which in the standard (SQL_Latin1_General_CP1_CI_AS) collation gets sorted before Z.
I eventually just resorted to brute force, finding that char(254) sorted after the conventional alphanumerics, but that got me curious - is there a reliable way to assign something "the last possible value in the relevant collation"?
If you want to find the last character in a varchar collation, you can just create a table with all possible characters and sort it. eg:
declare #chars table
(
CodePoint binary(1) primary key,
Character char(1) collate Arabic_CI_AI_KS_WS
)
declare #codePoint binary(1) = 0x0
while (#codePoint < 255)
begin
insert into #chars(CodePoint,Character)
values (#codePoint, cast(#codePoint as char(1)));
set #codePoint += 1;
end
Select *
from #chars
order by Character
Even if it were reliable, sort by what you actually want last.
This report is assuming nobody could ever enter a product override that began with the last possible character. You can't be sure of that, even if char(254) doesn't show up on conventional keyboards.
If you want products to appear last when they have a NULL override and main product name, construct a sort using CASE WHEN around that condition, as:
SELECT a.ProdCode, COALESCE(a.ProdNameOverride, m.ProdName, '') AS ProdName
FROM ProdInventory a
INNER JOIN MasterProdTable m ON a.ProdCode = m.ProdCode
WHERE a.ProdLocation = #ReportProdLocation
ORDER BY CASE WHEN a.ProdNameOverride IS NULL AND m.ProdName IS NULL THEN 1 ELSE 0 END ASC,
COALESCE(a.ProdNameOverride, m.ProdName) ASC, a.ProdCode ASC

SQL to split a column values into rows in Netezza

I have data in the below way in a column. The data within the column is separated by two spaces.
4EG C6CC C6DE 6MM C6LL L3BC C3
I need to split it into as beloW. I tried using REGEXP_SUBSTR to do it but looks like it's not in the SQL toolkit. Any suggestions?
1. 4EG
2. C6CC
3. C6DE
4. 6MM
5. C6LL
6. L3BC
7. C3
This has ben answered here: http://nz2nz.blogspot.com/2016/09/netezza-transpose-delimited-string-into.html?m=1
Please note the comment at the button about the best performing way of use if array functions. I have measured the use of regexp_extract_all_sp() versus repeated regex matches and the benefit can be quite large
The examples from nz2nz.blogpost.com are hard to follow. I was able to piece together this method:
with
n_rows as (--update on your end
select row_number() over(partition by 1 order by some_field) as seq_num
from any_table_with_more_rows_than_delimited_values
)
, find_values as ( -- fake data
select 'A' as id, '10,20,30' as orig_values
union select 'B', '5,4,3,2,1'
)
select
id,
seq_num,
orig_values,
array_split(orig_values, ',') as array_list,
get_value_varchar(array_list, seq_num) as value
from
find_values
cross join n_rows
where
seq_num <= regexp_match_count(orig_values, ',') + 1 -- one row for each value in list
order by
id,
seq_num

Oracle: Select values in date range with days where value is missing

I want to select values from table in range.
Something like this:
SELECT
date_values.date_from,
date_values.date_to,
sum(values.value)
FROM values
inner join date_values on values.id_date = date_values.id
inner join date_units on date_values.id_unit = date_units.id
WHERE
date_values.date_from >= '14.1.2012' AND
date_values.date_to <= '30.1.2012' AND
date_units.id = 4
GROUP BY
date_values.date_from,
date_values.date_to
ORDER BY
date_values.date_from,
date_values.date_to;
But this query give me back only range of days, where is any value. Like this:
14.01.12 15.01.12 66
15.01.12 16.01.12 4
17.01.12 18.01.12 8
...etc
(Here missing 16.01.12 to 17.01.12)
But I want to select missing value too, like this:
14.01.12 15.01.12 66
15.01.12 16.01.12 4
16.01.12 17.01.12 0
17.01.12 18.01.12 8
...etc
I can't use PL/SQL and if can you advise more general solution which can I expand for use on Hours, Months, Years; will be great.
I'm going to assume you're providing date_from and date_to. If so, you can generate your list of dates first and then join to it to get the remainder of your result. Alternatively, you can union this query to your date_values table as union does a distinct this will remove any extra data.
If this is how the list of dates is generated:
select to_date('14.1.2012','dd.mm.yyyy') + level - 1 as date_from
, to_date('14.1.2012','dd.mm.yyyy') + level as date_to
from dual
connect by level <= to_date('30.1.2012','dd.mm.yyyy')
- to_date('14.1.2012','dd.mm.yyyy')
Your query might become
with the_dates as (
select to_date('14.1.2012','dd.mm.yyyy') + level - 1 as date_from
, to_date('14.1.2012','dd.mm.yyyy') + level as date_to
from dual
connect by level <= to_date('30.1.2012','dd.mm.yyyy')
- to_date('14.1.2012','dd.mm.yyyy')
)
SELECT
dv.date_from,
dv.date_to,
sum(values.value)
FROM values
inner join ( select the_dates.date_from, the_dates.date_to, date_values.id
from the_dates
left outer join date_values
on the_dates.date_from = date_values.date_from ) dv
on values.id_date = dv.id
inner join date_units on date_values.id_unit = date_units.id
WHERE
date_units.id = 4
GROUP BY
dv.date_from,
dv.date_to
ORDER BY
dv.date_from,
dv.date_to;
The with syntax is known as sub-query factoring and isn't really needed in this case but it makes the code cleaner.
I've also assumed that the date columns in date_values are, well, dates. It isn't obvious as you're doing a string comparison. You should always explicitly convert to a date where applicable and you should always store a date as a date. It saves a lot of hassle in the long run as it's impossible for things to be input incorrectly or to be incorrectly compared.

Resources