SQL to split a column values into rows in Netezza - netezza

I have data in the below way in a column. The data within the column is separated by two spaces.
4EG C6CC C6DE 6MM C6LL L3BC C3
I need to split it into as beloW. I tried using REGEXP_SUBSTR to do it but looks like it's not in the SQL toolkit. Any suggestions?
1. 4EG
2. C6CC
3. C6DE
4. 6MM
5. C6LL
6. L3BC
7. C3

This has ben answered here: http://nz2nz.blogspot.com/2016/09/netezza-transpose-delimited-string-into.html?m=1
Please note the comment at the button about the best performing way of use if array functions. I have measured the use of regexp_extract_all_sp() versus repeated regex matches and the benefit can be quite large

The examples from nz2nz.blogpost.com are hard to follow. I was able to piece together this method:
with
n_rows as (--update on your end
select row_number() over(partition by 1 order by some_field) as seq_num
from any_table_with_more_rows_than_delimited_values
)
, find_values as ( -- fake data
select 'A' as id, '10,20,30' as orig_values
union select 'B', '5,4,3,2,1'
)
select
id,
seq_num,
orig_values,
array_split(orig_values, ',') as array_list,
get_value_varchar(array_list, seq_num) as value
from
find_values
cross join n_rows
where
seq_num <= regexp_match_count(orig_values, ',') + 1 -- one row for each value in list
order by
id,
seq_num

Related

Is there a way you can produce an output like this in T-SQL

I have a column which I translate the values using a case statements and I get numbers like this below. There are multiple columns I need to produce the result like this and this is just one column.
How do you produce the output as a whole like this below.
The 12 is the total numbers counting from top to bottom
49 is the Average.
4.08 is the division 49/12.
1 is how many 1's are there in the output list above. As you can see there is only one 1 in the output above
8.33% is the division and percentage comes from 1/12 * 100
and so on. Is there a way to produce this output below?
drop table test111
create table test111
(
Q1 nvarchar(max)
);
INSERT INTO TEST111(Q1)
VALUES('Strongly Agree')
,('Agree')
,('Disagree')
,('Strongly Disagree')
,('Strongly Agree')
,('Agree')
,('Disagree')
,('Neutral');
SELECT
CASE WHEN [Q1] = 'Strongly Agree' THEN 5
WHEN [Q1] = 'Agree' THEN 4
WHEN [Q1] = 'Neutral' THEN 3
WHEN [Q1] = 'Disagree' THEN 2
WHEN [Q1] = 'Strongly Disagree' THEN 1
END AS 'Test Q1'
FROM test111
I have to make a few assumptions here, but it looks like you want to treat an output column like a column in a spreadsheet. You have 12 numbers. You then have a blank "separator" row. Then a row with the number 12 (which is the count of how many numbers you have). Then a row with the number 49, which is the sum of those 12 numbers. Then the 4.08 row, which is rougly the average, and so on.
Some of these outputs can be provided by cube or rollup, but neither is a complete solution.
If you wanted to produce this output directly from TSQL, you would need to have multiple select statements and combine the results of all of those statements using union all. First you would have a select just to get the numbers. Then you would have a second select which outputs a "blank". Then another select which is providing a count. Then another select which is providing a sum. And so on.
You would also no longer be able to output actual numbers, since a "blank" is not a number. Visually it's best represented as an empty string. But now your output column has to be of datatype char or varchar.
You also have to make sure rows come out in the correct order for presentation. So you need a column to order by. You would have to add some kind of ordering column "manually" to each of the select statements, so when you union them all together you can tell SQL in what order the output should be provided.
So the answer to "can it be done?" is technically "yes". But if you think seems like a whole lot of laborious and inefficient TSQL work, you'd be right.
The real solution here is to change your approach. SQL should not be concerned with "output formatting". What you should do is just return the actual data (your 12 numbers) from SQL, and then do all of the additional presentation (like adding a blank row, adding a count row, etc), in the code of the program that is calling SQL to get that data.
I must say, this is one of the strangest T-SQL requirements I've seen, and is really best left to the presentation layer.
It is possible using GROUPING SETS though. We can use it to get an extra rollup row that aggregates the whole table.
Once you have the rollup, you need to unpivot the totalled row (identified by GROUPING() = 1) to get your final result. We can do this using CROSS APPLY.
This is impossible without a row-identifier. I have added ROW_NUMBER, but any primary or unique key will do.
WITH YourTable AS (
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rn,
CASE WHEN [Q1] = 'Strongly Agree' THEN 5
WHEN [Q1] = 'Agree' THEN 4
WHEN [Q1] = 'Neutral' THEN 3
WHEN [Q1] = 'Disagree' THEN 2
WHEN [Q1] = 'Strongly Disagree' THEN 1
END AS TestQ1
FROM test111
),
RolledUp AS (
SELECT
rn,
TestQ1,
grouping = GROUPING(TestQ1),
count = COUNT(*),
sum = SUM(TestQ1),
avg = AVG(TestQ1 * 1.0),
one = COUNT(CASE WHEN TestQ1 = 1 THEN 1 END),
onePct = COUNT(CASE WHEN TestQ1 = 1 THEN 1 END) * 1.0 / COUNT(*)
FROM YourTable
GROUP BY GROUPING SETS(
(rn, TestQ1),
()
)
)
SELECT v.TestQ1
FROM RolledUp r
CROSS APPLY (
SELECT r.TestQ1, 0 AS ordering
WHERE r.grouping = 0
UNION ALL
SELECT v.value, v.ordering
FROM (VALUES
(NULL , 1),
(r.count , 2),
(r.sum , 3),
(r.avg , 4),
(r.one , 5),
(r.onePct, 6)
) v(value, ordering)
WHERE r.grouping = 1
) v
ORDER BY
v.ordering,
r.rn;
db<>fiddle

how to select data row from a comma separated value field

My question is not exactly but similar to this question
How to SELECT parts from a comma-separated field with a LIKE statement
but i have not seen any answer there. So I am posting my question again.
i have the following table
╔════════════╦═════════════╗
║ VacancyId ║ Media ║
╠════════════╬═════════════╣
║ 1 ║ 32,26,30 ║
║ 2 ║ 31, 25,20 ║
║ 3 ║ 21,32,23 ║
╚════════════╩═════════════╝
I want to select data who has media id=30 or media=21 or media= 40
So in this case the output will return the 1st and the third row.
How can I do that ?
I have tried media like '30' but that does not return any value. Plus i just dont need to search for one string in that field .
My database is SQL Server
Thank you
It's never good to use the comma separated values to store in database if it is feasible try to make separate tables to store them as most probably this is 1:n relationship.
If this is not feasible then there are following possible ways you can do this,
If your number of values to match are going to stay same, then you might want to do the series of Like statement along with OR/AND depending on your requirement.
Ex.-
WHERE
Media LIKE '%21%'
OR Media LIKE '%30%'
OR Media LIKE '%40%'
However above query will likely to catch all the values which contains 21 so even if columns with values like 1210,210 will also be returned. To overcome this you can do following trick which is hamper the performance as it uses functions in where clause and that goes against making Seargable queries.
But here it goes,
--Declare valueSearch variable first to value to match for you can do this for multiple values using multiple variables.
Declare #valueSearch = '21'
-- Then do the matching in where clause
WHERE
(',' + RTRIM(Media) + ',') LIKE '%,' + #valueSearch + ',%'
If the number of values to match are going to change then you might want to look into FullText Index and you should thinking about the same.
And if you decide to go with this after Fulltext Index you can do as below to get what you want,
Ex.-
WHERE
CONTAINS(Media, '"21" OR "30" OR "40"')
The best possible way i can suggest is first you have do comma separated value to table using This link and you will end up with table looks like below.
SELECT * FROM Table
WHERE Media in('30','28')
It will surely works.
You can use this, but the performance is inevitably poor. You should, as others have said, normalise this structure.
WHERE
',' + media + ',' LIKE '%,21,%'
OR ',' + media + ',' LIKE '%,30,%'
Etc, etc...
If you are certain that any Media value containing the string 30 will be one you wish to return, you just need to include wildcards in your LIKE statement:
SELECT *
FROM Table
WHERE Media LIKE '%30%'
Bear in mind though that this would also return a record with a Media value of 298,300,302 for example, so if this is problematic for you, you'll need to consider a more sophisticated method, like:
SELECT *
FROM Table
WHERE Media LIKE '%,30,%'
OR Media LIKE '30,%'
OR Media LIKE '%,30'
OR Media = '30'
If there might be spaces in the strings (as per in your question), you'll also want to strip these out:
SELECT *
FROM Table
WHERE REPLACE(Media,' ','') LIKE '%,30,%'
OR REPLACE(Media,' ','') LIKE '30,%'
OR REPLACE(Media,' ','') LIKE '%,30'
OR REPLACE(Media,' ','') = '30'
Edit: I actually prefer Coder of Code's solution to this:
SELECT *
FROM Table
WHERE ',' + LTRIM(RTRIM(REPLACE(Media,' ',''))) + ',' LIKE '%,30,%'
You mention that would wish to search for multiple strings in this field, which is also possible:
SELECT *
FROM Table
WHERE Media LIKE '%30%'
OR Media LIKE '%28%'
SELECT *
FROM Table
WHERE Media LIKE '%30%'
AND Media LIKE '%28%'
I agree not a good idea comma seperated values stored like that. Bu if you have to;
I think using inline function is will give better performance;
Select VacancyId, Media from (
Select 1 as VacancyId, '32,26,30' as Media
union all
Select 2, '31,25,20'
union all
Select 3, '21,32,23'
) asa
CROSS APPLY dbo.udf_StrToTable(Media, ',') tbl
where CAST(tbl.Result as int) in (30,21,40)
Group by VacancyId, Media
Output is;
VacancyId Media
----------- ---------
1 32,26,30
3 21,32,23
and our inline function script is;
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[udf_StrToTable]') and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].udf_StrToTable
GO
CREATE FUNCTION udf_StrToTable (#List NVARCHAR(MAX), #Delimiter NVARCHAR(1))
RETURNS TABLE
With Encryption
AS
RETURN
( WITH Split(stpos,endpos)
AS(
SELECT 0 AS stpos, CHARINDEX(#Delimiter,#List) AS endpos
UNION ALL
SELECT CAST(endpos+1 as int), CHARINDEX(#Delimiter,#List,endpos+1)
FROM Split
WHERE endpos > 0
)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) as inx,
SUBSTRING(#List,stpos,COALESCE(NULLIF(endpos,0),LEN(#List)+1)-stpos) Result
FROM Split
)
GO
This solution uses a RECURSIVE CTE to identify the position of each comma within the string then uses SUBSTRING to return all strings between the commas.
I've left some unnecessary code in place to help you get you head round what it's doing. You can strip it down to provide exactly what you need.
DROP TABLE #TMP
CREATE TABLE #TMP(ID INT, Vals CHAR(100))
INSERT INTO #TMP(ID,VALS)
VALUES
(1,'32,26,30')
,(2,'31, 25,20')
,(3,'21,32,23')
;WITH cte
AS
(
SELECT
ID
,VALS
,0 POS
,CHARINDEX(',',VALS,0) REM
FROM
#TMP
UNION ALL
SELECT ID,VALS,REM,CHARINDEX(',',VALS,REM+1)
FROM
cte c
WHERE CHARINDEX(',',VALS,REM+1) > 0
UNION ALL
SELECT ID,VALS,REM,LEN(VALS)
FROM
cte c
WHERE POS+1 < LEN(VALS) AND CHARINDEX(',',VALS,REM+1) = 0
)
,cte_Clean
AS
(
SELECT ID,CAST(REPLACE(LTRIM(RTRIM(SUBSTRING(VALS,POS+1,REM-POS))),',','') AS INT) AS VAL FROM cte
WHERE POS <> REM
)
SELECT
ID
FROM
cte_Clean
WHERE
VAL = 32
ORDER BY ID

How to select Second Last Row in mySql?

I want to retrieve the 2nd last row result and I have seen this question:
How can I retrieve second last row?
but it uses order by which in my case does not work because the Emp_Number Column contains number of rows and date time stamp that mixes data if I use order by .
The rows 22 and 23 contain the total number of rows (excluding row 21 and 22) and the time and day it got entered respectively.
I used this query which returns the required result 21 but if this number increases it will cause an error.
SELECT TOP 1 *
FROM(
SELECT TOP 2 *
FROM DAT_History
ORDER BY Emp_Number ASC
) t
ORDER BY Emp_Number desc
Is there any way to get the 2nd last row value without using the Order By function?
There is no guarantee that the count will be returned in the one-but-last row, as there is no definite order defined. Even if those records were written in the correct order, the engine is free to return the records in any order, unless you specify an order by clause. But apparently you don't have a column to put in that clause to reproduce the intended order.
I propose these solutions:
1. Return the minimum of those values that represent positive integers
select min(Emp_Number * 1)
from DAT_history
where Emp_Number not regexp '[^0-9]'
See SQL Fiddle
This will obviously fail when the count is larger then the smallest employee number. But seeing the sample data, that would represent a number of records that is maybe not expected...
2. Count the records, ignoring the 2 aggregated records
select count(*)-2
from DAT_history
See SQL Fiddle
3. Relying on correct order without order by
As explained at the start, you cannot rely on the order, but if for some reason you still want to rely on this, you can use a variable to number the rows in a sub query, and then pick out the one that has been attributed the one-but-last number:
select Emp_Number * 1
from (select Emp_Number,
#rn := #rn + 1 rn
from DAT_history,
(select #rn := 0) init
) numbered
where rn = #rn - 1
See SQL Fiddle
The * 1 is added to convert the text to a number data type.
This is not a perfect solution. I am making some assumptions for this. Check if this could work for you.
;WITH cte
AS (SELECT emp_number,
Row_number()
OVER (
ORDER BY emp_number ASC) AS rn
FROM dat_history
WHERE Isdate(emp_number) = 0) --Omit date entries
SELECT emp_number
FROM cte
WHERE rn = 1 -- select the minimum entry, assuming it would be the count and assuming count might not exceed the emp number range of 9888000

Recursive Decaying Average in Sql Server 2012

I need to calculate a decaying average (cumulative moving?) of a set of values. The last value in the series is 50% weight, with the decayed average of all the prior series as the other 50% weight, recursively.
I came up with a CTE query that produces correct results, but it depends on a sequential row number. I'm wondering if there is a better way to do this in SQL 2012, maybe with the new windowing functions for Over(), or something like that?
In the live data, the rows are ordered by time. I can use an SQL view and ROW_NUMBER() to generate the necessary Row field for my CTE approach, but if there is a more efficient way to do this, I would like to keep this as efficient as possible.
I have a sample table with 2 columns: Row int, and Value Float. I have 6 sample data values of 1,2,3,4,4,4. The correct result should be 3.78125.
My solution is:
;WITH items AS (
SELECT TOP 1
Row, Value, Value AS Decayed
FROM Sample Order By Row
UNION ALL
SELECT v.Row, v.Value, Decayed * .5 + v.Value *.5 AS Decayed
FROM Sample v
INNER JOIN items itms ON itms.Row = v.Row-1
)
SELECT top 1 Decayed FROM items order by Row desc
This correctly produces 3.78125 with the test data. My question is: Is there a more efficient and/or simpler way to do this in SQL 2012, or is this about the only way to do it? Thanks.
One possible alternative would be
WITH T AS
(
SELECT
Value * POWER(5E-1, ROW_NUMBER()
OVER (ORDER BY Row DESC)
/* first row decays less so special cased */
-IIF(LEAD(Value) OVER (ORDER BY Row DESC) IS NULL,1,0))
as x
FROM Sample
)
SELECT SUM(x)
FROM T
SQL Fiddle
Or for the updated question using 60%/40%
WITH T AS
(
SELECT IIF(LEAD(Value) OVER (ORDER BY Row DESC) IS NULL, 1,0.6)
* Value
* POWER(4E-1, ROW_NUMBER() OVER (ORDER BY Row DESC) -1)
as x
FROM Sample
)
SELECT SUM(x)
FROM T
SQL Fiddle
both of the above perform a single pass through the data and can potentially use an index on Row INCLUDE(Value) to avoid a sort.

SQL SELECT Query

I have a very simple table that has businesses and a column of DisplayBiz = varchar(1) that is either Y or N... I want a script to extract data from the database first all the "Y" and then then all the "N" for a total of ten and I want them ordered by business name..
Is there a way to do this? I am assuming it would be something like this:
SELECT TOP 10 MemberID,
BizName
ORDER BY BizType
but this doesn't take into consideration the DisplayBiz column
Any ideas?
Many thanks..!
You can add more than one column in the ORDER BY clause :
-- ...
ORDER BY DisplayBiz DESC, BizType
Which would put Y rows first, then N rows.
This will get the first 10 alphabetical BizNames that have a 'Y' for DisplayBiz. If there are less than 10, it will start over at A for those with 'N'...
SELECT TOP 10 MemberID, BizName, DisplayBiz
FROM dbo.table
ORDER BY
CASE WHEN DisplayBiz = 'Y' THEN 1 ELSE 2 END,
BizName;
You could also use:
ORDER BY
DisplayBiz DESC,
BizName;
But I prefer the CASE - while more code, you're not taking advantage of the English spelling of Y/N. Seems more proper to be explicit.

Resources