How to find max value in a row with changing - loops

I have an dataset which has 12 different values for an ID, and also start and end Values. What I want to initilize is take the start value as my begining argument of loop and end value as the last argument. Search trough values accourding to them, and finding the maximum of them. After finding maximum search through values again wrt start and end value and find the longest consecutive max value occurance.
Below I posted an example dataset:
create table #sample_data(
ID VARCHAR(10), val1 INT, val2 INT, val3 INT, val4 INT, val5 INT,
val6 INT, val7 INT, val8 INT, val9 INT, val10 INT, val11 INT, val12 INT,
startValue INT, endValue INT );
insert into #sample_data values
(1001,3,2,1,0,1,2,3,0,0,0,0,0,1,7),
(1002,1,2,3,4,0,0,0,1,2,3,0,0,1,12),
(1003,0,3,2,1,0,0,0,3,3,0,0,0,1,12),
(1004,0,1,2,4,4,0,0,0,0,0,0,0,3,9),
(1005,1,2,2,1,0,0,2,2,2,1,0,0,1,8);
The result I expect for Id=1001 start=1, end = 7, max value is 3 and, it occurs 2 times but they arent consecutive, therefore final output I'd like to get is 1.
For ID=1002 start=1, end=12, max is 4 and it only occurs 1 time, so final output shoud be 1.
For ID = 1003 start=1, end=12, max is 3, 3 occurs three times but only 2 of them are consecutive therefore I expect to get 2.
For ID = 1004 start=3, end=9, max is 4 it occurs two times consecutively therefore output should be 2.
For ID = 1005 start=1, end=8, max is 2 it totaly occurs 5 times, 2 and 3 times consecutively, I expect to get 3 as my final output since it is longest.

If I understand the question correctly, the result for the row with Id 1005 should be 2 and not 3, because the max value (which is 2) appears consecutively in places 2,3 and then again in places 7,8,9 - but the endValue of that row is 8, and therefor the larger consecutive should not be counted.
Based on that understanding (which might be incorrect, hence the comment I've written to the question), this can be done with a set based approach (meaning, without any loops), with the help of some nice SQL tricks.
So the first thing you want to do is to use cross apply with a table value constructor to convert the val1...val12 columns to rows. I guess this can also be done using Pivot but I never quite got the hang of pivot so I prefer other solutions to get the same thing.
In my code, this step is done in the first common table expression (called CTEValues).
Next, you use a trick from Itzik Ben-Gan to handle gaps and island problems to identify the groups of consecutive values within each row. This step is done in the second cte (CTEGroups).
The third and final cte called CTEConsecutive use a simple group by and count to get the number of consecutive max values within each row of the original table, providing their column location is between the startValue and EndValue.
The last thing to do is get the max value of that count for each id - and that should give you the desired results.
Here's the full code:
WITH CTEValues AS
(
SELECT ID, startValue, EndValue, Val, ValId, IIF(Val = MAX(Val) OVER(PARTITION BY ID), 1, 0) As IsMax
FROM #sample_data
CROSS APPLY
(
SELECT *
FROM (VALUES
(Val1, 1),
(Val2, 2),
(Val3, 3),
(Val4, 4),
(Val5, 5),
(Val6, 6),
(Val7, 7),
(Val8, 8),
(Val9, 9),
(Val10, 10),
(Val11, 11),
(Val12, 12)
)V(Val, ValId)
) vals
), CTEGroups AS
(
SELECT ID, startValue, EndValue, Val, ValId, IsMax,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ValId) -
ROW_NUMBER() OVER(PARTITION BY ID, IsMax ORDER BY ValId) As Grp
FROM CTEValues
), CTEConsecutive AS
(
SELECT ID, COUNT(Val) As NumOfConsecutiveMaxValues --*, OVER(PARTITION BY Id, Grp) As NumOfValues
FROM CTEGroups
WHERE IsMax = 1
AND ValId >= startValue
AND ValId <= EndValue
GROUP BY ID, Grp
)
SELECT ID, MAX(NumOfConsecutiveMaxValues)
FROM CTEConsecutive
GROUP BY ID
ORDER BY Id
You can see a live demo on rextester.
If, however, I'm wrong in my initial assumption and the startvalue and endvalue are only relevant to the range of which to search for the max value, (and that would give you the expected results you've posted in the question), you will need another cte.
WITH CTEValues AS
(
SELECT ID, startValue, EndValue, Val, ValId
FROM #sample_data
CROSS APPLY
(
SELECT *
FROM (VALUES
(Val1, 1),
(Val2, 2),
(Val3, 3),
(Val4, 4),
(Val5, 5),
(Val6, 6),
(Val7, 7),
(Val8, 8),
(Val9, 9),
(Val10, 10),
(Val11, 11),
(Val12, 12)
)V(Val, ValId)
) vals
), CTEValuesWithMax AS
(
SELECT ID, startValue, EndValue, Val, ValId,
IIF(Val = (
SELECT MAX(Val)
FROM CTEValues AS T1
WHERE T0.ID = T1.ID
AND T1.ValId >= T1.startValue
AND T1.ValId <= T1.EndValue
), 1, 0) As IsMax
FROM CTEValues AS T0
)
The rest of the code remains the same, except that CTEGroups now selects from CTEValuesWithMax instead of from CTEValues.
You can see a live demo of this as well.

Related

Count 0's between 1's - SQL

I need a query or function to count the 0's between 1's in a string.
For example:
String1 = '10101101' -> Result=3
String2 = '11111001101' -> Result=1
String3 = '01111111111' -> Result=1
I only need to search for 101 pattern or 01 pattern if its at the beginning of the string.
You may try to decompose the input strings using SUBTRING() and a number table:
SELECT
String, COUNT(*) AS [101Count]
FROM (
SELECT
v.String,
SUBSTRING(v.String, t.No - 1, 1) AS PreviousChar,
SUBSTRING(v.String, t.No, 1) AS CurrentChar,
SUBSTRING(v.String, t.No + 1, 1) AS NextChar
FROM (VALUES
('10101101'),
('11111001101'),
('01111111111')
) v (String)
CROSS APPLY (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) t (No)
) cte
WHERE
CASE WHEN PreviousChar = '' THEN '1' ELSE PreviousChar END = '1' AND
CurrentChar = '0' AND
NextChar = '1'
GROUP BY String
Result:
String 101Count
10101101 3
11111001101 1
01111111111 1
Notes:
The table with alias v is the source table, the table with alias t is the number table. If the input strings have more than 10 characters, use an appropriate number (tally) table.
-- This converts "111101101010111" in "01101010" and "011101000" in "01110"
regexp_replace(field, '^1*(.*)1*0*$', '\1')
-- This converts "01101010" in "0000"
regexp_replace(field, '1', '')
-- This counts the string length, returning 4 for '0000':
LENGTH(field)
-- Put all together:
LENGTH(
regexp_replace(
regexp_replace(field, '^1*(.*)1*0*$', '\1')
, '1', '')
)
Different or more complicated cases require a modification of the regular expression.
Update
For "zeros between 1s" I see now you mean "101" sequences. This is more complicated because of the possibility of having "10101". Suppose you want to count this as 2:
replace 101 with 11011. Now 10101 will become either 1101101 or 1101111011. In either case, you have the "101" sequence well apart and still only have two of them.
replace all 101s with 'X'. You now have 1X11X1
replace [01] with the empty string. You now have XX.
use LENGTH to count the X's.
Any extra special sequence like "01" at the beginning you can convert as first thing with "X1" ("10" at the end would become "1X"), which will then neatly fold back in the above workflow.
By using the LIKE operator with % you can decide how to search a specific string. In this SQL query I am saying that I want every record that starts as 101 or 01.
SELECT ColumnsYouWant FROM TableYouWant
WHERE ColumnYouWant LIKE '101%' OR '01%';
You can simple COUNT the ColumnYouWant, like this:
SELECT COUNT(ColumnYouWant) FROM TableYouWant
WHERE ColumnYouWant LIKE '101%' OR '01%';
Or you can use a method of your backend language to count the results that the first query returns. This count method will depend on the language you are working with.
SQL Documentation for LIKE: https://www.w3schools.com/sql/sql_like.asp
SQL Documentation for COUNT; https://www.w3schools.com/sql/sql_count_avg_sum.asp
The other solutions do not account for all of the characters (max of 11, of the examples shown)
Data
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
(1, '10101101'),
(2, '11111001101'),
(3, '01111111111')) V(id, string);
Query
;with
split_cte as (
select id, n, substring(t.string, v.n, 1) subchar
from #tTEST t
cross apply (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),
(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) v(n)
where v.n<=len(t.string)),
lead_lag_cte as (
select id, n, lead(subchar, 1, 9) over (partition by id order by n) lead_c, subchar,
lag(subchar, 1, 9) over (partition by id order by n) lag_c
from split_cte)
select id, sum(case when (lead_c=1 and lag_c=9) then 1 else
case when (lead_c=1 and lag_c=1) then 1 else 0 end end) zero_count
from lead_lag_cte
where subchar=0
group by id;
Results
id zero_count
1 3
2 1
3 1
Another way, perhasp quicker:
DECLARE #T TABLE (ID INT, STRING VARCHAR(32));
INSERT INTO #T
VALUES (1, '10101101'),
(2, '11111001101'),
(3, '01111111111');
SELECT *, LEN(STRING) - LEN(REPLACE(STRING, '0', '')) AS NUMBER_OF_ZERO
FROM #T
Result:
ID STRING NUMBER_OF_ZERO
----------- -------------------------------- --------------
1 10101101 3
2 11111001101 3
3 01111111111 1
select (len(replace('1'+x, '101', '11011')) - len(replace(replace('1'+x, '101', '11011'), '101', '')))/3
from
(
values
('10101101'),
('11111001101'),
('01111111111'),
('01010101010101010101')
) v(x);

Split one row into multiple in oracle

I have a table which contains data something like this:
CREATE TABLE UDA_DATA
( uda VARCHAR2(20),
value_text VARCHAR2(4000)
);
insert into UDA_DATA values('Material_ID','PBL000129 PBL000132 PBL000130 PBL000131 PBL000133');
insert into UDA_DATA values('Material_ID','PBL000134 PBL000138 PBL000135 PBL000136 PBL000137');
insert into UDA_DATA values('Material_ID','PBL000125 PBL000128 PBL000126 PBL000124 PBL000127');
commit;
Now if we select the data from this table it will give the result something like this:
select * from UDA_DATA;
It gives result something like this:
But however I am expecting something like this:
Means it should break the value_text into two or more rows if the character length is more than 30. Also, uda column should have the suffix as 1,2..n
Not sure how to achieve this in a select query.
You could use recursive subquery factoring:
with rcte (uda, value, chunk_num, value_text) as (
select uda,
substr(value_text, 1, 30),
1,
substr(value_text, 31)
from uda_data
union all
select uda,
substr(value_text, 1, 30),
chunk_num + 1,
substr(value_text, 31)
from rcte
where value_text is not null
)
select uda || chunk_num as uda, value
from rcte;
UDA VALUE
-------------------- ----------------------------------------
Material_ID1 PBL000129 PBL000132 PBL000130
Material_ID1 PBL000134 PBL000138 PBL000135
Material_ID1 PBL000125 PBL000128 PBL000126
Material_ID2 PBL000131 PBL000133
Material_ID2 PBL000136 PBL000137
Material_ID2 PBL000124 PBL000127
The anchor member uses substr to get the first 30 characters as the value, and sets a chunk number which is always 1 for the anchor. It also gets the remains of the string after those first 30 characters have been removed, which may be null.
The recursive member does exactly the same, but working from the remains of the string as found by the previous iteration, and increments the chunk the number.
Finally the main query just gets all those extracted chunks, and appends the chunk number to the uda string.
You could use a hierarchical query if there is a unique key - the data you've shown doesn't have one though.
Your sample data doesn't have anything useful to order the results by. If your real table has a unique key you can use that by including it in both branches of the recursive CTE and then adding it to the final result as
order by unique_key, chunk_num
If there isn't on then you can get closer to your expected result by introducing a dummy key in the anchor member, e.g. using row_number() or the simpler rownum:
with rcte (rn, uda, value, chunk_num, value_text) as (
select rownum,
uda,
substr(value_text, 1, 30),
1,
substr(value_text, 31)
from uda_data
union all
select rn,
uda,
substr(value_text, 1, 30),
chunk_num + 1,
substr(value_text, 31)
from rcte
where value_text is not null
)
select uda || chunk_num as uda, value
from rcte
order by rn, chunk_num;
UDA VALUE
-------------------- ----------------------------------------
Material_ID1 PBL000129 PBL000132 PBL000130
Material_ID2 PBL000131 PBL000133
Material_ID1 PBL000134 PBL000138 PBL000135
Material_ID2 PBL000136 PBL000137
Material_ID1 PBL000125 PBL000128 PBL000126
Material_ID2 PBL000124 PBL000127

Window function behaves differently in Subquery/CTE?

I thought the following three SQL statements are semantically the same. The database engine will expand the second and third query to the first one internally.
select ....
from T
where Id = 1
select *
from
(select .... from T) t
where Id = 1
select *
from
(select .... from T where Id = 1) t
However, I found the window function behaves differently. I have the following code.
-- Prepare test data
with t1 as
(
select *
from (values ( 2, null), ( 3, 10), ( 5, -1), ( 7, null), ( 11, null), ( 13, -12), ( 17, null), ( 19, null), ( 23, 1759) ) v ( id, col1 )
)
select *
into #t
from t1
alter table #t add primary key (id)
go
The following query returns all the rows.
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4)))
over (order by id
rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t
id col1 lastval
-------------------
2 NULL NULL
3 10 NULL
5 -1 10
7 NULL -1
11 NULL -1
13 -12 -1
17 NULL -12
19 NULL -12
23 1759 -12
Without CTE/subquery: then I added a condition just return the row which Id = 19.
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t
where
id = 19;
However, lastval returns null?
With CTE/subquery: now the condition is applied to the CTE:
with t as
(
select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
from
#t)
select *
from t
where id = 19;
-- Subquery
select
*
from
(select
id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from
#t) t
where
id = 19;
Now lastval returns -12 as expected?
The logic order of operations of the SELECT statement is import to understand the results of your first example. From the Microsoft documentation, the order is, from top to bottom:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Note that the WHERE clause processing happens logically before the SELECT clause.
The query without the CTE is being filtered where id = 19. The order of operations causes the where to process before the window function in the select clause. There is only 1 row with an id of 19. Therefore, the where limits the rows to id = 19 before the window function can process the rows between unbounded preceding and 1 preceding. Since there are no rows for the window function, the lastval is null.
Compare this to the CTE. The outer query's filter has not yet been applied, so the CTE operates an all of the data. The rows between unbounded preceding finds the prior rows. The outer part of the query applies the filter to the intermediate results returns just the row 19 which already has the correct lastval.
You can think of the CTE as creating a temporary #Table with the CTE data in it. All of the data is logically processed into a separate table before returning data to the outer query. The CTE in your example creates a temporary work table with all of the rows that includes the lastval from the prior rows. Then, the filter in the outer query gets applied and limits the results to id 19.
(In reality, the CTE can shortcut and skip generating data, if it can do so to improve performance without affecting the results. Itzik Ben-Gan has a great example of a CTE that skips processing when it has returned enough data to satisfy the query.)
Consider what happens if you put the filter in the CTE. This should behave exactly like the first example query that you provided. There is only 1 row with an id = 19, so the window function does not find any preceding rows:
with t as ( select id, col1,
cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over ( order by id
rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
from #t
where id = 19 -- moved filter inside CTE
)
select *
from t
Window functions operate on your result set, so when you added where id = 19 your result set only had 1 row. Since your window function specifies rows between unbounded preceding and 1 preceding there was no preceding row, and resulted in null.
By using the subquery/cte you are allowing the window function to operate over the unfiltered result set (where the preceding rows exist), then retrieving only those rows from that result set where id = 19.
The querys you are comparing are not equivalent.
select id ,
(... ) as lastval
from #t
where id = 19;
will take only 1 row, so 'lastval' will take NULL from col1 as for the windowed function does not find preceding row.

TSQL : Any efficient way to insert data in between range

In my SQL Server database, I have a table like this :
counter, value
12345, 10.1
12370, 10.5
12390, 9.7
12405, 10.1
12510, 12.3
Let's assume that I input a value of 5. I need to fill in the data between the first record and second record by increment of 5 in the counter column.
For example using Record 1 and Record 2, here are the additional data needs to be inserted into the table.
12345, 10.1 --> Record 1
12350, 10.1
12355, 10.1
12360, 10.1
12365, 10.1
12370, 10.5 --> Record 2
Other than using a database cursor to loop through each record in the table and then select the MIN counter after Record 1, is there any other way that I can achieve it with less I/O overhead ? I just need to insert additional counter between the range based on the input parameter.
Thanks for your input.
If you're wanting to compute a weighted average, there's no need to create these rows. You can just work out how many rows you would have added and use that information to calculate the average. E.g.:
declare #t table (counter int not null, value decimal(19,4) not null)
insert into #t(counter, value) values
(12345, 10.1),
(12370, 10.5),
(12390, 9.7 ),
(12405, 10.1),
(12510, 12.3)
declare #gap int
set #gap = 5
;With Numbered as (
select counter,value,ROW_NUMBER() OVER (ORDER BY counter) as rn
from #t
), Paired as (
select n1.counter,n1.value,
(n2.counter - n1.counter)/#gap as Cnt --What do we do for the last row?
from Numbered n1
left join
Numbered n2
on
n1.rn = n2.rn - 1
)
select SUM(value*COALESCE(Cnt,1))/SUM(COALESCE(Cnt,1)) from Paired
Where as you can (hopefully) see, I've currently decided that the last row counts as just 1, but anything else could be done there also.
Filling gaps with values is usually a problem best answered using a Numbers table (a table with a single int column containing numbers from 1 to some sufficiently large number):
declare #n1 int = 12345, #n2 int = 12370, #step int = 5
select #n1 + (n * #step)
from numbers
where n < (#n2 - #n1) / #step
a recursion should work as well:
;WITH
Initial AS (SELECT COUNTER,value FROM yourtable),
maxvalue AS (SELECT MAX(COUNTER) Mvalue FROM Initial),
recur AS (
SELECT COUNTER, value FROM yourtable
UNION ALL
SELECT counter+5,value FROM recur r WHERE COUNTER+5< (SELECT Mvalue FROM maxvalue)
AND NOT EXISTS (SELECT 1 FROM Initial o WHERE o.COUNTER=r.COUNTER+5)
)
SELECT * FROM recur ORDER BY COUNTER
just replace 'yourtable' with the name of your table

SQL Newbie Needs Assistance w/ Query

Below is what I am trying to do in SQL Server 2012. I want to Update Table 2 with the % of total that each AMT value is to the total in Table 1. But the denominator to get the % should only be the total of the rows that have the same MasterDept. I can use this SELECT query to get the correct percentages when I load the table with only one MasterDept but do not know how to do it when there are multiple MasterDept. The first 2 columns in each table are identical both in structure and the data within the columns.
SELECT ABCID,
[AMT%] = ClientSpreadData.AMT/CONVERT(DECIMAL(16,4),(SELECT SUM(ClientSpreadData.AMT)
FROM ClientSpreadData))
FROM ClientSpreadData
Table data
TABLE 1 (MasterDept varchar(4), ABCID varchar(20), AMT INT)
Sample Data (4700, 1, 25),
(4300, 2, 30),
(4700, 3, 50),
(4300, 4, 15)
TABLE 2 (MasterDept varchar(4), ABCID varchar(20), [AMT%] INT)
Sample Data (4700, 1, AMT%)
AMT% should equal AMT / SUM(AMT). SUM(AMT) should only be summing the values where the MasterDept on Table 1 matches the MasterDept from the record on Table 2.
Does that make sense?
You can use a window to get a partitioned SUM():
SELECT MasterDept, ABCID, AMT, SUM(AMT) OVER(PARTITION BY MasterDept)
FROM #Table1
You can use that to get the percentage for each row to update your second table (this assumes 1 row per MasterDept/ABCID combination):
UPDATE A
SET A.[AMT%] = B.[AMT%]
FROM Table2 A
JOIN (SELECT MasterDept
, ABCID
, AMT
, CASE WHEN SUM(AMT) OVER(PARTITION BY MasterDept) = 0 THEN 0
ELSE AMT*1.0/SUM(AMT) OVER(PARTITION BY MasterDept)
END 'AMT%'
FROM #Table1
) B
ON A.MasterDept = B.MasterDept
AND A.ABCID = B.ABCID
As you can see in the subquery, a percent of total can be added to your Table1, so perhaps you don't even need Table2 as it's a bit redundant.
Update: You can use a CASE statement to handle a SUM() of 0.

Resources