Insert entry on a new temp table after make cell calculations - sql-server

I have a SQL Server table (source table) with an analogy column of type string. What I'm trying to achieve is the calculation of pieces and the insertion of new entry when analogy is "1/1".
More specific: When analogy is 1/1
take productName as it is,
put color as "white", -
keep analogy text the same "1/1"
divide by 2 the number of psc on both current and new entry.
What I've tried so far is to create a #temp table by using SELECT INTO and then try to create a recursion to check analogy before inserting a new entry on #temp table. However query didn't work and stuck.
I've used this example of Denis Lukichev but I'm not sure if this approach is suitable for my solution. Moreover this approach of Felix Pamittan is closer to what I want but i don't know how to integrate it on my example.
Any help or reference on how to achieve the solution will be appreciated..
Source table:
productName
color
analogy
psc
Alpha
Gray
1/1
1000
Beta
Gray
1/1
1000
Gama
Gray
2/1
1500
How to achieve the following result on a new temp table?:
productName
color
analogy
psc
Alpha
Gray
1/1
500
Alpha
white
1/1
500
Beta
Gray
1/1
500
Beta
white
1/1
500
Gama
Gray
2/1
1000
Gama
white
2/1
500
Moreover is there any chance of using other analogy and recalculate psc. For example: if the analogy is 2/1 means 2 slots are for Gray and one slot for white, then according analogy will be 500+500 =1000 psc for Gray and 500 psc for white.
UPDATE
After using the helpful suggestion of Dordi, it considered a close to solution until use another color.
More specific, i've added 'White' and 'Black' colors and the result was not as intended.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE sourceTable (
productName varchar(50),
color varchar(50),
analogy varchar(50),
psc int
);
INSERT INTO sourceTable (productName, color, analogy, psc) VALUES ('Alpha', 'Gray', '1/1',1000);
INSERT INTO sourceTable (productName, color, analogy, psc) VALUES ('Gama', 'Black', '1/2',1500);
INSERT INTO sourceTable (productName, color, analogy, psc) VALUES ('Gama', 'White', '3/0',1500);
Query 1:
SELECT t.productName,
x.color,
t.analogy,
CASE x.color
WHEN 'Gray' THEN psc * CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) / (CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) + CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) )
WHEN 'Black' THEN psc * CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) / (CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) + CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) )
WHEN 'White' THEN psc * CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) / (CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) + CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) )
END AS psc
FROM sourceTable t
CROSS JOIN (VALUES ('Gray'),('White'),('Black')) AS x(color)
Results:
| productName | color | analogy | psc |
|-------------|-------|---------|------|
| Alpha | Gray | 1/1 | 500 |
| Alpha | White | 1/1 | 500 |
| Alpha | Black | 1/1 | 500 |
| Gama | Gray | 1/2 | 500 |
| Gama | White | 1/2 | 1000 |
| Gama | Black | 1/2 | 500 |
| Gama | Gray | 3/0 | 1500 |
| Gama | White | 3/0 | 0 |
| Gama | Black | 3/0 | 1500 |
But the preferred results are:
| productName | color | analogy | psc |
|-------------|-------|---------|------|
| Alpha | Gray | 1/1 | 500 |
| Alpha | White | 1/1 | 500 |
| Gama | Black | 1/2 | 500 |
| Gama | White | 1/2 | 1000 |
| Gama | White | 3/0 | 1500 |
| Gama | White | 3/0 | 0 |
I was thinking CROSS JOIN (VALUES ('Gray'),('White'),('Black')) AS x(color) is the issue here, maybe it should take dynamic (select distinct) the colors, or another case scenario dealing with color name.
Any thoughts?

A combination of APPLY operator and the appropriate calculations is another option:
SELECT t.productName, a. color, t.analogy, a.psc
FROM (
SELECT
productName,
color,
analogy,
psc,
CONVERT(int, LEFT(analogy, CHARINDEX('/', analogy) - 1)) AS analogy1,
CONVERT(int, STUFF(analogy, 1, CHARINDEX('/', analogy), '')) AS analogy2
FROM sourceTable
) t
CROSS APPLY (VALUES
(t.color, ROUND(t.analogy1 * 1.0 / (t.analogy1 + t.analogy2) * t.psc, 0)),
('White', ROUND(t.analogy2 * 1.0 / (t.analogy1 + t.analogy2) * t.psc, 0))
) a (color, psc)

Based on your explanation, "analogy" is the repartition of psc over colors.
Here's another approach to calculate:
SELECT t.productName,
x.color,
t.analogy,
CASE x.color
WHEN 'Gray' THEN psc * CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) / (CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) + CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) )
WHEN 'White' THEN psc * CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) / (CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) + CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) )
END AS psc
FROM #TEMP t
CROSS JOIN (VALUES ('Gray'),('White')) AS x(color)
EDIT
Yes you can add a distint if you have multiple colors, your query becomes:
SELECT t.productName,
x.color,
t.analogy,
CASE
WHEN x.color = 'White' AND x.IsSource = 0 THEN psc * CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) / (CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) + CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) )
ELSE psc * CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) / (CAST(LEFT(analogy,CHARINDEX('/',analogy) - 1) as int) + CAST(RIGHT(analogy,CHARINDEX('/',analogy) - 1) as int) )
END AS psc
FROM sourceTable t
INNER JOIN (SELECT DISTINCT color AS Id,color AS color,1 IsSource FROM sourceTable
UNION ALL
SELECT DISTINCT color AS Id,'White' AS color,0 IsSource FROM sourceTable
) AS x ON t.color = x.Id
Here's a dbfiddle

Related

Logic in SQL Server to create a derived column based on comparing two comma separated columns

Want to create logic in SQL Server to create a derived column based on comparing two comma-separated columns.
Sample table data -
Create table ##table1 (ID INT Identity Primary Key, FulfillmentChannelStatus varchar(255),RoleAlternateSourcingChannel varchar (255))
insert into ##table1 values ('Filled,Open,In-process','Internal,Recruiter,Contractor')
,('Open,In-process,New','Contractor,Internal,Recruiter')
,('New,Filled','Contractor,Recruiter ')
,('Filled','Recruiter')
,('Open,New,Filled','Internal,Recruiter,Contractor')
,('Filled,Filled,Filled','Internal,Contractor,Recruiter')
,('Open ,Filled, In-proces','Contractor,Internal,Recruiter')
,('Filled','Others')
,('Cancelled,Filled','Contractor,Recruiter')
,('Cancelled, Filled, Cancel - In Process','Contractor,Recruiter,Internal')
Logic for new column--
--select * from ##tble
DECLARE #separator CHAR(1) = ','
SELECT
[Role Id],[RoleAlternateSourcingChannel],[FulfillmentChannelStatus] , [Filled fulfil] = x.value('(/root/r[sql:column("t.pos")]/text())[1]', 'VARCHAR(10)')
into ##temp FROM ##tble
CROSS APPLY (SELECT x = TRY_CAST('<root><r><![CDATA[' +
REPLACE([FulfillmentChannelStatus], #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)
.query('
for $x in /root/r[text()="Filled"][1]
return count(root/r[. << $x]) + 1
').value('text()[1]','INT')) AS t(pos)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE([RoleAlternateSourcingChannel], #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t2(x)
Scenario: I have two comma-separated columns
1 . I need to calculate values for only "Filled" values (in column [Role Alternate Sourcing Channel])
2. In 1st first row- I have a Filled value for Internal Scheduling in column ([RoleAlternateSourcingChannel]) so in the output column – it will be Internal.
3. In 2nd row – I don’t have any Filled so the output will be Null.
4. in 3rd row - I have a Filled value for Recruiter so the output will be Recruiter.
And so on…
5.In Row 6 for all value is filled so the output will be a recruiter. because preference of Recruiter>Internal>Contractor
Other than Recruiter/Internal/ Contractor all filled values will be Null.
The position of Filled value is not fixed. It can be anywhere such as - either at 1st place/position or 2nd place or in 3rd place.
Expected Output -
|+----+------------------------+----------------------------+---------------+
| ID |FulfillmentChannelStatus|RoleAlternateSourcingChannel| Filled fulfil |
+----+------------------------+----------------------------+---------------+
| 1 | Filled,Open,In-process | Internal,Recruiter,Contractor | Internal |
| 2 | Open,In-process,New | Contractor,Internal,Recruiter | NULL |
| 3 | New,Filled | Contractor,Recruiter | Recruiter |
| 4 | Filled | Recruiter | Recruiter |
| 5 | Open,New,Filled | Internal,Recruiter,Contractor | Contractor |
| 6 | Filled,Filled,Filled | Internal,Contractor,Recruiter | Recruiter |
| 7 | Open ,Filled, In-process| Contractor,Internal,Recruiter | Internal |
| 8 | Filled | Others | Null
| 9 | Cancelled, Filled, Cancel - In Procecess|Contractor,Internal,Recruiter | Internal
| 10| Cancelled, Filled| Internal,Recruiter| Recruiter
+----+------------------------+-------+--------+----------------------------+
**Question:** I tried Query2, For all other cases it is working fine now but for Row 9 and 10 O/P is Null but it should be Internal and Recruiter respectively.
A minimal reproducible example ##1-4 is not provided.
Shooting from the hip.
Please try the following solution based on XQuery.
XML and XQuery data model is based on ordered sequences, exactly what we need.
You moved the goalposts in the middle of the game.
I made just the "Recruiter","Internal","Contractor" as a legitimate
values for the RoleAlternateSourcingChannel column. Everything
else is filtered out.
I don't see any easy way to handle the preference of
Recruiter>Internal>Contractor for the row #6.
SQL #1
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, FulfillmentChannelStatus VARCHAR(255), RoleAlternateSourcingChannel VARCHAR(255));
INSERT INTO #tbl (FulfillmentChannelStatus, RoleAlternateSourcingChannel) VALUES
('Filled,Open,In-process', 'Internal,Recruiter,Contractor'),
('Open,In-process,New', 'Contractor,Internal,Recruiter'),
('New,Filled', 'Contractor,Recruiter'),
('Filled', 'Recruiter'),
('Open,New,Filled', 'Internal,Recruiter,Contractor'),
('Filled,Filled,Filled', 'Internal,Contractor,Recruiter'),
('Open,Filled,In-process', 'Contractor,Internal,Recruiter'),
('Filled', 'Others');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ',';
SELECT tbl.*
, Result = x.value('(/root/r[sql:column("t.pos")]/text())[1]', 'VARCHAR(10)')
FROM #tbl AS tbl
CROSS APPLY (SELECT x = TRY_CAST('<root><r><![CDATA[' +
REPLACE(FulfillmentChannelStatus, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)
.query('
if (count(/root/r[text()="Filled"]) eq 1) then
for $x in /root/r[text()="Filled"]
return count(root/r[. << $x]) + 1
else ()
').value('text()[1]','INT')) AS t(pos)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(RoleAlternateSourcingChannel, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML).query('<root>
{
for $x in /root/r[text()=("Recruiter","Internal","Contractor")]
return $x
}
</root>
')) AS t2(x);
Output
+----+--------------------------+-------------------------------+------------+
| ID | FulfillmentChannelStatus | RoleAlternateSourcingChannel | Result |
+----+--------------------------+-------------------------------+------------+
| 1 | Filled,Open,In-process | Internal,Recruiter,Contractor | Internal |
| 2 | Open,In-process,New | Contractor,Internal,Recruiter | NULL |
| 3 | New,Filled | Contractor,Recruiter | Recruiter |
| 4 | Filled | Recruiter | Recruiter |
| 5 | Open,New,Filled | Internal,Recruiter,Contractor | Contractor |
| 6 | Filled,Filled,Filled | Internal,Contractor,Recruiter | NULL |
| 7 | Open,Filled,In-process | Contractor,Internal,Recruiter | Internal |
| 8 | Filled | Others | NULL |
+----+--------------------------+-------------------------------+------------+
SQL #2
DB fiddle
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, FulfillmentChannelStatus VARCHAR(255), RoleAlternateSourcingChannel VARCHAR(255));
INSERT INTO #tbl (FulfillmentChannelStatus, RoleAlternateSourcingChannel) VALUES
('Filled,Open,In-process', 'Internal,Recruiter,Contractor'),
('Open,In-process,New', 'Contractor,Internal,Recruiter'),
('New,Filled', 'Contractor,Recruiter'),
('Filled', 'Recruiter'),
('Open,New,Filled', 'Internal,Recruiter,Contractor'),
('Filled,Filled,Filled', 'Internal,Contractor,Recruiter'),
('Open,Filled,In-process', 'Contractor,Internal,Recruiter'),
('Filled', 'Others'),
('Cancelled,Filled','Contractor,Recruiter'),
('Cancelled, Filled, Cancel - In Process','Contractor,Recruiter,Internal');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ',';
;WITH rs AS
(
SELECT ID, x
FROM #tbl
CROSS APPLY (SELECT TRY_CAST('<root>' +
'<source><r><![CDATA[' + REPLACE(REPLACE(FulfillmentChannelStatus,SPACE(1),''), #separator, ']]></r><r><![CDATA[') +
']]></r></source>' +
'<target><r><![CDATA[' + REPLACE(REPLACE(RoleAlternateSourcingChannel,SPACE(1),''), #separator, ']]></r><r><![CDATA[') +
']]></r></target>' +
'</root>' AS XML).query('<root>
{
for $x in /root/source/r
let $pos := count(root/source/r[. << $x]) + 1
return <r>
<s>{data($x)}</s><t>{data(/root/target/r[$pos])}</t>
</r>
}
</root>')) AS t(x)
), cte AS
(
SELECT ID
, c.value('(s/text())[1]', 'VARCHAR(30)') AS source
, c.value('(t/text())[1]', 'VARCHAR(30)') AS [target]
FROM rs
CROSS APPLY x.nodes('/root/r') AS t(c)
), cte2 AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY
CASE [target]
WHEN 'Recruiter' THEN 1
WHEN 'Internal' THEN 2
WHEN 'Contractor' THEN 3
END) AS seq
FROM cte
WHERE source = 'Filled'
AND [target] IN ('Recruiter','Internal','Contractor')
)
SELECT t.*
, c.[target] --, c.seq
FROM #tbl AS t
LEFT OUTER JOIN cte2 AS c ON c.ID = t.ID
WHERE c.seq = 1 OR c.seq is NULL
ORDER BY t.ID;
Output
+----+----------------------------------------+-------------------------------+------------+
| ID | FulfillmentChannelStatus | RoleAlternateSourcingChannel | target |
+----+----------------------------------------+-------------------------------+------------+
| 1 | Filled,Open,In-process | Internal,Recruiter,Contractor | Internal |
| 2 | Open,In-process,New | Contractor,Internal,Recruiter | NULL |
| 3 | New,Filled | Contractor,Recruiter | Recruiter |
| 4 | Filled | Recruiter | Recruiter |
| 5 | Open,New,Filled | Internal,Recruiter,Contractor | Contractor |
| 6 | Filled,Filled,Filled | Internal,Contractor,Recruiter | Recruiter |
| 7 | Open,Filled,In-process | Contractor,Internal,Recruiter | Internal |
| 8 | Filled | Others | NULL |
| 9 | Cancelled,Filled | Contractor,Recruiter | Recruiter |
| 10 | Cancelled, Filled, Cancel - In Process | Contractor,Recruiter,Internal | Recruiter |
+----+----------------------------------------+-------------------------------+------------+

SQL Server Denominator Pushed to Zero

I am using the following formula to calculate the Pearson correlation in my data. Note: I am using a CASE WHEN to account for a divide by zero error. The code below represents solely the formula.
( COUNT(*) * SUM(X * Y) - SUM(X) * SUM(Y) )
/ ( SQRT(COUNT(*) * SUM(X * X) - SUM(X) * SUM(x)) * SQRT(COUNT(*) * SUM(Y* Y) - SUM(Y) * SUM(Y) ) )
Edit added query:
DROP TABLE IF EXISTS #test;
SELECT year
,product_id
,score_range
,reporting_year
/* used to manually calculate correlation in excel */
,COUNT(*) AS n_count
,COUNT(*) * SUM(1_x * 2_score) - SUM(1_x) * SUM(2_score) AS numerator
,SUM(1_x * 1_x) AS 1_sumprod
,SUM(1_x) AS 1_sum
,SUM(2_score * 2_score) AS 2_sumprod
,SUM(2_score) AS 2_sum
INTO #test
FROM #acct_details
GROUP BY year
,product_id
,score_range
,reporting_year
;
SELECT year
,product_id
,score_range
,reporting_year
,CASE
WHEN ( ( SQRT(n_count * 1_sumprod - 1_sum * 1_sum) * SQRT(n_count * 2_sumprod - 2_sum * 2_sum) ) ) = 0
THEN NULL
ELSE numerator / ( ( SQRT(n_count * 1_sumprod - 1_sum * 1_sum) * SQRT(n_count * 2_sumprod - 2_sum * 2_sum) ) )
END AS sql_corr
,(n_count * 1_sumprod - 1_sum * 1_sum) 1_denom
,( SQRT(n_count * 2_sumprod - 2_sum * 2_sum) ) AS 2_denom
FROM #test
ORDER BY year
,reporting_year
,score_range
;
The output of my data looks like the table below. Note that excel_corr is the correlation manually calculated in Excel, which is my expected output.
The column sql_corr is the result from my sql code above. The columns from count to the end represent the X and Y values that get plugged into the formula above. My problem is that the sql_corr does not match the output from manually calculating the correlation by groupings in Excel.
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| year | product_id | score_range | reporting_year | sql_corr | count | numerator | 1_sumprod | 1_sum | 2_sumprod | 2_sum | excel_corr |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 1-2 | 2016 | NULL | 1 | 0 | 0.000124 | -0.011155 | 195364 | 442 | #DIV/0! |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 3-4 | 2016 | NULL | 1272 | -0.0683 | 4.9E-11 | -0.000007 | 304648060 | 622434 | -0.02911 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 5-6 | 2016 | -0.06416 | 3913 | -11.845 | 2.89E-09 | -0.000459 | 1.089E+09 | 2063948 | -0.06391 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 7-8 | 2016 | 0.00573 | 2593 | 1.63663 | 2.27E-08 | -0.000975 | 848560006 | 1482872 | 0.00573 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 9-10 | 2016 | -0.02106 | 1420 | -3.2855 | 4.13E-08 | -0.00131 | 555096971 | 887587 | -0.02106 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 11-12 | 2016 | 0.05231 | 917 | 6.64768 | 1.06E-07 | -0.000987 | 413059274 | 615312 | 0.052438 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 13-14 | 2016 | 0.006704 | 359 | 0.5064 | 6.18E-07 | 0.000271 | 185781413 | 258205 | 0.006705 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 15-16 | 2016 | 0.017846 | 55 | 0.14095 | 3.79E-06 | 0.000349 | 31849498 | 41850 | 0.017839 |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
| 2016 | 1 | 17-18 | 2016 | NULL | 1 | 0 | 0 | 0 | 641601 | 801 | #DIV/0! |
+------+------------+-------------+----------------+----------+-------+-----------+-----------+-----------+-----------+---------+------------+
For example, in score_range 3-4 the sql_corr value is NULL but in excel the value is -0.02911. If we plug in the values manually into the formula -0.02911 is the correct result.
numerator
/ ( ( SQRT(n_count * 1_sumprod - 1_sum * 1_sum) * SQRT(n_count * 2_sumprod - 2_sum * 2_sum) ) )
In SQL Server the denominator is getting pushed to 0. When I calculate this manually in Excel the denominator is 2.344354. Why is my denominator being pushed to 0 in SQL Server when the same data results in a different calculation when done manually?
Edit
The first part of the denominator is being pushed to 0. ( ( SQRT(n_count * 1_sumprod - 1_sum * 1_sum). When the multiplication occurs the whole denominator gets pushed to 0 in SQL activating the CASE statement to return NULL. This is incorrect confirmed by manual calculation. The following represents the output from both parts of the denominator 0.000000 and 9394.0387480572. The actual value for the first part of the denominator via manual calculation is ~0.00025.
Edit
The value of (n_count * 1_sumprod - 1_sum * 1_sum) = 6.2279E-08 -- before taking the square root. However, SQL is pushing this part of the equation to 0.
I am using SQL Server 2016 v14.0.2037.2. I thought maybe my value was too small but it appears that values greater than 5E-18 should remain. This was confirmed in the documentation here.
Credit to TomPhillips here.
They problem you have is you have a mix of integers and floats. This causes confusions and conversions. Convert all your values to float to get the value you are expecting. This is what Excel does.
Even though I CAST the initial values going into #test from #acct_details as DECIMAL, I needed to explicitly CAST the values in #test to the formula all as DECIMAL also. The result kept getting rounded to 0 here: SUM(1_x * 1_x). CASTing the values explicitly resolved this by forcing the necessary precision.
DROP TABLE IF EXISTS #test;
SELECT year
,product_id
,score_range
,reporting_year
/* used to manually calculate correlation in excel */
,CAST(COUNT(*) AS DECIMAL(16,9)) AS n_count
,CAST( ( COUNT(*) * SUM(1_x * 2_score) - SUM(1_x) * SUM(2_score) ) AS DECIMAL(16,9)) AS numerator
,CAST(SUM(1_x * 1_x) AS DECIMAL(16,9)) AS 1_sumprod
,CAST(SUM(1_x) AS DECIMAL(16,9)) AS 1_sum
,CAST(SUM(2_score * 2_score) AS DECIMAL(16,9)) AS 2_sumprod
,CAST(SUM(2_score) AS DECIMAL(16,9)) AS 2_sum
INTO #test
FROM #acct_details
GROUP BY year
,product_id
,score_range
,reporting_year
;
Changing them to FLOAT also worked. Documentation here.
DROP TABLE IF EXISTS #test;
SELECT year
,product_id
,score_range
,reporting_year
/* used to manually calculate correlation in excel */
,CAST(COUNT(*) AS FLOAT) AS n_count
,CAST( ( COUNT(*) * SUM(1_x * 2_score) - SUM(1_x) * SUM(2_score) ) AS FLOAT) AS numerator
,CAST(SUM(1_x * 1_x) AS FLOAT) AS 1_sumprod
,CAST(SUM(1_x) AS FLOAT) AS 1_sum
,CAST(SUM(2_score * 2_score) AS FLOAT) AS 2_sumprod
,CAST(SUM(2_score) AS FLOAT) AS 2_sum
INTO #test
FROM #acct_details
GROUP BY year
,product_id
,score_range
,reporting_year
;

Interpolate missing values when joining two tables

I have two tables of different density data and I'd like to be able to join them but interpolate this values in the lower frequency table to fill in the gaps.
I have no idea how to approach this other than it's a lag/lead thing but the differences are irregular.
Here is my set up below:
CREATE TABLE #HighFreq
(MD INT NOT NULL,
LOSS float)
INSERT INTO #HighFreq
VALUES
(6710,0.5)
,(6711,0.6)
,(6712,0.6)
,(6713,0.5)
,(6714,0.5)
,(6715,0.4)
,(6716,0.9)
,(6717,0.9)
,(6718,0.9)
,(6719,1)
,(6720,0.8)
,(6721,0.9)
,(6722,0.7)
,(6723,0.7)
,(6724,0.7)
,(6725,0.7)
CREATE TABLE #LowFreq
(MD INT NOT NULL
,X FLOAT
,Y FLOAT)
INSERT INTO #LowFreq
VALUES
(6710,12,1000)
,(6711,8,1001)
,(6718,10,1007)
,(6724,8,1013)
,(6730,11,1028)
And I want my output to look like this:
Here is an approach using a recursive cte and window functions. The recusive cte generates the list of mds from values available in both tables. Then, the idea is to put adjacent "missing" #LowFreq records into groups, using gaps-and-island technique. You can then do the interpolation in the outer query, by projecting values between the first (and only) non-null value in the group and the next one.
with cte as (
select min(coalesce(h.md, l.md)) md, max(coalesce(h.md, l.md)) md_max
from #HighFreq h
full join #LowFreq l on l.md = h.md
union all
select md + 1, md_max from cte where md < md_max
)
select
md,
loss,
coalesce(x, min(x) over(partition by grp)
+ (min(lead_x) over(partition by grp) - min(x) over(partition by grp))
* (row_number() over(partition by grp order by md) - 1)
/ count(*) over(partition by grp)
) x,
coalesce(y, min(y) over(partition by grp)
+ (min(lead_y) over(partition by grp) - min(y) over(partition by grp))
* (row_number() over(partition by grp order by md) - 1)
/ count(*) over(partition by grp)
) y
from (
select
c.md,
h.loss,
l.x,
l.y,
sum(case when l.md is null then 0 else 1 end) over(order by c.md) grp,
lead(l.x) over(order by c.md) lead_x,
lead(l.y) over(order by c.md) lead_y
from cte c
left join #HighFreq h on h.md = c.md
left join #LowFreq l on l.md = c.md
) t
Demo on DB Fiddle:
md | loss | x | y
---: | ---: | ---------------: | ---------------:
6710 | 0.5 | 12 | 1000
6711 | 0.6 | 8 | 1001
6712 | 0.6 | 8.28571428571429 | 1001.85714285714
6713 | 0.5 | 8.57142857142857 | 1002.71428571429
6714 | 0.5 | 8.85714285714286 | 1003.57142857143
6715 | 0.4 | 9.14285714285714 | 1004.42857142857
6716 | 0.9 | 9.42857142857143 | 1005.28571428571
6717 | 0.9 | 9.71428571428571 | 1006.14285714286
6718 | 0.9 | 10 | 1007
6719 | 1 | 9.66666666666667 | 1008
6720 | 0.8 | 9.33333333333333 | 1009
6721 | 0.9 | 9 | 1010
6722 | 0.7 | 8.66666666666667 | 1011
6723 | 0.7 | 8.33333333333333 | 1012
6724 | 0.7 | 8 | 1013
6725 | 0.7 | 8.5 | 1015.5
6726 | null | 9 | 1018
6727 | null | 9.5 | 1020.5
6728 | null | 10 | 1023
6729 | null | 10.5 | 1025.5
6730 | null | 11 | 1028

SQL Group By a Partition By

This must be accomplished in MS SQL Server. I believe OVER( PARTITION BY) must be used, but I've failed at all my tries and I end up counting the records to each ID or something else...
I have this table:
| ID | COLOR |
+------+--------+
| 1 | Red |
| 1 | Green |
| 1 | Blue |
| 2 | Red |
| 2 | Green |
| 2 | Blue |
| 3 | Red |
| 3 | Brown |
| 3 | Orange |
Notice that ID = 1 and ID = 2 have precisely the same values for COLOR, however ID = 3 only shares the value COLOR = Red.
I would like to group the table as follows:
| COLOR | COUNT | GROUPING |
+--------+-------+----------+
| Red | 2 | Type 1 |
| Green | 2 | Type 1 |
| Blue | 2 | Type 1 |
| Red | 1 | Type 2 |
| Brown | 1 | Type 2 |
| Orange | 1 | Type 2 |
This would mean that ID = 1 and ID = 2 share the same 3 values for color and they are aggregated together as Type 1. Although ID = 3 shares one value for color to ID = 1 and ID = 2 (which is 'Red') the rest of the values are not shared, as such it is considered of Type 2 (different grouping).
The tables used are simple examples and are enough to replicate to the entire dateset, however each ID can have in theory hundreds of records with different values for colors in each row. However they are unique, one ID can't have the the same color in different rows.
My best attempt:
SELECT
ID,
COLOR,
CONCAT ('TYPE ', COUNT(8) OVER( PARTITION by ID)) AS COLOR_GROUP
FROM
{TABLE};
Result:
| ID | COLOR | GROUPING |
+------+--------+----------+
| 1 | Green | Type 3 |
| 1 | Blue | Type 3 |
| 1 | Red | Type 3 |
| 2 | Green | Type 3 |
| 2 | Blue | Type 3 |
| 2 | Red | Type 3 |
| 3 | Red | Type 3 |
| 3 | Brown | Type 3 |
| 3 | Orange | Type 3 |
Although the results are terrible I've tried different methods, none of them is better.
Hope I was clear enough.
Thank you for the help!
try the following:
declare #t table ( ID int,COLOR varchar(100))
insert into #t select 1 ,'Red'
insert into #t select 1 ,'Green'
insert into #t select 1 ,'Blue'
insert into #t select 2 ,'Red'
insert into #t select 2 ,'Green'
insert into #t select 2 ,'Blue'
insert into #t select 3 ,'Red'
insert into #t select 3 ,'Brown'
insert into #t select 3 ,'Orange'
select *, STUFF((SELECT CHAR(10) + ' '+COLOR
FROM #t t_in where t_in.ID=t.ID
order by COLOR
FOR XML PATH ('')) , 1, 1, '') COLOR_Combined
into #temp
from #t t
select COLOR, count(color) [COUNT], 'TYPE ' + convert(varchar(10), dense_rank() OVER (order by [grouping])) [GROUPING]
from
(
select id, COLOR, COLOR_Combined, (row_number() over (order by id) - row_number() over (partition by Color_Combined order by id)) [grouping]
from #temp
)t
group by COLOR, [grouping]
drop table if exists #temp
Please find the db<>fiddle here.

Iterate through an SQL Server table and insert rows

A table (Table1) has the data below:
+-----------+-----------+-----------+---------+
| AccountNo | OldBranch | NewBranch | Balance |
+-----------+-----------+-----------+---------+
| 785321 | 10 | 20 | -200 |
| 785322 | 10 | 20 | 300 |
+-----------+-----------+-----------+---------+
Using the logic :
if the Balance is negative (ie. <0) then NewBranch has to be debited (Dr) and Old Branch has to be credited (Cr);
if the Balance is positive (ie. >0) then OldBranch has to be debited (Dr) and New Branch has to be credited (Cr);
rows as below have to be inserted into another Table (Table2)
+------------+------+--------+--------+
| Account NO | DrCr | Branch | Amount |
+------------+------+--------+--------+
| 785321 | Dr | 20 | 200 |
| 785321 | Cr | 10 | 200 |
| 785322 | Cr | 20 | 300 |
| 785322 | Dr | 10 | 300 |
+------------+------+--------+--------+
What are the possible solutions using a Cursor and otherwise?
Thanks,
You did not provide much in the way of details but something like this should be pretty close.
update nb
set Balance = Balance - ABS(t1.Balance)
from NewBranch nb
join Table1 t1 on t1.AccountNo = nb.AccountNo
where nb.Balance < 0
update ob
set Balance = Balance - ABS(t1.Balance)
from OldBranch ob
join Table1 t1 on t1.AccountNo = ob.AccountNo
where ob.Balance > 0
You absolutely dont need a cursor, just a set of insert statements
INSERT INTO Table2 (AccountNo,DrCr,Branch,Amount)
SELECT AccountNo,'Dr',IIF(Balance<0,NewBranch,OldBranch),IIF(balance<0,-1*balance,balance) FROM Table1
UNION ALL
SELECT AccountNo,'Cr',IIF(Balance>0,NewBranch,OldBranch),IIF(balance<0,-1*balance,balance) FROM Table1
declare #t table (Accountno int,
OldBranch INT,
NewBranch int,
Balance int)
insert into #t (Accountno,
OldBranch,
NewBranch,
Balance)
values (785321,10,20,200),
(785322,10,20,300)
select Accountno,Y.CRDR,Y.Branch,Y.Amount from #t CROSS APPLY
(Select 'Dr' AS CRDR,OldBranch AS Branch,Balance As Amount
UNION ALL
Select 'Cr',NewBranch,Balance)y

Resources