SQL query to sum multiple columns across multiple rows - sql-server

I have a table in SQL Server that has multiple rows per id. There are values that need to be subtracted in two columns as well. In the sample below, I need to subtract 1033.90 - 1033.90 - 1181.60 to equal -1181.60.
ID
Value1
Value2
1
1033.90
0.00
1
0.00
1033.90
1
1181.60
0.00
I have tried a few different ways found from others' questions but nothing has worked yet. Cross Joins or Unions seemed to be the way but have yet to give the result needed. Can anyone lend any clues?

All you seem to want is a sum(s) with a group by id
DROP TABLE IF EXISTS T;
CREATE TABLE T
(ID INT, Value1 DECIMAL(10,2), Value2 DECIMAL(10,2));
INSERT INTO T VALUES
(1, 1033.90 ,0.00),
(1, 0.00 ,1033.90),
(1, 1181.60 ,0.00);
SELECT ID , SUM(VALUE1) - SUM(VALUE2) AS TOT
FROM T
GROUP BY ID;
+------+---------+
| ID | TOT |
+------+---------+
| 1 | 1181.60 |
+------+---------+
1 row in set (0.001 sec)
and the group by is unnecessary if you have only 1 id. If you want a running total which where I think Lamu is coming from then you would need some way of ordering events

Related

SQL Server Merge Update With Partial Sources

I have a target table for which partial data arrives at different times from 2 departments. The keys they use are the same, but the fields they provide are different. Most of the rows they provide have common keys, but there are some rows that are unique to each department. My question is about the fields, not the rows:
Scenario
the target table has a key and 30 fields.
Dept. 1 provides fields 1-20
Dept. 2 provides fields 21-30
Suppose I loaded Q1 data from Dept. 1, and that created new rows 100-199 and populated fields 1-20. Later, I receive Q1 data from Dept. 2. Can I execute the same merge code I previously used for Dept. 1 to update rows 100-199 and populate fields 21-30 without unintentionally changing fields 1-20? Alternatively, would I have to tailor separate merge code for each Dept.?
In other words, does (or can) "Merge / Update" operate only on target fields that are present in the source table while ignoring target fields that are NOT present in the source table? In this way, Dept. 1 fields would NOT be modified when merging Dept. 2, or vice-versa, in the event I get subsequent corrections to this data from either Dept.
You can use a merge instruction, where you define a source and a target data, and what happens when a registry is found on both, just on the source, just on the target, and even expand it with custom logic, like it's just on the source, and it's older than X, or it's from department Y.
-- I'm skipping the fields 2-20 and 22-30, just to make this shorter.
create table #target (
id int primary key,
field1 varchar(100), -- and so on until 20
field21 varchar(100), -- and so on until 30
)
create table #dept1 (
id int primary key,
field1 varchar(100)
)
create table #dept2 (
id int primary key,
field21 varchar(100)
)
/*
Creates some data to merge into the target.
The expected result is:
| id | field1 | field21 |
| - | - | - |
| 1 | dept1: 1 | dept2: 1 |
| 2 | | dept2: 2 |
| 3 | dept1: 3 | |
| 4 | dept1: 4 | dept2: 4 |
| 5 | | dept2: 5 |
*/
insert into #dept1 values
(1,'dept1: 1'),
--(2,'dept1: 2'),
(3,'dept1: 3'),
(4,'dept1: 4')
insert into #dept2 values
(1,'dept2: 1'),
(2,'dept2: 2'),
--(3,'dept2: 3'),
(4,'dept2: 4'),
(5,'dept2: 5')
-- Inserts the data from the first department. This could be also a merge, it necessary.
insert into #target(id, field1)
select id, field1 from #dept1
merge into #target t
using (select id, field21 from #dept2) as source_data(id, field21)
on (source_data.id = t.id)
when matched then update set field21=source_data.field21
when not matched by source and t.field21 is not null then delete -- you can even use merge to remove some records that match your criteria
when not matched by target then insert (id, field21) values (source_data.id, source_data.field21); -- Every merge statement should end with ;
select * from #target
You can see this code running on this DB Fiddle

T-SQL - Finding records with chronological gaps

This is my first post here. I'm still a novice SQL user at this point though I've been using it for several years now. I am trying to find a solution to the following problem and am looking for some advice, as simple as possible, please.
I have this 'recordTable' with the following columns related to transactions; 'personID', 'recordID', 'item', 'txDate' and 'daySupply'. The recordID is the primary key. Almost every personID should have many distinct recordID's with distinct txDate's.
My focus is on one particular 'item' for all of 2017. It's expected that once the item daySupply has elapsed for a recordID that we would see a newer recordID for that person with a more recent txDate somewhere between five days before and five days after the end of the daySupply.
What I'm trying to uncover are the number of distinct recordID's where there wasn't an expected new recordID during this ten day window. I think this is probably very simple to solve but I am having a lot of difficulty trying to create a query for it, let alone explain it to someone.
My thought thus far is to create two temp tables. The first temp table stores all of the records associated with the desired items and I'm just storing the personID, recordID and txDate columns. The second temp table has the personID, recordID and the two derived columns from the txDate and daySupply; these would represent the five days before and five days after.
I am trying to find some way to determine the number of recordID's from the first table that don't have expected refills for that personID in the second. I thought a simple EXCEPT would do this but I don't think there's anyway of getting around a recursive type statement to answer this and I have never gotten comfortable with recursive queries.
I searched Stackoverflow and elsewhere but couldn't come up with an answer to this one. I would really appreciate some help from some more clever data folks. Here is the code so far. Thanks everyone!
CREATE TABLE #temp1 (personID VARCHAR(20), recordID VARCHAR(10), txDate
DATE)
CREATE TABLE #temp2 (personID VARCHAR(20), recordID VARCHAR(10), startDate
DATE, endDate DATE)
INSERT INTO #temp1
SELECT [personID], [recordID], txDate
FROM recordTable
WHERE item = 'desiredItem'
AND txDate > '12/31/16'
AND txDate < '1/1/18';
INSERT INTO #temp2
SELECT [personID], [recordID], (txDate + (daySupply - 5)), (txDate +
(daySupply + 5))
FROM recordTable
WHERE item = 'desiredItem'
AND txDate > '12/31/16'
AND txDate < '1/1/18';
I agree with mypetlion that you could have been more concise with your question, but I think I can figure out what you are asking.
SQL Window Functions to the rescue!
Here's the basic idea...
CREATE TABLE #fills(
personid INT,
recordid INT,
item NVARCHAR(MAX),
filldate DATE,
dayssupply INT
);
INSERT #fills
VALUES (1, 1, 'item', '1/1/2018', 30),
(1, 2, 'item', '2/1/2018', 30),
(1, 3, 'item', '3/1/2018', 30),
(1, 4, 'item', '5/1/2018', 30),
(1, 5, 'item', '6/1/2018', 30)
;
SELECT *,
ABS(
DATEDIFF(
DAY,
LAG(DATEADD(DAY, dayssupply, filldate)) OVER (PARTITION BY personid, item ORDER BY filldate),
filldate
)
) AS gap
FROM #fills
ORDER BY filldate;
... outputs ...
+----------+----------+------+------------+------------+------+
| personid | recordid | item | filldate | dayssupply | gap |
+----------+----------+------+------------+------------+------+
| 1 | 1 | item | 2018-01-01 | 30 | NULL |
| 1 | 2 | item | 2018-02-01 | 30 | 1 |
| 1 | 3 | item | 2018-03-01 | 30 | 2 |
| 1 | 4 | item | 2018-05-01 | 30 | 31 |
| 1 | 5 | item | 2018-06-01 | 30 | 1 |
+----------+----------+------+------------+------------+------+
You can insert the results into a temp table and pull out only the ones you want (gap > 5), or use the query above as a CTE and pull out the results without the temp table.
This could be stated as follows: "Given a set of orders, return a subset for which there is no order within +/- 5 days of the expected resupply date (defined as txDate + DaysSupply)."
This can be solved simply with NOT EXISTS. Define the range of orders you wish to examine, and this query will find the subset of those orders for which there is no resupply order (NOT EXISTS) within 5 days of either side of the expected resupply date (txDate + daysSupply).
SELECT
gappedOrder.personID
, gappedOrder.recordID
, gappedOrder.item
, gappedOrder.txDate
, gappedOrder.daysSupply
FROM
recordTable as gappedOrder
WHERE
gappedOrder.item = 'desiredItem'
AND gappedOrder.txDate > '12/31/16'
AND gappedOrder.txDate < '1/1/18'
--order not refilled within date range tolerance
AND NOT EXISTS
(
SELECT
1
FROM
recordTable AS refilledOrder
WHERE
refilledOrder.personID = gappedOrder.personID
AND refilledOrder.item = gappedOrder.item
--5 days prior to (txDate + daysSupply)
AND refilledOrder.txtDate >= DATEADD(day, -5, DATEADD(day, gappedOrder.daysSupply, gappedOrder.txDate))
--5 days after (txtDate + daysSupply)
AND refilledOrder.txtDate <= DATEADD(day, 5, DATEADD(day, gappedOrder.daysSupply, gappedOrder.txtDate))
);

Break up the data in a database column of a record into multiple records

Azure SQL Server - we have a table like this:
MyTable:
ID Source ArticleText
-- ------ -----------
1 100 <nvarchar(max) field with unstructured text from media articles>
2 145 "
3 866 "
4 232 "
ID column is the primary key and auto-increments on INSERTS.
I run this query to find the records with the largest data size in the ArticleText column:
SELECT TOP 500
ID, Source, DATALENGTH(ArticleText)/1048576 AS Size_in_MB
FROM
MyTable
ORDER BY
DATALENGTH(ArticleText) DESC
We are finding that for many reasons both technical and practical, the data in the ArticleText column is just too big in certain records. The above query allows me to look at a range of sizes for our largest records, which I'll need to know for what I'm trying to formulate here.
The feat I need to accomplish is, for all existing records in this table, any record whose ArticleText DATALENGTH is greater than X, break that record into X amount of records where each record will then contain the same value in the Source column, but have the data in the ArticleText column split up across those records in smaller chunks.
How would one achieve this if the exact requirement was say, take all records whose ArticleText DATALENGTH is greater than 10MB, and break each into 3 records where the resulting records' Source column value is the same across the 3 records, but the ArticleText data is separated into three chunks.
In essence, we would need to divide the DATALENGTH by 3 and apply the first 1/3 of the text data to the first record, 2nd 1/3 to the 2nd record, and the 3rd 1/3 to the third record.
Is this even possible in SQL Server?
You can use the following code to create a side table with the needed data:
CREATE TABLE #mockup (ID INT IDENTITY, [Source] INT, ArticleText NVARCHAR(MAX));
INSERT INTO #mockup([Source],ArticleText) VALUES
(100,'This is a very long text with many many words and it is still longer and longer and longer, and even longer and longer and longer')
,(200,'A short text')
,(300,'A medium text, just long enough to need a second part');
DECLARE #partSize INT=50;
WITH recCTE AS
(
SELECT ID,[Source]
,1 AS FragmentIndex
,A.Pos
,CASE WHEN A.Pos>0 THEN LEFT(ArticleText,A.Pos) ELSE ArticleText END AS Fragment
,CASE WHEN A.Pos>0 THEN SUBSTRING(ArticleText,A.Pos+2,DATALENGTH(ArticleText)/2) END AS RestString
FROM #mockup
CROSS APPLY(SELECT CASE WHEN DATALENGTH(ArticleText)/2 > #partSize
THEN #partSize - CHARINDEX(' ',REVERSE(LEFT(ArticleText,#partSize)))
ELSE -1 END AS Pos) A
UNION ALL
SELECT r.ID,r.[Source]
,r.FragmentIndex+1
,A.Pos
,CASE WHEN A.Pos>0 THEN LEFT(r.RestString,A.Pos) ELSE r.RestString END
,CASE WHEN A.Pos>0 THEN SUBSTRING(r.RestString,A.Pos+2,DATALENGTH(r.RestString)/2) END AS RestString
FROM recCTE r
CROSS APPLY(SELECT CASE WHEN DATALENGTH(r.RestString)/2 > #partSize
THEN #partSize - CHARINDEX(' ',REVERSE(LEFT(r.RestString,#partSize)))
ELSE -1 END AS Pos) A
WHERE DATALENGTH(r.RestString)>0
)
SELECT ID,[Source],FragmentIndex,Fragment
FROM recCTE
ORDER BY [Source],FragmentIndex;
GO
DROP TABLE #mockup
The result
+----+--------+---------------+---------------------------------------------------+
| ID | Source | FragmentIndex | Fragment |
+----+--------+---------------+---------------------------------------------------+
| 1 | 100 | 1 | This is a very long text with many many words and |
+----+--------+---------------+---------------------------------------------------+
| 1 | 100 | 2 | it is still longer and longer and longer, and |
+----+--------+---------------+---------------------------------------------------+
| 1 | 100 | 3 | even longer and longer and longer |
+----+--------+---------------+---------------------------------------------------+
| 2 | 200 | 1 | A short text |
+----+--------+---------------+---------------------------------------------------+
| 3 | 300 | 1 | A medium text, just long enough to need a second |
+----+--------+---------------+---------------------------------------------------+
| 3 | 300 | 2 | part |
+----+--------+---------------+---------------------------------------------------+
Now you have to update the existing line with the value at FragmentIndex=1, while you have to insert the values of FragmentIndex>1. Do this sorted by FragmentIndex and your IDENTITY ID-column will reflect the correct order.

Between two different range numbers

I have a stored procedure that has a table for a parameter with two columns: From and To. Both int. It is used for searching scores.
The example of the table is
+-----------+-------+----+
| RowNumber | From | To |
+-----------+-------+----+
| 1 | 0 | 30 |
| 2 | 60 | 80 |
+-----------+-------+----+
How can I search a table to have results that include all scores between 0 and 30 and 60 and 80?
I had tried between inside a while loop but nothing.
This is a guess in the absence of a reply, however, maybe...
CREATE TABLE Score (ID int IDENTITY(1,1),
Score int);
INSERT INTO Score
VALUES (65),(17),(97),(14),(34),(79),(37),(87),(65),(63),(15),(75),(05),(25),(38),(28),(88);
GO
CREATE TABLE ScoreRange (ID int IDENTITY(1,1),
[From] int, --Try to avoid keywords, and especially reserved words, for column names
[To] int); --Try to avoid keywords, and especially reserved words, for column names
INSERT INTO ScoreRange
VALUES (0,30),
(60,80);
GO
SELECT *
FROM Score S;
SELECT S.*
FROM Score S
JOIN ScoreRange SR ON S.Score BETWEEN SR.[From] AND SR.[To];
GO
DROP TABLE Score;
DROP TABLE ScoreRange;
It's kinda hard to answer without sample data - but I think you are looking for something like this:
SELECT t.*
FROM YourTable As t
JOIN #TVP As p ON t.Score >= p.[From] AND t.Score <= p.[To]
select * from t
where exists (
select 1 from ranges r
where t.val between r.from and r.to
);

Find "regional" relationships in SQL data using a query, or SSIS

Edit for clarification: I am compiling data weekly, based on Zip_Code, but some Zip_Codes are redundant. I know I should be able to compile a small amount of data, and derive the redundant zip_codes if I can establish relationships.
I want to define a zip code's region by the unique set of items and values that appear in that zip code, in order to create a "Region Table"
I am looking to find relationships by zip code with certain data. Ultimately, I have tables which include similar values for many zip codes.
I have data similar to:
ItemCode |Value | Zip_Code
-----------|-------|-------
1 |10 | 1
2 |15 | 1
3 |5 | 1
1 |10 | 2
2 |15 | 2
3 |5 | 2
1 |10 | 3
2 |10 | 3
3 |15 | 3
Or to simplify the idea, I could even concantenate ItemCode + Value into unique values:
ItemCode+
Value | Zip_Code
A | 1
B | 1
C | 1
A | 2
B | 2
C | 2
A | 3
D | 3
E | 3
As you can see, Zip_Code 1 and 2 have the same distinct ItemCode and Value. Zip_Code 3 however, has different values for certain ItemCodes.
I need to create a table that establishes a relationship between Zip_Codes that contain the same data.
The final table will look something like:
Zip_Code | Region
1 | 1
2 | 1
3 | 2
4 | 2
5 | 1
6 | 3
...etc
This will allow me to collect data only once for each unique Region, and derive the zip_code appropriately.
Things I'm doing now:
I am currently using a query similar to a join, and compares against Zip_Code using something along the lines of:
SELECT a.ItemCode
,a.value
,a.zip_code
,b.ItemCode
,b.value
,b.zip_code
FROM mytable as a, mytable as b -- select from table twice, similar to a join
WHERE a.zip_code = 1 -- left table will have all ItemCode and Value from zip 1
AND b.zip_code = 2 -- right table will have all ItemCode and Value from zip 2
AND a.ItemCode = b.ItemCode -- matches rows on ItemCode
AND a.Value != b.Value
ORDER BY ItemCode
This returns nothing if the two zip codes have exactly the same ItemNum, and Value, and returns a slew of differences between the two zip codes if there are differences.
This needs to move from a manual process to an automated process however, as I am now working with more than 100 zip_codes.
I do not have much programming experience in specific languages, so tools in SSIS are somewhat limited to me. I have some experience using the Fuzzy tools, and feel like there might be something in Fuzzy Grouping that might shine a light on apparent regions, but can't figure out how to set it up.
Does anyone have any suggestions? I have access to SQLServ and its related tools, and Visual Studio. I am trying to avoid writing a program to automate this, as my c# skills are relatively nooby, but will figure it out if necessary.
Sorry for being so verbose: This is my first Question, and the page I agreed to in order to ask a question suggested to explain in detail, and talk about what I've tried...
Thanks in advance for any help I might receive.
Give this a shot (I used the simplified example, but this can easily be expanded). I think the real interesting part of this code is the recursive CTE...
;with matches as (
--Find all pairs of zip_codes that have matching values.
select d1.ZipCode zc1, d2.ZipCode zc2
from data d1
join data d2 on d1.Val=d2.Val
group by d1.ZipCode, d2.ZipCode
having count(*) = (select count(distinct Val) from data where zipcode = d1.Zipcode)
), cte as (
--Trace each zip_code to it's "smallest" matching zip_code id.
select zc1 tempRegionID, zc2 ZipCode
from matches
where zc1<=zc2
UNION ALL
select c.tempRegionID, m.zc2
from cte c
join matches m on c.ZipCode=m.zc1
and c.ZipCode!=m.zc2
where m.zc1<=m.zc2
)
--For each zip_code, use it's smallest matching zip_code as it's region.
select zipCode, min(tempRegionID) as regionID
from cte
group by ZipCode
Demonstrating that there's a use for everything, though normally it makes me cringe: concatenate the values for each zip code into a single field. Store ZipCode and ConcatenatedValues in a lookup table (PK on the one, UQ on the other). Now you can assess which zip codes are in the same region by grouping on ConcatenatedValues.
Here's a simple function to concatenate text data:
CREATE TYPE dbo.List AS TABLE
(
Item VARCHAR(1000)
)
GO
CREATE FUNCTION dbo.Implode (#List dbo.List READONLY, #Separator VARCHAR(10) = ',') RETURNS VARCHAR(MAX)
AS BEGIN
DECLARE #Concat VARCHAR(MAX)
SELECT #Concat = CASE WHEN Item IS NULL THEN #Concat ELSE COALESCE(#Concat + #Separator, '') + Item END FROM #List
RETURN #Concat
END
GO
DECLARE #List AS dbo.List
INSERT INTO #List (Item) VALUES ('A'), ('B'), ('C'), ('D')
SELECT dbo.Implode(#List, ',')

Resources