How to Sum particular values in a column in Microsoft SQL Server? - sql-server

I'm kinda new to this and I have been stuck on this for a while now.
Example:
Col1 Col2 Col3
A | H | 1
A | I | 2
A | J | 3
B | J | 4
B | K | 5
C | L | 6
How can I sum 'Col3' but only for particular values. For example sum up the values in 'Col3' where the letters in 'Col1' are in the same row as 'Col3'. So A = 6 (1+2+3) and B = 9 (4+5) and C = 6
So you get this:
Col1 Col2 Col3
A | H | 6
A | I | 6
A | J | 6
B | J | 9
B | K | 9
C | L | 6
This is what I had so far:
SELECT Col1, Col2, SUM(Col3)
FROM Table1
GROUP BY Col1, Col2;
Thanks

Just to elaborate on my comment.
You can use the window function sum() over()
Example
Declare #YourTable Table ([Col1] varchar(50),[Col2] varchar(50),[Col3] int) Insert Into #YourTable Values
('A','H',1)
,('A','I',2)
,('A','J',3)
,('B','J',4)
,('B','K',5)
,('C','L',6)
Select Col1
,Col2
,Col3 = sum(Col3) over (partition by Col1)
From #YourTable
Returns
Col1 Col2 Col3
A H 6
A I 6
A J 6
B J 9
B K 9
C L 6

Just as another way you can do this way also using join and SUM (Transact-SQL)
function.
create table TestTable (Col1 varchar(5)
, Col2 varchar(5)
, Col3 int)
insert into TestTable Values
('A', 'H', 1),
('A', 'I', 2),
('A', 'J', 3),
('B', 'J', 4),
('B', 'K', 5),
('C', 'L', 6)
SELECT tblA.Col1
,tblA.Col2
,tblB.Col3
FROM (
SELECT Col1
,Col2
FROM TestTable
) tblA
INNER JOIN (
SELECT Col1
,sum(Col3) AS Col3
FROM TestTable
GROUP BY Col1
) tblB ON tblA.Col1 = tblB.Col1
Live Demo

There are a number of ways to write data aggregation queries like this. Which to use depends on what your final results need to look like. Just to go over some basics, I’ll go over several methods here.
The simplest is to use a WHERE clause:
SELECT Col1, sum(Col3)
from MyTable
where Col1 = 'A'
This will produce a single row of data:
Col1 Col3
A | 6
To produce sums for all of the distinct values in ColA, you would use GROUP BY:
SELECT Col1, sum(Col3)
from MyTable
group by Col1
This will produce three rows of data:
Col1 Col3
A | 6
B | 9
C | 6
The above samples are pretty straightforward and basic SQL examples. It is actually a bit difficult to produce the result set from your example, where you include Col2 and show the summation, because Col2 is not part of the data aggregation. Several ways to do this:
Using a subquery:
SELECT
mt.Col1
,mt.Col2
,sub.SumCol3 Col3
from MyTable mt
inner join (select
Col1
,sum(Col3) SumCol3
from MyTable
group by Col1) sub
on sub.Col1 = mt.Col1
Using a common table expression:
WITH cteSub
as (select
Col1
,sum(Col3) SumCol3
from MyTable
group by Col1)
select
mt.Col1
,mt.Col2
,cteSub.SumCol3 Col3
from MyTable mt
inner join cteSub
on ctesub.Col1 = mt.Col1
And, perhaps the most obscure and obtuse, using aggregation fucntions with partitioning:
SELECT
Col1
,Col2
,sum(Col3) over (partition by Col1) Col3
from MyTable
Thorough and complete discussions of all the above tactics (better than anything I'd write) can be found online, by searching for "SQL" plus the appropriate term (aggregation, subquery, CTE, paritioning functions). Good luck!

Related

Import Flat Data with Multiple Delimiters

My imported flat file has been imported into SQL with comma delimiters.
An example of my text file looks like:
Location\Floor\Room,Date,Value
After import:
Column 1 | Column 2 | Column 3
Location\Floor\Room | Date | Value
I would like my table to look as follows:
Column 1 | Column 2 | Column 3 | Column 4 | Column 5
Location | Floor | Room | Date | Value
Are there any ways that I can achieve like above?
SSIS - SQL Server Integration Service can also be used for this use case.
What you basically need is a two step transformation process where you load your input file in an interim table which allows a comma as a standard delimiter.
Once, you have you interim table available and with records (including the ones with backslash) you should then use a Derived Column Task in SSIS and create a custom logic based on SUBSTRING() and FINDSTRING() methods to create new columns to split the string based on backslash
I'm thinking of this solution.
select t2.col1
, t2.col2
, substring(t2.col3, charindex('\', t2.col3, len(t2.col2) + len(t2.col1)) + 1, len(t2.col3) - (len(t2.col2) + len(t2.col1) + 2))
, t2.[value], t2.[date]
from (
select t1.col1, substring(t1.main, len(t1.col1) + 2
, charindex('\', t1.main, len(t1.col1) + 2) - (len(t1.col1) + 2)) as col2
, t1.main as col3, t1.[value], t1.[date]
from (
select substring(column1, 0, charindex('\', column1)) as col1, column1 as main, [date], [value]
from tableA
) t1
) t2
This working for MAX 5 value of undivided string
val1\val2\val3\val4\val5
select [1] as col1, [2] as col2, [3] as col3, [4] as col4, [5] as col5, col2 as col7, col3 as col8
from (
select ROW_NUMBER() over(partition by col1 order by col1) rowid, col1, col2, col3, value
from <MyTable>
cross apply string_split(s.col1, '\')
) as tbl
pivot (
max(value) for rowid in ([1], [2], [3], [4], [5])
) as pv

How do I mask certain values and maintain uniqueness while using a case...when statement in MS SQL Server?

Say I have a column in a SQL Server table with the following entries:
+----+-----+
| ids| col1|
+----+-----+
|4 | a |
|4 | b |
|4 | a |
|4 | b |
|5 | a |
+----+-----+
I'd like to mask the ids column given that col1 = a. However, I'd also like to maintain the uniqueness of the ids masking, so the result would look as follows:
+----+-----+
| ids| col1|
+----+-----+
|XX | a |
|4 | b |
|XX | a |
|4 | b |
|YY | a |
+----+-----+
I have used a case...when with SHA2_256 algorithm to maintain uniqueness as in this post:
How do I mask/encrypt data in a view but maintain uniqueness of values?
,but then the resulting mask are 'Chinese-looking' characters that seem machine-unreadable. Is there a better way?
Would numbers be OK?
First, create and populate sample table (Please save us this step in your future questions)
DECLARE #T AS TABLE
(
ids int,
col1 char(1)
)
INSERT INTO #T VALUES
(4, 'a'),
(4, 'b'),
(4, 'a'),
(4, 'b'),
(5, 'a')
The query:
SELECT CASE WHEN col1 = 'a' THEN CHECKSUM(CAST(Ids as varchar(11))) ELSE ids END As ids,
col1
FROM #T
Results:
ids col1
136 a
4 b
136 a
4 b
137 a
Your suggested masked output values of XX and YY are perhaps misleading, because if you have millions of id values in your table, then two letters won't be able to uniquely/randomly cover all data. One option here might be to use NEWID() to generate a unique UUID for each id group:
WITH cte AS (
SELECT DISTINCT id, NEWID() AS mask
FROM yourTable
)
SELECT t2.mask, t1.col
FROM yourTable t1
INNER JOIN cte t2
ON t1.id = t2.id;
If you don't want to show the entire UUID, because it is too long, then you may instead show a substring of it, e.g. for just the first 5 characters:
SELECT LEFT(t2.mask, 5) AS mask, t1.col
FROM yourTable t1
INNER JOIN cte t2
ON t1.id = t2.id;
But keep in mind that the shorter you make the UUID being displayed, the greater the probability that two different id groups would be rendered with the same mask.
Try this query (Replace #test with your actual table name), In future case can come where you need to include other characters too in addition to just 'a'.
Below List table will help you with that.
create table #list
(
col1 varchar(1)
)
insert into #list values ('a')
select case when isnull(b.col1,'0')<>'0' then a.col1+cast ( Dense_rank() OVER(PARTITION BY a.col1 ORDER BY a.col1 ASC) as varchar(max)) else cast(a.ids as varchar(max)) end as ids,
a.col1 from #test a
left join #list b
on a.col1 =b.col1
Out Put
So this is what I ended up doing. Using the example provided by #Zohar Peled, but making the adjustment that the ids column is a varchar, we can make the table as follows:
DECLARE #T AS TABLE
(
ids varchar(150),
col1 char(1)
)
INSERT INTO #T VALUES
(4, 'a'),
(4, 'b'),
(4, 'a'),
(4, 'b'),
(5, 'a')
and then do the following:
SELECT CASE WHEN col1 = 'a' THEN CONVERT(VARCHAR(150),HashBytes('SHA2_256', ids),2) ELSE ids END As ids,
col1
FROM #T
This more closely resembles the initial solution in the link, I believe.
You can hide IDs also by integer numbers (don't know if it's secure enough in your case)
CREATE TABLE #t (ids int, col1 char(1));
INSERT INTO #t VALUES
(4, 'a'),
(4, 'b'),
(4, 'a'),
(4, 'b'),
(5, 'a');
Query
SELECT ISNULL(t2.num, t1.ids) AS ids, t1.col1
FROM
#t t1 LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (ORDER BY ids, col1) + (SELECT MAX(ids) FROM #t) AS num,
ids, col1
FROM #t
WHERE col1 = 'a'
GROUP BY ids, col1) t2
ON t1.ids = t2.ids AND t1.col1 = t2.col1;
Result
ids col1
-------------------- ----
6 a
4 b
6 a
4 b
7 a

Display only the top records according to there stage in SQL Server

I have a scenario where table has
Record_id,Record_Stage,Other_Column
1,A,Text1
1,B,Text2
1,C,Text3
1,D,Text4
2,A,SText1
2,B,SText2
My output should be based on record_id
1)the record with stage D for record_id 1
2)record_id 2 with Stage B is displayed as there are no Stage C and Stage D
O/p
1,D,Text4
2,B,SText2
I am manipulating this case in a SQL Server view,It would be great help If someone can help me in this.
it is easy with row_number()
select *
from (
select *, rn = row_number() over (partition by Record_id
order by Record_Stage desc)
from yourtable
) d
where d.rn = 1
Here is a solution:
CREATE TABLE T(
ID INT,
Stage VARCHAR(10),
Other VARCHAR(45)
);
INSERT INTO T VALUES
(1, 'A', 'Text1'),
(1, 'B', 'Text2'),
(1, 'C', 'Text3'),
(1, 'D', 'Text4'),
(2, 'A', 'SText1'),
(2, 'B', 'SText2');
WITH CTE AS
(
SELECT MAX(T.ID) AS ID,
MAX(T.Stage) AS Stage
FROM T
GROUP BY ID
)
SELECT T.*
FROM T INNER JOIN CTE ON T.ID = CTE.ID AND T.Stage = CTE.Stage;
Results:
+----+----+-------+--------+
| | ID | Stage | Other |
+----+----+-------+--------+
| 1 | 1 | D | Text4 |
| 2 | 2 | B | SText2 |
+----+----+-------+--------+

SQL concat integers and group them with from to

I'm new to stackoverflow, but I'm stuck with my query.
I've got a SQL table whitch looks like this:
+-------+------------+
| col1 | col2 |
+-------+------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 6 |
+-------+------------+
I don't know how to get the following resultset:
+-------+------------+
| col1 |SerialNumber|
+-------|------------+
| 1 | 1 to 4, 6 |
+--------------------+
With XML Path i can get this:
+-------+------------+
| col1 |SerialNumber|
+-------|------------+
| 1 | 1,2,3,4,6, |
+--------------------+
This is my query for it:
SELECT DISTINCT O.Col1,
(SELECT CAST(P.Col2 As varchar(5)) + ',' AS [text()]
FROM #Test P
WHERE P.Col1 = O.Col1
ORDER BY P.Col1
FOR XML PATH('')) AS 'SerialNumber'
FROM #Test O
I'm sorry if my question was already asked. I'm also lacking Keywords for this topic.
Test data:
CREATE TABLE t(col1 int,col2 int)
INSERT t(col1,col2)VALUES
(1,1),(1,2),(1,3),(1,4),
(1,6),(1,7),(1,8),(1,9),
(1,11),
(1,13),
(2,3),(2,4),(2,5),
(2,7)
A variant with FOR XML PATH:
SELECT col1,col2,outVal
INTO #temp
FROM
(
SELECT
col1,
col2,
outVal,
ISNULL(LEAD(outVal)OVER(PARTITION BY col1 ORDER BY col2),'') nextOutVal
FROM
(
SELECT
col1,
col2,
CASE
WHEN col2-1=LAG(col2)OVER(PARTITION BY col1 ORDER BY col2) AND col2+1=LEAD(col2)OVER(PARTITION BY col1 ORDER BY col2)
THEN 'to'
ELSE CAST(col2 AS varchar(10))
END outVal
FROM t
) q
) q
WHERE outVal<>nextOutVal
ORDER BY col1,col2
SELECT
t1.col1,
REPLACE(STUFF(
(
SELECT ','+t2.outVal
FROM #temp t2
WHERE t2.col1=t1.col1
ORDER BY t2.col2
FOR XML PATH('')
),1,1,''),',to,',' to ') SerialNumber
FROM (SELECT DISTINCT col1 FROM #temp) t1
DROP TABLE #temp
A variant for SQL Server 2017 (with STRING_AGG):
SELECT
col1,
REPLACE(STRING_AGG(outVal,',')WITHIN GROUP(ORDER BY col2),',to,',' to ')
FROM
(
SELECT
col1,
col2,
outVal,
ISNULL(LEAD(outVal)OVER(PARTITION BY col1 ORDER BY col2),'') nextOutVal
FROM
(
SELECT
col1,
col2,
CASE
WHEN col2-1=LAG(col2)OVER(PARTITION BY col1 ORDER BY col2) AND col2+1=LEAD(col2)OVER(PARTITION BY col1 ORDER BY col2)
THEN 'to'
ELSE CAST(col2 AS varchar(10))
END outVal
FROM t
) q
) q
WHERE outVal<>nextOutVal
GROUP BY col1
Result:
col1 SerialNumber
1 1 to 4,6 to 9,11,13
2 3 to 5,7
Solution:
Another possible approach using CTE for start and end values for each sequence and group concatenation:
T-SQL:
-- Table creation
CREATE TABLE #ValuesTable (
Col1 int,
Col2 int
)
INSERT INTO #ValuesTable VALUES (1, 1)
INSERT INTO #ValuesTable VALUES (1, 2)
INSERT INTO #ValuesTable VALUES (1, 3)
INSERT INTO #ValuesTable VALUES (1, 4)
INSERT INTO #ValuesTable VALUES (1, 6)
INSERT INTO #ValuesTable VALUES (2, 1)
INSERT INTO #ValuesTable VALUES (2, 2)
INSERT INTO #ValuesTable VALUES (2, 3)
INSERT INTO #ValuesTable VALUES (2, 4)
INSERT INTO #ValuesTable VALUES (2, 6)
INSERT INTO #ValuesTable VALUES (2, 7);
INSERT INTO #ValuesTable VALUES (2, 10);
-- Find sequences
WITH
TableStart AS (
SELECT t.Col1, t.Col2, ROW_NUMBER() OVER (ORDER BY t.Col1, t.Col2) AS RN
FROM #ValuesTable t
LEFT JOIN #ValuesTable b ON (t.Col1 = b.Col1) AND (t.Col2 = b.Col2 + 1)
WHERE (b.Col2 IS NULL)
),
TableEnd AS (
SELECT t.Col1, t.Col2, ROW_NUMBER() OVER (ORDER BY t.Col1, t.Col2) AS RN
FROM #ValuesTable t
LEFT JOIN #ValuesTable b ON (t.Col1 = b.Col1) AND (t.Col2 = b.Col2 - 1)
WHERE (b.Col2 IS NULL)
),
TableSequences AS (
SELECT
TableStart.Col1 AS Col1,
CASE
WHEN (TableStart.Col2 <> TableEnd.Col2) THEN CONVERT(nvarchar(max), TableStart.Col2) + N' to ' + CONVERT(nvarchar(max), TableEnd.Col2)
ELSE CONVERT(nvarchar(max), TableStart.Col2)
END AS Sequence
FROM TableStart
LEFT JOIN TableEnd ON (TableStart.RN = TableEnd.RN)
)
-- Select with group concatenation
SELECT
t1.Col1,
(
SELECT t2.Sequence + N', '
FROM TableSequences t2
WHERE t2.Col1 = t1.Col1
ORDER BY t2.Col1
FOR XML PATH('')
) SerialNumber
FROM (SELECT DISTINCT Col1 FROM TableSequences) t1
Output:
Col1 SerialNumber
1 1 to 4, 6,
2 1 to 4, 6 to 7, 10,
Notes:
Tested on SQL Server 2005, 2012, 2016.

Select all second highest values only from temp table

Using T-SQL (SQL Server 2008 R2), I'm trying to list only the rows with the second highest value in a particular column from a temp table and then place the results into a new temp table. The PK is the ID, which can have increasing version numbers and then unique codes.
Example:
ID | Name| Version | Code
------------------------
1 | A | 1 | 10
1 | A | 2 | 20
1 | A | 3 | NULL
2 | B | 1 | 40
2 | B | 2 | 50
2 | C | 1 | 60
The desired outcome of the query is
ID | Version | Code
------------------------
1 | 2 | 20
2 | 1 | 40
To achieve this I need the below query to be adapted to pull the second highest value as long as the result gives a version number greater than 1. These results come from a temp table and will then be placed into a final results temp table. EDIT: Please note this will be applied over 33000 rows of data so I would prefer something neater than INSERT VALUES. Thanks.
Current query:
SELECT
ID
,Version
,Code
INTO
#table2
FROM
#table1
SELECT *
FROM #table2
WHERE Version > 1
ORDER BY ID asc
DROP TABLE #table1
DROP TABLE #table2
I have tried running the where clause WHERE Version < (SELECT MAX(VERSION) FROM #TABLE 2) but this has no effect, presumably due to the unique code values and in any case wouldn't work where I have more than 3 Versions.
Ideas would be gratefully received.
Thanks in advance.
i HAVE TEST THE BELOW CODE AND IT IS GIVING OUTPUT AS PER The YOUR desired outcome of the query is
SELECT ID,Name,[Version],Code
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY [Version] DESC) AS RNK,*
FROM
(
SELECT 1 ID, 'A' Name ,1 [Version] ,10 Code
UNION ALL
SELECT 1, 'A', 2 ,20
UNION ALL
SELECT 1, 'A', 3 ,30
UNION ALL
SELECT 1, 'A', 4 ,NULL
UNION ALL
SELECT 2, 'B', 1 ,40
UNION ALL
SELECT 2, 'B', 2 ,50
UNION ALL
SELECT 2, 'C', 1 ,60
)B
)BASE
WHERE RNK =2
If your primary key is only ID, you have duplicate rows. So I assume your primary key is something else, for example ID, Version, Name. You have two rows with the same ID and same Version, what kind of rule do you want to apply on this ? Lowest number ?
I made an example that does kind of what you want:
First declare the necessary tables:
declare #table1 table (
Id int,
Name nvarchar(20),
[Version] int,
Code int
)
insert into #table1 values (1,'A',1,10),(1,'A',2,20),(1,'A',3,30),(1,'A',4,NULL)
,(2,'B',1,40),(2,'B',2,50),(2,'C',1,60);
And then the query to get the results:
with HighestVersions (Id, MaxVersion) As
(
select Id, max(version) from #table1 group by Id
)
select
t1.Id,
t1.[Version],
min(t1.Code) as Code
from
#table1 t1
inner join
HighestVersions hv
on
hv.Id = t1.Id
and (hv.MaxVersion-1) = t1.[Version]
group by
t1.Id
,t1.[Version]
I had to do a little dirty trick with the outermost select, this is because of the duplicate 'Id' and 'Version'. Else you would have gotten two rows with ID = 2, Version = 1
If you want to remove the NULL value you can change the WITH part (according to your last edit):
with HighestVersions (Id, MaxVersion) As
(
select Id, max(version) from #table1 where Code is not null group by Id
)
Try this:
DECLARE #List TABLE (ID int, Name char(1), Version int, Code int NULL)
INSERT INTO #List
VALUES
(1, 'A', 1, 10),
(1, 'A', 2, 20),
(1, 'A', 3, 30),
(1, 'A', 4, NULL),
(2, 'B', 1, 40),
(2, 'B', 2, 50),
(2, 'C', 1, 60)
SELECT
ID, Name, Version, Code
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY ID, Name ORDER BY Version DESC) Rn
FROM #List
) a
WHERE
a.Rn = 2

Resources