Import Flat Data with Multiple Delimiters - sql-server

My imported flat file has been imported into SQL with comma delimiters.
An example of my text file looks like:
Location\Floor\Room,Date,Value
After import:
Column 1 | Column 2 | Column 3
Location\Floor\Room | Date | Value
I would like my table to look as follows:
Column 1 | Column 2 | Column 3 | Column 4 | Column 5
Location | Floor | Room | Date | Value
Are there any ways that I can achieve like above?

SSIS - SQL Server Integration Service can also be used for this use case.
What you basically need is a two step transformation process where you load your input file in an interim table which allows a comma as a standard delimiter.
Once, you have you interim table available and with records (including the ones with backslash) you should then use a Derived Column Task in SSIS and create a custom logic based on SUBSTRING() and FINDSTRING() methods to create new columns to split the string based on backslash

I'm thinking of this solution.
select t2.col1
, t2.col2
, substring(t2.col3, charindex('\', t2.col3, len(t2.col2) + len(t2.col1)) + 1, len(t2.col3) - (len(t2.col2) + len(t2.col1) + 2))
, t2.[value], t2.[date]
from (
select t1.col1, substring(t1.main, len(t1.col1) + 2
, charindex('\', t1.main, len(t1.col1) + 2) - (len(t1.col1) + 2)) as col2
, t1.main as col3, t1.[value], t1.[date]
from (
select substring(column1, 0, charindex('\', column1)) as col1, column1 as main, [date], [value]
from tableA
) t1
) t2

This working for MAX 5 value of undivided string
val1\val2\val3\val4\val5
select [1] as col1, [2] as col2, [3] as col3, [4] as col4, [5] as col5, col2 as col7, col3 as col8
from (
select ROW_NUMBER() over(partition by col1 order by col1) rowid, col1, col2, col3, value
from <MyTable>
cross apply string_split(s.col1, '\')
) as tbl
pivot (
max(value) for rowid in ([1], [2], [3], [4], [5])
) as pv

Related

How to Sum particular values in a column in Microsoft SQL Server?

I'm kinda new to this and I have been stuck on this for a while now.
Example:
Col1 Col2 Col3
A | H | 1
A | I | 2
A | J | 3
B | J | 4
B | K | 5
C | L | 6
How can I sum 'Col3' but only for particular values. For example sum up the values in 'Col3' where the letters in 'Col1' are in the same row as 'Col3'. So A = 6 (1+2+3) and B = 9 (4+5) and C = 6
So you get this:
Col1 Col2 Col3
A | H | 6
A | I | 6
A | J | 6
B | J | 9
B | K | 9
C | L | 6
This is what I had so far:
SELECT Col1, Col2, SUM(Col3)
FROM Table1
GROUP BY Col1, Col2;
Thanks
Just to elaborate on my comment.
You can use the window function sum() over()
Example
Declare #YourTable Table ([Col1] varchar(50),[Col2] varchar(50),[Col3] int) Insert Into #YourTable Values
('A','H',1)
,('A','I',2)
,('A','J',3)
,('B','J',4)
,('B','K',5)
,('C','L',6)
Select Col1
,Col2
,Col3 = sum(Col3) over (partition by Col1)
From #YourTable
Returns
Col1 Col2 Col3
A H 6
A I 6
A J 6
B J 9
B K 9
C L 6
Just as another way you can do this way also using join and SUM (Transact-SQL)
function.
create table TestTable (Col1 varchar(5)
, Col2 varchar(5)
, Col3 int)
insert into TestTable Values
('A', 'H', 1),
('A', 'I', 2),
('A', 'J', 3),
('B', 'J', 4),
('B', 'K', 5),
('C', 'L', 6)
SELECT tblA.Col1
,tblA.Col2
,tblB.Col3
FROM (
SELECT Col1
,Col2
FROM TestTable
) tblA
INNER JOIN (
SELECT Col1
,sum(Col3) AS Col3
FROM TestTable
GROUP BY Col1
) tblB ON tblA.Col1 = tblB.Col1
Live Demo
There are a number of ways to write data aggregation queries like this. Which to use depends on what your final results need to look like. Just to go over some basics, I’ll go over several methods here.
The simplest is to use a WHERE clause:
SELECT Col1, sum(Col3)
from MyTable
where Col1 = 'A'
This will produce a single row of data:
Col1 Col3
A | 6
To produce sums for all of the distinct values in ColA, you would use GROUP BY:
SELECT Col1, sum(Col3)
from MyTable
group by Col1
This will produce three rows of data:
Col1 Col3
A | 6
B | 9
C | 6
The above samples are pretty straightforward and basic SQL examples. It is actually a bit difficult to produce the result set from your example, where you include Col2 and show the summation, because Col2 is not part of the data aggregation. Several ways to do this:
Using a subquery:
SELECT
mt.Col1
,mt.Col2
,sub.SumCol3 Col3
from MyTable mt
inner join (select
Col1
,sum(Col3) SumCol3
from MyTable
group by Col1) sub
on sub.Col1 = mt.Col1
Using a common table expression:
WITH cteSub
as (select
Col1
,sum(Col3) SumCol3
from MyTable
group by Col1)
select
mt.Col1
,mt.Col2
,cteSub.SumCol3 Col3
from MyTable mt
inner join cteSub
on ctesub.Col1 = mt.Col1
And, perhaps the most obscure and obtuse, using aggregation fucntions with partitioning:
SELECT
Col1
,Col2
,sum(Col3) over (partition by Col1) Col3
from MyTable
Thorough and complete discussions of all the above tactics (better than anything I'd write) can be found online, by searching for "SQL" plus the appropriate term (aggregation, subquery, CTE, paritioning functions). Good luck!

TSQL -- find records in table with multiples in one column, and at least one specific occurrence of a value in another column

TSQL -- find records in table with multiples in one column, and at least one specific occurrence of a value in another column
If I have:
ourDB.dbo.ourTable with col1 and col2 and col 3
I want to find occurrences such that
* A value of col1 occurs multiple times
* at least one instance of col2 = 'Val1' at least once.
TSQL -- find specific occurrence in table
So one would start with:
Select col1, col2, col3
FROM ourDB.dbo.ourTable
having count(col1) > 1
WHERE
(col2 = 'Val1')
Group by col1, col2, col3
Order by col1, col2, col3
This would find where col2 always occurs with 'Val1', but how is this generalized to Col2 having 'Val1' at least once ?
You must GROUP BY col1 only and with conditional aggregation you get all the col1 values you need:
SELECT * FROM ourDB.dbo.ourTable
WHERE col1 IN (
SELECT col1
FROM ourDB.dbo.ourTable
GROUP BY col1
HAVING COUNT(*) > 1 AND SUM(CASE WHEN col2 = 'Val1' THEN 1 END) > 0
)
ORDER BY col1, col2, col3
If you want only the rows with col2 = 'Val1':
SELECT * FROM ourDB.dbo.ourTable
WHERE
col2 = 'Val1'
AND
col1 IN (
SELECT col1
FROM ourDB.dbo.ourTable
GROUP BY col1
HAVING COUNT(*) > 1 AND SUM(CASE WHEN col2 = 'Val1' THEN 1 END) > 0
)
ORDER BY col1, col3
Alternate method below, originally from a colleague, I modified it some.
NOTE: Not necessarily better than other (accepted) answer, just different approach.
-- GENERIC_JOIN_WITH_SPECIFIC_COUNTS_Q_v_0.sql
USE [ourDB]
SELECT COUNT( distinct titleid) -- also could use COUNT(*)
from ourTable
WHERE
(
(1 <
(
select count( col1)
from ourTable
GROUP BY col1
HAVING (count(col1 > 1)
-- more than one col1 occurence
)
)
AND
(0 <
(
select count(*) from ourTable
WHERE( col2 = 'Val1' )
-- at least one occurence of col2 having 'Val1'
)
)
)

Concatenate every 2 rows in a single column table

This is the code I used to concatenate the first two rows which each have 6 characters each. Each row of data is set at 6.
My problem is it only returns the first concatenate needed and doesn't effect the next rows.
DECLARE #r_strands VARCHAR(MAX)
SET #r_strands=''
SELECT #r_strands= #r_strands + R_Strands
FROM rs_table
SELECT LEFT(#r_strands, +12) AS text
rs_table
r_strands
thedog
wentto
hisbed
wherei
placed
foodto
eatfor
supper
the above is an example of the single column table I want to concat every two rows
EX. result desired
r_strands
thedogwentto
hisbedwherei
placedfoodto
eatforsupper
You can use ROW_NUMBER to make pairs of two consecutive r_strands. Then use FOR XML PATH('') for concatenation:
If you have an Id to determine the order, you can replace the ORDER BY with Id instead of SELECT NULL.
SQL Fiddle
;WITH Cte AS(
SELECT *,
RN = (ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) + 1) / 2
FROM rs_table
)
SELECT
r_strands = (
SELECT '' + r_strands
FROM Cte
WHERE RN = c.RN
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'
)
FROM Cte c
GROUP BY RN
RESULT:
| r_strands |
|--------------|
| thedogwentto |
| hisbedwherei |
| placedfoodto |
| eatforsupper |
Supposing that you have an row number column (call it "id"), you can self join adjacent columns on the id, and concatenate the strings using the + string concatination operator:
select a.r_strands + b.r_strands
from rs_table a
join rs_table b
on a.id = b.id - 1
declare #t table (id int, r_strands varchar(100));
insert into #t values
(1, 'thedog'),
(2, 'wentto'),
(3, 'hisbed'),
(4, 'wherei'),
(5, 'placed'),
(6, 'foodto'),
(7, 'eatfor'),
(8, 'supper');
select r_strands + next_row
from
(
select *,
next_row = lead(r_strands) over(order by id),
is_mod = id % 2
from #t
) x
where is_mod != 0;

Consecutively calculate value between rows in a table

I'm trying to conduct a t-sql which is able to perform some calculation by taking the datetime value of the consecutive row subtract with the datetime value of its previous one.
For example:
Col1 Col2
-------------------------------------------------------------------
row 1: | ENTRY_DOOR_CLOSE | 2/12/2014 16:41:40:4140
row 2: | EXIT_DOOR_CLOSE_ENTRY_DOOR_OPEN | 3/12/2014 16:41:40:4140
row 3: | ENTRY_DOOR_CLOSE | 4/12/2014 16:41:40:4140
row 4: | EXIT_DOOR_CLOSE_ENTRY_DOOR_OPEN | 5/12/2014 16:41:40:4140
--------------------------------------------------------------------
Result:
Col1 Col2
---------------------------------------------------------------------
Row 1: | Diff | Row2.DateTime - Row1.DateTime
Row 2: | Diff | Row4.DateTime - Row3.DateTime
---------------------------------------------------------------------
Can anyone suggest an idea to resolve this?
In SQL Server 2012+, you can use the lead() function:
select 'Diff' as col1,
datediff(second, col2, col2_next) as diff_in_seconds
from (select t.*, lead(col2) over (order by col2) as col2_next
from table t
) t
where col1 = 'ENTRY_DOOR_CLOSE';
This assumes that the values are interleaved, as in the question.
Just figured out using CTE can solve my issue in case i'm not using SQL 2k12
;WITH valuedTable AS (
SELECT
ROW_NUMBER() OVER (PARTITION BY ScxxID, SxxID ORDER BY RecordTime) AS RowID
, ScxxID
, SxxID
, Exxx
, RecordTime
, ProcessName
FROM
database..xxx
WHERE
ProcessName = 'EXIT_DOOR_CLOSE_ENTRY_DOOR_OPEN'
OR
ProcessName = 'ENTRY_DOOR_CLOSE'
)
SELECT
valuedTable.ProcessName
, valuedTable.RecordTime
, nex.ProcessName
, nex.RecordTime
, DATEDIFF(S, valuedTable.RecordTime, nex.RecordTime) DIFF
FROM
valuedTable
INNER JOIN
( valuedTable nex ON nex.RowID = valuedTable.RowID + 1 )
AND
( nex.ProcessName = 'EXIT_DOOR_CLOSE_ENTRY_DOOR_OPEN' )
if you use sql server 2012 - use this one (your table is ordered, but this one is variabile too for non ordered table):
;WITH CTE AS (SELECT ROW_NUMBER() OVER (ORDER BY Col2) AS RN, Col1, Col2
FROM YourTable)
SELECT 'Diff' AS Col1, DATEDIFF(HOUR,a.Col2,x.Col2) AS Col2
FROM CTE a
CROSS APPLY (SELECT TOP 1 Col2 FROM CTE b WHERE Col1 = 'EXIT_DOOR_CLOSE_ENTRY_DOOR_OPEN' AND b.RN > a.RN ORDER BY Col2 ASC) x
WHERE Col1 = 'ENTRY_DOOR_CLOSE'
Hope this will help
--CREATE A TEMPORARY TABLE TO HOLD THE GIVEN DATA
DECLARE #Table AS TABLE
(
ID INT IDENTITY(1,1)
,Col1 VARCHAR(50)
,Col2 DATETIMEOFFSET(0)
)
INSERT INTO #Table (COl1,Col2) VALUES ('ENTRY_DOOR_CLOSE', '2014-12-02'),
('EXIT_DOOR_CLOSE_ENTRY_DOOR_OPEN' , '2014-12-03')
,('ENTRY_DOOR_CLOSE','2014-12-04')
,('EXIT_DOOR_CLOSE_ENTRY_DOOR_OPEN' , '2014-12-05')
--Using common table expression do the following
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY ID) AS RowID
,CONVERT(date,Col2) AS DateColumn
FROM #Table
)
SELECT
'DIF' AS Col1
,DATEDIFF(DD,SEcondCTE.DateColumn,FirstCTE.DateColumn)
FROM
CTE FirstCTE
INNER JOIN
CTE SEcondCTE
ON
FirstCTE.RowID = SEcondCTE.RowID + 1
WHERE FirstCTE.RowID % 2 =0

Update to get check_order

I have a table with values,
col1 col2 col3
1 0 ABA
1 0 ABB
1 0 ABC
2 0 BBA
2 0 BBB
2 0 BBC
I am trying to update the table to see the number of repetition of col1, in this case col1 has repeated 3 times so each update to col2 incremented by 1.
Required output after the update table
col1 col2 col3
1 1 ABA
1 2 ABB
1 3 ABC
2 1 BBA
2 2 BBB
2 3 BBC
A simple row_number() -ing should work
;with TMP as (
select *, row_number() over (partition by col1 order by col3) as RowNum
from tbl
)
update TMP set col2=RowNum
Where
tbl: is your table name
partition by col1: resets the row numbering for each col1 group
order by col3: is the basis for numbering within a col1 group
Assuming you are intending col3 to be in non-descending order, this should do it:
UPDATE MyTable
SET col2=(SELECT COUNT(*)
FROM MyTable AS T2
WHERE T2.col1=T1.col1 AND T2.col3<=T1.col3)
FROM MyTable AS T1
You will get duplicates in col2, if there are duplicates in col3 for a particular col1 value.
In case you are interested, here is a pretty verbose (and more expensive execution wise) solution using a ranking function. It has the same issue (i.e., the count gets repeated) for duplicates in col1/col3, as the previous solution:
UPDATE MyTable
SET col2=(
-- In the following query, DISTINCT merges rank dups caused by col3 dups
-- SELECT TOP(1) MyRank would also work.
SELECT DISTINCT MyRank
FROM (
SELECT col3,
DENSE_RANK() OVER (PARTITION BY col1 ORDER BY col3) AS MyRank
FROM MyTable
WHERE col1=UpdatedTable.col1
) As RankTable
WHERE RankTable.col3=UpdatedTable.col3)
FROM MyTable AS UpdatedTable

Resources