Move Data from One Table To Another Table With New Layout - sql-server

Please note, I asked this previously, but realized that I left out some very important information and felt it better to remove the original question and post a new one. My apologies to all........
I have a table that has the following columns:
ID
Name
2010/Jan
2010/Jan_pct
2010/Feb
2010/Feb_pct
.....
.....
2017/Nov
2017/Nov_pct
And then a column like that for every month/year combination to the present (hopefully that makes sense). Please note though: it is NOT a given that every month / year combination is present. There might be a gap or a missing month/year. For instance, I know 2017/Jan, 2017/Feb are missing and there could be any number missing. I just didn't want to list out every column but give a general idea of the layout.
Added to that, there isn't one row in the database, but can have multiple rows for a Name / ID and the ID is not an identity, but can be any number.
To give an idea of how the table looks, here is some sample data (mind you, I only added two of the Year/Mon combinations, but there are dozens that do not necessarily have one for each month/year)
ID Name 2010/Jan 2010/Jan_Pct 2010/Feb 2010/Feb_Pct
10 Gold 81 0.00123 79 0.01242
134 Silver 82 0 75 0.21291
678 Iron 987 1.53252 1056 2.9897
As you can imagine, this isn't the best design as you need to add a new two new columns every month. So I created a new table with the following definitions
ID - float,
Name - varchar(255),
Month - varchar(3),
Year - int,
Value - int,
Value_Pct - float
I am trying to figure out how to move the existing data from the old table into the new table design.
Any help would be greatly appreciated.....

You can work with the unpivot operator to get what you need, with one added step of combining extra rows returned by the unpivot operator.
Sample Data Setup:
Given that the destination table has a value column of int datatype and the value_pct of float data type, I followed the same datatype guidance for the existing data table.
create table dbo.data_table
(
ID float not null
, [Name] varchar(255) not null
, [2010/Jan] int null
, [2010/Jan_Pct] float null
, [2010/Feb] int null
, [2010/Feb_Pct] float null
)
insert into dbo.data_table
values (10, 'Gold', 81, 0.00123, 79, 0.01242)
, (134, 'Sliver', 82, 0, 75, 0.21291)
, (678, 'Iron', 987, 1.53252, 1056, 2.9897)
Answer:
--combine what was the "value" row and the "value_pct" row
--into a single row via summation
select a.ID
, a.[Name]
, a.[Month]
, a.[Year]
, sum(a.value) as value
, sum(a.value_pct) as value_pct
from (
--get the data out of the unpivot with one row for value
--and one row for value_pct.
select post.ID
, post.[Name]
, substring(post.col_nm, 6, 3) as [Month]
, cast(substring(post.col_nm, 1, 4) as int) as [Year]
, iif(charindex('pct', post.col_nm, 0) = 0, post.value_prelim, null) as value
, iif(charindex('pct', post.col_nm, 0) > 0, post.value_prelim, null) as value_pct
from (
--cast the columns that are currently INT as Float so that
--all data points can fit in one common data type (will separate back out later)
select db.ID
, db.[Name]
, cast(db.[2010/Jan] as float) as [2010/Jan]
, db.[2010/Jan_Pct]
, cast(db.[2010/Feb] as float) as [2010/Feb]
, db.[2010/Feb_Pct]
from dbo.data_table as db
) as pre
unpivot (value_prelim for col_nm in (
[2010/Jan]
, [2010/Jan_Pct]
, [2010/Feb]
, [2010/Feb_Pct]
--List all the rest of the column names here
)
) as post
) as a
group by a.ID
, a.[Name]
, a.[Month]
, a.[Year]
Final Output:

Related

Data Vault 2 - Hash diff and recurring data changes

I am having issues retrieving the latest value in a satellite table when some data is changed back to a former value.
The database is Snowflake.
As per Data Vault 2.0, I am currently using the hash diff function to assess whether to insert a new record in a satellite table, like this:
INSERT ALL
WHEN (SELECT COUNT(*) FROM SAT_ACCOUNT_DETAILS AD WHERE AD.MD5_HUB_ACCOUNT = MD5_Account AND AD.HASH_DIFF = AccHashDiff) = 0
THEN
INTO SAT_ACCOUNT_DETAILS (MD5_HUB_ACCOUNT
, HASH_DIFF
, ACCOUNT_CODE
, DESCRIPTION
, SOME_DETAIL
, LDTS)
VALUES (MD5_AE_Account
, AccHashDiff
, AccountCode
, Description
, SomeDetail
, LoadDTS)
SELECT DISTINCT
MD5(AccountId) As MD5_Account
, MD5(UPPER(COALESCE(TO_VARCHAR(AccountCode), '')
|| '^' || COALESCE(TO_VARCHAR(Description), '')
|| '^' || COALESCE(TO_VARCHAR(SomeDetail), '')
)) AS AccHashDiff
, AccountCode
, Description
, SomeDetail
, LoadDTS
FROM source_table;
The first time, a new record with AccountCode = '100000' and SomeDetail = 'ABC' is added:
MD5_HUB_ACCOUNT
HASH_DIFF
ACCOUNT_CODE
DESCRIPTION
SOME_DETAIL
LDTS
c81e72...
8d9d43...
100000
An Account
ABC
2021-04-08 10:00
An hour later, an update changes the value of SomeDetail to 'DEF', this is the resulting table:
MD5_HUB_ACCOUNT
HASH_DIFF
ACCOUNT_CODE
DESCRIPTION
SOME_DETAIL
LDTS
c81e72...
8d9d43...
100000
An Account
ABC
2021-04-08 10:00
c81e72...
a458b2...
100000
An Account
DEF
2021-04-08 11:00
A third update sets the value of SomeDetail back to 'ABC', but the record is not inserted in the satellite table, because the value of the hash diff is the same as the first inserted record (i.e. 8d9d43...).
If I query which is the latest record in the satellite table, the LDTS column tells me it's the one with 'DEF' which is not the desired result.
Instead, I should have a record with SomeDetail = 'ABC' and LDTS = '2021-04-08 12:00'.
What is the correct approach to this? If I add LoadDTS to the hash diff, a new record will be created each time an update is pushed, which is not the desired result either.
As you (and also the standard) mentionned, you need to compare to the last effective record.
I'm not an expert with Snowflake, but it might look like this :
INSERT ALL
WHEN (SELECT COUNT(*) FROM SAT_ACCOUNT_DETAILS AD WHERE AD.MD5_HUB_ACCOUNT = MD5_Account AND AD.HASH_DIFF = AccHashDiff AND AD.LDTS = (SELECT MAX(LDTS) FROM SAT_ACCOUNT_DETAILS MAD WHERE MAD.MD5_HUB_ACCOUNT = AD.MD5_HUB_ACCOUNT)) = 0
THEN ....
By adding "AD.LDTS = (SELECT MAX(LDTS) FROM....." to the query, you make sure you test against the latest data and not historical data

Is it possible to use STRING_SPLIT in an ORDER BY clause?

I am trying to order values that are going to be inserted into another table based on their order value from a secondary table. We are migrating old data to a new table with a slightly different structure.
I thought I would be able to accomplish this by using the string_split function but I'm not very skilled with SQL and so I am running into some issues.
Here is what I have:
UPDATE lse
SET Options = a.Options
FROM
dbo.LessonStepElement as lse
CROSS JOIN
(
SELECT
tbl1.*
tbl2.Options,
tbl2.QuestionId
FROM
dbo.TrainingQuestionAnswer as tbl1
JOIN (
SELECT
string_agg((CASE
WHEN tqa.CorrectAnswer = 1 THEN REPLACE(tqa.AnswerText, tqa.AnswerText, '*' + tqa.AnswerText)
ELSE tqa.AnswerText
END),
char(10)) as Options,
tq.Id as QuestionId
FROM
dbo.TrainingQuestionAnswer as tqa
INNER JOIN
dbo.TrainingQuestion as tq
on tq.Id = tqa.TrainingQuestionId
INNER JOIN
dbo.Training as t
on t.Id = tq.TrainingId
WHERE
t.IsDeleted = 0
and tq.IsDeleted = 0
and tqa.IsDeleted = 0
GROUP BY
tq.Id,
tqa.AnswerDisplayOrder
ORDER BY
(SELECT [Value] FROM STRING_SPLIT((SELECT AnswerDisplayOrder FROM dbo.TrainingQuestion WHERE Id = tmq.Id), ','))
) as tbl2
on tbl1.TrainingQuestionId = tbl2.QuestionId
) a
WHERE
a.TrainingQuestionId = lse.TrainingQuestionId
The AnswerDisplayOrder that I am using is just a nvarchar comma separated list of the ids for the answers to the question.
Here is an example:
I have 3 rows in the TrainingQuestionAnswer table that look like the following.
ID TrainingQuestionId AnswerText
-------------------------------------------
215 100 No
218 100 Yes
220 100 I'm not sure
I have 1 row in the TrainingQuestion table that looks like the following.
ID AnswerDisplayOrder
--------------------------
100 "218,215,220"
Now what I am trying to do is when I update the row in the new table with all of the answers combined, the answers will need to be in the correct order which is dependent on the AnswerDisplayOrder in the TrainingQuestion table. So in essence, the new table would have a row that would look similar to the following.
ID Options
--------------
193 "Yes No I'm not sure"
I'm aware that the way I'm trying to do it might not be even possible at all. I am still learning and would just love some advice or guidance on how to make this work. I also know that string_split does not guarantee order. I'm open to other suggestions that do guarantee the order as well.
I simplified the issue in the question to the following approach, that is a possible solution to your problem. If you want to get the results from the question, you need a splitter, that returns the substrings and the positions of the substrings. STRING_SPLIT() is available from SQL Server 2016, but is not an option here, because (as is mentioned in the documentation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try to use a JSON based approach, with a small string manipulation, that transforms answers IDs into a valid JSON array (218,215,220 into [218,215,220]). After that you can easily parse this JSON array with OPENJSON() and default schema. The result is a table, with columns key, value and type and the key column (again from the documentation) holds the index of the element in the specified array.
Tables:
CREATE TABLE TrainingQuestionId (
ID int,
TrainingQuestionId int,
AnswerText varchar(1000)
)
INSERT INTO TrainingQuestionId
(ID, TrainingQuestionId, AnswerText)
VALUES
(215, 100, 'No'),
(218, 100, 'Yes'),
(220, 100, 'I''m not sure')
CREATE TABLE TrainingQuestion (
ID int,
AnswerDisplayOrder varchar(1000)
)
INSERT INTO TrainingQuestion
(ID, AnswerDisplayOrder)
VALUES
(100, '218,215,220')
Statement:
SELECT tq.ID, oa.Options
FROM TrainingQuestion tq
OUTER APPLY (
SELECT STRING_AGG(tqi.AnswerText, ' ') WITHIN GROUP (ORDER BY CONVERT(int, j.[key])) AS Options
FROM OPENJSON(CONCAT('[', tq.AnswerDisplayOrder, ']')) j
LEFT JOIN TrainingQuestionId tqi ON TRY_CONVERT(int, j.[value]) = tqi.ID
) oa
Result:
ID Options
100 Yes No I'm not sure
Notes: You need SQL Server 2017+ to use STRING_AGG(). For SQL Server 2016 you need a FOR XML to aggregate strings.
declare #TrainingQuestionAnswer table
(
ID int,
TrainingQuestionId int,
AnswerText varchar(20)
);
insert into #TrainingQuestionAnswer(ID, TrainingQuestionId, AnswerText)
values(215, 100, 'No'), (218, 100, 'Yes'), (220, 100, 'I''m not sure');
declare #TrainingQuestiontest table
(
testid int identity,
QuestionId int,
AnswerDisplayOrder varchar(200)
);
insert into #TrainingQuestiontest(QuestionId, AnswerDisplayOrder)
values(100, '218,215,220'), (100, '220,218,215'), (100, '215,218');
select *,
(
select string_agg(pci.AnswerText, '==') WITHIN GROUP ( ORDER BY pci.pos)
from
(
select a.AnswerText,
pos = charindex(concat(',', a.ID, ','), concat(',', q.AnswerDisplayOrder,','))
from #TrainingQuestionAnswer as a
where a.TrainingQuestionId = q.QuestionId
and charindex(concat(',', a.ID, ','), concat(',', q.AnswerDisplayOrder,',')) >= 1
) as pci
) as TestAnswerText
from #TrainingQuestiontest as q;

bring column name into row in sql server [duplicate]

This question already has answers here:
Unpivot with column name
(3 answers)
Closed 4 years ago.
I have a table with column column_name_1 , column_name_2 , column_name_3 and with one row values 0,0,1
column_name_1 , column_name_2 , column_name_3
--------------,----------------,---------------
0 , 0 , 1
I need output like
column_name_1 | 0
column_name_2 | 0
column_name_3 | 1
Is it possible?
I have checked for some unpivot example, thats not exactly my case.
Because I need Column name into column value and one row into column.
1.Unpivot with column name
Name, Maths, Science, English
Tilak, 90, 40, 60
Raj, 30, 20, 10
changed into
Name, Subject, Marks
Tilak, Maths, 90
Tilak, Science, 40
Tilak, English, 60
Clearly there is a view, Name column remains its position as it is.
2.SQL Query for generating matrix like output querying related table in SQL Server
Above link also have Customer Name column which is remains same as it is.
But in my case input and output, both not have any same position.
So if still it can be achievable through pivot, Pls help with the code.
Clearly UNPIVOT would be more performant, but if you need a dynamic approach without actually using dynamic SQL
Example
Select C.*
From YourTable A
Cross Apply ( values (cast((Select A.* for XML RAW) as xml))) B(XMLData)
Cross Apply (
Select Item = a.value('local-name(.)','varchar(100)')
,Value = a.value('.','varchar(max)')
From B.XMLData.nodes('/row') as C1(n)
Cross Apply C1.n.nodes('./#*') as C2(a)
Where a.value('local-name(.)','varchar(100)') not in ('Columns','ToExclude')
) C
Returns
Item Value
Column_name_1 0
column_name_2 0
column_name_3 1
You want APPLY :
SELECT tt.cols, tt.colsvalue
FROM table t CROSS APPLY
( VALUES ([column_name_1], 'column_name_1'),
([column_name_2], 'column_name_2'),
([column_name_3], 'column_name_3')
) tt (colsvalue, cols);

TSQL - De-duplication report - grouping

So I'm trying to create a report that ranks a duplicate record, the idea behind this is that the customer wants to merge a whole lot of duplicate records that came about from a migration.
I need the ranking so that my report can show which record should be the "main" record, i.e. the record that will have missing data pulled into it.
The duplicate definition is pretty simple:
If the email addresses are the same then it is always a duplicate, if
the emails do not match, then the first name, surname, and mobile must
match.
The ranking will be based on a whole bunch of columns in the table, so:
email address isn't NULL = 50
phone number isn't NULL = 20
etc.. whichever gets the highest number in the duplicate group becomes the main record. This is where I am having issues, I can't seem to find a way to get an incremental number for each duplicate set. This is some of the code I have so far:
( I took out some of the rank columns in the temp table and CTE expression to shorten it )
DECLARE #tmp_Duplicates TABLE (
tmp_personID INT
, tmp_Firstname NVARCHAR(100)
, tmp_Surname NVARCHAR(100)
, tmp_HomeEmail NVARCHAR(300)
, tmp_MobileNumber NVARCHAR(100)
--- Ratings
, tmp_HomeEmail_Rating INT
--- Groupings
, tmp_GroupNumber INT
)
;WITH cteDupes AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personID DESC) AS RND,
ROW_NUMBER() OVER(PARTITION BY personHomeEmail ORDER BY personId) AS RNA,
p.personID, p.PersonFirstName, p.PersonSurname, p.PersonHomeEMail
, personMobileTelephone
FROM tblCandidate c INNER JOIN tblPerson p ON c.candidateID = p.personID
)
INSERT INTO #tmp_Duplicates
SELECT PersonID, PersonFirstName, PersonSurname, PersonHomeEMail, personMobileTelephone
, 10, RND
FROM cteDupes
WHERE RNA + RND > 2
ORDER BY personID, PersonFirstName, PersonSurname
SELECT * FROM #tmp_Duplicates
This gives me the results I want, but the group number isn't showing how I need it:
What I need is for each group to be an incremental value:

How do I compare two rows from a SQL database table based on DateTime within 3 seconds?

I have a table of DetailRecords containing records that seem to be "duplicates" of other records, but they have a unique primary key [ID]. I would like to delete these "duplicates" from the DetailRecords table and keep the record with the longest/highest Duration. I can tell that they are linked records because their DateTime field is within 3 seconds of another row's DateTime field and the Duration is within 2 seconds of one another. Other data in the row will also be duplicated exactly, such as Number, Rate, or AccountID, but this could be the same for the data that is not "duplicate" or related.
CREATE TABLE #DetailRecords (
[AccountID] INT NOT NULL,
[ID] VARCHAR(100) NULL,
[DateTime] VARCHAR(100) NULL,
[Duration] INT NULL,
[Number] VARCHAR(200) NULL,
[Rate] DECIMAL(8,6) NULL
);
I know that I will most likely have to perform a self join on the table, but how can I find two rows that are similar within a DateTime range of plus or minus 3 seconds, instead of just exactly the same?
I am having the same trouble with the Duration within a range of plus or minus 2 seconds.
The key is taking the absolute value of the difference between the dates and durations. I don't know SQL server, but here's how I'd do it in SQLite. The technique should be the same, only the specific function names will be different.
SELECT a.id, b.id
FROM DetailRecords a
JOIN DetailRecords b
ON a.id > b.id
WHERE abs(strftime("%s", a.DateTime) - strftime("%s", b.DateTime)) <= 3
AND abs(a.duration - b.duration) <= 2
Taking the absolute value of the difference covers the "plus or minus" part of the range. The self join is on a.id > b.id because a.id = b.id would duplicate every pair.
Given the entries...
ID|DateTime |Duration
1 |2014-01-26T12:00:00|5
2 |2014-01-26T12:00:01|6
3 |2014-01-26T12:00:06|6
4 |2014-01-26T12:00:03|11
5 |2014-01-26T12:00:02|10
6 |2014-01-26T12:00:01|6
I get the pairs...
5|4
2|1
6|1
6|2
And you should really store those dates as DateTime types if you can.
You could use a self-referential CTE and compare the DateTime fields.
;WITH CTE AS (
SELECT AccountID,
ID,
DateTime,
rn = ROW_NUMBER() OVER (PARTITION BY AccountID, ID, <insert any other matching keys> ORDER BY AccountID)
FROM table
)
SELECT earliestAccountID = c1.AccountID,
earliestDateTime = c1.DateTime,
recentDateTime = c2.DateTime,
recentAccountID = c2.AccountID
FROM cte c1
INNER JOIN cte c2
ON c1.rn = 1 AND c2.rn = 2 AND c1.DateTime <> c2.DateTime
Edit
I made several assumptions about the data set, so this may not be as relevant as you need. If you're simply looking for difference between possible duplicates, specifically DateTime differences, this will work. However, this does not constrain to your date range, nor does it automatically assume what the DateTime column is used for or how it is set.

Resources