Related
I am trying to copy the value from one column to another within the same table, and struggling to work out the correct way to do this.
Ideally I would do this in within the code rather than the SQL, however I am dealing with a legacy application and as such updating the code (whilst a new version is already in development) is a no-no.
I have the following table (IncomingData)
id PK
SourceID int
QuestionnaireID int
Q1 varchar(500)
Q2 varchar(500)
..
..
Q50 varchar(500)
So we would have:
id SourceID QuestionnaireID Q1 Q2 Q3 Q4 Q5
1 5000 10 1 2 1 5 3
Data will occasionally come in(via a webservice) from particular SourceID's where the Q1(etc) needs moving to Q3 in order for it to be processed correctly by our system.
So to this end I have created the following mapping table
id SourceID QuestionnaireID SourceQ DestQ
1 5000 10 Q1 Q3
1 5000 10 Q2 Q4
1 5000 10 Q3 Q5
1 5000 10 Q4 Q7
1 5000 10 Q5 Q8
My Idea is to create a temporary table matching IncomingData copy the values to the correct columns within this temporary table and then update IncomingData for that row with the values within the correct column.
Is there an easy way to do this?, the only thing I can see is to create numerous Dynamic SQL statements within this function and execute them. This concerns me because of injection, however the sql to insert into the temp table would only be created from the Mapping Table, so external variables would not be passed to the Dynamic SQL, so would this be an issue?
Update
The(my expected) flow of data within this :
1) Data appears within IncomingData
2) A stored procedure is scheduled to run that gets the data from incomingdata
3) a function is run within the stored procedure that takes SourceID and QuestionnaireID as parameters and searches SourceDestXmap table
4) if there are matches (i.e. some columns need changing), then, for each of the rows within SourceDestXmap the value of the SourceQ needs inserting into DestQ of a temporary table.
so from the example above, the value of IncomingData.Q5 needs to be inserted into tempData.Q8 column where the mappings are worked out from SourceDestXmap
Caveats:
There could any number of mappings for example QuestionnaireID 11 may have a Q8->Q27 mapping as well as a Q16->Q4 mapping.
Update #2
I basically need a more dynamic way of doing something like this:
BEGIN
DECLARE #tmpQ3 varchar(2000)
DECLARE #tmpQ4 varchar(2000)
DECLARE #tmpQ5 varchar(2000)
DECLARE #tmpQ7 varchar(2000)
DECLARE #tmpQ8 varchar(2000)
SET #tmpQ3 = #prmQ1
SET #tmpQ4 = #prmQ2
SET #tmpQ5 = #prmQ3
SET #tmpQ7 = #prmQ4
SET #tmpQ8 = #prmQ5
SET #prmQ1 = NULL
SET #prmQ2 = NULL
SET #prmQ3 = NULL
SET #prmQ4 = NULL
SET #prmQ5 = NULL
SET #prmQ3 = #tmpQ3
SET #prmQ4 = #tmpQ4
SET #prmQ5 = #tmpQ5
SET #prmQ7 = #tmpQ7
SET #prmQ8 = #tmpQ8
END
Surely this is not exactly what you need (I do not understand it fully - acutally), but you might get an idea, how you can "shift" your values aside and set new values (update) in one single step:
DECLARE #IncomingData TABLE(ID INT,SourceID INT,QuestionairID INT,Q1 VARCHAR(10),Q2 VARCHAR(10),Q3 VARCHAR(10),Q4 VARCHAR(10),Q5 VARCHAR(10));
INSERT INTO #IncomingData VALUES (1,5000,10,'1','2','3','4','5');
DECLARE #NewData TABLE(ID INT,SourceID INT,QuestionairID INT,Q1 VARCHAR(10),Q2 VARCHAR(10),Q3 VARCHAR(10),Q4 VARCHAR(10),Q5 VARCHAR(10));
INSERT INTO #NewData VALUES (1,5000,10,'new1','new2','new3','new4','new5');
--This will shift all original data "two steps aside"...
UPDATE #IncomingData SET id.Q5=id.Q3,id.Q4=id.Q2,id.Q3=id.Q1
,id.Q2=new.Q2
,id.Q1=new.Q1
FROM #IncomingData AS id
CROSS JOIN #NewData AS new
SELECT * FROM #IncomingData
The result:
ID SourceID QuestionairID Q1 Q2 Q3 Q4 Q5
1 5000 10 new1 new2 1 2 3
I have a Table with a computed column that uses a scalar function
CREATE FUNCTION [dbo].[ConvertToMillimetres]
(
#fromUnitOfMeasure int,
#value decimal(18,4)
)
RETURNS decimal(18,2)
WITH SCHEMABINDING
AS
BEGIN
RETURN
CASE
WHEN #fromUnitOfMeasure=1 THEN #value
WHEN #fromUnitOfMeasure=2 THEN #value * 100
WHEN #fromUnitOfMeasure=3 THEN #value * 1000
WHEN #fromUnitOfMeasure=4 THEN #value * 25.4
ELSE #value
END
END
GO
The table has this column
[LengthInMm] AS (CONVERT([decimal](18,2),[dbo].[ConvertToMillimetres]([LengthUnitOfMeasure],[Length]))) PERSISTED,
Assuming that the [Length] on the Table is 62.01249 and [LengthUnitOfMeasure] is 4 the LengthInMm computed value comes with 1575.11 but when i run the function directly like
SELECT [dbo].[ConvertToMillimetres] (4, 62.01249) GO
It comes with 1575.12
[Length] column is (decimal(18,4, null))
Can anyone tell why this happens?
I'm not sure this really counts as an answer, but you've perhaps got a problem with a specific version of SQL?
I just had a go at replicating it (on a local SQL 2014 install) and got the following:
create table dbo.Widgets (
Name varchar(20),
Length decimal(18,4),
LengthInMm AS [dbo].[ConvertToMillimetres] (4, Length) PERSISTED
)
insert into dbo.Widgets (Name, Length) values ('Thingy', 62.01249)
select * from dbo.Widgets
Which gives the result:
Name Length LengthInMm
-------------------- --------------------------------------- ---------------------------------------
Thingy 62.0125 1575.12
(1 row(s) affected)
Note that your definition uses [LengthInMm] AS (CONVERT([decimal](18,2),[dbo].[ConvertToMillimetres]([LengthUnitOfMeasure],[Length]))) PERSISTED, but that doesn't seem to make a difference to the result.
I also tried on my PC (Microsoft SQL Server 2012 - 11.0.2100.60 (X64)). Works fine:
CREATE TABLE dbo.data(
LengthUnitOfMeasure INT,
[Length] decimal(18,4),
[LengthInMm] AS (CONVERT([decimal](18,2),[dbo].[ConvertToMillimetres]([LengthUnitOfMeasure],[Length]))) PERSISTED
)
INSERT INTO dbo.data (LengthUnitOfMeasure, [Length])
SELECT 4, 62.01249
SELECT *
FROM dbo.data
/*
RESULT
LengthUnitOfMeasure Length LengthInMm
4 62.0125 1575.12
*/
I think, I found the answer:
Lets see what you are saying:
There is a column with decimal(18,4) data type.
There is a calculated column which depend on this column.
The result differs when you select the calculated field and when you provide the same result manually. (Right?)
Sorry, but the input parameters are not the same:
The column in the table is decimal(18,4). The value you are provided manually is decimal(7,5) (62.01249)
Since the column in the table can not store any values with scale of 5, the provided values will not be equal. (Furthermore there is no record in the table with the value of 62.01249 in the Length column)
What is the output when you query the [Length] column from the table? Is it 62.0124? If yes, then this is the answer. The results can not be equal since the input values are not equal.
To be a bit more specific: 62.01249 will be casted (implicit cast) to 62.0125.
ROUND(25.4 * 62.0124, 2) = 1575.11
ROUND(25.4 * 62.0125, 2) = 1575.12
EDIT Everybody who tried to rebuild the schema made the same mistake (Including me). When we (blindly) inserted the values from the original question into our instances, we inserted the 62.01249 into the Length column -> the same implicit cast occured, so we have the value 62.0125 in our tables.
I have a Reservations table with the following columns
Reservation_ID
Res_TotalAmount - money
Res_StartDate - datetime
IsDeleted - bit column with default value - false
So when a user tries to delete his reservation I've created a trigger that instead of delete - he just updates the value of column IsDelete to true;
So far so good - but this tourist may owe some compensation to the firm, for example when he has cancelled his reservation 30 days to 20 days from the start_date of the reservation - he owes 30% of the Res_TotalAmount and so on
And here is my trigger
Create Trigger tr_TotalAMountUpdateAfterInsert on RESERVATIONS after Delete
As
Begin
Declare #period int
Declare #oldResAmount int
Declare #newAmount money
Declare #resID int
Select #resID = Reservation_ID from deleted
select #oldResAmount = Res_TotalAmount from deleted
Select #period= datediff (day,Res_StartDate,GETDATE()) from deleted
case
#period is between 30 and 20 then #newAmount=30%*#oldResAmount
#period is between 20 and 10 then #newAmount=50%*#oldResAmount
end
exec sp_NewReservationTotalAmount #newAmount #resID
End
GO
As I have to use both triggers and stored procedure you see that I call at the end of the trigger one stored procedure which just updates Res_TotalAmount column
Create proc sp_NewReservationTotalAmount(#newTotalAmount money, #resID)
As
declare #resID int
Update RESERVATIONS set Res_TotalAmount=#newTotalAmount where Reservation_ID=resID
GO
So my first problem is that it gives me incorrect syntax near case
And my second - I would appreciate suggestions how to make both the trigger and stored procedure better.
Your fundamental flaw is that you seem to expect the trigger to be fired once per row - this is NOT the case in SQL Server. Instead, the trigger fires once per statement, and the pseudo table Deleted might contain multiple rows.
Given that that table might contain multiple rows - which one do you expect will be selected here??
Select #resID = Reservation_ID from deleted
select #oldResAmount = Res_TotalAmount from deleted
Select #period= datediff (day,Res_StartDate,GETDATE()) from deleted
It's undefined - you might get the values from arbitrary rows in Deleted.
You need to rewrite your entire trigger with the knowledge the Deleted WILL contain multiple rows! You need to work with set-based operations - don't expect just a single row in Deleted !
Also: the CASE statement in T-SQL is just intended to return an atomic value - it's not a flow control statement like in other languages, and it cannot be used to execute code. So your CASE statement in your trigger is totally "lost" - it needs to be used in an assignment or something like that ....
1) Here is the correct syntax for the CASE statement. Note that:
I changed the order of your comparisons with the CASE statement; the smaller value has to come first.
I have included an "ELSE" case so you don't wind up with an undefined value when #period is not within your given ranges
SELECT #newAmount =
CASE
WHEN #period between 10 and 20 then 0.5 * #oldResAmount
WHEN #period between 20 and 30 THEN 0.3 * #oldResAmount
ELSE #oldResAmount
END
2) You are going to have an issue with this trigger if ever a delete statement affects more than one row. Your statements like "SELECT #resID = Reservation_ID from deleted;" will simply assign one value from the deleted table at random.
EDIT
Here is an example of a set-based approach to your problem that will still work when multiple rows are "deleted" within the transaction (example code only; not tested):
Create Trigger tr_TotalAMountUpdateAfterInsert on RESERVATIONS after Delete
As
Begin
UPDATE RESERVATIONS
SET Res_TotalAmount =
d.Res_TotalAmount * dbo.ufn_GetCancellationFactor(d.Res_StartDate)
FROM RESERVATIONS r
INNER JOIN deleted d ON r.Reservation_ID = d.Reservation_ID
End
GO
CREATE FUNCTION dbo.ufn_GetCancellationFactor (#scheduledDate DATETIME)
RETURNS FLOAT AS
BEGIN
DECLARE #cancellationFactor FLOAT;
DECLARE #period INT = DATEDIFF (DAY, #scheduledDate, GETDATE());
SELECT #cancellationFactor =
CASE
WHEN #period <= 10 THEN 1.0 -- they owe the full amount (100%)
WHEN #period BETWEEN 11 AND 20 THEN 0.5 -- they owe 50%
WHEN #period BETWEEN 21 AND 30 THEN 0.3 -- they owe 30%
ELSE 0 -- they owe nothing
END
RETURN #cancellationFactor;
END;
GO
About the case:
The syntax is wrong. Even if it worked (see below) you'd be missing the WHENs:
case
WHEN #period is between 30 and 20 then #newAmount=30%*#oldResAmount
WHEN #period is between 20 and 10 then #newAmount=50%*#oldResAmount
end
Yet, the case statement can not be used this way. In the context where you want it, you need to use if. It's not like the switch statement in C++/C# for example. You can only use it in queries like
SELECT
case
WHEN #period is between 30 and 20 then value1
WHEN #period is between 20 and 10 then value2
end
Having said the above: I didn't actually read all your code. But now that I've read some of it, it is really important to understand how triggers work in SQL Server, as mark_s says.
I have a periodic check of a certain query (which by the way includes multiple tables) to add informational messages to the user if something has changed since the last check (once a day).
I tried to make it work with checksum_agg(binary_checksum(*)), but it does not help, so this question doesn't help much, because I have a following case (oversimplified):
select checksum_agg(binary_checksum(*))
from
(
select 1 as id,
1 as status
union all
select 2 as id,
0 as status
) data
and
select checksum_agg(binary_checksum(*))
from
(
select 1 as id,
0 as status
union all
select 2 as id,
1 as status
) data
Both of the above cases result in the same check-sum, 49, and it is clear that the data has been changed.
This doesn't have to be a simple function or a simple solution, but I need some way to uniquely identify the difference like these in SQL server 2000.
checksum_agg appears to simply add the results of binary_checksum together for all rows. Although each row has changed, the sum of the two checksums has not (i.e. 17+32 = 16+33). This is not really the norm for checking for updates, but the recommendations I can come up with are as follows:
Instead of using checksum_agg, concatenate the checksums into a delimited string, and compare strings, along the lines of SELECT binary_checksum(*) + ',' FROM MyTable FOR XML PATH(''). Much longer string to check and to store, but there will be much less chance of a false positive comparison.
Instead of using the built-in checksum routine, use HASHBYTES to calculate MD5 checksums in 8000 byte blocks, and xor the results together. This will give you a much more resilient checksum, although still not bullet-proof (i.e. it is still possible to get false matches, but very much less likely). I'll paste the HASHBYTES demo code that I wrote below.
The last option, and absolute last resort, is to actually store the table table in XML format, and compare that. This is really the only way you can be absolutely certain of no false matches, but is not scalable and involves storing and comparing large amounts of data.
Every approach, including the one you started with, has pros and cons, with varying degrees of data size and processing requirements against accuracy. Depending on what level of accuracy you require, use the appropriate option. The only way to get 100% accuracy is to store all of the table data.
Alternatively, you can add a date_modified field to each table, which is set to GetDate() using after insert and update triggers. You can do SELECT COUNT(*) FROM #test WHERE date_modified > #date_last_checked. This is a more common way of checking for updates. The downside of this one is that deletions cannot be tracked.
Another approach is to create a modified table, with table_name (VARCHAR) and is_modified (BIT) fields, containing one row for each table you wish to track. Using insert, update and delete triggers, the flag against the relevant table is set to True. When you run your schedule, you check and reset the is_modified flag (in the same transaction) - along the lines of SELECT #is_modified = is_modified, is_modified = 0 FROM tblModified
The following script generates three result sets, each corresponding with the numbered list earlier in this response. I have commented which output correspond with which option, just before the SELECT statement. To see how the output was derived, you can work backwards through the code.
-- Create the test table and populate it
CREATE TABLE #Test (
f1 INT,
f2 INT
)
INSERT INTO #Test VALUES(1, 1)
INSERT INTO #Test VALUES(2, 0)
INSERT INTO #Test VALUES(2, 1)
/*******************
OPTION 1
*******************/
SELECT CAST(binary_checksum(*) AS VARCHAR) + ',' FROM #test FOR XML PATH('')
-- Declaration: Input and output MD5 checksums (#in and #out), input string (#input), and counter (#i)
DECLARE #in VARBINARY(16), #out VARBINARY(16), #input VARCHAR(MAX), #i INT
-- Initialize #input string as the XML dump of the table
-- Use this as your comparison string if you choose to not use the MD5 checksum
SET #input = (SELECT * FROM #Test FOR XML RAW)
/*******************
OPTION 3
*******************/
SELECT #input
-- Initialise counter and output MD5.
SET #i = 1
SET #out = 0x00000000000000000000000000000000
WHILE #i <= LEN(#input)
BEGIN
-- calculate MD5 for this batch
SET #in = HASHBYTES('MD5', SUBSTRING(#input, #i, CASE WHEN LEN(#input) - #i > 8000 THEN 8000 ELSE LEN(#input) - #i END))
-- xor the results with the output
SET #out = CAST(CAST(SUBSTRING(#in, 1, 4) AS INT) ^ CAST(SUBSTRING(#out, 1, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 5, 4) AS INT) ^ CAST(SUBSTRING(#out, 5, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 9, 4) AS INT) ^ CAST(SUBSTRING(#out, 9, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 13, 4) AS INT) ^ CAST(SUBSTRING(#out, 13, 4) AS INT) AS VARBINARY(4))
SET #i = #i + 8000
END
/*******************
OPTION 2
*******************/
SELECT #out
I'm currently working on a data export feature for a survey application. We are using SQL2k8. We store data in a normalized format: QuestionId, RespondentId, Answer. We have a couple other tables that define what the question text is for the QuestionId and demographics for the RespondentId...
Currently I'm using some dynamic SQL to generate a pivot that joins the question table to the answer table and creates an export, its working... The problem is that it seems slow and we don't have that much data (less than 50k respondents).
Right now I'm thinking "why am I 'paying' to de-aggregate the data for each query? Why don't I cache that?" The data being exported is based on dynamic criteria. It could be "give me respondents that completed on x date (or range)" or "people that like blue", etc. Because of that, I think I have to cache at the respondent level, find out what respondents are being exported and then select their combined cached de-aggregated data.
To me the quick and dirty fix is a totally flat table, RespondentId, Question1, Question2, etc. The problem is, we have multiple clients and that doesn't scale AND I don't want to have to maintain the flattened table as the survey changes.
So I'm thinking about putting an XML column on the respondent table and caching the results of a SELECT * FROM Data FOR XML AUTO WHERE RespondentId = x. With that in place, I would then be able to get my export with filtering and XML calls into the XML column.
What are you doing to export aggregated data in a flattened format (CSV, Excel, etc)? Does this approach seem ok? I worry about the cost of XML functions on larger result sets (think SELECT RespondentId, XmlCol.value('//data/question_1', 'nvarchar(50)') AS [Why is there air?], XmlCol.RinseAndRepeat)...
Is there a better technology/approach for this?
Thanks!
EDIT: SQL Block for testing.
Run steps 1 & 2 to prime the data, test with step 3, clean up with step 4...
At a thousand respondents by one hundred questions, it already seems slower than I'd like.
SET NOCOUNT ON;
-- step 1 - create seed data
CREATE TABLE #Questions (QuestionId INT PRIMARY KEY IDENTITY (1,1), QuestionText VARCHAR(50));
CREATE TABLE #Respondents (RespondentId INT PRIMARY KEY IDENTITY (1,1), Name VARCHAR(50));
CREATE TABLE #Data (QuestionId INT NOT NULL, RespondentId INT NOT NULL, Answer INT);
DECLARE #QuestionTarget INT = 100
,#QuestionCount INT = 0
,#RespondentTarget INT = 1000
,#RespondentCount INT = 0
,#RespondentId INT;
WHILE #QuestionCount < #QuestionTarget BEGIN
INSERT INTO #Questions(QuestionText) VALUES(CAST(NEWID() AS CHAR(36)));
SET #QuestionCount = #QuestionCount + 1;
END;
WHILE #RespondentCount < #RespondentTarget BEGIN
INSERT INTO #Respondents(Name) VALUES(CAST(NEWID() AS CHAR(36)));
SET #RespondentId = SCOPE_IDENTITY();
SET #QuestionCount = 1;
WHILE #QuestionCount <= #QuestionTarget BEGIN
INSERT INTO #Data(QuestionId, RespondentId, Answer)
VALUES(#QuestionCount, #RespondentId, ROUND(((10 - 1 -1) * RAND() + 1), 0));
SET #QuestionCount = #QuestionCount + 1;
END;
SET #RespondentCount = #RespondentCount + 1;
END;
-- step 2 - index seed data
ALTER TABLE #Data ADD CONSTRAINT [PK_Data] PRIMARY KEY CLUSTERED (QuestionId ASC, RespondentId ASC);
CREATE INDEX DataRespondentQuestion ON #Data (RespondentId ASC, QuestionId ASC);
-- step 3 - query data
DECLARE #Columns NVARCHAR(MAX)
,#TemplateSQL NVARCHAR(MAX)
,#RunSQL NVARCHAR(MAX);
SELECT #Columns = STUFF(
(
SELECT DISTINCT '],[' + q.QuestionText
FROM #Questions AS q
ORDER BY '],[' + q.QuestionText
FOR XML PATH('')
), 1, 2, '') + ']';
SET #TemplateSql =
'SELECT *
FROM
(
SELECT r.Name, q.QuestionText, d.Answer
FROM #Respondents AS r
INNER JOIN #Data AS d ON d.RespondentId = r.RespondentId
INNER JOIN #Questions AS q ON q.QuestionId = d.QuestionId
) AS d
PIVOT
(
MAX(d.Answer)
FOR d.QuestionText
IN (xxCOLUMNSxx)
) AS p;';
SET #RunSql = REPLACE(#TemplateSql, 'xxCOLUMNSxx', #Columns)
EXECUTE sys.sp_executesql #RunSql;
-- step 4 - clean up
DROP INDEX DataRespondentQuestion ON #Data;
DROP TABLE #Data;
DROP TABLE #Questions;
DROP TABLE #Respondents;
No, your approach does not seem ok. Keep your normalized data. If you have proper keys, the "cost" to deaggregate will be minimal. To further optimize your performance, stop using dynamic SQL. Write some cleverly written queries and encapsulate them in stored procedures. This will allow SQL server to cache the query plans instead of rebuilding them every time.
Before you do any of this, however, check the query plan. It is also possible that you are forgetting an index on at least one of the fields you are searching on, which will result in a full table scan of data. You may be able to drastically increase your performance with a few well placed indexes.