SSIS data manipulation - database

I am currently using SSIS to read the data from a table, modify a column and inset it into a new table.
The modification I want to perform will occur if a previously read row has an identical value in a particular column.
My original idea was to use a c# script with a dictionary containing previously read values and a count of how many times it has been seen.
My problem is that I cannot save a dictionary as an SSIS variable. Is it possible to save a C# variable inside an SSIS script component? or is there another method I could use to accomplish this.
As an example, the data below
/--------------------------------\
| Unique Column | To be modified |
|--------------------------------|
| X5FG | 0 |
| QFJD | 0 |
| X5FG | 0 |
| X5FG | 0 |
| DFHG | 0 |
| DDFB | 0 |
| DDFB | 0 |
will be transformed into
/--------------------------------\
| Unique Column | To be modified |
|--------------------------------|
| X5FG | 0 |
| QFJD | 0 |
| X5FG | 1 |
| X5FG | 2 |
| DFHG | 0 |
| DDFB | 0 |
| DDFB | 1 |

Rather than use a cursor, just use a set based statment
Assuming SQL 2005+ or Oracle, use the ROW_NUMBER function in your source query like so. What's important to note is the PARTITION BY defines your group/when the numbers restart. The ORDER BY clause directs the order in which the numbers are applied (most recent mod date, oldest first, highest salary, etc)
SELECT
D.*
, ROW_NUMBER() OVER (PARTITION BY D.unique_column ORDER BY D.unique_column ) -1 AS keeper
FROM
(
SELECT 'X5FG'
UNION ALL SELECT 'QFJD'
UNION ALL SELECT 'X5FG'
UNION ALL SELECT 'X5FG'
UNION ALL SELECT 'DFHG'
UNION ALL SELECT 'DDFB'
UNION ALL SELECT 'DDFB'
) D (unique_column)
Results
unique_column keeper
DDFB 0
DDFB 1
DFHG 0
QFJD 0
X5FG 0
X5FG 1
X5FG 2

You can create a script component. When given the choice, select the row transformation (instead of source or destination).
In the script, you can create a global variable that you will update in the process row method.

Perhaps SSIS isn't the solution for this one task. Using a cursor with a table-valued variable you would be able to accomplish the same result. I'm not a fan of cursors in most situation, but when you need to iterate through data that depends on previous iterations or is self-reliant then it can be useful. Here's an example:
DECLARE
#value varchar(4)
,#count int
DECLARE #dictionary TABLE ( value varchar(4), count int )
DECLARE cur CURSOR FOR
(SELECT UniqueColumn FROM SourceTable s)
OPEN cur;
FETCH NEXT FROM cur INTO #value;
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #innerCount int = 0
IF NOT EXISTS (SELECT 1 FROM #dictionary WHERE value = #value)
BEGIN
INSERT INTO #dictionary ( value, count )
VALUES( #value, 0 )
END
ELSE
BEGIN
SET #innerCount = (SELECT count + 1 FROM #dictionary WHERE value = #value)
UPDATE #dictionary
SET count = #innerCount
WHERE value = #value
END
INSERT INTO TargetTable ( value, count )
VALUES (#value, #innerCount)
FETCH NEXT FROM cur INTO #value;
END

Related

Delete a row from cursor while using the cursor

I have a cursor that iterates through my temporary table. While it's iterating, I want to check a condition and delete some rows depending on the condition (I will be deleting rows that the iterator has not reached yet).
I tried deleting rows from the table the cursor is iterating (so the temp table), but no success, I can see them in the Messages panel (I print its name).
Is it possible to delete rows from the table a cursor is iterating in SQL-Server ? If it's not, what are my alternatives ?
Basically, the temp table contains tree-like data and depending on the value of a column, I need to delete its children (and grand-children and so on) if it does not fit a criteria.
DECLARE cursor_name CURSOR
FOR (SELECT * FROM #test) ORDER BY Path
DECLARE
#Id AS INTEGER,
#Name AS VARCHAR(MAX),
#Path AS VARCHAR(MAX)
OPEN cursor_name;
FETCH NEXT FROM cursor_name INTO #Id, #Name, #Path;
PRINT #Name
DELETE FROM #test
WHERE
Path LIKE '%76939%'
WHILE ##FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM cursor_name INTO #Id, #Name, #Path;
PRINT #Name
END;
CLOSE cursor_name;
DEALLOCATE cursor_name;
#EDIT
Here is more detail on the problem. We have data structured like a tree list. Every item has multiple columns that specify some characteristics the row. Those characteristics can be inherited or not (if InheritanceFlag is 1, then it's inherited, if it's 0, then it is not).
So, when a user makes a change, we need to propagate the change to its children, depending on the said flag. If one of its child has the InheritanceFlag set to 0, then it won't change its value and neither will its children. I wanted to remove those rows with the cursor using the path.
Here is the data that I have. ParentID is the ID of its parent. In this case, suppose we are editing the item 76938, thus we are looking at its children. The ToEdit column is what I'm looking to create; with it, I can filter the rows and directly change the characteristic column to the new value.
+-------+----------+-------+-------------------------+-----------------+--------+
| ID | ParentID | Name | Path | InheritanceFlag | ToEdit |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76938 | NULL | 1 | (76938) | 1 | X |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76942 | 76938 | 1.1 | (76938)\(76942) | 1 | 1 |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76952 | 76942 | 1.1.1 | (76938)\(76942)\(76952) | 0 | 0 |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76961 | 76942 | 1.1.2 | (76938)\(76942)\(76961) | 1 | 1 |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76943 | 76938 | 1.2 | (76938)\(76943) | 1 | 1 |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76944 | 76938 | 1.3 | (76938)\(76944) | 0 | 0 |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76946 | 76944 | 1.3.1 | (76938)\(76944)\(76946) | 1 | 0 |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76947 | 76944 | 1.3.2 | (76938)\(76944)\(76947) | 0 | 0 |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76948 | 76944 | 1.3.3 | (76938)\(76944)\(76948) | 1 | 0 |
+-------+----------+-------+-------------------------+-----------------+--------+
| 76945 | 76938 | 1.4 | (76938)\(76945) | 1 | 1 |
+-------+----------+-------+-------------------------+-----------------+--------+
You can delete from the underlying table and have the rows removed from future FETCHes if the cursor is DYNAMIC, and the query that defines the cursor doesn't require a spool, effectively turning it into a STATIC cursor.
In your code sorting by the unindexed VARCHAR(MAX) prevents the cursor from seeing any changes in the underlying table.
EG this
drop table if exists #test
go
create table #test(id integer, name varchar(max), path varchar(1000), index ix_path (path))
insert into #test(id,name,path) values (1,'a','0000000'),(2,'b', '0769391'),(3,'c', '1768391')
DECLARE cursor_name CURSOR DYNAMIC
FOR SELECT * FROM #test ORDER BY path
DECLARE
#Id AS INTEGER,
#Name AS VARCHAR(MAX),
#Path AS VARCHAR(MAX)
OPEN cursor_name;
FETCH NEXT FROM cursor_name INTO #Id, #Name, #Path;
PRINT #Name
print 'deleting'
DELETE FROM #test
WHERE
Path LIKE '%76939%'
WHILE 1=1
BEGIN
FETCH NEXT FROM cursor_name INTO #Id, #Name, #Path;
if ##FETCH_STATUS <> 0 break
PRINT #Name
END;
CLOSE cursor_name;
DEALLOCATE cursor_name;
outputs
(3 rows affected)
a
deleting
(1 row affected)
c

Bounded cumulative sum in SQL

How can I use SQL to compute a cumulative sum over a column, so that the cumulative sum always stays within upper/lower bounds. Example with lower bound -2 and upper bound 10, showing the regular cumulative sum and the bounded cumulative sum.
id input
-------------
1 5
2 7
3 -10
4 -10
5 5
6 10
Result:
id cum_sum bounded_cum_sum
---------------------------------
1 5 5
2 12 10
3 2 0
4 -8 -2
5 -3 3
6 7 10
See https://codegolf.stackexchange.com/questions/61684/calculate-the-bounded-cumulative-sum-of-a-vector for some (non SQL) examples of a bounded cumulative sum.
You can (almost) always use a cursor to implement whatever cumulative logic you have. The technique is quite routine so can be used to tackle a variety of problems easily once you get it.
One specific thing to note: Here I update the table in-place, so the [id] column must be uniquely indexed.
(Tested on SQL Server 2017 latest linux docker image)
Test Dataset
use [testdb];
if OBJECT_ID('testdb..test') is not null
drop table testdb..test;
create table test (
[id] int,
[input] int,
);
insert into test (id, input)
values (1,5), (2,7), (3,-10), (4,-10), (5,5), (6,10);
Solution
/* A generic row-by-row cursor solution */
-- First of all, make [id] uniquely indexed to enable "where current of"
create unique index idx_id on test(id);
-- append answer columns
alter table test
add [cum_sum] int,
[bounded_cum_sum] int;
-- storage for each row
declare #id int,
#input int,
#cum_sum int,
#bounded_cum_sum int;
-- record accumulated values
declare #prev_cum_sum int = 0,
#prev_bounded_cum_sum int = 0;
-- open a cursor ordered by [id] and updatable for assigned columns
declare cur CURSOR local
for select [id], [input], [cum_sum], [bounded_cum_sum]
from test
order by id
for update of [cum_sum], [bounded_cum_sum];
open cur;
while 1=1 BEGIN
/* fetch next row and check termination condition */
fetch next from cur
into #id, #input, #cum_sum, #bounded_cum_sum;
if ##FETCH_STATUS <> 0
break;
/* program body */
-- main logic
set #cum_sum = #prev_cum_sum + #input;
set #bounded_cum_sum = #prev_bounded_cum_sum + #input;
if #bounded_cum_sum > 10 set #bounded_cum_sum=10
else if #bounded_cum_sum < -2 set #bounded_cum_sum=-2;
-- write the result back
update test
set [cum_sum] = #cum_sum,
[bounded_cum_sum] = #bounded_cum_sum
where current of cur;
-- setup for next row
set #prev_cum_sum = #cum_sum;
set #prev_bounded_cum_sum = #bounded_cum_sum;
END
-- cleanup
close cur;
deallocate cur;
-- show
select * from test;
Result
| | id | input | cum_sum | bounded_cum_sum |
|---|----|-------|---------|-----------------|
| 1 | 1 | 5 | 5 | 5 |
| 2 | 2 | 7 | 12 | 10 |
| 3 | 3 | -10 | 2 | 0 |
| 4 | 4 | -10 | -8 | -2 |
| 5 | 5 | 5 | -3 | 3 |
| 6 | 6 | 10 | 7 | 10 |

Understanding SQL Merge statement?

I have a source table that has data identical to my target table. When I try to run a merge statement, it fails with the error
merge can't update a target row multiple times.
So My Question is since they are identical why SQL did succeed but with 0 rows affected instead. Please help me understand this.
By the way, My syntax is correct because in my initial insert it succeeded, the problem is if re-run it again.
Thank you.
target table and the source table has the same data.
WHEN MATCHED AND ISNULL(T.VALUE,'') <> ISNULL(S.VALUE,'')
COL1 COL2 COL3 VALUE DATE
1 A TYPE 3 2019-01-02
2 B KIND 4 2019-01-03
1 A COLOR 0 2019-01-02
2 B KIND 0 2019-01-03
MERGE TargetTable T
USING
(
SELECT COL1,
COL2,
COL3,
VALUE,
DATE
FROM SourceTable S
) s
ON
(
S.COL1 = T.COL1
AND S.COL2 = T.COL2
AND S.COL3 = T.COL3
AND S.DATE = T.DATE
)
WHEN MATCHED AND
(
ISNULL(S.VALUE,'') <> ISNULL(T.VALUE,'')
)
THEN UPDATE
SET
T.VALUE = S.VALUE
WHEN NOT MATCHED
THEN INSERT VALUES
(
S.COL1
,S.COL2
,S.COL3
,S.VALUE
,S.DATE
);
For better Unserstanding of Merge :
MERGE is a DML statement (data manipulation language).
Also called UPSERT (Update-Insert).
It tries to match source (table / view / query) to a target (table / updatable view) based on your defined conditions and then based on the matching results it insert/update/delete rows to/in/of the target table.
MERGE (Transact-SQL)
create table src (i int, j int);
create table trg (i int, j int);
insert into src values (1,1),(2,2),(3,3);
insert into trg values (2,20),(3,30),(4,40);
merge into trg
using src
on src.i = trg.i
when not matched by target then insert (i,j) values (src.i,src.j)
when not matched by source then update set trg.j = -1
when matched then update set trg.j = trg.j + src.j
;
select * from trg order by i
+---+----+
| i | j |
+---+----+
| 1 | 1 |
+---+----+
| 2 | 22 |
+---+----+
| 3 | 33 |
+---+----+
| 4 | -1 |
+---+----+
Source : Stackoverflow SQL Merge
I couldn't reproduce the error, but found something interesting
SQL DEMO
As you mention the first merge run perfect, but in my case the second merge says update 2 rows.
So I modify the 2nd merge to detect what rows were updated.
WHEN MATCHED AND
(
ISNULL(S.VALUE,'') <> ISNULL(T.VALUE,'')
)
THEN UPDATE
SET T.VALUE = S.VALUE + 10
OUTPUT
+------+------+-------+-------+---------------------+
| COL1 | COL2 | COL3 | VALUE | DATE |
+------+------+-------+-------+---------------------+
| 1 | A | TYPE | 3 | 02/01/2019 00:00:00 |
| 2 | B | KIND | 10 | 03/01/2019 00:00:00 |
| 1 | A | COLOR | 0 | 02/01/2019 00:00:00 |
| 2 | B | KIND | 14 | 03/01/2019 00:00:00 |
+------+------+-------+-------+---------------------+
Because you have 2 rows with the exact match (COL1, COL2, COL3, DATE) the system is telling you don't know which one update with which row.
But that doesn't explain why on my demo work as expected.
So my suggestion is you have to add a PK to your table to make sure the merge happen on the right rows.

Why is my SQL Server 2017 query returning incorrect results?

Below is some repro code for an issue I am having.
Run it in SQL SERVER 2017 you will get different (and incorrect) result compared with any other SQL SERVER version Setting the database to lower compatibility level on the sql Server 2017 instance, it works fine too.
Why does this happen and how can I fix it without changing the compatibility level?
Actual Result
+--------------+--------------+----------------+---------+-----------+---------+------------+-------+
| IsPriorAfter | IsIdealAfter | IsCurrentAfter | IsPrior | IsCurrent | IsIdeal | SecurityID | PosID |
+--------------+--------------+----------------+---------+-----------+---------+------------+-------+
| 1 | 1 | 1 | 1 | 1 | 1 | 123 | 1 |
| 0 | 0 | 0 | 0 | 1 | 1 | 234 | 2 |
| 0 | 0 | 0 | 1 | 0 | 0 | 234 | 3 |
+--------------+--------------+----------------+---------+-----------+---------+------------+-------+
Expected Result
+--------------+--------------+----------------+---------+-----------+---------+------------+-------+
| IsPriorAfter | IsIdealAfter | IsCurrentAfter | IsPrior | IsCurrent | IsIdeal | SecurityID | PosID |
+--------------+--------------+----------------+---------+-----------+---------+------------+-------+
| 1 | 1 | 1 | 1 | 1 | 1 | 123 | 1 |
| 0 | 1 | 1 | 0 | 1 | 1 | 234 | 2 |
| 1 | 0 | 0 | 1 | 0 | 0 | 234 | 3 |
+--------------+--------------+----------------+---------+-----------+---------+------------+-------+
Repro
if object_id('ForSubQuery') is not null begin
DROP TABLE ForSubQuery
end
Create Table ForSubQuery
(
SecID int
)
INSERT INTO ForSubQuery SELECT 123
INSERT INTO ForSubQuery SELECT 234
GO
SELECT * FROM ForSubQuery
if object_id('MainTable') is not null begin
DROP TABLE MainTable
end
Create Table MainTable
(
IsPrior bit,
IsCurrent bit,
IsIdeal bit,
[SecurityID] int,
PosID int
)
INSERT INTO MainTable SELECT 1,1,1,123,1
INSERT INTO MainTable SELECT 0,1,1,234,2
INSERT INTO MainTable SELECT 1,0,0,234,3
GO
SELECT * FROM MainTable
SELECT
CASE
WHEN
Position.IsPrior = 1
AND Position.[SecurityID] in (SELECT
SecID
FROM ForSubQuery
)
THEN 1
ELSE 0
END AS IsPriorAfter
,CASE
WHEN
Position.IsIdeal = 1
AND [Position].[SecurityID] IN (SELECT
secid
FROM ForSubQuery
)
THEN 1
ELSE 0
END AS IsIdealAfter
,CASE
WHEN
Position.IsCurrent = 1
AND [Position].[SecurityID] IN (SELECT
secid
FROM ForSubQuery
)
THEN 1
ELSE 0
END AS IsCurrentAfter
, Position.*
FROM MainTable [Position]
order by Position.PosID
TLDR
This is a bug that has been fixed in CU8 so installing at least that CU and ideally the most recent one will fix it.
Pre SQL Server 2017
In SQL Server 2016 the plan looks as above. The IN is treated the same as EXISTS so it evaluates the following three columns.
CASE WHEN IsPrior = 1 AND EXISTS (SELECT * FROM ForSubQuery WHERE SecID = MainTable.SecurityID) THEN 1 ELSE 0 END AS IsPriorAfter
CASE WHEN IsIdeal = 1 AND EXISTS (SELECT * FROM ForSubQuery WHERE SecID = MainTable.SecurityID) THEN 1 ELSE 0 END AS IsIdealAfter
CASE WHEN IsCurrent = 1 AND EXISTS (SELECT * FROM ForSubQuery WHERE SecID = MainTable.SecurityID) THEN 1 ELSE 0 END AS IsCurrentAfter
Each subquery instance gets its own operator in the plan and the query returns the correct result but this is sub optimal as the identical subquery may be executed up to three times per row.
Because each sub query has an AND next to it SQL Server can skip evaluating the sub query if the result of that expression is false however. This is achieved by each nested loops containing a pass through predicate. For example the one corresponding to evaluation of IsPriorAfter has a pass through predicate of IsFalseOrNull (IsPrior=1)
IsPrior=1 is a boolean expression that can return false, null, or true. The IsFalseOrNull then inverts the result and returns 1 for false, null and 0 for true. So the pass through predicate evaluates to true/1 if IsPrior is anything other than 1 (including NULL) and would then skip executing the sub query.
SQL Server 2017 RTM
SQL Server 2017 introduces a new optimisation rule CollapseIdenticalScalarSubquery. In the RTM version the execution plan is not correct.
Problem Plan
The sub query is now in a single operator and the pass through predicates are combined
IsFalseOrNull([IsCurrent]=(1)) OR IsFalseOrNull([IsIdeal]=(1)) OR IsFalseOrNull([IsPrior]=(1))
However this condition is not correct! It evaluates to true unless all three of IsPrior, IsIdeal, IsCurrent are 1.
So in your case the sub query is only executed once (for the first row in the table - where all three of the columns are equal to 1).
For the two other rows it should be executed but isn't. The nested loops has a probe column that is set to 1 if the correlated subquery returns a row. (Labelled Expr1016 in the plan). When execution is skipped this probe column is set to NULL
The final compute scalar in the plan has the following expression. When Expr1016 is null this evaluates to 0 for all three of your calculated columns using CASE.
[Expr1005] = Scalar Operator(CASE WHEN [IsPrior]=(1) AND [Expr1016] THEN (1) ELSE (0) END),
[Expr1009] = Scalar Operator(CASE WHEN [IsIdeal]=(1) AND [Expr1016] THEN (1) ELSE (0) END),
[Expr1013] = Scalar Operator(CASE WHEN [IsCurrent]=(1) AND [Expr1016] THEN (1) ELSE (0) END)
SQL Server 2017 patched
The final fixed plan after the CU is applied has the same plan shape as the 2017 RTM plan (with the subquery only appearing once) but the pass through predicate is now
IsFalseOrNull([IsCurrent]=(1)) AND IsFalseOrNull([IsIdeal]=(1)) AND IsFalseOrNull([IsPrior]=(1))
This only evaluates to true if none of those columns have a value of 1 so the sub query is now evaluated exactly when needed.

Inject SQL before Codeigniters $this->db->get() call

Here is what I try to do. I have a table with the following structure, that is supposed to hold translated values of other data in any other table
Translations
| Language id | translation | record_id | column_name | table_name |
====================================================================
| 1 | Hello | 1 | test_column | test_table |
| 2 | Aloha | 1 | test_column | test_table |
| 1 | Test input | 2 | test_column | test_table |
In my code I use in my views, I have a function that looks up this table, and returns the string in the language of the user. If the string is not translated in his language, the function returns the string in the default of the application (let's say with ID = 1)
It works fine, but I have to go through about 600 view files to apply this... I was wondering if it was possible to inject some SQL in my CodeIgniter models right before the $this->db->get() of the original record, that replaces the original column with the translated one.
Something like this:
$this->db->select('column_name, col_2, col_3');
// Injected SQL pseudocode:
// If RECORD EXISTS in table Translations where Language_id = 2 and record_id = 2 AND column_name = test_column AND table_name = test_table
// BEGIN
// SELECT translations.translation as column_name
// WHERE translations.table_name = test_table AND column_name = test_column AND record_id = 2
// END
// ELSE
// BEGIN
// SELECT translations.translation as column_name
// WHERE translations.table_name = test_table AND column_name = test_column AND record_id = 1
// END
$this->db->get('test_table');
Is this possible to be done somehow?
what you're asking for doesn't really make sense. You "inject" by simply making different query first, then altering your second query based on the results.
the other option (perhaps better) would be to do all of this in a stored procedure, but it is still essentially the same, just with less connections & prolly quicker processing

Resources