Add incremental number in duplicate records - sql-server

I have SSIS package, which retrieves all records including duplicates. My question is how to add an incremental value for the duplicate records (only the ID and PropertyID).
Eg
Records from a Merge Join
ID Name PropertyID Value
1 A 1 123
1 A 1 223
2 B 2 334
3 C 1 22
3 C 1 45
Now I need to append an incremental value at the end of the each record as
ID Name PropertyID Value RID
1 A 1 123 1
1 A 1 223 2
2 B 2 334 1
3 C 1 22 1
3 C 1 45 2
Since ID 1 & 3 are returned twice, the first record has RID as 1 and the second record as 2.
ID and PropertyID need to be considered to generate the Repeating ID i.e RID.
How can I do it in SSIS or using SQL command?
Update #1:
Please correct me if I'm wrong, since the data is not stored in any table yet, I'm unable to use the select query using rownumber(). Any way I can do it from the Merge Join?

You could use ROW_NUMBER:
SELECT ID,
Name,
PropertyID,
Value,
ROW_NUMBER() OVER(PARTITION BY ID, PropertyID ORDER BY Value) As RID
FROM TableName

This will do the job for you: https://paultebraak.wordpress.com/2013/02/25/rank-partitioning-in-etl-using-ssis/
You will need to write a custom script, something like this:
public
class
ScriptMain : UserComponent
{
string _sub_category = “”;
int _row_rank = 1;
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (Row.subcategory != _sub_category)
{
_row_rank = 1;
Row.rowrank = _row_rank;
_sub_category = Row.subcategory;
}
else
{
_row_rank++;
Row.rowrank = _row_rank;
}
}
}

Related

Convert EF core query in raw SQL

Due to EF Core 3.x limitations, I need to convert a LINQ query in raw SQL.
The table looks like :
Id Name TagList
----------------------------
1 Line1 1;5;8
2 Line2 1;4
3 Line3 5
4 Line4 3;2;8
5 Line5
6 Line6 4
I need to get lines that matches a tag list provided has a parameter : let's call it tagSearch.
The EF query I've used with EF Core 2.x has Any() and Contains() method :
var result= _myRepository.Where(a => a.TagList.Any(t => tagSearch.Contains(t)))
if tagSearch is a List<string> that contains following items : 1 | 2
The result is :
Id Name TagList
----------------------------
1 Line1 1;5;8
2 Line2 1;4
4 Line4 3;2;8
I've tried several SQL queries with STRING_SPLIT and the closest is :
DECLARE #tagSearch NVARCHAR(400) = '1;2'
SELECT *
FROM MyTable
WHERE ( #tagSearch = SOME (SELECT value FROM STRING_SPLIT(TagList, ';'))
or TagList = SOME (SELECT value FROM STRING_SPLIT(#tagSearch, ';'))
)
But the result is not the one expected.

Adding multiple records from a string

I have a string of email addresses. For example, "a#a.com; b#a.com; c#a.com"
My database is:
record | flag1 | flag2 | emailaddresss
--------------------------------------------------------
1 | 0 | 0 | a#a.com
2 | 0 | 0 | b#a.com
3 | 0 | 0 | c#a.com
What I need to do is parse the string, and if the address is not in the database, add it.
Then, return a string of just the record numbers that correspond to the email addresses.
So, if the call is made with "A#a.com; c#a.com; d#a.com", the rountine would add "d#a.com", then return "1, 3,4" corresponding to the records that match the email addresses.
What I am doing now is calling the database once per email address to look it up and confirm it exists (adding if it doesn't exist), then looping thru them again to get the addresses 1 by 1 from my powershell app to collect the record numbers.
There has to be a way to just pass all of the addresses to SQL at the same time, right?
I have it working in powershell.. but slowly..
I'd love a response from SQL as shown above of just the record number for each email address in a single response. That is, "1,2,4" etc.
My powershell code is:
$EmailList2 = $EmailList.split(";")
# lets get the ID # for each eamil address.
foreach($x in $EmailList2)
{
$data = exec-query "select Record from emailaddresses where emailAddress = #email" -parameter #{email=$x.trim()} -conn $connection
if ($($data.Tables.record) -gt 0)
{
$ResponseNumbers = $ResponseNumbers + "$($data.Tables.record), "
}
}
$ResponseNumbers = $($ResponseNumbers+"XX").replace(", XX","")
return $ResponseNumbers
You'd have to do this in 2 steps. Firstly INSERT the new values and then use a SELECT to get the values back. This answer uses delimitedsplit8k (not delimitedsplit8k_LEAD) as you're still using SQL Server 2008. On the note of 2008 I strongly suggest looking at upgrade paths soon as you have about 6 weeks of support left.
You can use the function to split the values and then INSERT/SELECT appropriately:
DECLARE #Emails varchar(8000) = 'a#a.com;b#a.com;c#a.com';
WITH Emails AS(
SELECT DS.Item AS Email
FROM dbo.DelimitedSplit8K(#Emails,';') DS)
INSERT INTO YT (emailaddress) --I don't know what the other columns value should be, so have excluded
SELECT E.Email
FROM dbo.YourTable YT
LEFT JOIN Emails E ON YT.emailaddress = E.Email
WHERE E.Email IS NULL;
SELECT YT.record
FROM dbo.YourTable YT
JOIN dbo.DelimitedSplit8K(#Emails,';') DS ON DS.Item = YT.emailaddress;

Updating one column based on parameter from another column in mssl1

Please I want to update my client database based on the job type
id Job_type Meal_Ticket
---------------------------
1 x 20
2 2x 12
Meaning if I click on add 20 meal tickets on button click, it should update to this:
id Job_type Meal_Ticket
----------------------------
1 x 40
2 2x 52
I tried
UPDATE Staff
SET Rticket = CASE
WHEN Jobtype = 'x' THEN Rticket = SUM(Rticket + 20)
WHEN Jobtype = '2x' THEN Rticket = SUM(Rticket + 2*20)
ELSE Rticket
END
I think you want this:
UPDATE Staff
SET Rticket = CASE WHEN Jobtype = 'x' THEN Rticket + 20
WHEN Jobtype = '2x' THEN Rticket + 40 END
WHERE Jobtype IN ('x', '2x');
The only problem I see with your logic is that you are using SUM to add two quantities, when you should just be using the + operator.

Removing the repeating elements from a row in a squlite table

Please let me know if there is any query where in I remove the repeating entries in a row.
For eg: I have a table which has name with 9 telephone numbers:
Name Tel0 Tel1 Tel2 Tel3 Tel4 Tel5 Tel6 Tel7 Tel8
John 1 2 2 2 3 3 4 5 1
The final result should be as shown below:
Name Tel0 Tel1 Tel2 Tel3 Tel4 Tel5 Tel6 Tel7 Tel8
John 1 2 3 4 5
regards
Maddy
I fear that it will be more complicated to keep this format than to split the table in two as I suggested. If you insist on keeping the current schema then I would suggest that you query the row, organise the fields in application code and then perform an update on the database.
You could also try to use SQL UNION operator to give you a list of the numbers, a UNION by default will remove all duplicate rows:
SELECT Name, Tel FROM
(SELECT Name, Tel0 AS Tel FROM Person UNION
SELECT Name, Tel1 FROM Person UNION
SELECT Name, Tel2 FROM Person) ORDER BY Name ;
Which should give you a result set like this:
John|1
John|2
You will then have to step through the result set and saving each number into a separate variable (skipping those variables that do not exist) until the "Name" field changes.
Tel1 := Null; Tel2 := Null;
Name := ResultSet['Name'];
Tel0 := ResultSet['Tel'];
ResultSet.Next();
if (Name == ResultSet['Name']) {
Tel1 := ResultSet['Tel'];
} else {
UPDATE here.
StartAgain;
}
ResultSet.Next();
if (Name == ResultSet['Name']) {
Tel2 := ResultSet['Tel'];
} else {
UPDATE here.
StartAgain;
}
I am not recommending you do this, it is very bad use of a relational database but once implemented in a real language and debugged that should work.

Hive query, better option to self join

So I am working with a hive table that is set up as so:
id (Int), mapper (String), mapperId (Int)
Basically a single Id can have multiple mapperIds, one per mapper such as an example below:
ID (1) mapper(MAP1) mapperId(123)
ID (1) mapper(MAP2) mapperId(1234)
ID (1) mapper(MAP3) mapperId(12345)
ID (2) mapper(MAP2) mapperId(10)
ID (2) mapper(MAP3) mapperId(12)
I want to return the list of mapperIds associated to each unique ID. So for the above example I would want the below returned as a single row.
1, 123, 1234, 12345
2, null, 10, 12
The mapper Strings are known, so I was thinking of doing a self join for every mapper string I am interested in, but I was wondering if there was a more optimal solution?
If the assumption that the mapper column is distinct with respect to a given ID is correct, you could collect the mapper column and the mapperid column to a Map using brickhouse collect. You can clone the repo from that link and build the jar with Maven.
Query:
add jar /complete/path/to/jar/brickhouse-0.7.0-SNAPSHOT.jar;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select id
,id_map['MAP1'] as mapper1
,id_map['MAP2'] as mapper2
,id_map['MAP3'] as mapper3
from (
select id
,collect(mapper, mapperid) as id_map
from some_table
group by id
) x
Output:
| id | mapper1 | mapper2 | mapper3 |
------------------------------------
1 123 1234 12345
2 10 12

Resources