I need to cleanup a set of companies name by replacing : INC, LTD, LTD. , INC. , others, with a empty space when they are individual words ( with one blank space before the word i.e. Incoming INC) and not letters part of company name i.e. INComing Money.
The logic I tried :
case
when FINDSTRING([Trade Name]," INC",1) > 0 then REPLACE([Trade Name]," INC","")
when FINDSTRING([Trade Name]," LTD",1) > 0 then REPLACE([Trade Name]," LTD","")
ELSE [Trade Name]
I tried SSIS expresion in a derived column :
FINDSTRING( [Trade Name] ," INC",1) ? REPLACE([Trade Name]," INC","") :
FINDSTRING([Trade Name]," LTD",1) ? REPLACE([Trade Name]," LTD",""):
The error received:
Error at Data Flow Task [Derived Column [1]]: Attempt to find the
input column named "A" failed with error code 0xC0010009. The input
column specified was not found in the input column collection.
In a similar case it is easier to use a Script Component to clean this column, you can simply split the column based on spaces then re concatenate the parts that are not equal to INC, you can use the following method to do that, or you can simple use RegEx.Replace() method to replace values based on regular expressions:
string value = "";
string[] parts = Row.TradeName.Split(' ');
foreach(string str in parts){
if(str != "INC"){
value += " " + str;
}
}
Row.outTradeName = value.TrimStart();
Related
I have created internal tables where I want to update age of employee in one internal table by calculating it from another table, I have done arithmetic calculations to get age but now how can I update it by any alternate way instead of MODIFY?
WRITE : / 'FirstName','LastName', ' Age'.
LOOP AT gt_items1 INTO gwa_items1.
READ TABLE gt_header INTO gwa_header WITH KEY empid = gwa_items1-empid.
gwa_items1-age = gv_date+0(4) - gwa_header-bdate+0(4).
MODIFY gt_items1 from gwa_items1 TRANSPORTING age WHERE empid = gwa_items1-empid.
WRITE : / gwa_items1-fname , gwa_items1-lname , gwa_items1-age .
ENDLOOP.
Use field symbols (instead of work areas) by LOOPing over internal tables:
WRITE : / 'FirstName','LastName', ' Age'.
LOOP AT gt_items1
ASSIGNING FIELD-SYMBOL(<ls_item1>).
READ TABLE gt_header
ASSIGNING FIELD-SYMBOL(<ls_header>)
WITH KEY empid = <ls_item1>-empid.
IF sy-subrc EQ 0.
<ls_item1>-age = gv_date+0(4) - <ls_header>-bdate+0(4).
WRITE : / <ls_item1>-fname , <ls_item1>-lname , <ls_item1>-age .
ENDIF.
ENDLOOP.
Field symbols have two advantages:
They modify the internal table directly, no separate MODIFY is
necessary.
They are somewhat faster, than work areas.
Besides József Szikszai's answer you could also use references:
write : / 'FirstName','LastName', ' Age'.
sort gt_header by empid. " <------------- Sort for binary search
loop at gt_items1 reference into data(r_item1).
read table gt_header reference into data(r_header)
with key empid = r_item1->empid binary search. " <------------- Faster read
check sy-subrc eq 0.
r_item1->age = gv_date+0(4) - r_header->bdate+0(4).
write : / r_item1->fname , r_item1->lname , r_item1->age .
endloop.
I added some enhacements to your code also.
For more info check this link.
In order to create PlaceKey for addresses to link some of my tables, I need to split an address column in SnowFlake.
I am not familiar with JavaScript, but I tried Javascript UDF in SnowFlake. Then I don't know how to deal with the addresses like '123_45ThSt'.
The output of my function is like '123_45 Th St'. I am stuck here.
The expected output is '123 45Th St'.
Hope someone could help me out. Much appreciated!
Below is another example and my SnowFlake SQL code:
Original address column: 12345NE17ThSt
The expected column: 12345 NE 17Th St
My function's output: 12345 NE17 ST
My function:
CREATE OR REPLACE FUNCTION Split_On_Upper_Case(s string)
RETURNS string
LANGUAGE JAVASCRIPT
AS '
function Split_On_Upper_Case(str){
str=str.split(/(?=[A-Z])/).join(" ")
return str
}
// Now call the function
return Split_On_Upper_Case(S);
'
;
Assuming the format of street address, which includes number + word (ends with lower case or number) + word (start with upper case), I have below solution:
CREATE OR REPLACE FUNCTION Split_On_Upper_Case(s string)
RETURNS string
LANGUAGE JAVASCRIPT
AS $$
regexp = /([0-9]+)(NE|SE|NW|SW)?(.*[0-9a-z]{1})([A-Z][a-zA-Z0-9]+)/g;
splits = regexp.exec(S.replace(/_/g, " "));
if (splits && splits.length == 5) {
return
splits[1].trim() + " " +
(splits[2] ? splits[2].trim() + " ": "" ) +
splits[3].trim() + " " +
splits[4].trim();
}
return "not found" // or whatever you want to do
$$;
Then try to run the function:
select Split_On_Upper_Case('12345NE17ThSt');
-- 12345 NE 17Th St
select Split_On_Upper_Case('123_45ThSt');
-- 123 45Th St
select Split_On_Upper_Case('35TestSt');
-- 35 Test St
It returns expected output, but if you have more sample inputs, they can help to validate.
In SQL we can use the "WHERE 1=1 hack" to easily generate a WHERE statement in a loop, so we don't need to check if the current iteration is the first one.
without WHERE 1 :
//I haven't tried the C++ code below, it's just an example to briefly explain the "hack"
string statement = "WHERE";
for (int i = 0 ; i < list.size() ; i++)
{
if (i != 0)
{
statement += " AND "; //we don't want to generate "WHERE AND"
}
statement += list[i];
}
the generated statement :
WHERE <something_1>
AND <something_2>
AND <something_3>
with WHERE 1 :
string statement = "WHERE 1 = 1"; // Added "1 = 1"
for (int i = 0 ; i < list.size() ; i++)
{
statement += "AND" + list[i];
}
the generated statement :
WHERE 1 = 1
AND <something_1>
AND <something_2>
AND <something_3>
My issue : I need to generate an "ORDER BY" statement, and I was wondering if such a hack also exists for the ORDER BY statement.
I could check if the current iteration in the loop is the last one, but there's maybe a better solution.
ORDER BY a DESC,
b DESC,
c DESC,
d DESC,
<dummy statement added at the end, so I don't need to remove the last comma>
From what I've read I cannot use "ORDER BY 1", so does a similar hack actually exists?
You could legally start every ORDER BY with something like:
order by ##spid
But it's likely to raise questions about query execution efficiency.
What you said about not using "1" is only partially true. If you did a similar pattern, adding "1" at the end would work as long as whatever column is in position 1 isn't also referenced in the order by clause.
I generally do something like this:
string whereClause = (list.Count > 0)
? "WHERE (" + list.StringJoin(") AND (") + ")"
: "";
where StringJoin is a very simple extension method I wrote. Note the added parentheses so that if any of the list elements have an "OR" you don't get into trouble.
Something identical can be done for ORDER BY, just replacing the word WHERE above.
string orderClause = (list.Count > 0)
? "ORDER BY " + list.StringJoin(", ")
: "";
This is C# instead of c++, but a simple version of the extension method is this:
public static string StringJoin(this IEnumerable<string> list, string separator)
{
if (list == null)
return null;
else
return string.Join(separator, list.ToArray());
}
Using SSIS I am bringing in raw text files that contain this in the output:
I use this data later to report on. The Key columns get pivoted. However, I don't want to show all those columns individually, I only want to show the total.
To accomplish this my idea was calculate the Sum on insert using a trigger, and then insert the sum as a new row into the data.
The output would look something like:
Is what I'm trying to do possible? Is there a better way to do this dynamically on pivot? To be clear I'm not just pivoting these rows for a report, there are other ones that don't need the sum calculated.
Using derived column and Script Component
You can achieve this by following these steps:
Add a derived column (name: intValue) with the following expression:
(DT_I4)(RIGHT([Value],2) == "GB" ? SUBSTRING([Value],1,FINDSTRING( [Value], " ", 1)) : "0")
So if the value ends with GB then the number is taken else the result is 0.
After that add a script component, in the Input and Output Properties, click on the Output and set the SynchronousInput property to None
Add 2 Output Columns outKey , outValue
In the Script Editor write the following script (VB.NET)
Private SumValues As Integer = 0
Public Overrides Sub PostExecute()
MyBase.PostExecute()
Output0Buffer.AddRow()
Output0Buffer.outKey = ""
Output0Buffer.outValue = SumValues.ToString & " GB"
End Sub
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Output0Buffer.AddRow()
Output0Buffer.outKey = Row.Key
Output0Buffer.outValue = Row.Value
SumValues += Row.intValue
End Sub
I am going to show you a way but I don't recommend adding total to the end of the detail data. If you are going to report on it show it as a total.
After source add a data transformation:
C#
Add two columns to your data flow: Size int and type string
Select Value as readonly
Here is the code:
string[] splits = Row.value.ToString().Split(' '); //Make sure single quote for char
int goodValue;
if(Int32.TryParse(splits[0], out goodValue))
{
Row.Size = goodValue;
Row.Type = "GB";
}
else
{
Row.Size = 0;
Row.Type="None";
}
Now you have the data with the proper data types to do arithmatic in your table.
If you really want the data in your format. Add a multicast and an aggregate and SUM(Size) and then merge back into your original flow.
I was able to solve my problem in another way using a trigger.
I used this code:
INSERT INTO [Table] (
[Filename]
, [Type]
, [DeviceSN]
, [Property]
, [Value]
)
SELECT ms.[Filename],
ms.[Type],
ms.[DeviceSN],
'Memory Device.Total' AS [Key],
CAST(SUM(CAST(left(ms.[Value], 2) as INT)) AS VARCHAR) + ' GB' as 'Value'
FROM [Table] ms
JOIN inserted i ON i.Row# = ms.Row#
WHERE ms.[Value] like '%GB'
GROUP BY ms.[filename],
ms.[type],
ms.[devicesn]
I want to select data as per below criteria in Netezza.
can someone help me to write the sql.
Case 1: Unique ID has 2 "."s
Deal ID = Parse from UNIQ_ID. Pos 1 to first "."
E.g.
Unique ID = 0000149844.FXFWD.COIBI_I
Deal ID = 0000149844
Case 2: Unique ID has 1 "."s
Deal ID = Parse from UNIQ_ID. First "." to end
E.g
Unique ID = 25808.1234140AT`enter code here`
Deal ID = 1234140AT
Use "position" function of Netezza to determine the position of ".", Make use of this output in "substr" function to extract required fields.
For Case 1 :
select substr('0000149844.FXFWD.COIBI_I',1,(position('.' in '0000149844.FXFWD.COIBI_I') - 1));
For Case 2 :
select substr('25808.1234140AT',(position('.' in '25808.1234140AT') + 1));