Properly Using String Functions in Stored Procedures - sql-server

I have an SSIS package that imports data into SQL Server. I have a field that I need to cut everything after and including "-". The following is my script:
SELECT LEFT(PartNumber, CHARINDEX('-', PartNumber) - 1)
FROM ExtensionBase
My question is where in my stored procedure should I use this script so that it cuts before entering data into the ExtensionBase. Can I do this in a Scalar_Value Function?

You have two routes available to you. You can use Derived Columns and the Expressions to generate this value or use a Script Transformation. Generally speaking, reaching for a script first is not a good habit for maintainability and performance in SSIS but the other rule of thumb is that if you can't see the entire expression without scrolling, it's too much.
Dataflow
Here's a sample data flow illustrating both approaches.
OLE_SRC
I used a simple query to simulate your source data. I checked for empty strings, part numbers with no dashes, multiple dashes and NULL.
SELECT
D.PartNumber
FROM
(
VALUES
('ABC')
, ('def-jkl')
, ('mn-opq-rst')
, ('')
, ('Previous line missing')
, (NULL)
) D(PartNumber);
DER Find dash position
I'm going to use FINDSTRING to determine the starting point. FINDSTRING is going to return a zero if the searched item doesn't exist or the input is NULL. I use the ternary operator to either return the position of the first dash, less a space to account for the dash, or the length of the source string.
(FINDSTRING(PartNumber,"-",1)) > 0
? (FINDSTRING(PartNumber,"-",1)) - 1
: LEN(PartNumber)
I find it helpful in these situations to first compute the positions before trying to use them later. That way if you make an error, you don't have to fix multiple formulas.
DER Trim out dash plus
The 2012 release of SSIS provided us with the LEFT function while previous editions users had to make due with SUBSTRING calls.
The LEFT expression would be
LEFT(PartNumber,StartOrdinal)
whilst the SUBSTRING is simply
SUBSTRING(PartNumber,1,StartOrdinal)
SCR use net libraries
This approach is to use the basic string capabilities of .NET to make life easier. The String.Split method is going to return an array of strings, less what we split upon. Since you only want the first thing, the zero-eth element, we will assign that to our SCRPartNumber column that is created in this task. Note that we check whether the PartNumber is null and sets the null flag on our new column.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (!Row.PartNumber_IsNull)
{
string[] partSplit = Row.PartNumber.Split('-');
Row.SCRPartNumber = partSplit[0];
}
else
{
Row.SCRPartNumber_IsNull = true;
}
}
Results
You can see the results are the same however you compute them.

Related

Customize Normalization in SQL Server Full Text Search by replacing characters

I want to customize SQL Server FTS to handle language specific features better.
In many language like Persian and Arabic there are similar characters that in a proper search behavior they should consider as identical char like these groups:
['آ' , 'ا' , 'ء' , 'ا']
['ي' , 'ی' , 'ئ']
Currently my best solution is to store duplicate data in new column and replace these characters with a representative member and also normalize search term and perform search in the duplicated column.
Is there any way to tell SQL Server to treat any members of these groups as an identical character?
as far as i understand ,this would be used for suggestioning purposes so the being so accurate is not important. so
in farsi actually none of the character in list above doesn't share same meaning but we can say they do have a shared short form in some writing cases ('آ' != 'اِ' but they both can write as 'ا' )
SCENARIO 1 : THE INPUT TEXT IS IN COMPLETE FORM
imagine "محمّد" is a record in a table formatted (id int,text nvarchar(12))named as 'table'.
after removing special character we can use following command :
select * from [db].[dbo].[table] where text REPLACE(text,' ّ ','') = REPLACE(N'محمد',' ّ ','');
the result would be
SCENARIO 2: THE INPUT IS IN SHORT FORMAT
imagine "محمد" is a record in a table formatted (id int,text nvarchar(12))named as 'table'.
in this scenario we need to do some logical operation on text before we query in data base
for e.g. if "محمد" is input as we know and have a list of this special character ,it should be easily searched in query as :
select * from [db].[dbo].[table] where REPLACE(text,' ّ ','') = 'محمد';
note:
this solution is not exactly a best one because the input should not be affected in client side it, would be better if the sql server configure to handle this.
for people who doesn't understand farsi simply he wanna tell sql that َA =["B","C"] and a have same value these character in the list so :
when a "dad" word searched, if any word "dbd" or "dcd" exist return them too.
add:
some set of characters can have same meaning some of some times not ( ['ي','أ'] are same but ['آ','اِ'] not) so in we got first scenario :
select * from [db].[dbo].[table] where text like N'%هی[أي]ت' and text like N'هی[أي]ت%';

SQL Server: STRING_SPLIT() result in a computed column

I couldn't find good documentation on this, but I have a table that has a long string as one of it's columns. Here's some example data of what it looks like:
Hello:Goodbye:Apple:Orange
Example:Seagull:Cake:Chocolate
I would like to create a new computed column using the STRING_SPLIT() function to return the third value in the string table.
Result #1: "Apple"
Result #2: "Cake"
What is the proper syntax to achieve this?
At this time your answer is not possible.
The output rows might be in any order. The order is not guaranteed to
match the order of the substrings in the input string.
STRING_SPLIT reference
There is no way to guarantee which item was the third item in the list using string_split and the order may change without warning.
If you're willing to build your own, I'd recommend reading up on the work done by
Brent Ozar and Jeff Moden.
You shouldn't be storing data like that in the first place. This points to a potentially serious database design problem. BUT you could convert this string into JSON by replacing : with ",", surround it with [" and "] and retrieve the third array element , eg :
declare #value nvarchar(200)='Example:Seagull:Cake:Chocolate'
select json_value('["' + replace(#value,':','","' )+ '"]','$[2]')
The string manipulations convert the string value to :
["Example","Seagull","Cake","Chocolate"]
After that, JSON_VALUE parses the JSON string and retrieves the 3rd item in the array using a JSON PATH expression.
Needless to say, this will be slow and can't take advantage of indexing. If those values are meant to be read or written individually, they should be stored in separate columns. They'll probably take less space than one long string.
If you have a lot of optional fields but only a subset contain values at any time, you could use sparse columns. This way you could have thousands of rows, only a few of which would contain data at any time

Snowflake:Export data in multiple delimiter format

Requirement:
Need the file to be exported as below format, where gender, age, and interest are columns and value after : is data for that column. Can this be achieved while using Snowflake, if not is it possible to export data using Python
User1234^gender:male;age:18-24;interest:fishing
User2345^gender:female
User3456^age:35-44
User4567^gender:male;interest:fishing,boating
EDIT 1: Solution as given by #demircioglu
It displays as NULL values instead of other column values
Below the EMPLOYEES table data
When I ran below query
SELECT 'EMP_ID'||EMP_ID||'^'||'FIRST_NAME'||':'||FIRST_NAME||';'||'LAST_NAME'||':'||LAST_NAME FROM tempdw.EMPLOYEES ;
Create your SQL with the desired format and write it to a file
COPY INTO #~/stage_data
FROM
(
SELECT 'User'||User||'^'||'gender'||':'||gender||';'||'age'||':'||age||';'||'interest'||':'||interest FROM table
)
file_format = (TYPE=CSV compression='gzip')
File format here is not important because each line will be treated as a field because of your delimiter requirements
Edit:
CONCAT function (aliased with ||) returns NULL if you have a NULL value.
In order to eliminate NULLs you can use NVL2 function
So your SQL will have series of NVL2s
NVL2 checks the first parameter and if it's not NULL returns first expression, if it's NULL returns second expression
So for User column
'User'||User||'^' will turn into
NVL2(User,'User','')||NVL2(User,User,'')||NVL2(User,'^','')
P.S. I am leaving up to you to create the rest of the SQL, because Stackoverflow's function is to help find the solution, not spoon feed the solution.
No, I do not believe multiple delimiters like this are supported in Snowflake at this time. Multiple byte and multiple character delimiters are supported, but they will need to be specified as the same delimiter repeated for either record or line.
Yes, it may be possible to do some post-processing or use Python scripts to achieve this. Or even SQL transformative statements. This is not really my area of expertise so if someone has an example for you, I'll let them add to the discussion.

Remove Duplicate adjacent Sub-String from String in Microsoft SQL Server

I am using SQL Server 2008 and I have a column in a table, which has values like below. It basically shows departure and arrival information.
-->Heathrow/Dublin*Dublin/Heathrow
-->Gatwick/Liverpool*Liverpool/Carlisle *Carlisle/Gatwick
-->Heathrow/Dublin*Liverpool/Heathrow
(The 3rd example shown above is slightly different where the person did not depart from Dublin, instead departed from a Liverpool).
This makes the column too lengthy, and I want to remove only the adjacent duplicates, so the information can be shown like below:
-->Heathrow/Dublin/Heathrow
-->Gatwick/Liverpool/Carlisle/Gatwick
-->Heathrow/Dublin***Liverpool/Heathrow
So, this would still show the correct travel route, but omits only the contiguous duplicates. Also, in the 3rd case, since the departure and arrival information location is not the same, Iwould like to show it as ***.
I found a post here that removes all duplicates (Find and Remove Repeated Substrings) but this is slightly different from the solution that I need.
Could someone share any thoughts please?
The first step is to adapt the process defined in the following link so that it splits based on /:
T-SQL split string
This returns a table which you would then loop through checking if the value contains an *. In that case you would get the text values before and after the * and compare them. Use CHARINDEX to get the position of the *, and SUBSTRING to get the values before and after. Once you have those check both values and append to your output string accordingly.
So you have a database column that contains this text string? Is your concern to display the data to the user in a new format, or to update the data in your database table with a new value?
Do you have access to the original data from which this text string was built? It would probably be easier to re-create the string in the format you desire than it would be to edit the existing string programmatically.
If you don't have access to this data, it would probably be a lot simpler to update your data (or reformat it for display) if you do the string manipulation in a high-level language such as c# or java.
If you're reformatting it for display, write the string manipulation code in whatever language is appropriate, right before displaying it. If you're updating your table, you could write a program to process the table, reading each record, building the replacement string, and updating the record before moving on to the next one.
The bottom line is that T-SQL is just not a good language for doing this sort of string examination and manipulation. If you can build a fresh string from the original data, or do your manipulation in a high-level language, you'll have an easier job of it and end up with more maintainable code.
I wrote a code for the first example you gave. You still need to
improve it for the rest ...
DECLARE #STR VARCHAR(50)='Heathrow/Dublin*Dublin/Heathrow'
IF (SELECT SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1)) =
(SELECT SUBSTRING(#STR,CHARINDEX('*',#STR)+1,LEN(SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1))))
BEGIN
SELECT STUFF(#STR,CHARINDEX('*',#STR),LEN(SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1))+1,'')
END
ELSE
BEGIN
SELECT STUFF(#STR,CHARINDEX('*',#STR),LEN(SUBSTRING(#STR,CHARINDEX('*',#STR)+1,LEN(SUBSTRING(#STR,CHARINDEX('/',#STR)+1,CHARINDEX('*',#STR)-CHARINDEX('/',#STR)-1)))),'***')
END

select first two characters of values in a concatenated string

I am trying to create a formula field that checks a string that is a series of concatenated values separated by a comma. I want to check the first two characters of each comma separated value in the string. For example, the string pattern could be: abcd,efgh,ijkl,mnop,qrst,uvwx
In my formula I'd like to check if the first two characters are 'ab','ef'
If so, I would return true, else false.
Thanks.
To do this properly, you need to use a regular expression. Unfortunately the REGEX function is not available in formula fields. It is, however, available in formulas in Validation Rules and in Workflow Rules. You can, therefore, specify the below formula in either of a Validation or Workflow Rule:
OR(
AND(
NOT(
BEGINS( KXENDev__Languages__c, "ab" )
),
NOT(
BEGINS( KXENDev__Languages__c, "ef" )
)
),
REGEX( KXENDev__Languages__c , ".*,(?!ab|ef).*")
)
If it's a Validation Rule, you're done -- this formula will create an error if any of the entries do not start with "ab" or "ef". If it's a Workflow Rule, then you can add a Field Update to it to update some field with False when this formula is true (if this formula is true then there is at least one item that doesn't start with ab or ef, so that would make your field False).
Some may ask "What's with the BEGINS statements? Couldn't you have done this all with one REGEX?" Yes, I probably could, but that makes for an increasingly complex REGEX statement, and these are quite difficult to debug in Salesforce.com, so I prefer to keep my REGEXes in Salesforce.com as simple as possible.
I suggest you to search for ',ab' and ',ef' using CONTAINS method. But first of all you need to re implement method which composes this string so it puts ',' before first substring. At the end returned string should look like ',abcd,efgh,ijkl,mnop,qrst,uvwx'.
If you are not able to re implement method which compose this string use LEFT([our string goes here],2) method to check first two chars.

Resources