Derived column expression only capture 10 characters - sql-server

I am new to SSIS and I have searched to find the solution to this question. Any help is most appreciated!
I have a flat file with data defined as dt_wstr, to change the datatype I am using a data conversion to set the [column] to dt_str(50)
I am also using a derived column - to add as a new column: The goal is write an expression
I have a [column] which is defined as 11 characters
My question is how do I write an expression to only capture 10 characters, and anything greater than 10 I want to change the [column] to -1 else (dt_I8) [column]
I've tried:
FINDSTRING([Column],"9999999999",1) == 10 ? -1 : (DT_I8)TRIM([Column])
FINDSTRING([Column],"9999999999",1) > 10 ? -1 : (DT_I8)TRIM([Column])
LEN([Column]) == 10 ? -1 : (DT_I8)[column]
SUBSTRING( [Copy of Member ID] ,1,10)
The package runs without errors however the results in the table are not correct, the column with more than 10 characters are not showing up in the table
I am using visual studio 2012
Thank you Dawana

I don't know why your substring attempt didn't work, but this would return the first 10 characters of column:
LEFT(column,10)
https://msdn.microsoft.com/en-us/library/hh231081(v=sql.110).aspx

Related

SQL String are same but case equals method returns false

I am using SQL to compare two columns and return TRUE/FALSE if they are equal.
In some cases, the two columns contain exactly the same string (no spaces or anything) but I am still getting false.
What may the reason for this be?
I am using this code:
CASE WHEN column1 = column2 THEN 0 ELSE 1 END AS [check]
The values are different despite the displayed value.
Using T-SQL, run a query like this to see the exact difference in the underlying raw values:
SELECT
column1
, CAST(column1 AS varbinary(MAX)) AS column1Binary
, column2
, CAST(column2 AS varbinary(MAX)) AS column2Binary
FROM dbo.YourTable;
This will reveal underlying differences like tabs or subtle character differences.
In fact, a likely explanation for what you are seeing is that one/both of the strings has leading and/or trailing whitespace. On SQL Server you may try:
CASE WHEN LTRIM(column1) = LTRIM(column2) THEN 0 ELSE 1 END AS [check]
If the above does not detect the problematical records, then try checking the length:
CASE WHEN LEN(column1) = LEN(column2) THEN 0 ELSE 1 END AS [check2]

SQL Server - How to get last numeric value in the given string

I am trying to get last numeric part in the given string.
For Example, below are the given strings and the result should be last numeric part only
SB124197 --> 124197
287276ACBX92 --> 92
R009321743-16 --> 16
How to achieve this functionality. Please help.
Try this:
select right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
Explanation:
Using PATINDEX with '%[^0-9]%' as a search pattern you get the starting position of the first occurrence of a character that is not a number.
Using REVERSE you get the position of the first non numeric character starting from the back of the string.
Edit:
To handle the case of strings not containing non numeric characters you can use:
select case
when patindex(#str, '%[^0-9]%') = 0 then #str
else right(#str, patindex('%[^0-9]%',reverse(#str)) - 1)
end
If your data always contains at least one non-numeric character then you can use the first query, otherwise use the second one.
Actual query:
So, if your table is something like this:
mycol
--------------
SB124197
287276ACBX92
R009321743-16
123456
then you can use the following query (works in SQL Server 2012+):
select iif(x.i = 0, mycol, right(mycol, x.i - 1))
from mytable
cross apply (select patindex('%[^0-9]%', reverse(mycol) )) as x(i)
Output:
mynum
------
124197
92
16
123456
Demo here
Here is one way using Patindex
SELECT RIGHT(strg, COALESCE(NULLIF(Patindex('%[^0-9]%', Reverse(strg)), 0) - 1, Len(strg)))
FROM (VALUES ('SB124197'),
('287276ACBX92'),
('R009321743-16')) tc (strg)
After reversing the string, we are finding the position of first non numeric character and extracting the data from that position till the end..
Result :
-----
124197
92
16

T-SQL ORDER BY ignores " '-' + ... " but not " '+' + ... "

So i recently encountered a wierd bug when comparing two values.
My values was a range from -1 to 2.
Sometimes it thought that -1 was bigger than 0, the solution was easy. Apparently was the column set to varchar(50) instead of int.
But this made me think why this happened. Because even if the column was set to varchar(50) the '-' should have a lower char value than '0' (charvalue for '-' is 45 and charvalue for '0' should be 48)
I made some tests and it turns out, what i can find, that '-' is the only character that ORDER BY doesn't care about.
Example:
SELECT
A.x
FROM
(
VALUES
('-5'), ('-4'), ('-3'), ('-2'), ('-1'),
('0'), ('1'), ('2'), ('3'), ('4'), ('5')
) A(x)
ORDER BY
A.x;
SELECT
B.x
FROM
(
VALUES
('+5'), ('+4'), ('+3'), ('+2'), ('+1'),
('0'), ('1'), ('2'), ('3'), ('4'), ('5')
) B(x)
ORDER BY
B.x
Result:
Result of A
0
1
-1
2
-2
3
-3
4
-4
5
-5
Result of B
+1
+2
+3
+4
+5
0
1
2
3
4
5
(+ has a charvalue of 43)
The '+' order by feels right but the '-' seems... wrong
Anyone knows why it is like this?
Additional info
Server version: 12.0.4213
Collation: Finnish_Swedish_CI_AS
No clue what else could skew the result. Ask if you need more information.
Found out why.
TLDR: Non-unicode and unicode collation sorts '-' differently.
"A SQL collation's rules for sorting non-Unicode data are incompatible
with any sort routine that is provided by the Microsoft Windows
operating system; however, the sorting of Unicode data is compatible
with a particular version of the Windows sorting rules. Because the
comparison rules for non-Unicode and Unicode data are different, when
you use a SQL collation you might see different results for
comparisons of the same characters, depending on the underlying data
type. For example, if you are using the SQL collation
"SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less
than the string 'ab' because the hyphen ("-") is sorted as a separate
character that comes before "b". However, if you convert these strings
to Unicode and you perform the same comparison, the Unicode string
N'a-c' is considered to be greater than N'ab' because the Unicode
sorting rules use a "word sort" that ignores the hyphen."
Source: https://support.microsoft.com/en-us/kb/322112

How do I match a substring of variable length?

I am importing data into my SQL database from an Excel spreadsheet.
The imp table is the imported data, the app table is the existing database table.
app.ReceiptId is formatted as "A" followed by some numbers. Formerly it was 4 digits, but now it may be 4 or 5 digits.
Examples:
A1234
A9876
A10001
imp.ref is a free-text reference field from Excel. It consists of some arbitrary length description, then the ReceiptId, followed by an irrelevant reference number in the format " - BZ-0987654321" (which is sometimes cropped short, or even missing entirely).
Examples:
SHORT DESC A1234 - BZ-0987654321
LONGER DESCRIPTION A9876 - BZ-123
REALLY LONG DESCRIPTION A2345 - B
REALLY REALLY LONG DESCRIPTION A23456
The code below works for a 4-digit ReceiptId, but will not correctly capture a 5-digit one.
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId = right(right(rtrim(replace(replace(imp.ref,'-',''),'B','')),5)
+ rtrim(left(imp.ref,charindex(' - BZ-',imp.ref))),5)
How can I change the code so it captures either 4 (A1234) or 5 (A12345) digits?
As ughai rightfully wrote in his comment, it's not recommended to use anything other then columns in the on clause of a join.
The reason for that is that using functions prevents sql server for using any indexes on the columns that it might use without the functions.
Therefor, I would suggest adding another column to imp table that will hold the actual ReceiptId and be calculated during the import process itself.
I think the best way of extracting the ReceiptId from the ref column is using substring with patindex, as demonstrated in this fiddle:
SELECT ref,
RTRIM(SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9]%', ref), 6)) As ReceiptId
FROM imp
Update
After the conversation with t-clausen-dk in the comments, I came up with this:
SELECT ref,
CASE WHEN PATINDEX('%[ ]A[0-9][0-9][0-9][0-9][0-9| ]%', ref) > 0
OR PATINDEX('A[0-9][0-9][0-9][0-9][0-9| ]%', ref) = 1 THEN
SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9][0-9| ]%', ref), 6)
ELSE
NULL
END As ReceiptId
FROM imp
fiddle here
This will return null if there is no match,
when a match is a sub string that contains A followed by 4 or 5 digits, separated by spaces from the rest of the string, and can be found at the start, middle or end of the string.
Try this, it will remove all characters before the A[number][number][number][number] and take the first 6 characters after that:
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId in
(
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 5),
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 6)
)
When using equal, the spaces after is not evaluated

Nonsense Error in SQL Server Management Studio

I have this query that spawns the following error:
SELECT * FROM Quota
WHERE LEFT(QtLabel, LEN(QtLabel)-2) IN (
'1032',
'3300',
'9682'
)
Msg 536, Level 16, State 5, Line 1
Invalid length parameter passed to the SUBSTRING function.
Am I doing something wrong? It tends to show up when I use the LEN() function. Might it be a datatype issue?
Is it possible that LEN(QtLabel) <= 2 ?
Are you sure that each QtLabel field is longer than 2 characters?
LEFT probably uses SUBSTRING internally. What happens if the length of QtLabel is <= 2?
It's because some QtLabel values don't contain > 2 characters, so you end up trying to do a LEFT() with a negative value as the number to restrict to.
In your scenario, you're assuming all QtLabel values are 6 characters, so uou should do:
SELECT * FROM Quota
WHERE LEN(QtLabel) = 6
AND LEFT(QtLabel, LEN(QtLabel)-2) IN (
'1032',
'3300',
'9682'
)
SELECT LEFT('MyText', -2)
will throw the same error

Resources