SQL: Fix for CSV import mistake - sql-server

I have a database that has multiple columns populated with various numeric fields. While trying to populate from a CSV, I must have mucked up assigning delimited fields. The end result is a column containing It's Correct information, but also contains the next column over's data- seperated by a comma.
So instead of Column UPC1 containing "958634", it contains "958634,95877456". The "95877456" is supposed to be in the UPC2 column, instead UPC2 is NULL.
Is there a way for me to split on the comma and send the data to UPC2 while keeping UPC1 data before the comma in tact?
Thanks.

You can do this with string functions. To query the values and verify the logic, try this:
SELECT
LEFT(UPC1, CHARINDEX(',', UPC1) - 1),
SUBSTRING(UPC1, CHARINDEX(',', UPC1) + 1, 1000)
FROM myTable;
If the result is what you want, turn it into an update:
UPDATE myTable SET
UPC1 = LEFT(UPC1, CHARINDEX(',', UPC1) - 1),
UPC2 = SUBSTRING(UPC1, CHARINDEX(',', UPC1) + 1, 1000);
The expression for UPC1 takes the left side of UPC1 up to one character before the comma.
The expression for UPC2 takes the remainder of the UPC1 string starting one character after the comma.
The third argument to SUBSTRING needs some explaining. It's the number of characters you want to include after the starting position of the string (which in this case is one character after the comma's location). If you specify a value that's longer than the string SUBSTRING will just return to the end of the string. Using 1000 here is a lot easier than calculating the exact number of characters you need to get to the end.

Related

SQL - Replace string function is not working as intended

I have a simple string; for example,'01023201580001'.
I would like to replace the last two characters of this string; '01', with '00'.
I could extract the last two characters from this string as RIGHT(columname,2) and then use
REPLACE([columname], RIGHT([columname], 2), '00') as newColumnString
But in the result, it replaces the first two characters as well?
Expected result: 01023201580000
Result I get: 00023201580000
What am I doing wrong?
The second argument to the replace() function defines a pattern to match. The function will look for all instances of that pattern in the target string (first argument) and replace them with the replacement text (third argument).
If you know you only need to change the last two characters, you can take the value excluding those characters and then append the characters you want:
select left(columname, len(columname) - 2) + '00';
If you are doing this for an entire column and some of the rows might not end with '01', you can filter those out:
update MyTable
set columname = left(columname, len(columname) - 2) + '00'
where columname like '%01';
You could also use stuff() in a similar way.
In SQL server, you can use substring like so:
DECLARE #s NVARCHAR(20) = N'01023201580001';
DECLARE #ReplaceWith NVARCHAR(20) = N'00';
SELECT SUBSTRING(#s, 0, LEN(#s) - 1) + #ReplaceWith;
Output: 01023201580000

Why does the EXCEPT clause trim whitespace at the end of text?

I read through the documentation for the SqlServer EXCEPT operator and I see no mention of explicit trimming of white space at the end of a string. However, when running:
SELECT 'Test'
EXCEPT
SELECT 'Test '
no results are returned. Can anyone explain this behavior or how to avoid it when using EXCEPT?
ANSI SQL-92 requires strings to be the same length before comparing and the pad character is a space.
See https://support.microsoft.com/en-us/help/316626/inf-how-sql-server-compares-strings-with-trailing-spaces for more information
In the ANSI standard (accessed here section 8.2 )
3) The comparison of two character strings is determined as follows:
a) If the length in characters of X is not equal to the length
in characters of Y, then the shorter string is effectively
replaced, for the purposes of comparison, with a copy of
itself that has been extended to the length of the longer
string by concatenation on the right of one or more pad char-
acters, where the pad character is chosen based on CS. If
CS has the NO PAD attribute, then the pad character is an
implementation-dependent character different from any char-
acter in the character set of X and Y that collates less
than any string under CS. Otherwise, the pad character is a
<space>.
b) The result of the comparison of X and Y is given by the col-
lating sequence CS.
c) Depending on the collating sequence, two strings may com-
pare as equal even if they are of different lengths or con-
tain different sequences of characters. When the operations
MAX, MIN, DISTINCT, references to a grouping column, and the
UNION, EXCEPT, and INTERSECT operators refer to character
strings, the specific value selected by these operations from
a set of such equal values is implementation-dependent.
If this behaviour must be avoided, you can reverse the columns as part of your EXCEPT:
SELECT 'TEST', REVERSE('TEST')
EXCEPT
SELECT 'TEST ', REVERSE('TEST ')
which gives the expected result, though is quite annoying especially if you're dealing with multiple columns.
The alternative would be to find a collating sequence with an alternate pad character or a no pad option set, though this seems to not exist in t-sql after a quick google.
Alternatively, you could terminate each column with a character and then substring it out in the end:
SELECT SUBSTRING(col,1,LEN(col) -1) FROM
(
SELECT 'TEST' + '^' as col
EXCEPT
SELECT 'TEST ' + '^'
) results

Oracle: Check if number column contains a value from a formatted string of numbers

In my local table, I am try to check if an Oracle Number column called JOBNUMBER has a value that exists in a string parameter. Technically I am passing in the string as a stored procedure nvarchar2 parameter, but for simplicity, I hardcoded the string in my Query below:
SELECT FIRST_NAME, JOB_NUMBER
FROM JOBTABLE
WHERE TO_CHAR(JOB_NUMBER) IN ('00052, 00048');
When Oracle runs the query above, it returns no values even though 00052 is a number value in the table column for JOB_NUMBER. I'm thinking that it checks for the whole string ('00052, 00048') in JOB_NUMBER and can't find it, so it returns no values. The string will contain different values each time, and there will several numbers (of type string) in that string.
Does anyone know how to do this?
The trick is to keep the leading zeroes of the number when comparing to the string, then looping through the string to compare. Here a CTE is used is to simulate creating a numeric job number and a string to search. The TO_CHAR function makes sure to preserve the leading zeroes and the FM format removes the leading space that TO_CHAR leaves for the sign. CONNECT BY loops through the elements for the count of the delimiter + 1 times, keeping the count in the value in 'LEVEL'. This value is used in REGEXP_SUBSTR to iterate through the elements to compare the converted numeric value to each element to see if a match is found. Note this regular expression allows for NULL elements should you need to know which item in the list is your match.
SQL> with tbl(job_nbr_in, job_str_in) as (
select 00052, '00052, 00048' from dual
)
select --level element_nbr,
to_char(job_nbr_in, 'FM00000') search_for, job_str_in in_string,
regexp_substr(job_str_in, '(.*?)(, |$)', 1, level, NULL, 1) found
from tbl
where to_char(job_nbr_in, 'FM00000') = regexp_substr(job_str_in, '(.*?)(, |$)', 1, level, NULL, 1)
connect by level <= regexp_count(job_str_in, ',')+1;
SEARCH_FOR IN_STRING FOUND
---------- ------------ ------------
00052 00052, 00048 00052
If you are not sure if you will always have a space after the comma, remove spaces with REPLACE and adjust the delimiter in REGEXP_SUBSTR:
with tbl(job_nbr_in, job_str_in) as (
select 00052, '00052, 00048' from dual
)
select to_char(job_nbr_in, 'FM00000') search_for, job_str_in in_string,
regexp_substr(replace(job_str_in, ' '), '(.*?)(,|$)', 1, level, NULL, 1) found
from tbl
where to_char(job_nbr_in, 'FM00000') = regexp_substr(replace(job_str_in, ' '), '(.*?)(,|$)', 1, level, NULL, 1)
connect by level <= regexp_count(job_str_in, ',')+1;

T-SQL 2008: Parse String

I have the following location as a string:
\\Windows\UnitB\CU1234_001\
I want to return the CU1234_001 part only. The query which I need to use needs to be dynamic since this string will change and it could be longer or shorter (it will all the time end in "\".
I've tried to used something like this but this just eliminate the last "\" and returns the rest of the string:
select
substring('\\Windows\UnitB\CU1234_001\',
1, (len('\\Windows\UnitB\CU1234_001\') - (Charindex('\',
reverse(rtrim('\\Windows\UnitB\CU1234_001\'))))))
You can use a combination of string functions to extract what you want:
SELECT REVERSE(SUBSTRING(REVERSE(col),
2,
CHARINDEX('/', REVERSE(col), 2) - 2))
FROM yourTable

Get element of character varying type by index or convert it to array

I have some sql function that returns character varying type. The output is something like this:
'TTFFFFNN'. I need to get this characters by index. How to convert character varying to array?
Use string_to_array() with NULL as delimiter (pg 9.1+):
SELECT string_to_array('TTFFFFNN'::text, NULL) AS arr;
Per documentation:
In string_to_array, if the delimiter parameter is NULL, each character
in the input string will become a separate element in the resulting array.
In older versions (pg 9.0-), the call with NULL returned NULL. (Fiddle.)
To get the 2nd position (example):
SELECT (string_to_array('TTFFFFNN'::text, NULL))[2] AS item2;
Alternatives
For single characters I would use substring() directly, like #a_horse commented:
SELECT substring('TTFFFFNN'::text, 2, 1) AS item2;
SQL Fiddle showing both.
For strings with actual delimiters, I suggest split_part():
Split comma separated column data into additional columns
Only use regexp_split_to_array() if you must. Regular expression processing is more expensive.
In addition to the solutions outlined by Erwin and a horse, you can use regexp_split_to_array() with an empty regexp instead:
select regexp_split_to_array('TTFFFFNN'::text, '');
With an index, that becomes:
select (regexp_split_to_array('TTFFFFNN'::text, ''))[2];

Resources