I have received text file like below format
|1842035 |023851-005WA_ABCD|
|1842035 |023851-005WA_ABCD|
|1842035 |023851-005WA_ABCD|
|2270691 |023851-005WA_ABCD|
973186 023851-005WA_ABCD
973186 023851-005WA_ABCD
973186 023851-005WA_ABCD
973186 023851-005WA_ABCD
994087 023851-005WA_ABCD
How do we load this to Snowflake ? It seems, these are the only format we can receive the files in text.
If the individual lines in the files are less than 16 Mb each, you can use the delimited (CSV) file format to read each whole line into a single column. From there you can transform the data using the following SQL:
with DATA as
(
select $1 as LINE from values
('|1842035 |023851-005WA_ABCD|'),
('|1842035 |023851-005WA_ABCD|'),
('|1842035 |023851-005WA_ABCD|'),
('|2270691 |023851-005WA_ABCD|'),
(' 973186 023851-005WA_ABCD'),
(' 973186 023851-005WA_ABCD'),
(' 973186 023851-005WA_ABCD'),
(' 973186 023851-005WA_ABCD'),
(' 994087 023851-005WA_ABCD')
), CLEANED as
(
select regexp_replace(trim(replace(LINE, '|', ' ')), '\\s+', ' ') as A from DATA
)
select split_part(A,' ',1)::int as COL_1, split_part(A,' ',2) as COL_2 from CLEANED
;
COL_1
COL_2
1842035
023851-005WA_ABCD
1842035
023851-005WA_ABCD
1842035
023851-005WA_ABCD
2270691
023851-005WA_ABCD
973186
023851-005WA_ABCD
973186
023851-005WA_ABCD
973186
023851-005WA_ABCD
973186
023851-005WA_ABCD
994087
023851-005WA_ABCD
Here is a file format that will read a whole line in a file as a single column. You may need to adjust it slightly for your files. It basically specifies that the row delimiter is a newline and there is no column delimiter:
ALTER FILE FORMAT "TEST"."PUBLIC".READ_LINES SET COMPRESSION = 'AUTO'
FIELD_DELIMITER = 'NONE' RECORD_DELIMITER = '\n' SKIP_HEADER = 0
FIELD_OPTIONALLY_ENCLOSED_BY = 'NONE' TRIM_SPACE = FALSE
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE ESCAPE = 'NONE'
ESCAPE_UNENCLOSED_FIELD = '\134' DATE_FORMAT = 'AUTO' TIMESTAMP_FORMAT ='AUTO'
NULL_IF = ('\\N');
Related
After using Ltrim(Column name) or Rtrim(Column Name) still there are undetected characters.
I tried to search the specific character with 10 characters on the table but still undetected. There are 2 extra spaces at the end of the word and trim function is not working.
Please try the following user defined function (UDF)
SQL
/*
1. All invisible TAB, Carriage Return, and Line Feed characters will be replaced with spaces.
2. Then leading and trailing spaces are removed from the value.
3. Further, contiguous occurrences of more than one space will be replaced with a single space.
*/
CREATE FUNCTION dbo.udf_tokenize(#input VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN (SELECT CAST('<r><![CDATA[' + #input + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)'));
END
Test harness
-- DDL and sample data population, start
DECLARE #mockTbl TABLE (ID INT IDENTITY(1,1), col_1 VARCHAR(100), col_2 VARCHAR(100));
INSERT INTO #mockTbl (col_1, col_2)
VALUES (' FL ', ' Miami')
, (' FL ', ' Fort Lauderdale ')
, (' NY ', ' New York ')
, (' NY ', '')
, (' NY ', NULL);
-- DDL and sample data population, end
-- before
SELECT * FROM #mockTbl;
-- remove invisible chars
UPDATE #mockTbl
SET col_1 = dbo.udf_tokenize(col_1)
, col_2 = dbo.udf_tokenize(col_2);
-- after
SELECT *, LEN(col_2) AS [col_2_len] FROM #mockTbl;
I'm having difficulty with a T-SQL query that joins 2 tables using a character column. I suspect that there are some whitespace differences causing the problem but have not been able to track them down. In order to test this theory I'd like to strip all of the whitespaces from the joining columns and see if that resolves the issue. Unfortunately, I'm stuck on how to remove all whitespaces in a T-SQL string. Here is a simple example showing what I've tried (see the test columns):
select
str,
test1 = replace(str, '\\s+' , ''),
test2 = replace(str, '[\s]*' , '')
from
(
values
(''),
(' '),
(' xyz'),
('abc '),
('hello world')
) d (str);
Is there a way to get this to work in T-SQL?
Clarification: by white space, I mean to strip out ALL of the following:
\s white space (space, \r, \n, \t, \v, \f)
' ' space
\t (horizontal) tab
\v vertical tab
\b backspace
\r carriage return
\n newline
\f form feed
\u00a0 non-breaking space
This piece of code helped figure out exactly what kind of whitespace was present in the original query that had the join issue:
select distinct
fieldname,
space = iif(charindex(char(32), fieldname) > 0, 1, 0),
horizontal_tab = iif(charindex(char(9), fieldname) > 0, 1, 0),
vertical_tab = iif(charindex(char(11), fieldname) > 0, 1, 0),
backspace = iif(charindex(char(8), fieldname) > 0, 1, 0),
carriage_return = iif(charindex(char(13), fieldname) > 0, 1, 0),
newline = iif(charindex(char(10), fieldname) > 0, 1, 0),
formfeed = iif(charindex(char(12), fieldname) > 0, 1, 0),
nonbreakingspace = iif(charindex(char(255), fieldname) > 0, 1, 0)
from tablename;
It turned out there were carriage returns and new line feeds in the data of one of the tables. So using #scsimon's solution this problem was resolved by changing the join to this:
on REPLACE(REPLACE(a.fieldname, CHAR(10), ''), CHAR(13), '') = b.fieldname
Do you mean this? It removes a whitespace (as in created with a spacebar) character
Replace(str,' ', '')
I would suggest that you remove spaces and tabs (4 spaces):
SELECT REPLACE(REPLACE(str,' ', ''), char(9), '')
Replace(str,' ', '')
worked for me.
Check this out:
select
myString,
test1 = replace(myString, ' ' , ''),
test2 = replace(myString, ' ' , '')
from
(
values
(''),
(' '),
(' xyz'),
('abc '),
('hello world')
) d (myString);
Screenshot of output:
I have one column for comment and I need to show this for one report.
Here what happen some time, users uses multiple enters in comment box. I can not access code part I need to manage this thing in SQL only.
So I have removed unwanted
1 /r/n
2 /n/n
from using
REPLACE(REPLACE(Desc, CHAR(13)+CHAR(10), CHAR(10)),CHAR(10)+CHAR(10), CHAR(10)) as Desc,
Now I want to remove any \r or \n from starting or ending of the string if any
By the way you meant in your question:(Remove char(10) or char(13) from specific string)
Note: You should see the output result by switching your resultset output to Results to Text(Ctrl+T).
Results to Text
Results to Grid
Use TRIM check here
Example : UPDATE tablename SET descriptions = TRIM(TRAILING "<br>" FROM descriptions)
if you want to replace newline then use something like below
SELECT REPLACE(REPLACE(#str, CHAR(13), ''), CHAR(10), '')
or
DECLARE #testString varchar(255)
set #testString = 'MY STRING '
/*Select the string and try to copy and paste into notepad and tab is still there*/
SELECT testString = #testString
/*Ok, it seems easy, let's try to trim this. Huh, it doesn't work, the same result here.*/
SELECT testStringTrim = RTRIM(#testString)
/*Let's try to get the size*/
SELECT LenOfTestString = LEN(#testString)
/*This supposed to give us string together with blank space, but not for tab though*/
SELECT DataLengthOfString= DATALENGTH(#testString)
SELECT ASCIIOfTab = ASCII(' ')
SELECT CHAR(9)
/*I always use this like a final solution*/
SET #testString = REPLACE(REPLACE(REPLACE(#testString, CHAR(9), ''), CHAR(10), ''), CHAR(13), '') SELECT #testString
/*
CHAR(9) - Tab
CHAR(10) - New Line
CHAR(13) - Carriage Return
*/
Reference
select dbo.trim('abc','c') -- ab
select dbo.trim('abc','a') -- bc
select dbo.trim(' b ',' ') -- b
Create a user-define-function: trim()
trim from both sides
trim any letter: space, \r, \n, etc
Create FUNCTION Trim
(
#Original varchar(max), #letter char(1)
)
RETURNS varchar(max)
AS
BEGIN
DECLARE #rtrim varchar(max)
SELECT #rtrim = iif(right(#original, 1) = #letter, left(#original,datalength(#original)-1), #original)
return iif( left(#rtrim,1) = #letter, right(#rtrim,datalength(#rtrim)-1),#rtrim)
END
I have a data flow with over 150 columns and many of them are of data type string, I need to remove comma's and double quotes from the value of every text column because they are causing issues when I export the data to CSV, is there an easy way to do that other than doing it explicitly for every column in a derived column or script compnent?
In the script generator below, put all your column names from the CSV in the order that you want, and run it.
;With ColumnList as
(
Select 1 Id, 'FirstColumn' as ColumnName
Union Select 2, 'SecondColumn'
Union Select 3, 'ThirdColumn'
Union Select 4, 'FourthColumn'
Union Select 5, 'FifthColumn'
)
Select 'Trim (Replace (Replace (' + ColumnName + ', '','', ''''), ''"'', ''''))'
From ColumnList
Order BY Id
The Id column should contain a proper sequence (I would generate that in EXCEL. Here is the output
---------------------------------------------------------
Trim (Replace (Replace (FirstColumn, ',', ''), '"', ''))
Trim (Replace (Replace (SecondColumn, ',', ''), '"', ''))
Trim (Replace (Replace (ThirdColumn, ',', ''), '"', ''))
Trim (Replace (Replace (FourthColumn, ',', ''), '"', ''))
Trim (Replace (Replace (FifthColumn, ',', ''), '"', ''))
(5 row(s) affected)
You could just cut and paste from here into your SSIS dataflow.
I found a way to loop through columns in a script component, then i was able to check the column data type and do a replace function, here is the post i used.
I have columns that contain empty spaces with the data:
example:| fish |
how can I update the column so my result will be : |Fish| ?
in oracle I can trim the column:
update Example set column1 = trim(column1)
I google it and i notice that ASE doesnt supoort trim.
I found that str_replace(column1, ' ', '') does not actually replace the spaces.
Switching the '' for null works:
create table Example (column1 varchar(15))
insert into Example (column1) values ('| fish |')
select * from Example
-- produces "| fish |"
update Example set column1 = str_replace(column1, ' ', null)
select * from Example
-- produces "|fish|"
drop table Example
You can use combine of rtrim and ltrim
update Example set column1 = rtrim(ltrim(column1))
or str_replace
update Example set column1 = str_replace(column1,' ','')