For example, I have this string in an SSMS DB:
Sent for processing:1 ;DK A/C-WestlySnipes:ACCT NOT FOUND ;DK A/C-SonyaBlade:ACCT NOT FOUND
What I want to be able to do is pull out WestleySnipes and SonyaBlade from within that string and store it in a temp table. The names can be different and there can be more than one name within a particular string.
I've tried using substrings combined with CharIndex but I can only pull out 1 value. Not sure how to pull out multiple names from within the same string where the names can change in any given string.
In SQL Server 2016 and later, STRING_SPLIT(string, delimiter) is a table valued function. You can split your text string with a ; delimiter like so. (fiddle).
SELECT value FROM STRING_SPLIT(
'Sent for processing:1 ;DK A/C-WestlySnipes:ACCT NOT FOUND ;DK A/C-SonyaBlade:ACCT NOT FOUND',
';')
You get back this:
value
Sent for processing:1
DK A/C-WestlySnipes:ACCT NOT FOUND
DK A/C-SonyaBlade:ACCT NOT FOUND
Then you can use ordinary string-editing functions on value in your query to extract the precise substrings you need. Something like this will work for your specific example. (fiddle.)
SELECT
value,
REPLACE(REPLACE(value, 'DK A/C-', ''), ':ACCT NOT FOUND', '') val
FROM STRING_SPLIT(
'Sent for processing:1 ;DK A/C-WestlySnipes:ACCT NOT FOUND ;DK A/C-SonyaBlade:ACCT NOT FOUND',
';')
WHERE value LIKE 'DK %'
Related
I read through snowflake documentation and the web and found only one solution to my problem by https://stackoverflow.com/users/12756381/greg-pavlik which can be found here Snowflake JSON to tabular
This doesn't work on data with Russian attribute names and attribute values. What modifications can be made for this to fit my case?
Here is an example:
create or replace table target_json_table(
v variant
);
INSERT INTO target_json_table SELECT parse_json('{
"at": {
"cf": "NV"
},
"pd": {
"мо": "мо",
"ä": "ä",
"retailerName": "retailer",
"productName":"product"
}
}');
call create_view_over_json('target_json_table', 'V', 'MY_VIEW');
ERROR: Encountered an error while creating the view. SQL compilation error: syntax error line 7 at position 7 unexpected 'ä:'. syntax error line 8 at position 7 unexpected 'мо'.
There was a bug in the original SQL used as a basis for the creation of the stored procedure. I have corrected that. You can get an update on the Github page. The changed section is here:
sql =
`
SELECT DISTINCT '"' || array_to_string(split(f.path, '.'), '"."') || '"' AS path_nAme, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING') AS attribute_type, -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
'"' || array_to_string(split(f.path, '.'), '.') || '"' AS alias_name -- This generates column aliases based on the path
FROM
#~TABLE_NAME~#,
LATERAL FLATTEN(#~COL_NAME~#, RECURSIVE=>true) f
WHERE TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[') -- This prevents traversal down into arrays
limit ${ROW_SAMPLE_SIZE}
`;
Previously this SQL simply replaced non-ASCII characters with underscores. The updated SQL will wrap key names in double quotes to create non-ASCII key names.
Be sure that's what you want it to do. Also, the keys are nested. I decided that the best way to handle that is to create column names in the view with dot notation, for example one column name is pd.ä. That will require wrapping the column name with double quotes, such as:
select * from MY_VIEW where "pd.ä" = 'ä';
Final note: The name of your stored procedure is create_view_over_json, however, in the Github project the name is create_view_over_variant. When you update, be sure to call the right procedure.
I'm having trouble creating a table in Athena - that points at files with the following format:
string, string, string, array.
when I wrote the file - I delimited the array items with '|'.
I delimited each line with '\n' and each column with ','.
so for example, a row in my CSV would look like that:
Garfield, 15, orange, fish|milk|lasagna
in hive (according to the documentation i read)- when creating a table with a row delimited format - while stating the delimiters you can state a 'collection items' delimiter - that states the delimiter between elements in array columns.
I could not find an equivalent for Presto in the documentation,
Is anyone aware if it's possible, if so - what is the format, or where can I find it?
i tried "guessing" many forms, including 'collection items', none seem to work.
CREATE EXTERNAL TABLE `cats`(
`name` string,
`age` string,
`color` string,
`foods` array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
COLLECTION ITEMS TERMINATED BY '|'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'some-location'
Would really appreciate any insight, Thanks! :)
According to AWS Athena docs on using SerDe, your guess was 100% correct.
In general, Athena uses the LazySimpleSerDe if you do not specify a ROW FORMAT, or if you specify ROW FORMAT DELIMITED
ROW FORMAT
DELIMITED FIELDS TERMINATED BY ','
ESCAPED BY '\\'
COLLECTION ITEMS TERMINATED BY '|'
MAP KEYS TERMINATED BY ':'
Now, when I simply tried your DDL statement, I would get
line 1:8: no viable alternative at input 'create external'
However by deleting LINES TERMINATED BY '\n', I was able to create table schema in meta catalog
CREATE EXTERNAL TABLE `cats`(
`name` string,
`age` string,
`color` string,
`foods` array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'some-location'
Sample file with lines as shown in your file would get parsed correctly and I was able to do UNNEST on foods column:
SELECT *
FROM "cats"
CROSS JOIN UNNEST(foods) as t(food)
which resulted in
Moreover, it was also enough to simply swap lines LINES TERMINATED BY '\n' and COLLECTION ITEMS TERMINATED BY '|' for query to work (although I don't have an explanation for it)
CREATE EXTERNAL TABLE `cats`(
`name` string,
`age` string,
`color` string,
`foods` array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '|'
LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'some-location'
(Note: this answer is applicable to Presto in general, but not to Athena)
Currently you cannot set collection delimiter in Presto.
Please create a feature request # https://github.com/prestosql/presto/issues/
Note, we plan to provide generic support for table properties to address cases like this holistically -- https://github.com/prestosql/presto/issues/954. You can track the issue and associated pull request for updates.
I'm use presto engine creating a hive table , set collection delimiter in Presto, for example:
CREATE TABLE IF NOT EXISTS test (
id bigint COMMENT 'ID',
type varchar COMMENT 'TYPE',
content varchar COMMENT 'CONTENT',
create_time timestamp(3) COMMENT 'CREATE TIME',
pt varchar
)
COMMENT 'create time 2021/11/04 11:27:53'`
WITH (
format = 'TEXTFILE',
partitioned_by = ARRAY['pt'],
textfile_field_separator = U&'\0001'
)
I recently had a request come through to remove some Agent names from the guest surname field in a client's database.
Eg. 'John Smith -Wotif'
When testing using the following UPDATE statement, the entire field was wiped rather than just the specific string.
UPDATE GUEST
SET SURNAME = REPLACE(' -Wotif',' -Wotif','')
WHERE SURNAME LIKE '% -Wotif'
I've since found that simply using the column name as the matching string will allow the full statement to work (even if already specified in the SET section), but I can't work out where the logic of the original statement effectively says 'wipe these fields entirely'.
Unless specified otherwise, surely the '' replacement only applies to the value contained within the substring, regardless of whether the string and substring match?
The first argument in the REPLACE function is the full string that you want to search. So you should be referencing the SURNAME field rather than specifying part of the string.
REPLACE(SURNAME,' -Wotif','')
You update SQL command should be like this -
UPDATE GUEST
SET SURNAME = REPLACE(SURNAME, 'FindValue' , 'ReplaceWithValue')
WHERE SURNAME LIKE '% -Wotif'
If you want to find & replace '-Wotif' with blank, then update command should be like below-
UPDATE GUEST
SET SURNAME = REPLACE(SURNAME, '-Wotif' , '')
WHERE SURNAME LIKE '% -Wotif'
Context: I'm scraping some XML form descriptions from a Web Services table in hopes of using that name to identify what the user has inputted as response. Since this description changes for each step (row) of the process and each product I want something that can evaluate dynamically.
What I tried: The following was quite useful but it returns a dynamic attribute query result in it's own field ans using a coalesce to reduce the results as one field would lead to it's own complications: Get values from XML tags with dynamically specified data fields
Current Attempt:
I'm using the following code to generate the attribute name that I will use in the next step to query the attribute's value:
case when left([Return], 5) = '<?xml'
then lower(cast([Return] as xml).value('(/response/form/*/#name)[1]','varchar(30)'))
else ''
end as [FormRequest]
And as part of step 2 I have used the STUFF function to try and make the row-level query possible
case when len(FormRequest)>0
then stuff( ',' + 'cast([tmpFormResponse] as xml).value(''(/wrapper/#' + [FormRequest] + ')[1]'',''varchar(max)'')', 1, 1, '')
else ''
end as [FormResponse]
Instead of seeing 1 returned as my FormReponse feild value for the submit attribute (please see in yellow below) it's returning the query text -- cast([tmpFormResponse] as xml).value('(/wrapper/#submit)1','varchar(max)') -- instead (that which should be queried).
How should I action the value method so that I can dynamically strip out the response per row of XML data in tmpFormResponse based on the field value in the FormRequest field?
Thanx
You can check this out:
DECLARE #xml XML=
N'<root>
<SomeAttributes a="a" b="b" c="c"/>
<SomeAttributes a="aa" b="bb" c="cc"/>
</root>';
DECLARE #localName NVARCHAR(100)='b';
SELECT sa.value(N'(./#*[local-name()=sql:variable("#localName")])[1]','nvarchar(max)')
FROM #xml.nodes(N'/root/SomeAttributes') AS A(sa)
Ended up hacking up a solution to the problem by using PATINDEX and CHARINDEX to look for the value in the [FormRequest] field in the he tmpFormResponse field.
The Problem
I have a number of filename strings that I want to parse into columns using a tilda as delimiter. The strings take on the static format:
Filepath example C:\My Documents\PDF
Surname example Walker
First Name example Thomas
Birth Date example 19991226
Document Created Datetime example 20180416150322
Document Extension example .pdf
So a full concatenated example would be something like:
C:\My Documents\PDF\Walker~Thomas~19991226~20180416150322.pdf
I want to ignore the file path and extension given in the string and only parse the following values into columns:
Surname, First Name, Birth Date, Document Created Datetime
So something like:
SELECT Surname = --delimitedString[0]
FirstName = --delimitedString[1]
--etc.
What I have tried
I know that I have several tasks I would need to perform in order to split the string, first I would need to trim off the extension and file path so that I can return a string delimited by tildas (~).
This is problem one for me, however problem 2 is splitting the new delimted string itself i.e.
Walker~Thomas~19991226~20180416150322
Ive had a good read through this very comprehensive question and It seems (as im using SQL Server 2008R2) the only options are to use either a function with loops or recursive CTE's or attempt a very messy attempt using SUBSTRING() with charIndex().
Im aware that If I had access to SQL Server 2016 I could use string_split but unfortunately I cant upgrade.
I do have access to SSIS but im very new to it so decided to attempt the bulk of the work within a SQL statement
Here is a way without a splitter that shouldn't be too complicated...
declare #var table (filepath varchar(256))
insert into #var values
('C:\My Documents\PDF\Walker~Thomas~19991226~20180416150322.pdf')
;with string as(
select
x = right(filepath,charindex('\',reverse(filepath))-1)
from #var
)
select
SurName= substring(x,1,charindex('~',x) - 1)
,FirstName = substring(x,charindex('~',x) + 1,charindex('~',x) - 1)
from string
I know you mentioned wanting to avoid the charindex() option if at all possible, but I worked it out in a hopefully semi-readable way. I find it somewhat easy to read complex functions like this when I space each parameter on a different line and use indent levels. It's not the most proper looking, but it helps with legibility:
with string as (select 'C:\My Documents\PDF\Walker~Thomas~19991226~20180416150322.pdf' as filepath)
select
substring(
filepath,
len(filepath)-charindex('\',reverse(filepath))+2, --start location, after last '\'
len(filepath)- --length of path
(len(filepath)-charindex('\',reverse(filepath))+2)- --less characters up to last '\'
(len(filepath)-charindex('.',filepath)) --less file extention
)
from string
Fritz already have a great start, my answer just add on top it
with string as (select 'C:\My Documents\PDF\Walker~Thomas~19991226~20180416150322.pdf' as filepath)
, newstr as (
select
REPLACE(substring(
filepath,
len(filepath)-charindex('\',reverse(filepath))+2, --start location, after last '\'
len(filepath)- --length of path
(len(filepath)-charindex('\',reverse(filepath))+2)- --less characters up to last '\'
(len(filepath)-charindex('.',filepath)) --less file extention
) , '~', '.') as new_part
from string
)
SELECT
PARSENAME(new_part,4) as Surname,
PARSENAME(new_part,3) as [First Name],
PARSENAME(new_part,2) as [Birth Date],
PARSENAME(new_part,1) as [Document Created Datetime]
FROM newstr