SQL Server : extract strings from delimiter - sql-server

I'm query a column called Description and I need to extract strings from each of the "-" delimiters;
e.g.
Description
---------------------------------
abc#abc.com - Invoice - A12222203
FGH#fgh.com - Credit - C12222333
So ideally need to extract each segment into three separate columns;
e.g.
Email | Doc Type | Ref
------------+----------+----------
abc#abc.com | Invoice | A12222203
FGH#fgh.com | Credit | C12222333
I have managed to extract the email address using
Substring(SL_Reference,0,charindex('-',SL_Reference))Email
Any ideas how I can split the two remaining sections into individual columns (i.e. Doc type and ref)?
Many thanks

There must be hundreds of ways to do this string manipulation, here are a couple.
This uses apply to get the positions of each delimeter then simple string manipulation to get each part.
with myTable as (
select * from (values('abc#abc.com - Invoice - A12222203'),('FGH#fgh.com - Credit - C12222333'))v(Description)
)
select
Trim(Left(description,h1-1)) Email,
Trim(Substring(description,h1+1,Len(description)-h2-h1-1)) DocType,
Trim(Right(description,h2-1)) Ref
from mytable
cross apply(values(CharIndex('-',description)))v1(h1)
cross apply(values(CharIndex('-',Reverse(description))))v2(h2)
This splits the string into rows then conditionally aggregates back into one row.
with myTable as (
select * from (values('abc#abc.com - Invoice - A12222203'),('FGH#fgh.com - Credit - C12222333'))v(Description)
)
select
max(Iif(rn=1,v,null)) Email,
max(Iif(rn=2,v,null)) Doctype,
max(Iif(rn=3,v,null)) Ref
from mytable
cross apply (
select Trim(value)v,row_number() over(order by (select null)) rn
from String_Split(Description,'-')
)s
group by Description

Related

Convert SQLite column data into rows in related tables

I have a SQLite database that is currently just one table - it has been imported from a csv file. Two of the columns are semicolon separated lists of either categories or tags imported as TEXT fields. A typical row might look like this:
1 | Article Title | photography;my work | tips;lenses;gear | In this article I'll talk about...
How can I extract the category and tags columns, uniquely insert them into their own respective tables, and then create a relational table to tie them all together? So the end result would be something like:
Content
1 | Article Title | photography;my work | tips;lenses;gear | In this article I will talk about...
Categories
1 | photography
2 | my work
ContentCategories
1 | 1 | 1
2 | 2 | 1
This would effectively convert my one table database into a truly relational database.
I'm hoping this can be done both efficiently and quickly as there is a very large number of rows this solution would be used on.
This solution needs to be compatible with SQLite version 3.36 or later.
I believe that the following demonstrates how this can be done. However it is a 2 stage process and just for the categories. Similar two stage processes could be used for other columns.
Table/column names may differ.
/* Create Demo Environment */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;
CREATE TABLE IF NOT EXISTS content (content_id INTEGER PRIMARY KEY,title TEXT, categories TEXT);
INSERT INTO content (title,categories) VALUES
('Article1','photography;my work;something;another;blah'),
('Article2','photography;thier work;not something;not another;not blah'),
('Article3','A;B;C;D;E;F;G;;');
CREATE TABLE IF NOT EXISTS category (category_id INTEGER PRIMARY KEY,category_name TEXT UNIQUE);
CREATE TABLE IF NOT EXISTS contentcategories (content_id_map,category_id_map, PRIMARY KEY (content_id_map,category_id_map));
/* Stage 1 populate the category table */
WITH
sep AS (SELECT ';'), /* The value separator */
justincase AS (SELECT 100), /* limiter for the number of iterations */
splt(value,rest) AS
(
SELECT
substr(categories,1,instr(categories,(SELECT * FROM sep))-1),
substr(categories,instr(categories,(SELECT * FROM sep))+1)||(SELECT * FROM sep)
FROM content
UNION ALL SELECT
substr(rest,1,instr(rest,(SELECT * FROM sep))-1),
substr(rest,instr(rest,(SELECT * FROM sep))+1)
FROM splt
WHERE length(rest) > 0
LIMIT (SELECT * FROM justincase) /* just in case limit iterations*/
)
INSERT OR IGNORE INTO category (category_name) SELECT value FROM splt WHERE length(value) > 0;
/* Show the resulktant Category table */
SELECT * FROM category;
/* Stage 2 populate the contentcategories mapping table */
WITH
sep AS (SELECT ';'), /* The value separator */
justincase AS (SELECT 100), /* limiter for the number of iterations */
splt(value,rest,contentid,categoryid) AS
(
SELECT
substr(categories,1,instr(categories,(SELECT * FROM sep))-1),
substr(categories,instr(categories,(SELECT * FROM sep))+1)||(SELECT * FROM sep),
content_id,
(SELECT category_id FROM category WHERE category_name = substr(categories,1,instr(categories,(SELECT * FROM sep))-1))
FROM content
UNION ALL SELECT
substr(rest,1,instr(rest,(SELECT * FROM sep))-1),
substr(rest,instr(rest,(SELECT * FROM sep))+1),
contentid,
(SELECT category_id FROM category WHERE category_name = substr(rest,1,instr(rest,(SELECT * FROM sep))-1))
FROM splt
WHERE length(rest) > 0
LIMIT (SELECT * FROM justincase) /* just in case limit iterations */
)
INSERT OR IGNORE INTO contentcategories SELECT contentid,categoryid FROM splt WHERE length(value) > 0;
/* Show the result of content joined via the mapping table with the category table */
SELECT content.*,category.*
FROM content
JOIN contentcategories ON content_id = content_id_map JOIN category ON category_id_map = category_id
;
/* Cleanup Demo Environment */
DROP TABLE IF EXISTS contentcategories;
DROP TABLE IF EXISTS content;
DROP TABLE IF EXISTS category;
So the content table has three rows each with a varying number of categories.
The first Stage uses recursion to split the values dropping the separators (the separator is coded as a CTE just the once so could be passed, like wise a value to limit the number of recursions can also be passed as it is a CTE).
The resulting CTE (splt) is then used for a SELECT INSERT to load the new category table with the extracted/split categories (OR IGNORE used to ignore any duplicates such as photography).
The second stage then splits the values again this time getting the id of the category from the new category table so that the mapping table contentcategories can be loaded.
After each stage a SELECT is used to show the result of the stage (these are included just to demonstrate).
So when the above is run then,
The **first result&& (after loading the category table) is:-
The second result is :-
i.e. everything is extracted via the joins as expected (not thoroughly checked though).
note that the erroneous ;; i.e. no value between the separators is discarded by WHERE length(value) > 0

Max Value with unique values in more than one column

I feel like I'm missing something really obvious here.
Using T-SQL/SQL-Server:
I have unique values in more than one column but want to select the max version based on one particular column.
Dataset:
Example
ID | Name| Version | Code
------------------------
1 | Car | 3 | NULL
1 | Car | 2 | 1000
1 | Car | 1 | 2000
Target status: I want my query to only select the row with the highest version value. Running a MAX on the version column pulls all three because of the distinct values in the 'Code' column:
SELECT ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
The net result is that I get all three entries as per the data set due to the unique values in the Code column, but I only want the top row (Version 3).
Any help would be appreciated.
You need to identify the row with the highest version as 1 query and use another outer query to pull out all the fields for that row. Like so:
SELECT t.ID, t.Name, GRP.Version, t.Code
FROM (
SELECT ID
,Name
,MAX(Version) as Version
FROM Table
GROUP BY ID, Name
) GRP
INNER JOIN Table t on GRP.ID = t.ID and GRP.Name = t.Name and GRP.Version = t.Version
You can also use row_number() to do this kind of logic, for example like this:
select ID, Name, Version, Code
from (
select *, row_number() over (order by Version desc) as RN
from Table1
) X where RN = 1
Example in SQL Fiddle
add the top statment to force the return of a single row. Also add the order by notation
SELECT top 1 ID
,Name
,MAX(Version)
,Code
FROM Table
GROUP BY ID, Name, Code
order by max(version) desc

TSQL - View with cross apply and pivot

this is my base table:
docID | rowNumber | Column1 | Column2 | Column3
I use cross apply and pivot to transform the records in Column1 to actual columns and use the values in column2 and column3 as records for the new columns. In my fiddle you can see base and transformed select statement.
I have columns like Plant and Color which are numbered, e.g. Plant1, Plant2, Plant3, Color1, Color2 etc.
For each plant that exists in all plant columns I want to create a new row with a comma separated list of colors in one single column.
What I want to achieve is also in below screenshot:
This should become a view to use in Excel. How do I need to modify the view to get to the desired result?
Additional question: The Length-column is numeric. Is there any way to switch the decimal separator from within Excel as a user and apply it to this or all numeric column(s) so that it will be recognized by Excel as a number?
I used to have an old php web query where I would pass the separator from a dropdown cell in Excel as a parameter.
Thank you.
First off, man the way your data is stored is a mess. I would recommend reading up on good data structures and fixing yours if you can. Here's a TSQL query that gets you the data in the correct format.
WITH CTE_no_nums
AS
(
SELECT docID,
CASE
WHEN PATINDEX('%[0-9]%',column1) > 0
THEN SUBSTRING(column1,0,PATINDEX('%[0-9]%',column1))
ELSE column1
END AS cols,
COALESCE(column2,column3) AS vals
FROM miscValues
WHERE column2 IS NOT NULL
OR column3 IS NOT NULL
),
CTE_Pivot
AS
(
SELECT docID,partNumber,prio,[length],material
FROM CTE_no_nums
PIVOT
(
MAX(vals) FOR cols IN (partNumber,prio,[length],material)
) pvt
)
SELECT A.docId + ' # ' + B.vals AS [DocID # Plant],
A.docID,
A.partNumber,
A.prio,
B.vals AS Plant,
A.partNumber + '#' + A.material + '#' + A.[length] AS Identification,
A.[length],
SUBSTRING(CA.colors,0,LEN(CA.colors)) colors --substring removes last comma
FROM CTE_Pivot A
INNER JOIN CTE_no_nums B
ON A.docID = B.docID
AND B.cols = 'Plant'
CROSS APPLY ( SELECT vals + ','
FROM CTE_no_nums C
WHERE cols = 'Color'
AND C.docID = A.docID
FOR XML PATH('')
) CA(colors)
Results:
DocID # Plant docID partNumber prio Plant Identification length colors
---------------- ------ ---------- ---- ---------- ------------------ ------- -------------------------
D0001 # PlantB D0001 X001 1 PlantB X001#MA123#10.87 10.87 white,black,blue
D0001 # PlantC D0001 X001 1 PlantC X001#MA123#10.87 10.87 white,black,blue
D0002 # PlantA D0002 X002 2 PlantA X002#MA456#16.43 16.43 black,yellow
D0002 # PlantC D0002 X002 2 PlantC X002#MA456#16.43 16.43 black,yellow
D0002 # PlantD D0002 X002 2 PlantD X002#MA456#16.43 16.43 black,yellow

T-SQL trying to determine the largest string from a set of concatenated strings in a database

I have two tables. One has an Order number, and details about the order:
CREATE TABLE #Order ( OrderID int )
and the second contains comments about the order:
CREATE TABLE #OrderComments ( OrderID int
Comment VarChar(500) )
Order ID Comments
~~~~~~~~ ~~~~~~~~
1 Loved this item!
1 Could use some work
1 I've had better
2 Try the veal
I'm tasked with determining the maximum length of the output, then returning output like the following:
Order ID Comments Length
~~~~~~~~ ~~~~~~~~ ~~~~~~
1 Loved this item! | Could use some work | I've had better 56
2 Try the veal 12
So, in this example, if this is all of the data, I'm looking for "56").
The main purpose is to determine the maximum length of all comments when appended together, including the | delimiter. This will be used when constructing the table this output will be put into, to determine if we can get the data within the 8,060 size limit for a row or if we need to use varchar(max) or text to hold the data.
I have tried a couple of subqueries that can generate this output to variables, but I haven't found one yet that could generate the above output. If I could get that, then I could just do a SELECT TOP 1 ... ORDER BY 3 DESC to get the number I'm looking for.
To find out what the length of the longest string will be if you trim and concatenate all the (not null) comments belonging to an OrderId with a delimiter of length three you can use
SELECT TOP(1) SUM(LEN(Comment)) + 3* (COUNT(Comment) - 1) AS Length
FROM OrderComments
GROUP BY OrderId
ORDER BY Length DESC
To actually do the concatenation you can use XML PATH as demonstrated in many other answers on this site.
WITH O AS
(
SELECT DISTINCT OrderID
FROM #Order
)
SELECT O.OrderID,
LEFT(y.Comments, LEN(y.Comments) - 1) AS Comments
FROM O
CROSS APPLY (SELECT ltrim(rtrim(Comment)) + ' | '
FROM #OrderComments oc
WHERE oc.OrderID = O.OrderID
AND Comment IS NOT NULL
FOR XML PATH(''), TYPE) x (Comments)
CROSS APPLY (SELECT x.Comments.value('.', 'VARCHAR(MAX)')) y(Comments)
All you need is STUFF function and XML PATH
Check out this sql fiddle
http://www.sqlfiddle.com/#!3/65cc6/5

Joining a table based on comma separated values

How can I join two tables, where one of the tables has multiple comma separated values in one column that reference an id in another column?
1st table
Name | Course Id
====================
Zishan | 1,2,3
Ellen | 2,3,4
2nd table
course id | course name
=======================
1 | java
2 | C++
3 | oracle
4 | dot net
Maybe this uglyness, I have not checked results:
select names.name, courses.course_name
from names inner join courses
on ',' + names.course_ids + ',' like '%,' + cast(courses.course_id as nvarchar(20)) + ',%'
First of all your Database structure is not normalized and should have been. Since it is already set up this way , here's how to solve the issue.
You'll need a function to split your string first:
CREATE FUNCTION SPLIT_STRING(str VARCHAR(255), delim VARCHAR(12), pos INT) RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(str, delim, pos),
LENGTH(SUBSTRING_INDEX(str, delim, pos-1)) + 1), delim, '');
Then you'll need to create a view in order to make up for your structure:
CREATE VIEW database.viewname AS
SELECT SPLIT_STRING(CourseID, ',', n) as firstField,
SPLIT_STRING(CourseID, ',', n) as secondField,
SPLIT_STRING(CourseID, ',',n) as thirdField
FROM 1stTable;
Where n is the nth item in your list.
Now that you have a view which generates your separated fields, you can make a normal join on your view, just use your view like you would use a table.
SELECT *
FROM yourView
JOIN table1.field ON table2.field
However since I don't think you'll always have 3 values in your second field from your first table you'll need to tweak it a little more.
Inspiration of my answer from:
SQL query to split column data into rows
and
Equivalent of explode() to work with strings in MySQL
SELECT f.name,s.course_name FROM table1 AS f
INNER JOIN table2 as s ON f.course_id IN (s.course_id)
Use the Below Query For Solution
Select * from table_2 t2 INNER JOIN table_1 t1 on t1.Course Id = t2.course id

Resources