Extract Quantity and price from text - sql-server

I have these data
CREATE TABLE #Items (ID INT , Col VARCHAR(300))
INSERT INTO #Items VALUES
(1, 'Dave sold 10 items are sold to ABC servercies at 2.50 each'),
(2, '21 was sold to Tray Limited 3.90 each'),
(3, 'Consulting ordered 15 at 7.11 per one'),
(4, 'Returns from Murphy 7 at a cost of 6.10 for each item')
from the Col i want to extract Quantity and Price
I have written the below query which extract the quantity
SELECT
ID,
Col,
LEFT(SUBSTRING(Col, PATINDEX('%[0-9]%', Col), LEN(Col)),2) AS Qty
FROM #Items
my difficulty is that i don't how i can extract the Pice.
Expected output

You were told already, that storing values within such a string is a real no-no-go.
But - if you have to deal with external input - you might try this:
DECLARE #items TABLE(ID INT , Col VARCHAR(300))
INSERT INTO #items VALUES
(1, 'Dave sold 10 items are sold to ABC servercies at 2.50 each'),
(2, '21 was sold to Tray Limited 3.90 each'),
(3, 'Consulting ordered 15 at 7.11 per one'),
(4, 'Returns from Murphy 7 at a cost of 6.10 for each item');
SELECT i.ID
,i.Col
,A.Casted.value('/x[not(empty(. cast as xs:int?))][1]','int') AS firstNumberAsInt
,A.Casted.value('/x[not(empty(. cast as xs:decimal?))][2]','decimal(10,4)') AS SecondNumberAsDecimal
FROM #items i
CROSS APPLY(SELECT CAST('<x>' + REPLACE((SELECT i.Col AS [*] FOR XML PATH('')),' ','</x><x>') + '</x>' AS XML)) A(Casted);
The idea in short:
we use some string methods to transform your string into XML, where each word is within it's own <x>-element.
We use XML-XQuery's abilities to pick only nodes which answer a predicate.
We use the predicate not(empty(. cast as someType)). This will return an element only in cases, where its content can be casted. Any other element is omitted.
The result:
+----+------------------------------------------------------------+------------------+-----------------------+
| ID | Col | firstNumberAsInt | SecondNumberAsDecimal |
+----+------------------------------------------------------------+------------------+-----------------------+
| 1 | Dave sold 10 items are sold to ABC servercies at 2.50 each | 10 | 2.5000 |
+----+------------------------------------------------------------+------------------+-----------------------+
| 2 | 21 was sold to Tray Limited 3.90 each | 21 | 3.9000 |
+----+------------------------------------------------------------+------------------+-----------------------+
| 3 | Consulting ordered 15 at 7.11 per one | 15 | 7.1100 |
+----+------------------------------------------------------------+------------------+-----------------------+
| 4 | Returns from Murphy 7 at a cost of 6.10 for each item | 7 | 6.1000 |
+----+------------------------------------------------------------+------------------+-----------------------+
I'm sure you know that there are millions of cases where this kind of parsing will break...

First things first: DON'T store things like that in a DB and expect to be able just "extract" data. I can give you a solution given the data you have, but it's going to fall down pretty quickly if anyone enters something silly, for example "Sold ice creams 1.50 each x 10" or "Bought 5 sorbets total 20".
What we will do is use CROSS APPLY in series to calculate the positions of each number.
SELECT
ID,
Col,
CAST(SUBSTRING(Col, FirstNum, EndFirst - 1) AS int) AS Qty,
CAST(SUBSTRING(Col, FirstNum + EndFirst + SecondNum - 2, EndSecond) AS decimal(18,2)) AS Price
FROM #Items
CROSS APPLY (VALUES (PATINDEX('%[0-9]%', Col) ) ) v1(FirstNum)
CROSS APPLY (VALUES (PATINDEX('%[^0-9]%', SUBSTRING(Col, FirstNum, LEN(Col))) ) ) v2(EndFirst)
CROSS APPLY (VALUES (PATINDEX('%[0-9.]%', SUBSTRING(Col, FirstNum + EndFirst - 1, LEN(Col))) ) ) v3(SecondNum)
CROSS APPLY (VALUES (PATINDEX('%[^0-9.]%', SUBSTRING(Col, FirstNum + EndFirst - 1 + SecondNum, LEN(Col))) ) ) v4(EndSecond)

Related

Split string of variable length, variable delimiters

Read the question posed here, but mine is a little more complicated.
I have a string that is variable in length, and the delimiter can sometimes be two dashes, or sometimes it can be just one. Let's say in my table the data that I want to break out is stored in a single column like this:
+ -----------------------------------------+
| Category |
+------------------------------------------+
| Zoo - Animals - Lions |
| Zoo - Personnel |
| Zoo - Operating Costs - Power / Cooling |
+------------------------------------------+
But I want to output the data string from that single column into three separate columns like this:
+----------+--------------------+-----------------+
| Location | Category | Sub-Category |
+----------+--------------------+-----------------+
| Zoo | Animals | Lions |
| Zoo | Personnel | |
| Zoo | Operating Costs | Power / Cooling |
+----------+--------------------+-----------------+
Hoping for some guidance as the samples I've been finding on Google seem to be simpler than this.
You can also use a string splitter. Here is an excellent one that works with your version. DelimitedSplit8K
Now we need some sample data.
declare #Something table
(
Category varchar(100)
)
insert #Something values
('Zoo - Animals - Lions')
, ('Zoo - Personnel')
, ('Zoo - Operating Costs - Power / Cooling')
Now that we have a function and sample data the code for this is quite nice and tidy.
select s.Category
, Location = max(case when x.ItemNumber = 1 then Item end)
, Category = max(case when x.ItemNumber = 2 then Item end)
, SubCategory = max(case when x.ItemNumber = 3 then Item end)
from #Something s
cross apply dbo.DelimitedSplit8K(s.Category, '-') x
group by s.Category
And this will return:
Category |Location|Category |SubCategory
Zoo - Animals - Lions |Zoo |Animals |Lions
Zoo - Operating Costs - Power / Cooling |Zoo |Operating Costs|Power / Cooling
Zoo - Personnel |Zoo |Personnel |NULL
Here is a solution that uses solely string functions:
select
left(
category,
charindex('-', category) - 2
) location,
substring(
category,
charindex('-', category) + 2,
len(category) - charindex('-', category, charindex('-', category) + 1)
) category,
case when charindex('-', category, charindex('-', category) + 1) > 0
then right(category, charindex('-', reverse(category)) - 2)
end sub_category
from t
Demo on DB Fiddle:
location | category | sub_category
:------- | :--------------- | :--------------
Zoo | Animal | Lions
Zoo | Personnel | null
Zoo | Operating Costs | Power / Cooling
You've tagged this with [sql-server-2017]. That means, that you can use JSON-support (this was introduced with v2016).
Currently JSON is the best built-in approach for position- and type-safe string splitting:
A mockup, to simulate your issue
DECLARE #mockup TABLE (ID INT IDENTITY, Category VARCHAR(MAX))
INSERT INTO #mockup (Category)
VALUES ('Zoo - Animals - Lions')
,('Zoo - Personnel')
,('Zoo - Operating Costs - Power / Cooling');
--The query
SELECT t.ID
,A.[Location]
,A.Category
,A.subCategory
FROM #mockup t
CROSS APPLY OPENJSON(CONCAT('[["',REPLACE(t.Category,'-','","'),'"]]'))
WITH ([Location] VARCHAR(MAX) '$[0]'
,Category VARCHAR(MAX) '$[1]'
,SubCategory VARCHAR(MAX) '$[2]') A;
The result (might need some TRIM()ing)
ID Location Category subCategory
1 Zoo Animals Lions
2 Zoo Personnel NULL
3 Zoo Operating Costs Power / Cooling
The idea in short:
We use some simple string operations to transform your string into a JSON array:
a b c => [["a","b","c"]]
Now we can use OPENJSON() together with a WITH-clause to return each fragment by its position with a fixed type.
Bit of hack, but it works:
DECLARE #t TABLE (Category VARCHAR(255))
INSERT #t (Category)
VALUES ('Zoo - Animals - Lions'),('Zoo - Personnel'),('Zoo - Operating Costs - Power / Cooling')
;WITH split_vals AS (
SELECT Category AS Cat,TRIM(Value) AS Value,ROW_NUMBER() OVER (PARTITION BY Category ORDER BY Category) AS RowNum
FROM #t
CROSS APPLY STRING_SPLIT(Category,'-')
), cols AS (
SELECT
Cat,
CASE WHEN RowNum = 1 THEN Value END AS Location,
CASE WHEN RowNum = 2 THEN Value END AS Category,
CASE WHEN RowNum = 3 THEN Value END AS [Sub-Category]
FROM split_vals
)
SELECT STRING_AGG(Location, '') AS Location,
STRING_AGG(Category, '') AS Category,
STRING_AGG([Sub-Category], '') AS [Sub-Category]
FROM cols
GROUP BY Cat;

T-SQL - Finding records with chronological gaps

This is my first post here. I'm still a novice SQL user at this point though I've been using it for several years now. I am trying to find a solution to the following problem and am looking for some advice, as simple as possible, please.
I have this 'recordTable' with the following columns related to transactions; 'personID', 'recordID', 'item', 'txDate' and 'daySupply'. The recordID is the primary key. Almost every personID should have many distinct recordID's with distinct txDate's.
My focus is on one particular 'item' for all of 2017. It's expected that once the item daySupply has elapsed for a recordID that we would see a newer recordID for that person with a more recent txDate somewhere between five days before and five days after the end of the daySupply.
What I'm trying to uncover are the number of distinct recordID's where there wasn't an expected new recordID during this ten day window. I think this is probably very simple to solve but I am having a lot of difficulty trying to create a query for it, let alone explain it to someone.
My thought thus far is to create two temp tables. The first temp table stores all of the records associated with the desired items and I'm just storing the personID, recordID and txDate columns. The second temp table has the personID, recordID and the two derived columns from the txDate and daySupply; these would represent the five days before and five days after.
I am trying to find some way to determine the number of recordID's from the first table that don't have expected refills for that personID in the second. I thought a simple EXCEPT would do this but I don't think there's anyway of getting around a recursive type statement to answer this and I have never gotten comfortable with recursive queries.
I searched Stackoverflow and elsewhere but couldn't come up with an answer to this one. I would really appreciate some help from some more clever data folks. Here is the code so far. Thanks everyone!
CREATE TABLE #temp1 (personID VARCHAR(20), recordID VARCHAR(10), txDate
DATE)
CREATE TABLE #temp2 (personID VARCHAR(20), recordID VARCHAR(10), startDate
DATE, endDate DATE)
INSERT INTO #temp1
SELECT [personID], [recordID], txDate
FROM recordTable
WHERE item = 'desiredItem'
AND txDate > '12/31/16'
AND txDate < '1/1/18';
INSERT INTO #temp2
SELECT [personID], [recordID], (txDate + (daySupply - 5)), (txDate +
(daySupply + 5))
FROM recordTable
WHERE item = 'desiredItem'
AND txDate > '12/31/16'
AND txDate < '1/1/18';
I agree with mypetlion that you could have been more concise with your question, but I think I can figure out what you are asking.
SQL Window Functions to the rescue!
Here's the basic idea...
CREATE TABLE #fills(
personid INT,
recordid INT,
item NVARCHAR(MAX),
filldate DATE,
dayssupply INT
);
INSERT #fills
VALUES (1, 1, 'item', '1/1/2018', 30),
(1, 2, 'item', '2/1/2018', 30),
(1, 3, 'item', '3/1/2018', 30),
(1, 4, 'item', '5/1/2018', 30),
(1, 5, 'item', '6/1/2018', 30)
;
SELECT *,
ABS(
DATEDIFF(
DAY,
LAG(DATEADD(DAY, dayssupply, filldate)) OVER (PARTITION BY personid, item ORDER BY filldate),
filldate
)
) AS gap
FROM #fills
ORDER BY filldate;
... outputs ...
+----------+----------+------+------------+------------+------+
| personid | recordid | item | filldate | dayssupply | gap |
+----------+----------+------+------------+------------+------+
| 1 | 1 | item | 2018-01-01 | 30 | NULL |
| 1 | 2 | item | 2018-02-01 | 30 | 1 |
| 1 | 3 | item | 2018-03-01 | 30 | 2 |
| 1 | 4 | item | 2018-05-01 | 30 | 31 |
| 1 | 5 | item | 2018-06-01 | 30 | 1 |
+----------+----------+------+------------+------------+------+
You can insert the results into a temp table and pull out only the ones you want (gap > 5), or use the query above as a CTE and pull out the results without the temp table.
This could be stated as follows: "Given a set of orders, return a subset for which there is no order within +/- 5 days of the expected resupply date (defined as txDate + DaysSupply)."
This can be solved simply with NOT EXISTS. Define the range of orders you wish to examine, and this query will find the subset of those orders for which there is no resupply order (NOT EXISTS) within 5 days of either side of the expected resupply date (txDate + daysSupply).
SELECT
gappedOrder.personID
, gappedOrder.recordID
, gappedOrder.item
, gappedOrder.txDate
, gappedOrder.daysSupply
FROM
recordTable as gappedOrder
WHERE
gappedOrder.item = 'desiredItem'
AND gappedOrder.txDate > '12/31/16'
AND gappedOrder.txDate < '1/1/18'
--order not refilled within date range tolerance
AND NOT EXISTS
(
SELECT
1
FROM
recordTable AS refilledOrder
WHERE
refilledOrder.personID = gappedOrder.personID
AND refilledOrder.item = gappedOrder.item
--5 days prior to (txDate + daysSupply)
AND refilledOrder.txtDate >= DATEADD(day, -5, DATEADD(day, gappedOrder.daysSupply, gappedOrder.txDate))
--5 days after (txtDate + daysSupply)
AND refilledOrder.txtDate <= DATEADD(day, 5, DATEADD(day, gappedOrder.daysSupply, gappedOrder.txtDate))
);

Truncate in SQL Server rounding values

I'm looking for a way to truncate or drop extra decimal places in SQL. I've found a way but i'm having a problem with values that do not have 3 decimal places.
I have the following data
ProductID | Price | Amount
------------+----------+---------
100 | 50.01 | 1
101 | 25 | 0.789
It's very simple, all I need to do is get the total from each product (Price * Amount).
My query:
select
[ProductID],
[Price],
[Amount],
round(SUM(([Price] * [Amount])),2,1) as 'Total'
from
[Tables]
What I get is:
ProductID | Price | Amount | Total
-----------+-----------+-----------+-----------
100 | 50.01 | 1 | 50 <=======
101 | 25 | 0.789 | 19.72
So, if my calculator is working, the result of this simple operation is:
(50.01 * 1) = 50.01
-
(25 * 0.789) = 19.725
-
Question: SQL does the trick dropping the 5 from the 19.725, but why does (50.01 * 1) equals 50?
I do know that if I use Round((value),2,0) I'll get 50.01, but if I do that 19.725 becomes 19.73 and that is not correct for my application.
What can I do to fix this?
If you cast price and amount to either numeric or decimal data type as shown below, you should arrive at the expected result:
DECLARE #Tables table
(
ProductID int,
Price float,
Amount float
);
INSERT #Tables
(ProductID, Price, Amount)
VALUES
(100, 50.01, 1),
(101, 25, 0.789);
SELECT ProductID
,Price
,Amount
,ROUND(SUM((CAST(Price AS decimal(5,2)) * CAST(Amount AS decimal(5,3)))),2,1) AS 'Total'
FROM #Tables
GROUP BY ProductID, Price, Amount;
(2 row(s) affected)
ProductID Price Amount Total
----------- ---------------------- ---------------------- ---------------------------------------
100 50.01 1 50.01000
101 25 0.789 19.72000
(2 row(s) affected)
SELECT ProductID,
Price,
Amount,
CAST(SUBSTRING(CAST(CAST(Price * Amount AS decimal(18,3)) AS VARCHAR),0, LEN(CAST(CAST(Price * Amount AS decimal(18,3)) AS VARCHAR))) AS DECIMAL(18,2)) AS Total
FROM [Tables]

How can I group / window date ordered events delineated by an arbitrary expression?

I would like to group some data together based on dates and some (potentially arbitrary) indicator:
Date | Ind
================
2016-01-02 | 1
2016-01-03 | 5
2016-03-02 | 10
2016-03-05 | 15
2016-05-10 | 6
2016-05-11 | 2
I would like to group together subsequent (date-ordered) rows but breaking the group after Indicator >= 10:
Date | Ind | Group
========================
2016-01-02 | 1 | 1
2016-01-03 | 5 | 1
2016-03-02 | 10 | 1
2016-03-05 | 15 | 2
2016-05-10 | 6 | 3
2016-05-11 | 2 | 3
I did find a promising technique at the end of a blog post: "Use this Neat Window Function Trick to Calculate Time Differences in a Time Series" (the final subsection, "Extra Bonus"), but the important part of the query uses a keyword (FILTER) that doesn't seem to be supported in SQL Server (and a quick Google later and I'm not sure where it is supported!).
I'm still hopeful a technique using a window function might be the answer. I just need a counter that I can add to every row, (like RANK or ROW_NUMBER does) but that only increments when some arbitrary condition evaluates as true. Is there a way to do this in SQL Server?
Here is the solution:
DECLARE #t TABLE ([Date] DATETIME, Ind INT)
INSERT INTO #t
VALUES
('2016-01-02', 1),
('2016-01-03', 5),
('2016-03-02', 10),
('2016-03-05', 15),
('2016-05-10', 6),
('2016-05-11', 2)
SELECT [Date],
Ind,
1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM
(
SELECT *,
CASE WHEN LAG(ind) OVER(ORDER BY [Date]) >= 10
THEN 1
ELSE 0
END AS [Group]
FROM #t
) t
Just mark row as 1 when previous is greater than 10 else 0. Then a running sum will give you the desired result.
Giving full credit to Giorgi for the idea, but I've modified his answer (both for my benefit and for future readers).
Just change the CASE statement to see if 30 or more days have lapsed since the last record:
DECLARE #t TABLE ([Date] DATETIME)
INSERT INTO #t
VALUES
('2016-01-02'),
('2016-01-03'),
('2016-03-02'),
('2016-03-05'),
('2016-05-10'),
('2016-05-11')
SELECT [Date],
1 + SUM([Group]) OVER(ORDER BY [Date]) AS [Group]
FROM
(
SELECT [Date],
CASE WHEN DATEADD(d, -30, [Date]) >= LAG([Date]) OVER(ORDER BY [Date])
THEN 1
ELSE 0
END AS [Group]
FROM #t
) t

SQL Server SUM based on subsequent records

Microsoft SQL Server 2012 (SP1) - 11.0.3156.0 (X64)
I am not sure of the best way to word this and have tried a few different searches with different combinations of words without success.
I only want to Sum Sequence = 1 when there are Sequence > 1, in the table below the Sequence = 1 lines marked with *. I don't care at all about checking that Sequence 2,3,etc match the same pattern because if they exist at all I need to Sum them.
I have data that looks like this:
| Sequence | ID | Num | OtherID |
|----------|----|-----|---------|
| 1 | 1 | 10 | 1 |*
| 2 | 1 | 15 | 1 |
| 3 | 1 | 20 | 1 |
| 1 | 2 | 10 | 1 |*
| 2 | 2 | 15 | 1 |
| 1 | 3 | 10 | 1 |
| 1 | 1 | 40 | 3 |
I need to sum the Num column but only when there is more than one sequence. My output would look like this:
Sequence Sum OtherID
1 20 1
2 30 1
3 20 1
I have tried grouping the queries in a bunch of different ways but really by the time I get to the sum, I don't know how to look ahead to make sure there are greater than 1 sequences for an ID.
My query at the moment looks something like this:
select Sequence, Sum(Num) as [Sum], OtherID
from tbl
where ID in (Select ID from tbl where Sequence > 1)
Group by Sequence, OtherID
tbl is a CTE that I wrapped around my query and it partially works, but is not really the filter I wanted.
If this is something that just shouldn't be done or can't be done then I can handle that, but if it's something I am missing I'd like to fix the query.
Edit:
I can't give the full query here but I started with this table/data (to get the above output). The OtherID is there because the data has the same ID/Sequence combinations but that OtherID helps separate them out so the rows are not identical (multiple questions on a form).
Create table #tmpTable (ID int, Sequence int, Num int, OtherID int)
insert into #tmpTable (ID, Sequence, Num, OtherID) values (1, 1, 10, 1)
insert into #tmpTable (ID, Sequence, Num, OtherID) values (1, 2, 15, 1)
insert into #tmpTable (ID, Sequence, Num, OtherID) values (1, 3, 20, 1)
insert into #tmpTable (ID, Sequence, Num, OtherID) values (2, 1, 10, 1)
insert into #tmpTable (ID, Sequence, Num, OtherID) values (2, 2, 15, 1)
insert into #tmpTable (ID, Sequence, Num, OtherID) values (3, 1, 10, 1)
insert into #tmpTable (ID, Sequence, Num, OtherID) values (1, 1, 40, 3)
The following will sum over Sequence and OtherID, but only when:
Either
sequence is greater than 1
or
there is something else with the same ID and OtherID, but a different sequence.
Query:
select Sequence, Sum(Num) as SumNum, OtherID from #tmpTable a
where Sequence > 1
or exists (select * from #tmpTable b
where a.ID = b.ID
and a.OtherID = b.OtherID
and b.Sequence <> a.Sequence)
group by Sequence, OtherID;
It looks like you are trying to sum by Sequence and OtherID if the Count of ID >1, so you could do something like below:
select Sequence, Sum(Num) as [Sum], OtherID
from tbl
where ID in (Select ID from tbl where Sequence > 1)
Group by Sequence, OtherID
Having count(id)>1

Resources