Creating formula in SQL server

Creating formula in SQL server - sql-server

Maybe some of you could help me with creation of formula in sql. I need to perform calculations of the result of the expression for all given formulas. The notation of the formula is simple: P (X) means that the expression appends the integer X in parentheses. M (Y) means that the expression subtracts the integer Y in parentheses. The “+” symbol combines elements of the formula.
Example. given formula: P (10) + M (5) + M (3) + P (1). It translates to 10-5-3 + 1 = 3.
The result should look like this:

Those easy formulas can be done in Microsoft SQL Server.
Split the formula in the different parts with STRING_SPLIT and + as separator.
Use REPLACE to apply a negative number sign: P(X) --> X and M(X) --> -X.
Use CONVERT to turn the string parts into numbers.
Add everything up with a SUM aggregation and group by clause.
Sample data
create table input
(
formula nvarchar(50)
);
insert into input (formula) values
('P(10)+M(5)+M(3)+P(1)'),
('P(7)+M(3)+M(4)');
Solution
select i.formula,
sum(convert(int, replace(replace(replace(s.value,'P(', ''),'M(','-'),')',''))) as rez
from input i
cross apply string_split(i.formula, '+') s
group by i.formula;
Result
formula rez
-------------------- ---
P(10)+M(5)+M(3)+P(1) 3
P(7)+M(3)+M(4) 0
Fiddle to see everything in action with intermediate steps.

Related

Convert Excel formulae (logic) to SQL Server

I'm looking for help on converting an Excel formula to SQL Server.
=If(AND(N3="A", R3>O3),
R3,If(AND(N3="P",S3>O3),S3,If(N3="D","",If(OR(Q3="P",Q3="A")*AND(P3>TODAY(),P3>O3),P3,O3))))
SQL formula I tried ....Colum N & Q consists of varchar and other fields are datetime in SQL Server. In below SQL statement, I have replaced and (BOLD) with OR condition. When I use "AND"(bold) am getting right data in few cases if I use (OR), am getting right data in few other cases. Here is database structure with insert statements.
https://www.db-fiddle.com/f/iHYxufV2NuyXwHeBM832NS/4
create table dbo.test (id int, N varchar(10), O datetime, P datetime, Q varchar(10), R datetime, S datetime)
select case when N='A' and R>O THEN R
when N='P' and S>O then S
when N='D' then ''
when (Q='P' or Q='Á') **and** p>getdate() and P>O then P else O end data
from test
output for above fiddler =
id-data
1-2020-11-20 00:00:00
2-2021-02-15 00:00:00
3-2021-04-11 00:00:00
4-2021-04-16 00:00:00
5-2021-04-30 00:00:00

The problem is * is a multiplication operator, but both sides of the expression are boolean rather than numeric. I think what's going on is Excel is converting the boolean true/false values to 1 and 0 for the multiplication operation.
If this is correct, then AND is the correct operator and almost everything else in the translation is correct.
There is one other mistake. when N='D' then '' is wrong, because the other result values all appear to be DateTime columns. You can't mix strings and dates. Instead, you need when N='D' then NULL.
CASE WHEN N = 'A' AND R > O THEN R
WHEN N = 'P' AND S > O THEN S
WHEN N = 'D' THEN NULL
WHEN Q IN ('P', 'A') AND P > current_timestamp AND P > O THEN P
ELSE O END
If you really need an empty string, you can convert the result and coalesce to empty string at a different level, but don't do it inside the CASE expression.
It's also worth noting the DateTime/String mismatch could entirely explain the strange results. If you have a sample somewhere for testing with the columns represented as Varchar values instead of Date or DateTime, then the comparisons could be wrong, throwing off the results. For example, O comes before S in the third row of sample data if they are compared as strings instead of dates.

If this is your verbatim code, you have an accented A in this line:
when (Q='P' or Q='Á') and p>getdate() and P>O then P else O end data
A and Á are not equivalent, so that may be short circuiting your OR and failing to return values for any Q = 'A' values that aren't handled further up in the logic.
Other than that your logic looks equivalent. The use of OR(...)*AND(...) is odd but does produce a 1/0 value, and your conversion into SQL has the correct boolean operators to match that logic.

How to get max Value Unit from field Value Unit based on first value before comma separated?

I work on SQL server 2012 I face issue :i can't get first max value from value Unit based on First value before comma .
as example This value Unit below :
1.89, 2.625, 3.465
I will get first value before comma separated as 1.89 then if this number is max value return full number
exist on Value Unit
create table #finaltable
(
partid int,
ValueUnit nvarchar(50)
)
insert into #finaltable(partid,ValueUnit)
values
(2532,'1.71, 2.375, 3.135'),
(2532,'1.89, 2.625, 3.465')
select * from #finaltable
How to get first max value from field ValueUnit based on first value before comma separated ?
Expected Result returned :
1.89, 2.625, 3.465
because 1.89 is maximum number from 1.71 then I returned full number

I agree with the comments, your design is bad. For more on that, you should also read "Is storing a delimited list in a database column really that bad?".
But well, you can use patindex() to get the position of the comma and then extract the first number representation with left(). convert() it to some decimal, order by it and take the TOP 1 row.
SELECT TOP 1
*
FROM #finaltable
ORDER BY convert(decimal(4, 3), left(valueunit, patindex('%,%', valueunit) - 1)) DESC;
You may need to tweak the conversion to a decimal. I don't know what maximum length and precision you may need.

select integer before a certain character

hie am trying to select the integer value before the char C in my SQL database table which contains the information below.
240mm2 X 15C WIRING CABLE
150mm2 X 3C flex
10mm2 x 4C swa
so far i have used the query
select left ('C',CHARINDEX ('C',product_name)) from product
and i get 'C' on my results which is correct. Now am stuck does anyone know how i can modify the above select query to get a result which only lists the integers for eg
15
3
4

Two observations: the integer before "C" has a space before it and there is no space between the integer and "C".
If these are generally true, then you can do what you want using substring_index():
select substring_index(substring_index(product_name, 'C', 1), ' ', -1) + 0 as thenumber
The + 0 simply converts the value to a number.

If you're doing this in SQL Server you could try the following:
Select Substring(product_name,
PATINDEX('% [0-9]%',product_name) + 1,
PATINDEX('%[0-9]C%',product_name) - PATINDEX('% [0-9]%',product_name)
) as num
from Product
This assumes that there is a space before the number and always a C after the number with no space.
It works out the starting point and then the length based on the start and end and performs a substring with the results.

You could use a combination of instring and substring.
First get the position of the C
Then substring till C
It goes like this:
SELECT INSTR('foobarbar', 'bar');
= 4
And then you select substring from 1 to 4.

How do I match a substring of variable length?

I am importing data into my SQL database from an Excel spreadsheet.
The imp table is the imported data, the app table is the existing database table.
app.ReceiptId is formatted as "A" followed by some numbers. Formerly it was 4 digits, but now it may be 4 or 5 digits.
Examples:
A1234
A9876
A10001
imp.ref is a free-text reference field from Excel. It consists of some arbitrary length description, then the ReceiptId, followed by an irrelevant reference number in the format " - BZ-0987654321" (which is sometimes cropped short, or even missing entirely).
Examples:
SHORT DESC A1234 - BZ-0987654321
LONGER DESCRIPTION A9876 - BZ-123
REALLY LONG DESCRIPTION A2345 - B
REALLY REALLY LONG DESCRIPTION A23456
The code below works for a 4-digit ReceiptId, but will not correctly capture a 5-digit one.
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId = right(right(rtrim(replace(replace(imp.ref,'-',''),'B','')),5)
+ rtrim(left(imp.ref,charindex(' - BZ-',imp.ref))),5)
How can I change the code so it captures either 4 (A1234) or 5 (A12345) digits?

As ughai rightfully wrote in his comment, it's not recommended to use anything other then columns in the on clause of a join.
The reason for that is that using functions prevents sql server for using any indexes on the columns that it might use without the functions.
Therefor, I would suggest adding another column to imp table that will hold the actual ReceiptId and be calculated during the import process itself.
I think the best way of extracting the ReceiptId from the ref column is using substring with patindex, as demonstrated in this fiddle:
SELECT ref,
RTRIM(SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9]%', ref), 6)) As ReceiptId
FROM imp
Update
After the conversation with t-clausen-dk in the comments, I came up with this:
SELECT ref,
CASE WHEN PATINDEX('%[ ]A[0-9][0-9][0-9][0-9][0-9| ]%', ref) > 0
OR PATINDEX('A[0-9][0-9][0-9][0-9][0-9| ]%', ref) = 1 THEN
SUBSTRING(ref, PATINDEX('%A[0-9][0-9][0-9][0-9][0-9| ]%', ref), 6)
ELSE
NULL
END As ReceiptId
FROM imp
fiddle here
This will return null if there is no match,
when a match is a sub string that contains A followed by 4 or 5 digits, separated by spaces from the rest of the string, and can be found at the start, middle or end of the string.

Try this, it will remove all characters before the A[number][number][number][number] and take the first 6 characters after that:
UPDATE app
SET
[...]
FROM imp
INNER JOIN app
ON app.ReceiptId in
(
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 5),
left(stuff(ref,1, patindex('%A[0-9][0-9][0-9][0-9][0-9][ ]%', imp.ref + ' ') - 1, ''), 6)
)
When using equal, the spaces after is not evaluated

Postgres SQL function string_to_array

I have a table:
c1|c2|c3|c4
-----+--+--+----
a b c 10
a a b 20
c a c 10
b b c 10
c b c 30
I want to write a function where the inputs are 3 strings / text eg ('a b c, b d, c'), compare every element to each other, find if a row exist with this combination, an sum the number of the 4th (c4) column up. But if there is a constellation of b a c or c a b it would match a b c 10. If there is a row like b c c then it wont be a row like c b b. Every matchup is unique.
I think the best would be to use string_to_array(text, text).
I put together some pseudo code, but no idea how to write it in SQL. Maybe the logic is wrong too.
function (x,y,z)
res = 0
x_array = string_to_array(x, ' ')
y_array = string_to_array(y, ' ')
z_array = string_to_array(z, ' ')
foreach(x_item in x_array)
foreach(y_item in y_array)
foreach(z_item in z_array)
if (c1 = (x_item || y_item || z_item ) && c2 = (x_item || y_item || z_item ) && c3 = (x_item || y_item || z_item ))
res++
EDIT
First off all there was a mistake in the example table. There was a row a b c and c b a. It cant be. a b c = c b a ! and each row must be unique.
example: three text inputs a b c | b c | c
each element vs each element: a b c , a c c, b b c, b c c, c b c, c c c
a b c = 10;
a c c (is the same as c a c) = 10;
b b c = 10;
b c c (is the same as c b c) = 30;
c b c = 30;
c c c (no match) = 0; result = 90

I think this might be what you want:
Return the sum of column c4 from all rows where a given set of three tokens matches the columns (c1, c2, c3).
Simple version
Much simpler with contains #> and is contained <# by operators:
SELECT sum(c4) AS sum_of_matching_c4
FROM tbl
WHERE ARRAY[c1,c2,c3] <# ARRAY['b', 'a', 'c'] -- strings in arbitrary order
AND ARRAY[c1,c2,c3] #> ARRAY['b', 'a', 'c'];
Sorry, that would fail for ('b', 'c', 'c') vs. ('c', 'b', 'b').
Slow and sure
WITH i(arr) AS (
SELECT ARRAY(VALUES ('b'), ('c'), ('c') ORDER BY 1) -- input once
) -- in arbitrary order
SELECT sum(c4) AS sum_of_matching_c4
FROM (
SELECT c4, array_agg(x ORDER BY x) AS arr
FROM (
SELECT ctid, c4, unnest(ARRAY[c1,c2,c3]) AS x
FROM tbl t, i
WHERE ARRAY[c1,c2,c3] <# arr -- optional pre-selection
AND ARRAY[c1,c2,c3] #> arr -- for better performance?
) a
GROUP BY ctid, c4
) b
JOIN i USING (arr)
-> sqlfiddle demo.
The major difficulty is to order the values of the columns within the row.
For your input (3 strings) I achieve this in the WHERE clause with a VALUE expression in the CTE which I order right away and collect it in an array. I use a CTE for convenience, so we have to enter values in one place only.
It's more complicated for the row values. I put the three columns in an array and break that up to rows with unnest(). As you did not provide a primary key, I use the ctid as ad-hoc surrogate primary key instead - which I need for the GROUP BY to stuff the now sorted (c1, c2, c3) into an array.
Finally I sum up all c4 of rows where the now sorted arrays match exactly.
Note: I expressly do not use string_agg() because that does not produce distinct results. Consider:
'abc' 'cde' 'fgh'
'ab' 'ccdef' 'gh'
.. resulting int the same string if concatenated.
Index / Performance
You might consider to save pre-ordered data to speed up queries. Doing it on the fly is expensive. I.e. you could pre-generate the sorted array and save it as redundant column which you can then support with an index. Should be faster by several orders of magnitude for the cost of redundant data storage.
If you are dealing with long strings, a solution similar to what I outlined in this related answer on dba.SE might be the best course of action.
Alternatively (preferred!) guarantee that (c1, c2, c3) are always stored in ascending order. You could use a trigger BEFORE INSERT OR UPDATE to keep values within the row ordered. No redundant storage and you can simply create a multi-column index on the three columns and compare to them one by one (instead of comparing the array like in my example).

You don't need to write a function for that.
First, there's no "strings" with postgresql ( sql ) , it's "text" or "varchar".
Second, what you need is an SQL query like this:
SELECT ( DISTINCT ( c1 || c2 || c3 )) AS txtcol, SUM (c4) AS rowsum;
or
SELECT ( DISTINCT ( c1 || c2 || c3 )) AS txtcol, SUM(c4) AS numsum GROUP BY txtcol;
Can't recall the exact syntax at the moment, you need to work it out,
anyway the point is you need to concatenate varchar columns with some built-in
function like CONCAT or "||" operator, and then sum/group by numeric column. All you need
is to concatenate columns, and give resulting all-together column a name.
To be exact, you don't even need to show concatenated column on resulting table,
you could output just sums, and number of rows sumarized for example.
Theoretically you could write SQL function or PL/SQL function for that, but I'm sure it's just not necessary, your case seems to me simple enough to be able to achieve result you want without a function. Built-in sumarizing function SUM() is called "aggregate" function, other examples of aggregating functions are e.g. MIN() or MAX().
Note what you're actually trying to do, is grouping rows by some resulting VARCHAR column by the effect of concatenation per-row.
EDIT: "Arrays" in SQL or procedural SQL is some internally-handled arrays, do not confuse them with relations ( tables in database, nor with tables as SELECT results ). I think you also don't need SQL arrays for that, the task really isn't so hard as it looks like.