Is there another way to write SUM(IIF([CONDITION], 1, 0))?

Is there another way to write SUM(IIF([CONDITION], 1, 0))? - sql-server

What SUM(IIF([CONDITION], 1, 0)) does is very simple. So simple, that what I really just want to write is COUNT([CONDITION]) (but that's invalid). Is there a shorter or more preferred way to write SUM(IIF([CONDITION], 1, 0))?

You should be able to move your condition to where clause.
Select count(*) from [table(s)] where [CONDITION];
The above only counts where the condition that results in a match. You will find that providing a query example will go a long way in finding a solution.
Best of luck.

Related

Sort data with multiple criteria including MATCH

I'm honestly not 100% sure if this is the right place to ask but I have to give it a try so I'm trying to sort by curse and by A -> Z the names but I have try plenty of ways and I never get the correct format and I have google it and they just keep giving useless answers that is not for this specific case.
As you can see I made a custom sort for the curse but I still have no idea how or where to add the A -> Z for the names I always get it wrong and it tells me is wrong. Any advice ?
I have done plenty attempts trying to format it the right way buuut always end in failure an example:
Formula: =ARRAYFORMULA(SORT(DATA!A2:BY101,match(DATA!P2:P101,{"1º";"2º";"3º";"4º";"5º";"6º";"7º";"8º";"9º";"10º";"11º";"12º"},0),TRUE))

try:
=ARRAYFORMULA(SORT(DATA!A2:BY101, MATCH(DATA!P2:P101,
{"1º";"2º";"3º";"4º";"5º";"6º";"7º";"8º";"9º";"10º";"11º";"12º"}, 0), 1, 2, 1))

Optimizing array overlap query using && operator

I have a relatively simple query that attempts to calculate the count of rows I'll have to deal with in a later operation. It looks like:
SELECT COUNT(*)
FROM my_table AS t1
WHERE t1.array_of_ids && ARRAY[cast('1' as bigint)];
The tricky piece is that ARRAY[] portion is determined by the code that invokes the query so instead of having 1 element in this example it could have hundreds or thousands. This makes the query take a decent amount of time to run if a user is actively waiting for the calculation to complete.
Is there anything obvious I'm doing wrong or any obvious improvement that could be made?
Thanks!
Edit:
There are not any indexes on the table. I tried to create one with
CREATE INDEX my_index on my_table(array_of_ids);
and it came back with
ERROR: index row requires 8416 bytes, maximum size is 8191
I'm not very experienced here unfortunately. Maybe there is simply too many rows for an index to be useful?
I ran an explain on the query and the output essentially looks like:
QUERY PLAN | Filter: ((array_of_ids && '{1, 2, 3, 4, 5, 6, 7 ... n}'::bigint[])
so I guess it is automatically doing the ::bigint[]. I tried this as well and the query takes the same time to execute which I guess makes sense.
I realize I'm only pasting a portion of the response to the explain (analyze, buffers, format text) but I'm doing this in psql and my system often runs out of memory. There are tons of --- in the output, I am not sure if there is a way to not have psql do that.
The plan looks pretty simple so is this basically saying there is no way to optimize this? I have two huge arrays and it just takes time to determine an overlap? Not sure if there is a JOIN solution here, I tried to unnest and do a JOIN on equivalent entries but the query never returned so I'm not sure if I got it wrong or if it is just a far slower approach.

More Efficient Way to Avoid Multiple Calculations #2?

Here is a more difficult version of another post I made earlier today.
More Efficient Way to Avoid Multiple Calculations?
I have a lot of these chains in my sheet. Is there a more efficient way than what I am doing?
Here is an example post of more difficult formula needed.
https://docs.google.com/spreadsheets/d/1qejqo0WzMYa5K7YCnovW-7ki97DCX2h6gGA7LE7Eeh8/edit?usp=sharing
=INDIRECT("Data!E"&(K8-1))
=INDIRECT("Data!E"&(K9-1))
=INDIRECT("Data!E"&(K10-1))
etc.
Can I modify this to work?
=ARRAYFORMULA(ROW(A1:A20)*????)
It there a single formula or more efficient one that may reduce calculation time?

you can do it like this:
=ARRAYFORMULA(IFERROR(VLOOKUP(K8:K-1, {ROW(A:A), Data!E:E}, 2, 0)))

Why does a Recursive CTE in Transact-SQL require a UNION ALL and not a UNION?

I get that an anchor is necessary, that makes sense. And I know that a UNION ALL is needed, if your recursive CTE doesn't have one, it just doesn't work... but I can't find a good explanation of why that is the case. All the documentation just states that you need it.
Why can't we use a UNION instead of a UNION ALL in a recursive query? It seems like it would be a good idea to not include duplicates upon deeper recursion, doesn't it? Something like that should already be working under the hood already, I would think.

I presume the reason is that they just haven't considered this a priority feature worth implementing. It looks like Postgres does support both UNION and UNION ALL.
If you have a strong case for this feature you can provide feedback at Connect (or whatever the URL of its replacement will be).
Preventing duplicates being added could be useful as a duplicate row added in a later step to a previous one will nearly always end up causing an infinite loop or exceeding the max recursion limit.
There are quite a few places in the SQL Standards where code is used demonstrating UNION such as below
This article explains how they are implemented in SQL Server. They aren't doing anything like that "under the hood". The stack spool deletes rows as it goes so it wouldn't be possible to know if a later row is a duplicate of a deleted one. Supporting UNION would need a somewhat different approach.
In the meantime you can quite easily achieve the same in a multi statement TVF.
To take a silly example below (Postgres Fiddle)
WITH R
AS (SELECT 0 AS N
UNION
SELECT ( N + 1 )%10
FROM R)
SELECT N
FROM R
Changing the UNION to UNION ALL and adding a DISTINCT at the end won't save you from the infinite recursion.
But you can implement this as
CREATE FUNCTION dbo.F ()
RETURNS #R TABLE(n INT PRIMARY KEY WITH (IGNORE_DUP_KEY = ON))
AS
BEGIN
INSERT INTO #R
VALUES (0); --anchor
WHILE ##ROWCOUNT > 0
BEGIN
INSERT INTO #R
SELECT ( N + 1 )%10
FROM #R
END
RETURN
END
GO
SELECT *
FROM dbo.F ()
The above uses IGNORE_DUP_KEY to discard duplicates. If the column list is too wide to be indexed you would need DISTINCT and NOT EXISTS instead. You'd also probably want a parameter to set the max number of recursions and avoid infinite loops.

This is pure speculation, but I would say, that the UNION ALL ensures, that the result of each iteration can be calculated individually. Essentially it ensures, that an iteration cannot interfere with another.
A UNION would require a sort operation in the background which might modify the result of previous iterations. The program should not change the state of a previous call in the call stack, it should interact with it using input parameters and the result of the subsequent iteration (in a procedural setting). This probably should apply to set based operations, thus to SQL Server's recursive CTEs.
I might be wrong, late night brain-dumps are not 100% reliable :)
Edit (just another thought):
When a recursion starts, you have a call stack. Each level in this stack starts calculating it's result, but should wait for the result of all subsequent calls before it can finish and return it's result. UNION would try to eliminate duplication, but you don't have any records until you reach the termination condition (and the final would be built from the bottom to the top), but the result of the subsequent call is required by the ones above it. The UNION would be reduced to a DISTINCT at the very end.

A good explanation of pred post speculation here : https://sqlite.org/lang_with.html :
Optimization note: ...... Very little memory is needed to run the above example. However, if the example had used UNION instead of UNION ALL, then SQLite would have had to keep around all previously generated content in order to check for duplicates. For this reason, programmers should strive to use UNION ALL instead of UNION when feasible.

Is this confusing TSQL syntax more efficient? Is it really that confusing?

I would have searched but I don't even know how to search for something like this, so here goes.
I work in a company where I'm looking at lots of older stored procs (currently we're on sql server 2012, I don't know when they were written, but likely 3-5+ years ago). I continually come across these little nuggets in select statements, and they are extremely confusing to me. Ultimately they are all math-based, so maybe there's a performance reason for these things? Maybe they are way more common than I thought, and I just haven't been exposed to them?
Here is an example, imagine it is buried in a SELECT clause, selecting from a table that has a numerical "balance" column (and please forgive me if my parens don't match up exactly ... you will get the idea).
SELECT
...
HasBalance = LEFT('Y', 1 * ABS(SIGN(SUM(balance)))) + LEFT('N', 1 * (1 -ABS(SIGN(SUM(balance)))))
...
I mean, it wasn't that hard to figure out that essentially what it's doing is coming up with 'Y' or 'N' by concat'ing 2 strings ('Y' + '') or ('' + 'N'), each of which were generated by using 3 functions to reduce a value (balance) to a 1 or 0 to multiply by 1 to determine how many characters to take in each LEFT function (ultimately, 0 or 1).
But come on.
Here is another example:
sum(x.amount * charindex('someCode', x.code) * charindex(x.code, 'someCode'))
This one is essentially saying x.code MUST equal 'someCode' or return 0; otherwise return x.amount. If either charindex fails it throws a *0 into the mix, 0'ing out the math.
There are lots of others, they're not impossible to figure out WHAT they're doing ... but I still have to spend time figuring out what they're doing.
Is the obscurity as obscure as I think it is? If so, is it worth the tradeoff (assuming there is a tradeoff)?
Thanks!