rewrite T-SQL bitwise logic - sql-server

How do I rewrite this T-SQL code to produce the same results
SELECT ACC.Title,
ACC.AdvertiserHierarchyId,
1 AS Counter
FROM admanAdvertiserHierarchy_tbl ACC
JOIN dbo.admanAdvertiserObjectType_tbl AOT ON AOT.AdvertiserObjectTypeId = ACC.AdvertiserObjectTypeId
WHERE (EXISTS
(SELECT 1
FROM dbo.admanAdvertiserHierarchy_tbl CAMP
JOIN dbo.admanAdvertiserAdGroup_tbl AG ON CAMP.AdvertiserHierarchyId = AG.AdvertiserHierarchyId
JOIN dbo.admanAdvertiserCreative_tbl AC ON AC.AdvertiserAdGroupId = AG.AdvertiserAdGroupId
AND CAMP.ParentAdvertiserHierarchyId = ACC.AdvertiserHierarchyId
WHERE CAMP.ERROR = 0
AND AC.Dirty & 7 > 0
AND AC.ERROR = 0
AND AG.ERROR = 0 ))
its preventing the optimizer from using indexes efficiently .
trying to achieve the following results
Title AdvertiserHierarchyId Counter
trcom65#travelrepublic.co.uk 15908 1
paul570#travelrepublic.co.uk 37887 1
es88#travelrepublic.co.uk 37383 1
it004#travelrepublic.co.uk 27006 1
011 10526 1
013 10528 1
033 12013 1
062 17380 1
076 20505 1
this is a count of the dirty tinyint column
Dirty total
0 36340607
1 117569
2 873553
3 59
that links to a static reason table
DirtyReasonId Title
0 Nothing
1 Overnight Engine
2 End To End
3 Overnight And End To End
4 Pause Resume
5 Overnight Engine and Paused
6 Overnight Engine E2E and Paused
7 All Three

If you are asking specifically about the use of the BITWISE AND operator, I believe you are correct, and it's unlikely that SQL Server sees that as sargable, at least, not with an index with Dirty as a leading column.
You are showing only the lowest two bits in use (maximum value of Dirty is 3), yet you are testing the lowest three bits.
So, AC.Dirty > 0 would return an equivalent result, given that 3 is largest value of Dirty. But there is a possibility that other (higher-order) bits are set, for example Dirty could be set to 8. So, if the intent is to check ONLY the lowest three bits, then we need to ensure that we test only the three lowest-order bits. This expression would do that, and one of the predicates is sargable:
( AC.Dirty > 0 AND AC.Dirty % 8 > 0 )
This basically tests first whether ANY bits in AC.Dirty are set, and then checks if any of the last three bits are set. (We're using the MODULO division operator to return the remainder of AC.Dirty divided by 8, which will of course return an integer value between 0 and 7. If we get a zero, then we know that none of the lower three bits are set, else we know at least one of the bits is set.
Just to be clear: the predicate on AC.Dirty > 0 is redundant. It's included here in case you are wanting to make sure that database can at least consider using an existing index with Dirty as a leading column.
I will mention that another option to consider would be adding a persisted COMPUTED COLUMN on the expression, and create an index on it. But that seems a bit overkill for what you need here.
If you are asking specifically about getting an index used on table admanAdvertiserCreative_tbl (AC), then likely your best candidate would be covering index on (AdvertiserAdGroupId, Error, Dirty).
The SQL rewrite below should return equivalent results, perhaps with better performance (depending on your data distribution, indexes, et al.)
Basically, replace the EXISTS (correlated subquery) with a JOIN to a subquery. The subquery returns distinct values of CAMP.ParentAdvertiserHierarchyId, which is the column you referenced to correlate the subquery.
This may or may not make use of any indexes, depending on what indexes are available. (It's likely have clustered unique indexes on the primary keys, and have non-clustered indexes on the foreign keys, which should help join performance.)
Untested:
SELECT ACC.Title,
ACC.AdvertiserHierarchyId,
1 AS Counter
FROM admanAdvertiserHierarchy_tbl ACC
JOIN dbo.admanAdvertiserObjectType_tbl AOT
ON AOT.AdvertiserObjectTypeId = ACC.AdvertiserObjectTypeId
JOIN (SELECT CAMP.ParentAdvertiserHierarchyId
FROM dbo.admanAdvertiserHierarchy_tbl CAMP
JOIN dbo.admanAdvertiserAdGroup_tbl AG
ON CAMP.AdvertiserHierarchyId = AG.AdvertiserHierarchyId
JOIN dbo.admanAdvertiserCreative_tbl AC
ON AC.AdvertiserAdGroupId = AG.AdvertiserAdGroupId
WHERE CAMP.ERROR = 0
AND ( AC.Dirty > 0 AND AC.Dirty % 8 > 0 )
AND AC.ERROR = 0
AND AG.ERROR = 0 )
GROUP BY CAMP.ParentAdvertiserHierarchyId
) c
ON c.ParentAdvertiserHierarchyId = ACC.AdvertiserHierarchyId

Related

how to get number containing rows in snowflake

I have tried regexp, regexp_like and like but didn't work
select * from b
where regexp_like(col1, '\d')
where regexp_like(col1, '[0-9]')
....etc
we have this table
Col1
avr100000
adfdsgwr
20170910020359.761
Enterprise
adf56ds76gwr
0+093000
080000
adfdsgwr
output should be these 5 rows
col1
avr100000
20170910020359.761
adf56ds76gwr
0+093000
1080000
Thanks
You can use regexp_instr in the where clause to see if it finds a digit anywhere in the string:
create temp table b(col1 string);
insert into b (col1) values ('avr100000'), ('adfdsgwr'),
('20170910020359.761'),
('Enterprise'),
('adf56ds76gwr'),
('0+093000'),
('080000'),
('adfdsgwr')
;
select col1 from b where regexp_instr(col1, '\\d') > 0;
I'm updating my answer to note that regexp_instr is going to perform about 3.8 times faster than using regexp_count for this requirement.
The reason is that regexp_instr will stop and report the location of the first digit it encounters. In contrast, regexp_count will continue examining the string until it reaches its end. If we only want to know if a digit exists in a string, we can stop as soon as we encounter the first one.
If it is a small data set, this won't matter much. For large data sets, that 3.8 times faster makes a big difference. Here is a mini test harness that shows the performance difference:
create or replace transient table RANDOM_STRINGS as
select RANDSTR(50, random()) as RANDSTR from table(generator (rowcount => 10000000));
alter session set use_cached_result = false;
-- Run these statements multiple times on an X-Small warehouse to test performance
-- Run both to warm the cache, then note the times after the initial runs to warm the cache
-- Average over 10 times with warm cache: 3.315s
select count(*) as ROWS_WITH_NUMBERS
from RANDOM_STRINGS where regexp_count(randstr, '\\d') > 0;
-- Average over 10 times with warm cache: 0.8686s
select count(*) as ROWS_WITH_NUMBERS
from RANDOM_STRINGS where regexp_instr(randstr, '\\d') > 0;
One method is to count how many alpha tokens there are:
select column1 as input
,regexp_count(column1, '[A-Za-z]') as alpha_count
from values
('0-100000'),
('adfdsgwr'),
('20170910020359.761'),
('Enterprise'),
('adfdsgwr'),
('0+093000'),
('1-080000'),
('adfdsgwr')
INPUT
ALPHA_COUNT
0-100000
0
adfdsgwr
8
20170910020359.761
0
Enterprise
10
adfdsgwr
8
0+093000
0
1-080000
0
adfdsgwr
8
and thus exclude those where it is not zero:
select column1 as input
from values
('0-100000'),
('adfdsgwr'),
('20170910020359.761'),
('Enterprise'),
('adfdsgwr'),
('0+093000'),
('1-080000'),
('adfdsgwr')
where regexp_count(column1, '[A-Za-z]') = 0
gives:
INPUT
0-100000
20170910020359.761
0+093000
1-080000
All you need to do is find 1 or more instances of a numeric value:
select
column1 as input
from
values
('avr100000'),
('adfdsgwr'),
('20170910020359.761'),
('Enterprise'),
('adf56ds76gwr'),
('0+093000'),
('1-080000'),
('adfdsgwr')
where
regexp_count (column1, '\\d') > 0;
Results:
avr100000
20170910020359.800
adf56ds76gwr
0+093000
1-080000

SQL Equivalent of Countif or Index Match on row level instead of columnar level

I have a table which comprises of 30 columns, all adjacent to one another. Of these 5 are text fields indicating certain details pertaining to that entry and 25 are value fields. Value fields have the column name as Val00, Val01, Val02 .....upto Val24
Based on a logic appearing elsewhere, these value fields input a value for n amount of columns and then drop to 0 for all of the subsequent fields
e.g.
When n is 5 the output will be
Val00
Val01
Val02
Val03
Val04
Val05
Val06
Val07
Val24
1.5
1.5
1.5
1.5
1.5
0
0
0
0
As can be seen, all values starting from val05 will drop to 0 and all columns from Val 05 to Val24 will be 0.
Given this output, I want to find what this n value is and create a new column ValCount to store this value in it.
In Excel this would be fairly straight forward to achieve with the below formula
=COUNTIF(Val00:Val24,">0")
However I'm not sure how we would go about in terms of SQL. I understand that the count function works on a columnar level and not on a row level.
I found a solution which is rather long but it should do the job so I hope it helps you.
SELECT SUM(SUM(CASE WHEN Val00 >= 1 THEN 1 ELSE 0 END)
+ SUM(CASE WHEN Val01 >= 1 THEN 1 ELSE 0 END)
+ SUM(CASE WHEN Val02 >= 1 THEN 1 ELSE 0 END)) As ValCount

How does the outer WHERE clause affect the way nested query is executed?

Let's say I have a table lines
b | a
-----------
17 7000
17 0
18 6000
18 0
19 5000
19 2500
I want to get positive values of a function: (a1 - a2) \ (b2 - b1) for all elements in cartesian product of lines with different b's. (If you are interested this will result in intersections of lines y1 = b1*x + a1 and y2 = b2*x + a2)
I wrote query1 for that cause
SELECT temp.point FROM
(SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
WHERE temp.point > 0
It throws a "division by zero" error. I tried the same query without the WHERE clause (query2) and it works just fine
SELECT temp.point FROM
(SELECT DISTINCT ((l1.a - l2.a) / (l2.b - l1.b)) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
as well as the variation with the defined SQL function (query3)
CREATE FUNCTION get_point(#a1 DECIMAL(18, 4), #a2 DECIMAL(18, 4), #b1 INT, #b2 INT)
RETURNS DECIMAL(18, 4)
WITH EXECUTE AS CALLER
AS
BEGIN
RETURN (SELECT (#a1 - #a2) / (#b2 - #b1))
END
GO
SELECT temp.point FROM
(SELECT DISTINCT dbo.get_point(l1.a, l2.a, l1.b, l2.b) AS point
FROM lines AS l1
CROSS JOIN lines AS l2
WHERE l1.b != l2.b
) AS temp
WHERE temp.point > 0
I have an intuitive assumption that the outer SELECT shouldn't affect the way nested SELECT is executed (or at least shouldn't break it). Even if it is not true that wouldn't explain why query3 works when query1 doesn't.
Could someone explain the principle behind this? That would be much appreciated.
If you want to guarantee that the query will always work, you'd need to wrap your calculation in something like a case statement
case when l2.b - l1.b = 0
then null
else (l1.a - l2.a) / (l2.b - l1.b)
end
Technically, the optimizer is perfectly free to evaluate conditions in whatever order it expects will be more efficient. The optimizer is free to evaluate the division before the where clause that filters out rows where the divisor would be 0. It is also free to evaluate the where clause first. Your different queries have different query plans which result in different behavior.
Realistically, though, even though a particular query might have a "good" query plan today, there is no guarantee that the optimizer won't decide in a day, a month, or a year to change the query plan to something that would throw a division by 0 error. I suppose you could decide to use a bunch of hints/ plan guides to force a particular plan with a particular behavior to be used. But that tends to be the sort of thing that bites you in the hind quarters later. Wrapping the calculation in a case (or otherwise preventing the division by 0 error) will be much safer and easier to explain to the next developer.

Check last bit in a hex in SQL

I have a SQL entry that is of type hex (varbinary) and I want to do a SELECT COUNT for all the entries that have this hex value ending in 1.
I was thinking about using CONVERT to make my hex into a char and then use WHERE my_string LIKE "%1". The thing is that varchar is capped at 8000 chars, and my hex is longer than that.
What options do I have?
Varbinary actually works with some string manipulation functions, most notably substring. So you can use eg.:
select substring(yourBinary, 1, 1);
To get the first byte of your binary column. To get the last bit then, you can use this:
select substring(yourBinary, len(yourBinary), 1) & 1;
This will give you zero if the bit is off, or one if it is on.
However, if you really only have to check at most the last 4-8 bytes, you can easily use the bitwise operators on the column directly:
select yourBinary & 1;
As a final note, this is going to be rather slow. So if you plan on doing this often, on large amounts of data, it might be better to simply create another bit column just for that, which you can index. If you're talking about at most a thousand rows or so, or if you don't care about speed, fire away :)
Check last four bits = 0001
SELECT SUM(CASE WHEN MyColumn % 16 IN (-15,1) THEN 1 END) FROM MyTable
Check last bit = 1
SELECT SUM(CASE WHEN MyColumn % 2 IN (-1,1) THEN 1 END) FROM MyTable
If you are wondering why you have to check for negative moduli, try SELECT 0x80000001 % 16
Try using this where
WHERE LEFT(my_string,1) = 1
It it's text values ending in 1 then you want the Right as opposed to the Left
WHERE RIGHT(my_string,1) = 1

TSQL - How to prevent optimisation in this query

I have a query analogous to:
update x
set x.y = (
select sum(x2.y)
from mytable x2
where x2.y < x.y
)
from mytable x
the point being, I'm iterating over rows and updating a field based on a subquery over those fields which are changing.
What I'm seeing is the subquery is being executed for each row before any updates occur, so the changed values for each row are not being picked up.
How can I force the subquery to be re-evaluated for each row of the update?
Is there a suitable table hint or something?
As an aside, I was doing the below and it did work, however since modifying my query somewhat (for logic purposes, not to try and solve this issue) this trick no longer works :(
declare #temp int
update x
set #temp = (
select sum(x2.y)
from mytable x2
where x2.y < x.y
),
x.y = #temp
from mytable x
I'm not particularly concerned about performance, this is a background task run over a few rows
It looks like task is incorrect or other rules should apply.
Let's see on example. Let's say you have values 4, 1, 2, 3, 1, 2
Sql will update rows based on original values. I.e. during single update statement newly calculated values is NOT mixing with original values:
-- only original values used
4 -> 9 (1+2+3+1+2)
1 -> null
2 -> 2 (1+1)
3 -> 6 (1+2+1+2)
1 -> null
2 -> 2 (1+1)
Based on your request you wants that update of each rows will count newly calculated values. (Note, that SQL does not guarantees the sequence in which rows will be processed.)
Let's do this calculation by processing rows from top to bottom:
-- from top
4 -> 9 (1+2+3+1+2)
1 -> null
2 -> 1 (1)
3 -> 4 (1+1+2)
1 -> null
2 -> 1 (1)
Do the same in other sequence - from bottom to top:
-- from bottom
4 -> 3 (2+1)
1 -> null
2 -> 1 (1)
3 -> 5 (2+2+1)
1 -> null
2 -> 2 (1+1)
How you can see your expected result is inconsistent. To make it right you need to correct the calculation rule - for instance define strong sequence of the rows to process (date, id, ...)
Also, if you want to do some recursive processing look at the common_table_expression:
http://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx

Resources