Sqlite optimization for MAX query on non-leftmost column on index - database

I noticed the online page about SQLite's query optimizer guarantees that queries of the form SELECT MAX(colA) FROM TABLE can be optimized if there is an index whose leftmost column is colA.
However, I'm less clear about what happens when an index is used to narrow the table based on an equality in WHERE clause, such that the next column in the index is the one that I'm taking a MAX on. Based on the structure of the index, the maximum value should be quickly accessible as the last row in the subset of the index satisfying the WHERE clause. For example, given an index on colA and colB, it should be possible to find SELECT MAX(colB) FROM SillyTable WHERE colA = 1 without scanning all 6 rows associated with colA = 1:
Index of SillyTable on colA, colB:
colA colB rowid
1 1 4
1 2 5
1 4 2
1 5 8
1 6 3 # This is the one
2 1 1
2 5 6
2 8 7
Does SQLite actually optimize a query like this, or will it scan all the rows that satisfy the WHERE clause? If it does a scan, how can I change the query to make it run faster?
My specific use case is similar to the SillyTable example. I created the following table:
CREATE TABLE Product(
ProductTypeID INTEGER NOT NULL,
ProductID INTEGER NOT NULL,
PRIMARY KEY(ProductTypeID, ProductID),
FOREIGN KEY(ProductTypeID)
REFERENCES ProductType(ProductTypeID)
);
ProductTypeID is not particularly selective for the table; I might have many rows with the same ProductTypeID but different ProductID. EXPLAIN QUERY PLAN tells me that my query uses an index automatically built for the composite primary key, but that is true whether it scans or binary-searches the subset of rows found with the index:
EXPLAIN QUERY PLAN SELECT MAX(ProductID) FROM Product
WHERE ProductTypeID = ?;
=>
SEARCH TABLE Product USING COVERING INDEX sqlite_autoindex_Product_1(ProductTypeID=?)

This is shown in the EXPLAIN output:
sqlite> EXPLAIN SELECT MAX(ProductID) FROM Product WHERE ProductTypeID = ?;
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 17 0 00 Start at 17
1 Null 0 1 2 00 r[1..2]=NULL
2 OpenRead 1 3 0 k(2,,) 02 root=3 iDb=0; sqlite_autoindex_Product_1
3 Variable 1 3 0 00 r[3]=parameter(1,)
4 IsNull 3 13 0 00 if r[3]==NULL goto 13
5 Affinity 3 1 0 D 00 affinity(r[3])
6 SeekLE 1 13 3 1 00 key=r[3]
7 IdxLT 1 13 3 1 00 key=r[3]
8 Column 1 1 4 00 r[4]=Product.ProductID
9 CollSeq 0 0 0 (BINARY) 00
10 AggStep0 0 4 1 max(1) 01 accum=r[1] step(r[4])
11 Goto 0 13 0 00 max() by index
12 Prev 1 7 0 00
13 AggFinal 1 1 0 max(1) 00 accum=r[1] N=1
14 Copy 1 5 0 00 r[5]=r[1]
15 ResultRow 5 1 0 00 output=r[5]
16 Halt 0 0 0 00
17 Transaction 0 0 1 0 01 usesStmtJournal=0
18 Goto 0 1 0 00
To make the code generator simpler, SQLite always creates a loop for the aggregation (lines 6 to 12). However, for max(), this loop aborts after the first successful step (line 11).

Related

How to aggregate number of notes sent to each user?

Consider the following tables
group (obj_id here is user_id)
group_id obj_id role
--------------------------
100 1 A
100 2 root
100 3 B
100 4 C
notes
obj_id ref_obj_id note note_id
-------------------------------------------
1 2 10
1 3 10
1 0 foobar 10
1 4 20
1 2 20
1 0 barbaz 20
2 0 caszes 30
2 1 30
4 1 70
4 0 taz 70
4 3 70
Note: a note in the system can be assigned to multiple users (for instance: an admin could write "sent warning to 2 users" and link it to 2 user_ids). The first user the note gets linked to is stored differently than the other linked users. The note itself is linked to the first linked user only. Whenever group.obj_id = notes.obj_id then ref_obj_id = 0 and note <> null
I need to make an overview of the notes per user. Normally I would do this by joining on group.obj_id = notes.obj_idbut here this goes wrong because of ref_obj_id being 0 (in which case I should join on notes.obj_id)
There are 4 notes in this system (foobar, barbaz, caszes and taz).
The desired output is:
obj_id user_is_primary notes_primary user_is_linked notes_linked
-------------------------------------------------------------------
1 2 10;20 2 30;70
2 1 30 2 10;20
3 0 2 10;70
4 1 70 1 20
How can I get to this aggregated result?
I hope that I was able to explain the situation clearly; perhaps it is my inexperience but I find the data model not the most straightforward.
Couldn't you simply put this in the ON clause of your join?
case when notes.ref_obj_id = 0 then notes.obj_id else notes.ref_obj_id end = group.obj_id

Select rows where count() = n

I'm implementing a search functionality where the results should show results page and for each result, the main image and up to 3 more thumbnails.
Right now in the procution version, for each ad it makes 1 select to return the images from the database which it terrible for performance, so I've changed it to a single query that does basically the following:
select * from AdImages order by IsMainImage desc, AdImageId
and returns something like:
AdImageId AdId IsMainImage FilePath
----------- ----------- ----------- ----------------------------------------
1 1 1 9c513f10-5480-4e41-89c6-074b36051999.jpg
5 2 1 f64f9c12-398e-445f-9724-baebe40930b1.jpg
6 4 1 8187d566-b296-4ab0-85e5-b9fc86f293b7.jpg
8 5 1 b8165008-09b3-4258-bf54-043195138344.jpg
10 6 1 86c636ed-f4ed-4f7e-8c7e-fc0b24faa956.jpg
11 7 1 4409a3fd-2bc0-4512-9850-6f5146193e50.jpg
13 8 1 b9b66c48-92b7-479a-a85d-dc6d26b03ebc.jpg
14 9 1 9f3f06ad-4fe1-43a5-8cce-3bb804bb10b7.jpg
16 10 1 016c30dc-5ee8-40d8-9d0f-398f444d7a7b.jpg
19 11 1 e5e56602-1af7-492b-8a8e-b61ac86b751b.jpg
2 1 0 02d44ce1-0de6-4e22-b4ef-043a72e9b5e8.jpg
3 1 0 8c4e19db-faff-44c2-9aab-6a96ab2a8e22.jpg
4 1 0 d8c2464a-277c-40fa-ab43-d2455e819e7e.jpg
7 4 0 d1430ae0-df51-43b7-acea-50d606eee5ba.jpg
9 5 0 b947ae4c-653d-4c27-9edd-567d977e1af3.jpg
12 7 0 3080c947-3769-4762-bb29-f1f9c5303ecd.jpg
15 9 0 d2543ce3-1e65-4a18-80d6-584de0025f1a.jpg
17 10 0 03b26d6a-4e0c-4393-9b5a-d9f2a24d36da.jpg
18 10 0 cde5dacd-3984-4cea-b56f-c3a6c5b82fa0.jpg
20 11 0 9e286ac0-25b1-4a05-af83-26e5d0002c2a.jpg
21 11 0 b1266770-9926-462c-8ec0-e965b21021eb.jpg
22 11 0 0542bd2a-4c4b-41d4-b51b-d311f42f0da9.jpg
23 11 0 b1cc44c9-50c4-4e81-bc9a-a0a4b515e709.jpg
My local db is very small but I could notice a very good performance gain, anyway, I think it could be better if I could make this query return only up to 4 rows for each ad instead of all the rows for each ad as it is doing. But to do so, it should be something like where count(AdId) == 4 which I'm not sure is possible.
I'm also using Entity Framework here. Any extra advice would be very welcome.
Use Window Function
select AdImageId ,AdId ,IsMainImage ,FilePath
from(
select row_number() over(partition by Adid order by IsMainImage desc, AdImageId) rn,*
from AdImages)a
where rn<=4
If I am understanding you correctly, you can just return the TOP xx results.
SELECT TOP(3) * from AdImages order by IsMainImage desc, AdImageId;
This will return only the top 3 results.

Variable length array estension using SIMD operation

I would like to do the following array extension using SIMD intrinsic.
I have two arrays:
cluster value (v_i): 10, 20, 30, 40
cluster length (l_i): 3, 2, 1, 2
I would like to create a resultant array containing the values: v_i repeated for l_i times, i.e:
result: 10, 10, 10, 20, 20, 30, 40, 40.
How can I compute this using SIMD intrinsic?
This may be optimized by SIMD if input array size is up to 8, output array size up to 16, and bytes as array values. At least SSSE3 is required. Extending this approach to larger arrays/elements is possible but efficiency will quickly degrade.
Compute prefix sum of array lengths. This may be quickly done if you reinterpret byte array of lengths as a single 64-bit (32-bit) word, multiply it by 0x101010101010100, and store the result in SIMD register.
Fill array of indexes (in single SIMD register) with average index (half-size of the array of prefix sums).
Perform binary search for proper index for each byte of index register (in parallel). This may be done by extracting appropriate byte of prefix sum register with PSHUFB instruction, comparing extracted prefix value with byte number using PCMPGTB (and optionally with PCMPEQB), then adding/subtracting half of index range.
(Optionally) fill all unused bytes of index register with 0xff.
Use PSHUFB to fill some register with values from cluster value array indexed by the index register.
Main loop of the algorithm (binary search) contains PSHUFB, PCMPGTB, and a few arithmetical and logical operations. It is executed log(input_array_size) times, most likely 2 or 3 times.
Here is an example:
cluster value: 10 20 30 40
cluster length: 3 2 1 2
prefix sum: 0 3 5 6 8
indexes: 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
prefix value: 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
byte number: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
mask: ff ff ff ff ff 0 0 0 0 0 0 0 0 0 0 0
indexes: 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3
prefix value: 3 3 3 3 3 6 6 6 6 6 6 6 6 6 6 6
byte number: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
mask: ff ff ff 0 0 ff 0 0 0 0 0 0 0 0 0 0
indexes: 0 0 0 1 1 2 3 3 3 3 3 3 3 3 3 3
length constrained: 0 0 0 1 1 2 3 3 ff ff ff ff ff ff ff ff
cluster value: 10 10 10 20 20 30 40 40 0 0 0 0 0 0 0 0

SAS, assigning the same numbers to specific observations

I want to assign the same id number to every four observations. For example, if I have the following data
age marital gender id
45 1 0 1
33 1 1 1
68 0 1 1
27 1 0 1
43 0 0 2
37 0 1 2
19 1 1 2
40 1 1 2
25 1 0 3
38 1 1 3
57 0 0 3
50 1 0 3
51 1 1 4
44 0 1 4
69 1 0 4
39 0 1 4
The last column id is something I want to produce.
Plus, the dataset have 500,000+ observations.
Thanks in advance.
Slightly more compact:
id = ceil(_n_/4);
Use the integer function and the built-in _n_ variable (which increments for each observation):
id = int( (_n_-4)/4 )+1;

transact SQL, sum each row and insert into another table

for a table on ms-sql2000 containing the following columns and numbers:
S_idJ_id Se_id B_id Status Count multiply
63 1000 16 12 1 10 2
64 1001 12 16 1 9 3
65 1002 17 12 1 10 2
66 1003 16 12 1 6 3
67 1004 12 16 1 10 2
I want to generate an classic asp script which will do the following for each row
where status=1 :
-multiply -> answer= multiply column 'count' with column 'multiply'
Then:
count the total answer and sum for each se_id like :
se_id total
12 47
16 38
17 20
and display on screen like
Rank se_id total
1 12 47
2 16 38
3 17 20
Condition:
if there are multiple equal total values then give the lower numbered se_id a priority for
getting a ranking and give the next higher numbered se_id the next number in rank
Any sample code in classic asp or advice is welcome on how to get this accomplished
'score' = source table.
if (EXISTS (select * from INFORMATION_SCHEMA.TABLES where TABLE_NAME = 'result_table'))
begin
drop table result_table;
end
select
rank = IDENTITY(INT,1,1),
se_id, sum(multiply * count) as total
into result_table
from score
where status = 1
group by se_id
order by total desc, se_id;
[Edit] Change query as answer on first comment

Resources