MySQL I want to optimize this further - database

So I started off with this query:
SELECT * FROM TABLE1 WHERE hash IN (SELECT id FROM temptable);
It took forever, so I ran an explain:
mysql> explain SELECT * FROM TABLE1 WHERE hash IN (SELECT id FROM temptable);
+----+--------------------+-----------------+------+---------------+------+---------+------+------------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------------+------+---------------+------+---------+------+------------+-------------+
| 1 | PRIMARY | TABLE1 | ALL | NULL | NULL | NULL | NULL | 2554388553 | Using where |
| 2 | DEPENDENT SUBQUERY | temptable | ALL | NULL | NULL | NULL | NULL | 1506 | Using where |
+----+--------------------+-----------------+------+---------------+------+---------+------+------------+-------------+
2 rows in set (0.01 sec)
It wasn't using an index. So, my second pass:
mysql> explain SELECT * FROM TABLE1 JOIN temptable ON TABLE1.hash=temptable.hash;
+----+-------------+-----------------+------+---------------+----------+---------+------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+----------+---------+------------------------+------+-------------+
| 1 | SIMPLE | temptable | ALL | hash | NULL | NULL | NULL | 1506 | |
| 1 | SIMPLE | TABLE1 | ref | hash | hash | 5 | testdb.temptable.hash | 527 | Using where |
+----+-------------+-----------------+------+---------------+----------+---------+------------------------+------+-------------+
2 rows in set (0.00 sec)
Can I do any other optimization?

You can gain some more speed by using a covering index, at the cost of extra space consumption. A covering index is one which can satisfy all requested columns in a query without performing a further lookup into the clustered index.
First of all get rid of the SELECT * and explicitly select the fields that you require. Then you can add all the fields in the SELECT clause to the right hand side of your composite index. For example, if your query will look like this:
SELECT first_name, last_name, age
FROM table1
JOIN temptable ON table1.hash = temptable.hash;
Then you can have a covering index that looks like this:
CREATE INDEX ix_index ON table1 (hash, first_name, last_name, age);

Related

How to identify valid records based on column values in snowflake

I have a table as below
I want output like below
This means I have few predefined pairs, example
if one employee is coming from both HR_INTERNAL and HR_EXTERNAL, take only that record which is from HR_INTERNAL
if one employee is coming from both SALES_INTERNAL and SALES_EXTERNAL, take only that record which is from SALES_INTERNAL
etc.
Is there a way to achieve this?
I used ROW_NUMBER to rank
ROW_NUMBER() OVER(PARTITION BY "EMPID" ORDER BY SOURCESYSTEM ASC) AS RANK_GID
I just put them on a table like this:
create or replace table predefined_pairs ( pairs ARRAY );
insert into predefined_pairs select [ 'HR_INTERNAL', 'HR_EXTERNAL' ] ;
insert into predefined_pairs select [ 'SALES_INTERNAL', 'SALES_EXTERNAL' ] ;
Then I use the following query to produce the output you wanted:
select s.sourcesystem, s.empid,
CASE WHEN COUNT(1) OVER(PARTITION BY EMPID) = 1 THEN 'ValidRecord'
WHEN p.pairs[0] IS NULL THEN 'ValidRecord'
WHEN p.pairs[0] = s.sourcesystem THEN 'ValidRecord'
ELSE 'InvalidRecord'
END RecordValidity
from source s
left join predefined_pairs p on array_contains( s.sourcesystem::VARIANT, p.pairs ) ;
+-------------------+--------+----------------+
| SOURCESYSTEM | EMPID | RECORDVALIDITY |
+-------------------+--------+----------------+
| HR_INTERNAL | EMP001 | ValidRecord |
| HR_EXTERNAL | EMP001 | InvalidRecord |
| SALES_INTERNAL | EMP002 | ValidRecord |
| SALES_EXTERNAL | EMP002 | InvalidRecord |
| HR_EXTERNAL | EMP004 | ValidRecord |
| SALES_INTERNAL | EMP005 | ValidRecord |
| PURCHASE_INTERNAL | EMP003 | ValidRecord |
+-------------------+--------+----------------+

Extract into multiple columns from JSON with PostgreSQL

I have a column item_id that contains data in JSON (like?) structure.
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| id | item_id |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| 56711 | {itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}" |
| 56712 | {itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}" |
| 56721 | {itemID":["2704\/1#1#1356"]}" |
| 56722 | {itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}" |
| 57638 | {itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}" |
| 57638 | {itemID":["109#1#3364","110\/1#1#3364"]}" |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
I need the last four digits before every comma (if there is) and the last 4 digits distincted and separated into individual colums.
The distinct should happen across id as well, so only one result row with id: 57638 is permitted.
Here is a fiddle with a code draft that is not giving the right answer.
The desired result should look like this:
+----------+-----------+-----------+
| id | item_id_1 | item_id_2 |
+----------+-----------+-----------+
| 56711 | 1974 | |
| 56712 | 4220 | 4221 |
| 56721 | 1356 | |
| 56722 | 3349 | |
| 57638 | 3364 | 3365 |
+----------+-----------+-----------+
There can be quite a lot 'item_id_%' column in the results.
with the_table (id, item_id) as (
values
(56711, '{"itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}'),
(56712, '{"itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}'),
(56721, '{"itemID":["2704\/1#1#1356"]}'),
(56722, '{"itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}'),
(57638, '{"itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}'),
(57638, '{"itemID":["109#1#3365","110\/1#1#3365"]}')
)
select id
,(array_agg(itemid)) [1] itemid_1
,(array_agg(itemid)) [2] itemid_2
from (
select distinct id
,split_part(replace(json_array_elements(item_id::json -> 'itemID')::text, '"', ''), '#', 3)::int itemid
from the_table
order by 1
,2
) t
group by id
DEMO
You can unnest the json array, get the last 4 characters of each element as a number, then do conditional aggregation:
select
id,
max(val) filter(where rn = 1) item_id_1,
max(val) filter(where rn = 2) item_id_2
from (
select
id,
right(val, 4)::int val,
dense_rank() over(partition by id order by right(val, 4)::int) rn
from mytable t
cross join lateral jsonb_array_elements_text(t.item_id -> 'itemID') as x(val)
) t
group by id
You can add more conditional max()s to the outer query to handle more possible values.
Demo on DB Fiddle:
id | item_id_1 | item_id_1
----: | --------: | --------:
56711 | 1974 | null
56712 | 4220 | 4221
56721 | 1356 | null
56722 | 3349 | null
57638 | 3364 | 3365

TSQL query parser in TSQL

I would like to have something like a procedure that takes a query definition as input and output a set of tables containing the individual elements of the query.
Searching the internet for this yields me numerous results in various programming languages, but not in tsql itself. Is there such a resource around?
An example in order to illustrate what I mean by parser:
Input example (any query, really:)
'select t1.col1,t2.col2
from table1 t1
inner join table2.col2
on t1.t2ref=t2.key'
The output, of course, will be a multitude of data. I mentioned tables, but it could be in any form eg an xml. Here is a VERY SIMPLISTIC and arbitrary example of decomposition for the query above:
tables_used:
+----+-----------+--------+------------+
| id | object_id | name | alias used |
+----+-----------+--------+------------+
| 1 | 43252345 | table1 | t1 |
| 2 | 6542625 | table2 | t2 |
+----+-----------+--------+------------+
columns_used:
+----------+-------------+
| table_id | column name |
+----------+-------------+
| 1 | col1 |
| 1 | t2ref |
| 2 | key |
| 2 | col2 |
+----------+-------------+
joins_used:
+-----+-----+-------+-----------------+
| tb1 | tb2 | type | on |
+-----+-----+-------+-----------------+
| 1 | 2 | inner | t1.t2ref=t2.key |
+-----+-----+-------+-----------------+

How to mark selected columns of a table for display

I use PostgreSQL 10.1 and:
CREATE TABLE human
(
id ... NOT NULL,
gender ...,
height ...,
weight ...,
eye ...,
hair ...,
...
);
I have an input form through which I insert the data. I wish an elegant and proper way by which I can SELECT which columns required to be DISPLAYED in that form, something like weight ... DISPLAYED, or eye ... NOT DISPLAYED, .
One way is to correspond NULL with DISPLAYED (when NOT NULL then display it, or when NULL then do not display it) and use information_schema which (corresponding) makes me no so happy:
Another way is to:
CREATE TABLE human_column
(
id ... NOT NULL,
characteristic character varying(...),
is_displayed boolean
);
where characteristic data are the names of the columns of human table.
Is there a better way to add a direct foreign attribute to the columns of a table? (In 51.7. pg_attribute there is a column named attoptions. Would it be used?)
specifying "options" for columns to define if they will be "displayed" or not seems a little overhead. Imagine you keep such list in human_column. To modify it you would need to update it with new is_displayed values. Then you would need to build column list to be selected in query.
When you create a view, you do the same (specify a list of columns to be displayed) and then you can just query the view, without need to dynamically build the query. Also you can always check the current list of included columns from catalog or information_schema.
The only "not cosy" feature of a view - you can't change columns in it, thus you have to drop and create it again.
drop/create view on demand looks cheaper to me then dynamically building query with list of columns on each select still.
demo:
db=# create view v as select oid,datname from pg_database;
CREATE VIEW
db=# select * from v;
oid | datname
-------+-----------
13505 | postgres
16384 | t
1 | template1
13504 | template0
16419 | o
(5 rows)
checking list of columns:
db=# select column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length from information_schema.columns where table_name = 'v';
column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length
-------------+------------------+----------------+-------------+-----------+--------------------------
oid | 1 | | YES | oid |
datname | 2 | | YES | name |
(2 rows)
same for original table:
db=# select column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length from information_schema.columns where table_name = 'pg_database';
column_name | ordinal_position | column_default | is_nullable | data_type | character_maximum_length
---------------+------------------+----------------+-------------+-----------+--------------------------
datname | 1 | | NO | name |
datdba | 2 | | NO | oid |
encoding | 3 | | NO | integer |
datcollate | 4 | | NO | name |
datctype | 5 | | NO | name |
datistemplate | 6 | | NO | boolean |
datallowconn | 7 | | NO | boolean |
datconnlimit | 8 | | NO | integer |
datlastsysoid | 9 | | NO | oid |
datfrozenxid | 10 | | NO | xid |
datminmxid | 11 | | NO | xid |
dattablespace | 12 | | NO | oid |
datacl | 13 | | YES | ARRAY |
(13 rows)

Join two Select Statements into a single row when one select has n amount of entries?

Is it possible in SQL Server to take two select statements and combine them into a single row without knowing how many entries one of the select statements got?
I've been looking around at various Join solutions but they all seem to work on the basis that the amount of columns is predetermined. I have a case here where one table has a determined amount of columns (t1) and the other table have an undetermined amount of entries (t2) which all use a key that matches one entry in t1.
+----+------+-----+
| id | name | ... |
+----+------+-----+
| 1 | John | ... |
+----+------+-----+
And
+-------------+----------------+
| activity_id | account_number |
+-------------+----------------+
| 1 | 12345467879 |
| 1 | 98765432515 |
| ... | ... |
| ... | ... |
+-------------+----------------+
The number of account numbers belonging to the first query is unknown.
After the query it would become:
+----+------+-----+----------------+------------------+-----+------------------+
| id | name | ... | account_number | account_number_2 | ... | account_number_n |
+----+------+-----+----------------+------------------+-----+------------------+
| 1 | John | ... | 12345467879 | 98765432515 | ... | ... |
+----+------+-----+----------------+------------------+-----+------------------+
So I don't know how many account numbers could be associated with the id beforehand.

Resources