If I have a table that has an index
col1, col2, col3, col4, createDate
But also would like an index that goes
col1, col2, col3, col4, closeDate
Can I combine the two dates in one index, because I believe that datetimes must be the end of a index. Makes little point in organize after a single point of time, or am I wrong?
Nothing like this is possible right? Without using include.
col1, col2, col3, col4, (createDate, closeDate)
They are both used erratic, I guess it would still do a hash join, so it doesn't matter that is two separate indexes?
The query optimizer seem to pick the wrong plans so we are trying to force the optimizer using WITH (INDEX(index)) for the right plan. But it is hard to determine it.
Related
good night/day.
I've been having issues to search a value no matter the column, for example If I need a specific value from col1, col3 or col4, I would only need to type the value I need in Cell B1 no matter if it is lowercase or uppercase , But can't find the way to solve it.
=QUERY(IMPORTRANGE(Links!C2,"Supply!A2:L"),
"SELECT Col1, Col2, Col3, Col4, Col6, Col7, Col11, Col10, Col9, Col8, Col12
WHERE lower('Col1&Col3&Col4') CONTAINS '"&LOWER(B1)&"'",0)
Instead It drops me #N/A (Error
Query completed with an empty output) when searching for a specific value in Cell B1.
Thanks and have a good one, whoever can help me out!
try:
=ARRAYFORMULA(QUERY({IMPORTRANGE(Links!C2,"Supply!A2:L"),
IMPORTRANGE(Links!C2,"Supply!A2:A")&
IMPORTRANGE(Links!C2,"Supply!C2:C")&
IMPORTRANGE(Links!C2,"Supply!D2:D")},
"select Col1,Col2,Col3,Col4,Col6,Col7,Col11,Col10,Col9,Col8,Col12
where lower(Col13) contains '"&LOWER(B1)&"'", 0))
I have one view, i want to add pagination logic on this view. There are over 1.5 million records. It took longer time to get result if for my where condition that select only specific records mapped with one Id.
I am thinking of getting only those mapped records from main table and then select only those records from view, will this faster?
Select top 10 col1, col2, col3, ROW_NUMBER() OVER (ORDER BY col4 desc) from vMyView where someid=1
Then
Select top 10 col1, col2, col3 from vMyView where col1 in (Select col1 from tMyTable where someid=1)
FYI I am not expert
Assuming typical cardinality, I tend to write it more like this:
select top 10 col1, col2, col3
from vMyView v
inner join tMyTable t ON t.col1 = v.col1
WHERE t.someid = 1
However, if it's possible to match more than one row in tMyTable for each col1 value in vMyView, this could possibly result in duplicating rows from vMyView. If duplicating rows is possible, a solution based on row_number() is typically the fastest option.
i want to add pagination logic on this view
As for paging, you should look into OFFSET/FETCH syntax, rather than TOP n.
SELECT col1, col2, col3
FROM vMyView v
ORDER BY <need an order by clause for paging to work>
OFFSET <pagenumber * pagesize> FETCH NEXT <pagesize> ROWS ONLY
I am looking to find an effective way to delete duplicated records from my database. First, I used a stored procedure that uses joins and such, which caused the query to execute very slow. Now, I am trying a different approach. Please consider the following queries:
/* QUERY A */
SELECT *
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value
This query just executed in 12 seconds, with a result of 182.400 records. The row count in the table is currently 420.930.407, and col1 and col3 are indexed.
The next query:
/* QUERY B */
WITH ALL_RECORDS AS
(SELECT id
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value)
SELECT *
FROM ALL_RECORDS
This query took less than 2 seconds, and gives me all the id's of the 182.400 records in the table (according to the where clause).
Then, my last query, is a query that selects the lowest (first) id of all records grouped on the columns I want to group on to check for duplicates:
/* QUERY C */
SELECT MIN(id)
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value
GROUP BY col1,
col2,
col3,
col4,
col5,
col6
Again, this query executes in less than 2 seconds. The result is 30.400, which means there are 30.400 unique records among the 182.400 records that are unique.
Now, I'd like to delete (or, first select to make sure I have my query right) all records that are not unique. So, I'd like to remove 182.400 - 30.400 = 152.000 records from my_table.
I thought I'd combine the two last queries: get all id's that belong to my dataset according to the where clause on col1, col2 and col3 (query B), and then delete/select all records from that dataset of which the id is not in the id list of the unique record id's (query C).
However, when I select all from query B where query B.id NOT IN query C, the query does not take 2, 4 or 12 (14 or 16) seconds, but seems to take forever (20.000 records shown after 1 minute, around 40.000 after 2 minutes, so I canceled the query since it'll find 152.000 records, which will take 8 minutes this way).
WITH ALL_RECORDS AS
(SELECT id
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value)
SELECT id
FROM ALL_RECORDS
WHERE id NOT IN
(SELECT MIN(id)
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value
GROUP BY col1,
col2,
col3,
col4,
col5,
col6)
I know NOT IN is slow, but I can't grasp how it's THIS slow (since both queries without the not in part execute in less than 2 seconds each).
Does anyone have some good advice for me on how to solve this puzzle?
------------------ Additional information ------------------
Previous solution was the following stored procedure. For some reason it executes perfectly on my acceptance environment, but not on my production environment. Currently, we have over 400 million records on production and a little over 2 million records on acceptance, so this might be a reason.
DELETE my_table
FROM my_table
LEFT OUTER JOIN
(SELECT MIN(id) AS RowId,
col1,
col2,
col3,
col4,
col5,
col6
FROM my_table
WHERE col1 = value
AND col2 = value
AND col3 = value
GROUP BY col1,
col2,
col3,
col4,
col5,
col6) AS KeepRows ON my_table.id = KeepRows.RowId
WHERE KeepRows.RowId IS NULL
AND my_table.col1 = value
AND my_table.col2 = value
AND my_table.col3 = value
I have based this solution on another answer on stackoverflow (can't find it at the moment), but I feel I should be able to create a query based on Query B and C that executes within a few seconds...
with dupl as (
select row_number() over(partition by col1,col2,col3,col4,col5,col6 order by id) rn,
id,col1,col2,col3,col4,col5,col6
from myTable
)
delete dupl where rn>1
Combining two 2-second queries together will not, generally, result in a single 4-second query, because queries, unlike their underlying tables, are rarely indexed.
Usual approach for this kind of tasks is to cache id's you want to keep in a temporary table, index it accordingly and then use it in the left join (or not in - I bet the resulting execution plans are practically the same).
You can probably get some more performance if you will play with indices on the main table. For example, I think that (col1, col2, col3) should give your code some boost (columns should not necessarily be mentioned in this order, it usually depends on their cardinalities).
How can I move data in similar tables (same number of columns, data types. If they are not same, it can be achieved with a view I hope) most efficiently between schemas of the same postgresql database?
EDIT
Sorry for the vagueness. I intend to use the additional schemas as archives for data not often needed (to improve performance). To be more precise data older than 2 years is to be archived. It is okay to take the server offline, but by not more than a day, at most 2. It is an accounting software for a medium sized company. By liberal estimates the number of records in an year wont go near a million.
insert into target_schema.table_one (col1, col2, col3)
select col1, col2, col3
from source_schema.other_table
where <some condition to select the data to be moved>;
If you really want to "move" the data (i.e. delete the rows from the source table), you need to can use
If the table is the target of a foreign key you cannot use truncate in that case you need to use
delete from source_schema.other_table
where <some condition to select the data to be moved>;
You can combine both steps into a single statement, if you want to:
with deleted_data as (
delete from source_schema.other_table
where <some condition to select the data to be moved>;
returning *
)
insert into target_schema.table_one (col1, col2, col3)
select col1, col2, col3
from deleted_data;
There's 2 ways I've found of upserting many rows into a table with SQL Server 2008.
One of which is found here http://technet.microsoft.com/en-us/library/bb522522(v=sql.105).aspx says to create a temp table, then insert values to temp table, and finally merge that table with target able.
This doesn't seem very efficient to me because you have to create a table, fill the table, merge to target table, and then delete the temp table.
The only other thing I can think of is as follows...
MERGE dbo.targettable as tgt
USING (
SELECT 12 as col1, 13 as col2, 'abc' as col3, 'zyx' as col4
UNION ALL
SELECT 11 as col1, 11 as col2, 'def' as col3, 'def' as col4
(etc etc)
UNION ALL
SELECT 7 as col1, 10 as col2, 'jfj' as col3, 'tub' as col4)
as new
ON tgt.col1=new.col1
WHEN MATCHED THEN UPDATE SET tgt.col2=new.col2, tgt.col3=new.col3, tgt.col4=new.col4
WHEN NOT MATCHED THEN INSERT (col1, col2, col3, col4)
VALUES(new.col1, new.col2, new.col3, new.col4);
Based on usr's answer I was able to find http://msdn.microsoft.com/en-us/library/bb510625.aspx
I think this is the way to do it. Could someone verify that this syntax appears correct?
MERGE dbo.targettable as tgt
USING (VALUES(12, 13, 'abc', 'zyx'), (11, 11, 'def', 'def'),(7, 10, 'jfj', 'tub'))
AS new (col1, col2, col3, col4)
ON tgt.col1=new.col1
WHEN MATCHED THEN UPDATE SET tgt.col2=new.col2, tgt.col3=new.col3, tgt.col4=new.col4
WHEN NOT MATCHED THEN INSERT (col1, col2, col3, col4)
VALUES(new.col1, new.col2, new.col3, new.col4);
Where does the data to be merged come from?
If it comes from a query, inline the query into the merge.
If it
comes from the app, use table-valued parameters.
If it is generated
iteratively, use a temp table or table variable.
If it is a constant like in your example use the VALUES clause. Don't use UNION ALL because it is more verbose, does not document semantics nicely and increases query compile time because the optimizer has to convert it to VALUES form.