teradata, optimization query - query-optimization

I make a request in teradata. after a while the session is terminated cpu> 100000 s
How can I optimize a query?
select a, b, c, d, e from table where (a = '55' or a='055') and date > '20180701'

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows.
ALTER TABLE table ADD INDEX (a);

Related

How to insert data into a table such that possible extra columns in data get added to the parent table?

I'm trying to insert daily imported data into a SQL Server (2017) table. While most of the time the imported data has a fixed amount of columns, sometimes the client wants to add a new column to the data-to-be-imported.
I'm seeking for a solution that when the data gets imported (whether it is from another table, from R or from .csv's, don't mind this), SQL would automatically add the missing (extra) column to the parent table, providing the column name and assigning NULL to all previous entries.
I've tried with both UNION ALL and BULK INSERT, but both of these require the same # of columns. I'm working with SSMS2017, R3.4.1.
Next, I tried with a staging table and modifying the UNION clause as:
SELECT * FROM Table_new
UNION ALL
SELECT Tp.*, '' FROM Table_parent Tp;
But more often than not the extra column doesn't occur, so the column dimension problem occurs again.
I also thought about running the queries from R with DBI and odbc dbWriteTable() and handling the invalid column error with TryCatch(), parsing the column name from the error message and so on, but this would be a shakiest craft I've ever done and would prefer not to.
Ultimately I thought adding an if clause in R, and depending on the number of added new columns, loop and add the ', ""' part to the SQL query to create the extra columns. I'm convinced that this is too complex solution to this problem.
# Pseudo-R
#calculate the difference between lenght(colnames)
diff <- diff(length(colnames_new, colnames_parent)
if diff = 0 {
dbQuery(BULK INSERT INTO old SELECT * FROM new;)
} else if diff > 0 {
dbQuery(paste0(SELECT * FROM new
UNION ALL
SELECT T1.*, loop_paste(, '' /* for every diff */), FROM parent T1;))
} else if diff < 0 {
dbQuery(SELECT * FROM parent
UNION ALL
SELECT T2.*, loop_paste(, '' /* for every diff */), FROM new T2;))
}
To summarize: when inserting data to SQL table, how to (automatically) append the columns in the parent table, when necessary? Thanks!
The things in your database such as tables, columns, primary keys, foreign keys, check clauses are all part of the database schema. People design the schema before adding data to the database.
If you want to add new columns then you have to redesign your schema. When you do this you will also have to rewrite some of the CRUD procedures.

Columns of an index contain all PKs. Not efficient?

For example,
TABLE_A has a, b, c columns.
a is a PK and indexed as PK_TABLE_A. And there is an index called IDX_TABLE_A that contains b, a in order.
SELECT a, b
FROM TABLE_A
WHERE a = #P1 AND b = #P2
This query will use PK_TABLE_A and b predicate will be ignored.
SELECT a, b
FROM TABLE_A
WHERE b = #P2
This query will use IDX_TABLE_A. But a doesn't have to be indexed. Being an included column will be more efficient.
Are there any reasonable cases IDX_TABLE_A indexes a column?
Including columns in an index that do not help with locating particular rows can still improve performance of a query, by allowing values for those columns to be retrieved directly from the index record, without following a reference from the index record to the table record to obtain them. Queries whose selected columns are all included in one (or more) indexes are called "covered" queries; an index "covers" all the desired columns and the database does not need to access the table rows themselves to build the query results.
The index on (b,a) in TABLE_A might exist to speed up a query that matches on b, or possibly both b and a (these could be exact matches, range matches or other kinds), and wants to quickly return only the values of b and a in the query results, but not the values of column c.

Deleting old records from a very big table based on criteria

I have a table (Table A) that contains 300 million records, I want to do a data retention activity on basis of some criteria. So I want to delete about 200M records of the table.
Concerning the performance, I planned to create a new table (Table-B) with the oldest 10M records from Table-A. Then I can select records from Table-B which matches the criteria and will delete it in Table A.
Extracting 10M records from Table-A and loading into Table-B using SQL Loader takes ~5 hours.
I already created indexes and I use parallel 32 wherever applicable.
What I wanted to know is,
Is there any better way to extract from Table-A and to load it in Table-B.
Is there any better approach other than creating a temp table(Table-B).
DBMS: Oracle 10g, PL/SQL and Shell.
Thanks.
If you want to delete 70% of the records of your table, the best way is to create a new table that contains the remaining 30% of the rows, drop the old table and rename the new table to the name of the old table. One possibility to create the new table is a create-table-as-select statement (CTAS), but there are also possibilities that make the impact on the running system much smaller, e.g. one can use materialized views to select the remaining data and convert the materialized vie to a table. The details of the approach depend on the requirements.
This reading and writing is much more efficient then deleting the rows of the old table.
If you delete the rows of the old table it is probably necessary to reorganize the old table which will also end up in writing these remaining 30% of data.
Partitioning the table by your criteria may be an option.
Consider a case with the criteria is the month. All January data falls into the Jan partition. All February data falls into the Feb partition...
Then when it comes time to drop all the old January data, you just drop the partition.
Using rowid best to use but inline cursor can help u more
Insert into table a values ( select * from table B where = criteria) then truncate table A
Is there any better way to extract from Table-A and to load it in? You can use parallel CTAS - create table-b as select from table-a. You can use compression and parallel query in one step.
Table-B. Is there any better approach other than creating a temp
table(Table-B)? Better approach would be partitioning of table a
Probably better approach would be partitioning of Table A but if not you can try fast and simple:
declare
i pls_integer :=0 ;
begin
for r in
( -- select what you want to move to second table
SELECT
rowid as rid,
col1,
col2,
col3
FROM
table_a t
WHERE
t.col < SYSDATE - 30 --- or other criteria
)
loop
insert /*+ append */ into table_b values (r.col1, r.col2, r.col3 ); -- insert it to second table
delete from table_a where rowid = r.rid; -- and delete it
if i < 500 -- check your best commit interval
then
i:=i+1;
else
commit;
i:=0;
end if;
end loop;
commit;
end;
In above example you will move your records in small 500 rows transactions. You can optimize it using collection and bulk insert but i wanted to keep simple code.
I was missing one index on a column that i was using in a search criteria.
Apart from this there was some indexes missing on referenced tables too.
Apart from this #miracle173 answer is also good but we are having some foreign key too that might create problem if we had used that approach.
+1 to #miracle173

SQL Index performance ,, which is better?

Is the following SQL good or bad practice from a performance perspective?
Two queries, searching by a common column:
CREATE INDEX tbl_idx ON tbl (a, b);
SELECT id, a, b
FROM tbl
WHERE a = #a
AND b = #b;
SELECT id, a, b
FROM tbl
WHERE b = #b;
This index
CREATE INDEX tbl_idx ON tbl (a, b);
Will be useful for these queries
where a= and b =
where a= and b>
where a like 'someval%' and b=
but not useful for these queries:
where b=
where a> and b=
where a like '%someval%' and b=
where isnull(a,'')= and b=
In summary, in a multicolumn index, if SQL Server was able to do a seek on first key column then the index would be useful..
Coming to your question, the first query would benefit from index you created whereas second query may tend to do a scan on this index..
There are many factors which dictate whether seek is good or bad.In some cases SQL Server may tend to not use the index available like bookmark lookup cost exceeds limit..
References:
https://blogs.msdn.microsoft.com/craigfr/2006/07/07/seek-predicates/
https://blogs.msdn.microsoft.com/craigfr/2006/06/26/scans-vs-seeks/
https://www.youtube.com/watch?v=-m426WYclz8
If you reverse the index column order (b,a), then the index may be useful to both queries. Furthermore, if id is the primary key implemented as a clustered index, the index will cover both queries because the clustering key is implicitly included as the row locator. Otherwise, id could be explictly added as an included column to provide the best performance:
CREATE INDEX tbl_idx ON tbl (a, b)
INCLUDE(id);

SQL Server index on group of columns doesn't perform well on individual column seek

I have a non-clustered index on a group of columns as (a, b, c, d), and I already use this index using a common query we have which search for those four columns in the where clause.
On the other side, when I try to search for column (a) by simply using:
select count(*)
from table
where a = value
here the performance is fine and execution plan shows it used my index.
But when I try to search for column (d) by simply using:
select count(*)
from table
where d = value
here the performance is bad, and execution plan already used same index but it shows hint that index is missing and impact is 98% and it suggest creating new index for the column (d).
Just for testing, I tried to create new index on this column and the performance become very good.
I don't want to stuck in redundant indices as the table is very huge (30GB) and it has about 100 million rows.
Any idea why my main index didn't perform well with all columns?
Column a data type is INT
Column d data type is TINYINT
SQL Server version is 2014 Enterprise.
Thanks.
Abed
If you have complex index on 4 columns (A,B,C,D )then you could use queries which filter:
1) WHERE A=...
2) WHERE A=... AND B =...
3) WHERE A=... AND B =... AND C=....
3) WHERE A=... AND B =... AND C=.... AND D=...
You CAN'T skip lead portion of the index, if you will filter like this :
WHERE B= ... AND C= ... AND D=... (thus, skipping A) performance will be BAD.
TRY creating separate indexes on each column, they are more flexible.

Resources