I'm pretty new to databases, so forgive me if this is a silly question.
I have this query:
SELECT* FROM myTable WHERE key1 = value1 AND key2 = value2
I think the time complexity of this query is O(log(n)), with "n" the size of myTable, but I'm not sure.
Is it correct or I'm missing something?
Related
I've got 2 tables,
'[Item] with field [name] nvarchar(255)
'[Transaction] with field [short_description] nvarchar(3999)
And I need to do thus :
Select [Transaction].id, [Item].id
From [Transaction] inner join [Item]
on [Transaction].[short_description] like ('%' + [Item].[name] + '%')
The above works if limited to a handful of items, but unfiltered is just going over 20 mins and I cancel.
I have a NC index on [name], but I cannot index [short_description] due to its length.
[Transaction] has 320,000 rows
[Items] has 42,000.
That's 13,860,000,000 combinations.
Is there a better way to perform this query ?
I did poke at full-text, but I'm not really that familiar, the answer was not jumping out at me there.
Any advice appreciated !!
Starting a comparison string with a wildcard (% or _) will NEVER use an index, and will typically be disastrous for performance. Your query will need to scan indexes rather than seek through them, so indexing won't help.
Ideally, you should have a third table that would allow a many-to-many relationship between Transaction and Item based on IDs. The design is the issue here.
After some more sleuthing I have utilized some Fulltext features.
sp_fulltext_keymappings
gives me my transaction table id, along with the FT docID
(I found out that 'doc' = text field)
sys.dm_fts_index_keywords_by_document
gives me FT documentId along with the individual keywords within it
Once I had that, the rest was simple.
Although, I do have to look into the term 'keyword' a bit more... seems that definition can be variable.
This only works because the text I am searching for has no white space.
I believe that you could tweak the FTI configuration to work with other scenarios... but I couldn't promise.
I need to look more into Fulltext.
My current 'beta' code below.
CREATE TABLE #keyMap
(
docid INT PRIMARY KEY ,
[key] varchar(32) NOT NULL
);
DECLARE #db_id int = db_id(N'<database name>');
DECLARE #table_id int = OBJECT_ID(N'Transactions');
INSERT INTO #keyMap
EXEC sp_fulltext_keymappings #table_id;
select km.[key] as transaction_id, i.[id] as item_id
from
sys.dm_fts_index_keywords_by_document ( #db_id, #table_id ) kbd
INNER JOIN
#keyMap km ON km.[docid]=kbd.document_id
inner join [items] i
on kdb.[display_term] = i.name
;
My actual version of the code includes inserting the data into a final table.
Execution time is coming in at 30 seconds, which serves my needs for now.
I have the below sql
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE ( BP.IsDeleted IS NULL
OR BP.IsDeleted = 0 )
GROUP BY BP.BoundProjectId
I already have an index on the table BoundProducts on this column order (BoundProjectId, IsDeleted)
Currently this query takes around 2-3 seconds to return the result. I am trying to reduce it to zero seconds.
This query returns 25077 rows as of now.
Please provide me any ideas to improvise the query.
Looking at this in a bit different point of view, I can think that your OR condition is screwing up your query, why not to rewrite it like this?
SELECT CAST(FORMAT(SUM(COALESCE(BP.InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet
, BP.BoundProjectId AS ProjectId
FROM (
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted IS NULL
UNION ALL
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted = 0
) AS BP
GROUP BY BP.BoundProjectId;
I've had better experience with UNION ALL rather than OR.
I think it should work totally the same. On top of that, I'd create this index:
CREATE NONCLUSTERED INDEX idx_BoundProducts_IsDeleted_BoundProjectId_iInstalledSubTotal
ON dbo.BoundProducts (IsDeleted, BoundProjectId)
INCLUDE (InstalledSubTotal);
It should satisfy your query conditions and seek index quite well. I know it's not a good idea to index bit fields, but it's worth trying.
P.S. Why not to default your IsDeleted column value to 0 and make it NOT NULLABLE? By doing that, it should be enough to do a simple check WHERE IsDeleted = 0, that'd boost your query too.
If you really want to try index seek, it should be possible using query hint forceseek, but I don't think it's going to make it any faster.
The options I suggested last time are still valid, remove format and / or create an indexed view.
You should also test if the problem is the query itself or just displaying the results after that, for example trying it with "select ... into #tmp". If that's fast, then the problem is not the query.
The index name in the screenshot is not the same as in create table statement, but I assume that's just a name you changed for the question. If the scan is happening to another index, then you should include that too.
-- Holds last 30 valdates
create table #valdates(
date int
)
insert into #valdates
select distinct top (30) valuation_date
from tbsm.tbl_key_rates_summary
where valuation_date <= 20150529
order by valuation_date desc
select
sum(fv_change), sc_group, valuation_date
from
(select *
from tbsm.tbl_security_scorecards_summary
where valuation_date in (select date from #valdates)) as fact
join
(select *
from tbsm.tbl_security_classification
where sc_book = 'UC' ) as dim on fact.classification_id = dim.classification_id
group by
valuation_date, sc_group
drop table #valdates
This query takes around 40 seconds to return because the fact table has almost 13 million rows.. Can I do anything about this?
Based on the fact that there's no proper index that supports the fetch, that's probably the easiest (or only) option to really improve the performance. Most likely index like this would improve the situation a lot:
create index idx_security_scorecards_summary_1 on
tbl_security_scorecards_summary (valuation_date, classification_id)
include (fv_change)
Everything depends of course on how good the selectivity of the valuation_date and classification_id fields are (=how big portion of the table needs to be fetched) and might work better with the fields in opposite order. The field fv_change is in the include section so that it's included in the index structure so there's no need to fetch it from the base table.
Include fields help if the SQL has to fetch a lot of rows from the table. If the amount of rows that this touches is small, then it might not help at all. Like always in indexing, this of course slows down the inserts / updates, and is optimized for this case only and you should of course look at the bigger picture too.
The select is written in a little bit strange way, not sure if that makes any difference, but you could also try the normal way to do this:
select
sum(fact.c), dim.sc_group, fact.valuation_date
from
tbsm.tbl_security_scorecards_summary fact
join tbsm.tbl_security_classification dim
on fact.classification_id = dim.classification_id
where
fact.valuation_date in (select date from #valdates) and
dim.sc_book = 'UC'
group by
fact.valuation_date,
dim.sc_group
Looking at "statistics io" output should give you a good idea which table is causing the slowness, and looking at query plan to see if there's any strange operators might help to understand the situation better.
I have tables
Book:
Id | Name | ...
UrlRecord:
Id | EntityId | Entityname | Slug >> to store id-less url for many other tables like Category | Book | BookChapter...
So the data is huge.
EntityId=> contains Id in other table like bookid, categoryid, chapterId...
Id EntityId Entityname Slug
1 2 Category truyen-tranh
2 2 BookChapter chapter-one
....
SearchBookDetails stored procedure:
SELECT p.Source,
(SELECT Slug from UrlRecord url where EntityName = 'Category' and EntityId = (SELECT top(1) CategoryId from Book_Category_Mapping bc where bc.BookId = p.Id)
) as CategorySeName
FROM ....
the performance is very slow, up to 22 seconds if I have the CategorySeName clause above because it's a heavy query.
However, i don't know how to improve the performance and still get the CategorySeName value return like above.
Your problem is the correlated subnquery. This is an extremely poor technique that changes your select statment into a what is basically a cursor and runs it row-by-agonizing-row. Never use them if you have a large data set. Use a derived table or a CTE or a temp table instead.
You use EntityId to point to N other tables, like bookid, categoryid, chapterId.
You're table design is wrong, it's actually impossible to set a foreign key.
It is wrong, because that way you cannot enforce foreign-keys.
And much worse, this will result in slow query performance, because there has been no index created automagically, as it does when you create a foreign key.
The query optimizer will thus come up with a very ugly execution plan, which explains why it is that slow.
If you must have an object id, you can createa a view and do:
COALESCE(bookid, categoryid, chapterId) AS EntityId
but I very much doubt object_id, or EntityId as you call it, is of any use to you that way.
PS:
string comparison instead of using an id is always a bad idea
where EntityName = 'Category'
combining those two antipatterns is an especially good idea.
I have a huge table to work with . I want to check if there are some records whose parent_id equals my passing value .
currently what I implement this is by using "select count(*) from mytable where parent_id = :id"; if the result > 0 , means the they do exist.
Because this is a very huge table , and I don't care what's the exactly number of records that exists , I just want to know whether it exists , so I think count(*) is a bit inefficient.
How do I implement this requirement in the fastest way ? I am using Oracle 10.
#
According to hibernate Tips & Tricks https://www.hibernate.org/118.html#A2
It suggests to write like this :
Integer count = (Integer) session.createQuery("select count(*) from ....").uniqueResult();
I don't know what's the magic of uniqueResult() here ? why does it make this fast ?
Compare to "select 1 from mytable where parent_id = passingId and rowrum < 2 " , which is more efficient ?
An EXISTS query is the one to go for if you're not interested in the number of records:
select 'Y' from dual where exists (select 1 from mytable where parent_id = :id)
This will return 'Y' if a record exists and nothing otherwise.
[In terms of your question on Hibernate's "uniqueResult" - all this does is return a single object when there is only one object to return - instead of a set containing 1 object. If multiple results are returned the method throws an exception.]
There's no real difference between:
select 'y'
from dual
where exists (select 1
from child_table
where parent_key = :somevalue)
and
select 'y'
from mytable
where parent_key = :somevalue
and rownum = 1;
... at least in Oracle10gR2 and up. Oracle's smart enough in that release to do a FAST DUAL operation where it zeroes out any real activity against it. The second query would be easier to port if that's ever a consideration.
The real performance differentiator is whether or not the parent_key column is indexed. If it's not, then you should run something like:
select 'y'
from dual
where exists (select 1
from parent_able
where parent_key = :somevalue)
select count(*) should be lighteningly fast if you have an index, and if you don't, allowing the database to abort after the first match won't help much.
But since you asked:
boolean exists = session.createQuery("select parent_id from Entity where parent_id=?")
.setParameter(...)
.setMaxResults(1)
.uniqueResult()
!= null;
(Some syntax errors to be expected, since I don't have a hibernate to test against on this computer)
For Oracle, maxResults is translated into rownum by hibernate.
As for what uniqueResult() does, read its JavaDoc! Using uniqueResult instead of list() has no performance impact; if I recall correctly, the implementation of uniqueResult delegates to list().
First of all, you need an index on mytable.parent_id.
That should make your query fast enough, even for big tables (unless there are also a lot of rows with the same parent_id).
If not, you could write
select 1 from mytable where parent_id = :id and rownum < 2
which would return a single row containing 1, or no row at all. It does not need to count the rows, just find one and then quit. But this is Oracle-specific SQL (because of rownum), and you should rather not.
For DB2 there is something like select * from mytable where parent_id = ? fetch first 1 row only. I assume that something similar exists for oracle.
This query will return 1 if any record exists and 0 otherwise:
SELECT COUNT(1) FROM (SELECT 1 FROM mytable WHERE ROWNUM < 2);
It could help when you need to check table data statistics, regardless table size and any performance issue.