Postgres index for a conditional join - database

I have 3 tables (all simplified here)
job (id int, type char(1))
data (job_id int, owner int, creator int, value int)
user (id int)
My query is
select user.id, job.id, sum(data.value)
from job
join data on job.id = data.job_id
join user on case job.type when 'O' then data.owner else data.creator end = user.id
group by user.id, job.id
How do I create an index in Postgres which caters for the case statement in the join?
Job would have maybe a dozen rows, users 1000s and data millions.
Thanks

In your example, you don't need the table "user" to get the results:
SELECT
CASE
WHEN job.type = '0' THEN DATA.OWNER
ELSE DATA.creator
END AS user_id,
job.ID,
DATA.VALUE
FROM
job
JOIN DATA ON job.ID = DATA.job_id -- indexes on the id's
GROUP BY 1,2;
Edit: Assumption: There is a foreign key between "data" and "user" that checks if a user exists.

Seems this isn't possible. Solved my issue by refactoring code to add a row into data for each type, along with a new "type" column and then indexed against this.

Related

Database Index when SQL statement includes "IN" clause

I have SQL statement which takes really a lot of time to execute and I really had to improve it somehow.
select * from table where ID=1 and GROUP in
(select group from groupteam where
department= 'marketing' )
My question is if I should create index on columns ID and GROUP would it help?
Or if not should I create index on second table on column DEPARTMENT?
Or I should create two indexes for both tables?
First table has 249003.
Second table has in total 900 rows while query in that table returns only 2 rows.
That is why I am surprised that response is so slow.
Thank you
You can also use EXISTS, depending on your database like so:
select * from table t
where id = 1
and exists (
select 1 from groupteam
where department = 'marketing'
and group = t.group
)
Create a composite index on individual indexes on groupteam's department and group
Create a composite index or individual indexes on table's id and group
Do an explain/analyze depending on your database to review how indexes are being used by your database engine.
Try a join instead:
select * from table t
JOIN groupteam gt
ON d.group = gt.group
where ID=1 AND gt.department= 'marketing'
Index on table group and id column and table groupteam group column would help too.

Can you cache the results of a subquery that's used repeatedly in a stored procedure?

I have a series of queries being UNION'd together. Each query has a WHERE... IN clause that compares against the same list of IDs.
In a simplified form for example purposes it looks like this:
SELECT * FROM MyTable
WHERE AuthorUserId IN (SELECT UserId FROM Users WHERE TeamId = #teamId)
UNION
SELECT * FROM MyTable
WHERE PublisherUserId IN (SELECT UserId FROM USERS WHERE TeamId = #teamId)
UNION...
and so on. #teamId is an int stored procedure parameter.
Is there a way to tell SQL Server to hold on to the result set of
SELECT UserId FROM USERS WHERE TeamId = #teamId
so it doesn't fetch it for each SELECT?
One way you could do this is by capturing the results of that query and storing it in a temp table, and JOINing to those results in your query:
Declare #UserIds Table (UserId Int)
Insert #UserIds (UserId)
SELECT UserId
FROM Users
WHERE TeamId = #teamId
SELECT M.*
FROM MyTable M
JOIN #UserIds U ON M.AuthorUserId = U.UserId
UNION
SELECT M.*
FROM MyTable M
JOIN #UserIds U ON M.PublisherUserId = U.UserId
UNION...
SQL server is smart enough to store results of same query if the parameters are same. If your subquery is not using session variables, you will be fine. You can also put index on TeamId which will make this query faster.
If you are still worried about it and using any programming language to get data from SP, then you should store it in a variable in code and then pass that result set into SP.
Second option would be to store it in temp table and then query it from temp table

Unique entries over to tables in database

I have a problem where I need to check that two columns in each table in a database are unique.
We have the database with barcode entries called uid and rid.
Table 1: T1.uid
And
Table 2: T2.rid
No barcodes in the two table columns must be the same.
How can we ensure that.
If a insertion of a barcode into table T1.uid matches an entry in
T2.rid we want to throw an error.
The tables are cleaned up and is in a consistent state where the entries in
T1.uid and T2.rid are unique over both table columns.
It is not possible to insert NULL values in the tables respective uid and tid column(T1.uid and T2.rid)
It is not possible to create a new table for all barcodes.
Because we don't have full control of the database server.
EDIT 19-02-2015
This solution cannot work for us, because we cannot make a new table
to keep track of the unique names(see table illustration).
We want to have a constraint over two columns in different tables without changing the schema.
Per the illustration we want to make it impossible for john to exist in
T2 because he already exists in table T1. So an error must be "thrown"
when we try to insert John in T2.Name.
The reason is that we have different suppliers that inserts into these tables
in different ways, if we change the schema layout, all suppliers would
need to change their database queries. The total work is just to much,
if we force every suppplier to make changes.
So we need something unobtrusive, that doesnt require the suppliers to change
their code.
A example could be that T1.Name is unique and do not accept NULL values.
If we try insert an existing name, like "Alan", then an exception will occur
because the column has unique values.
But we want to check for uniqueness in T2.Name at the same time.
The new inserted value should be unique over the two tables.
Maybe something like this:
SELECT uid FROM Table1
Where Exists (
SELECT rid FROM Table2
WHERE Table1.uid = rid )
This will show all rows from Table1 where their column uid has an equivalent in column rid of Table2.
The condition before the insertion happens could look like below. #Idis the id you need to insert the data for.
DECLARE #allowed INT;
SELECT #allowed = COUNT(*)
FROM
(
SELECT T1.uid FROM T1 WHERE T1.uid = #Id
UNION ALL
SELECT T2.rid FROM T2 WHERE T2.rid = #id
)
WHERE
#id IS NOT NULL;
IF #allowed = 0
BEGIN
---- insert allowed
SELECT 0;
END
Thanks to all who answered.
I have solved the problem. A trigger is added to the database
everytime an insert or update procedure is executed, we catch it
check that the value(s) to be inserted doens't exist in the columns of the two
tables. if that check is succesfull we exceute the original query.
Otherwise we rollback the query.
http://www.codeproject.com/Articles/25600/Triggers-SQL-Server
Instead Of Triggers

Which query will execute faster, a query which uses table object or a query which uses temporary table in sql server

I have created a stored procedure in which I have used a table object and inserted some column in it. Below is the procedure:
CREATE Procedure [dbo].[usp_Security] (#CredentialsList dbo.Type_UserCredentialsList ReadOnly) As
Begin
Declare
#Result Table
(
IdentityColumn Int NOT NULL Identity (1, 1) PRIMARY KEY,
UserCredentials nVarChar(4000),
UserName nVarChar(100),
UserRole nVarChar(200),
RoleID Int,
Supervisor Char(3),
AcctMaintDecn Char(3),
EditPendInfo Char(3),
ReqInstID Char(3)
)
Insert Into #Result
Select Distinct UserCredentials, 'No', D.RoleName, D.RoleID,'No', 'No', 'No' From #CredentialsList A
Join SecurityRepository.dbo.SecurityUsers B On CharIndex(B.DomainAccount, A.UserCredentials) > 0
Join SecurityRepository.dbo.SecurityUserRoles C On C.UserID = B.UserID
Join SecurityRepository.dbo.SecurityRoles D On D.RoleID = C.RoleID
Where D.RoleName Like 'AOT.%' And B.IsActive = 1 And D.IsActive = 1
Update A
Set A.UserName = B.UserName
From #Result A
Join #CredentialsList B On A.UserCredentials = B.UserCredentials
-- "Supervisor" Column
Update A
Set A.Supervisor = 'Yes'
From #Result A
Join SecurityRepository.dbo.SecurityUsers B On CharIndex(B.DomainAccount, A.UserCredentials) > 0
Join SecurityRepository.dbo.SecurityUserRoles C On C.UserID = B.UserID
Join SecurityRepository.dbo.SecurityRoles D On D.RoleID = C.RoleID
Where D.RoleName In ('AOT.Manager', 'AOT.Deps Ops Admin', 'AOT.Fraud Manager', 'AOT.Fulfillment Manager')
And B.IsActive = 1 And D.IsActive = 1
-- Return Result
Select * From #Result Order By UserName, UserRole
End
In the above procedure, I have made the use of Table object and then created a clustered index on that table object.
However, if I create a temporary table and then process the above info in the SP, will it be faster than using table object instead of temporary table. I tried creating a seperate Clustered index on a column in a table object, but it does not allow me to create it as we cannot create an index on a table object.
I wanted to make use of temporary table in the above stored procedure, but will it reduce the cost as compared to the use of table object.
It depends! - as always there are a lot of factors that come into play here.
A table variable tends to work best for small numbers of rows - e.g. 10, 20 rows - since it never has statistics, cannot have indices on it, and the SQL Server query optimizer will always assume it has just a single row of data. If you have too many rows in a table variable, this will badly skew the execution plan being determined.
Furthermore, the table variable doesn't participate in transaction handling, which can be a good or a bad thing - so if you insert 10 rows into a table variable inside a transaction and then roll back that transaction - those rows are still in your table variable. Just be aware of that!
The temporary table works best if you intend to have rather many rows, if you might even need to index something.
Temporary tables also behave just like regular tables in transactional processing, e.g. a transaction will affect those temporary tables.
But again: the real way to find out is to try it and measure it - and try again and measure again.

Table Valued Parameter has slow performance because of table scan

I have an aplication that passes parameters to a procedure in SQL. One of the parameters is an table valued parameter containing items to include in a where clause.
Because the table valued parameter has no statistics attached to it when I join my TVP to a table that has 2 mil rows I get a very slow query.
What alternatives do I have ?
Again, the goal is to pass certain values to a procedure that will be included in a where clause:
select * from table1 where id in
(select id from #mytvp)
or
select * from table1 t1 join #mytpv
tvp on t1.id = tvp.id
although it looks like it would need to run the query once for each row in table1, EXISTS often optimizes to be more efficient than a JOIN or an IN. So, try this:
select * from table1 t where exists (select 1 from #mytvp p where t.id=p.id)
also, be sure that t.id is the same datatype as p.id and t.id has an index.
You can use a temp table with an index to boost performance....(assuming you have more than a couple of records in your #mytvp)
just before you join the table you could insert the data from the variable #mytvp to a temp table...
here's a sample code to create a temp table with index....The primary key and unique field determines which columns to index on..
CREATE TABLE #temp_employee_v3
(rowID int not null identity(1,1)
,lname varchar (30) not null
,fname varchar (30) not null
,city varchar (20) not null
,state char (2) not null
,PRIMARY KEY (lname, fname, rowID)
,UNIQUE (state, city, rowID) )
I had the same issue that table-valued parameters where very slow in my context. I came up with a solution that passed the list of values as a comma separated string to the stored procedure. the procedure then made a PATINDEX(...) > 0 comparision. This was about a factor of 1:6 faster.
As mentioned here and explained here you can have primary key and unique constraints on the table type. E.g.
CREATE TYPE IdList AS TABLE ( Id UNIQUEIDENTIFIER NOT NULL PRIMARY KEY )
However, check if it improves performance in your case as now, these indexes exist when the TVP is populated which might lead to a counter effect depending if your input is sorted and/or if you use more than one column.
In common with table variables, table-valued parameters have no statistics (see the section "restrictions"); the query optimiser works on the assumption that they contain only one row, which if your parameter contains a lot of rows is likely to result in an inappropriate query plan.
One way to improve your chances of a better plan is to add a statement level recompile; this should enable the optimiser to take the size of the TVP into account when selecting a plan.
select * from table1 t where exists (select 1 from #mytvp p where t.id=p.id) OPTION (RECOMPILE)
(incorporating KM's suggestion)

Resources