WHERE IN as Subquery in Postgres

WHERE IN as Subquery in Postgres - arrays

I have two table.
Table users has one column category_ids as array of integer integer[] which is storing as {1001,1002}
I am trying to get all categories details but it is not working.
SELECT *
FROM categories
WHERE id IN (Select category_ids from users where id=1)
When I run Select category_ids from users where id=1 I am getting result as {1001,1002}. How to make {1001,1002} as (1001,1002) or what should I change to make work above query?

You could use =ANY():
SELECT *
FROM categories
JOIN users ON categories.id =any(category_ids)
WHERE users.id = 1;
But why are you using an array?

I would use an EXISTS condition:
SELECT *
FROM categories c
WHERE EXISTS (Select *
from users u
where c.id = any(u.category_ids)
and u.id = 1);

This should possibly be the fastest option, but you should EXPLAIN ANALYZE all the answers to make sure.
SELECT *
FROM categories c
WHERE id IN (SELECT unnest(category_ids) from users where id=1)
SELECT *
FROM categories c
JOIN (SELECT unnest(category_ids) AS id from users where id=1) AS foo
ON c.id=foo.id
The first query will remove duplicates in the array due to IN(), the second will not.
I didn't test the syntax so there could be typos. Basically unnest() is a function that expands an array to a set of rows, ie it turns an array into a table that you can then use to join to other tables, or put into an IN() clause, etc. The opposite is array_agg() which is an aggregate function that returns an array of aggregated elements. These two are quite convenient.
If the optimizer turns the =ANY() query into a seq scan with a filter, then unnest() should be much faster as it should unnest the array and use nested loop index scan on categories.
You could also try this:
SELECT *
FROM categories c
WHERE id =ANY(SELECT category_ids from users where id=1)

Related

Return multiple rows with subquery in SQL select clause

Is there a way to return multiple rows in a SQL select clause, when your subquery does not have a join/key field? My query currently looks like this. I'm wanting to return list of users and a list of contracts when there is no key between the users and contracts.
There is a handful of users, but a whole lot of contracts and I'm wanting to generate a list of each contractID next to each userID.
select
userid,
(select contractid from contracts) as contractid
from users
New here, but the suggestion for a cross join did what I wanted. thanks!

You can generate all possible combinations of users with constracts by performing a CROSS JOIN. For example:
select
u.*,
c.*
from users u
cross join contracts c
You can, then, filter the result by appending WHERE <condition> as needed.

compare column with text array in postgressql

We have a PostgreSQL database where I have two tables with column text[] datatype. When I use an inner join to get details I do not see it is matching with any row.
for e.g.
create table test{
names text[],
id
}
create table test_b{
names text[],
id
}
now when I run below query,
SELECT t.* from test t inner join test_b tt
where t.names=tt.names;
I don't see any results. I even tried normal query with
SELECT * FROM test where names='{/test/test}';
It also did not work.
Any suggestions?

I suspect that you want array overlap operator &&. Also, since you don't need to return anything from test_b, using exists with a correlated subquery is more relevant than a join:
select t.*
from test t
where exists (select 1 from test_b b where t.names && b.names)

Postgresql Select from values in array

i am converting multiple rows in to a array using array_agg() function,
and i need to give that array to a select statements where condition.
My query is,
SELECT * FROM table WHERE id =
ALL(SELECT array_agg(id) FROM table WHERE some_condition)
but it gives error, how can i over come it..

the error has been cleared by type casting the array, using my query like this
SELECT * FROM table WHERE id =
ALL((SELECT array_agg(id) FROM table WHERE some_condition)::bigint[])
reference link

It seems like you are over-complicating things. As far as I can tell, your query should be equivalent to simple:
SELECT * FROM table WHERE some_condition
Or, if you are selecting from 2 different tables, use join:
SELECT table1.*
FROM table1
JOIN table2 ON table1.id = table2.id
WHERE some_condition
Not only this is simpler, it is also faster than fiddling with arrays.

T-SQL filtering on dynamic name-value pairs

I'll describe what I am trying to achieve:
I am passing down to a SP an xml with name value pairs that I put into a table variable, let's say #nameValuePairs.
I need to retrieve a list of IDs for expressions (a table) with those exact match of name-value pairs (attributes, another table) associated.
This is my schema:
Expressions table --> (expressionId, attributeId)
Attributes table --> (attributeId, attributeName, attributeValue)
After trying complicated stuff with dynamic SQL and evil cursors (which works but it's painfully slow) this is what I've got now:
--do the magic plz!
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
select distinct
e.expressionId, a.attributeName, a.attributeValue
into
#temp
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
group by
e.expressionId, a.attributeName, a.attributeValue
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select distinct
expressionId
from
#temp
group by expressionId
having count(*) = #noOfAttributes
Can people please review and see if they can spot any problems? Is there a better way of doing this?
Any help appreciated!

I belive that this would satisfy the requirement you're trying to meet. I'm not sure how much prettier it is, but it should work and wouldn't require a temp table:
SET #noOfAttributes = select count(*) from #nameValuePairs
SELECT e.expressionid
FROM expression e
LEFT JOIN (
SELECT attributeid
FROM attributes a
JOIN #nameValuePairs nvp ON nvp.name = a.Name AND nvp.Value = a.value
) t ON t.attributeid = e.attributeid
GROUP BY e.expressionid
HAVING SUM(CASE WHEN t.attributeid IS NULL THEN (#noOfAttributes + 1) ELSE 1 END) = #noOfAttributes
EDIT: After doing some more evaluation, I found an issue where certain expressions would be included that shouldn't have been. I've modified my query to take that in to account.

One error I see is that you have no table with an alias of b, yet you are using: a.attributeId = b.attributeId.
Try fixing that and see if it works, unless I am missing something.
EDIT: I think you just fixed this in your edit, but is it supposed to be a.attributeId = e.attributeId?

This is not a bad approach, depending on the sizes and indexes of the tables, including #nameValuePairs. If it these row counts are high or it otherwise becomes slow, you may do better to put #namValuePairs into a temp table instead, add appropriate indexes, and use a single query instead of two separate ones.
I do notice that you are putting columns into #temp that you are not using, would be faster to exclude them (though it would mean duplicate rows in #temp). Also, you second query has both a "distinct" and a "group by" on the same columns. You don't need both so I would drop the "distinct" (probably won't affect performance, because the optimizer already figured this out).
Finally, #temp would probably be faster with a clustered non-unique index on expressionid (I am assuming that this is SQL 2005). You could add it after the SELECT..INTO, but it is usually as fast or faster to add it before you load. This would require you to CREATE #temp first, add the clustered and then use INSERT..SELECT to load it instead.
I'll add an example of merging the queries in a mintue... Ok, here's one way to merge them into a single query (this should be 2000-compatible also):
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select
expressionId
from
(
select distinct
e.expressionId, a.attributeName, a.attributeValue
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
) as Temp
group by expressionId
having count(*) = #noOfAttributes

SQL Server 2005 Performance: Distinct or full table in WHERE IN statement

We have two Tables:
Document: id, title, document_type_id, showon_id
DocumentType: id, name
Relationship: DocumentType hasMany Documents. (Document.document_type_id = DocumentType.id)
We wish to retrieve a list of all document types for one given ShowOn_Id.
We see two possiblities:
SELECT DocumentType.*
FROM DocumentType
WHERE DocumentType.id IN (
SELECT DISTINCT Document.document_type_id FROM Document WHERE showon_id = 42
);
SELECT DocumentType.*
FROM DocumentType
WHERE DocumentType.id IN (
SELECT Document.document_type_id FROM Document WHERE showon_id = 42
);
Our question is: when and if is it better to use the DISTINCT to get the smaller record set versus retrieving the whole table and the IN statement walking the table to the first match. (We guess that's what it does ;-))
Is this different for different databases, is there a common answer?
Or is there a better way of doing it? (We are in .NET land)

You can use a join:
SELECT DISTINCT DocumentType.*
FROM DocumentType
INNER JOIN Document
ON DocumentType.id=Document.document_type_id
WHERE Document.showon_id = 42
I think it's the best way to do it.

For the best performance you should use:
SELECT DISTINCT dt.*
FROM
DocumentType dt
INNER JOIN Document d ON dt.id=d.document_type_id and d.showon_id = 42
Joins are very efficient at bridging multiple tables where as the nested query in the Where clause will need to perform a separate result selection that will filter down the From clause results. The join statement is also much more readable.
I would also put an index on showon_id, in addition to the primary keys and foreign key relationship.
My answer differs from wmasm's answer only by moving the showon_id filter up to the inner join. For MS SQL 2k5, I think the interpreter is smart enough to do this automatically, but you always want to work with the smallest result set possible. Bringing your filters up to inner join statements can limit the number of rows the query has to work with when joining many tables together. If you do this though, you should understand that this happens for every row comparison so complex filters (such as like x = '%a' or function calls) are better left for the Where clause so that the inner joins may filter out unnecessary comparisons.

Use an EXISTS. It sometimes is faster, but in my opinion, more readable than a DISTINCT and JOIN. Just for kicks, pls reply with the query plan for this query and the JOIN above, and see if anything is different (they may be optimized down to the same plan). If they are the same, I'd recommend the EXISTS as it is closer to a "plain language" description than a JOIN (because you don't want any of the data from Document, etc.)
SELECT whatever
FROM DocumentType dt
WHERE EXISTS( SELECT *
FROM Document
WHERE dt.id = document_type_id
AND showon_id = 42)
To get the query plan (ref: http://msdn.microsoft.com/en-us/library/ms180765(SQL.90).aspx), do:
SET SHOWPLAN_TEXT ON
GO
SELECT ...
GO

From my point of view it should not make any difference inside SQL Server (but who knows how this is implemented).
Think of it this way: to return the resultset the server needs to go into the Document table and retrieve all document_type_id WHERE showon_id = 42. In the process of retrieving the document_type_ids (e.g. by index seeking) it puts them into a hash table. When this process has finished the hash table will contain distinct values anyway. After that the query execution goes inside the Document_Type table, scans the primary key and probes into the hash table. Note that this depends, e.g. maybe it's more efficient to not use a hash table, when the expected row count from the Document table it low compared to Document_Type, but in general you get the same query plan as for the query wmasm just suggested.

Follow up on Matt's answer:
I've enabled the query plan and tested the following four different queries that have come up so far:
SELECT DocumentType.* FROM DocumentType WHERE DocumentType.id IN (SELECT DISTINCT Document.document_type_id FROM Document WHERE showon_id = 42);
SELECT DocumentType.* FROM DocumentType WHERE DocumentType.id IN (SELECT Document.document_type_id FROM Document WHERE showon_id = 42);
SELECT DISTINCT DocumentType.* FROM DocumentType INNER JOIN Document ON DocumentType.id=Document.document_type_id WHERE Document.showon_id = 42;
SELECT DocumentType.* FROM DocumentType WHERE EXISTS ( SELECT * FROM Document WHERE DocumentType.id=Document.document_type_id AND showon_id = 42);
The query plan for all four queries turned out to be the same:
|--Hash Match(Right Semi Join, HASH:([Document].[document_type_id])=([DocumentType].[Id]))
|--Hash Match(Inner Join, HASH:([Document].[Title], [Uniq1005])=([Document].[Title], [Uniq1005]), RESIDUAL:([Document].[Title] as [Document].[Title] = [Document].[Title] as [Document].[Title] AND [Uniq1005] = [Uniq1005]))
| |--Index Seek(OBJECT:([Document].[IX_Document_3] AS [Document]), SEEK:([Document].[showon_id]=(1)) ORDERED FORWARD)
| |--Index Scan(OBJECT:([Document].[IX_Document_1] AS [Document]))
|--Table Scan(OBJECT:([DocumentType] AS [DocumentType]))
I am not sure what every line and element means, but it seems that from the performance perspective it does not matter how you construct the query for this kind of problem...