I have a jsonb document in a table. This document has an array of cameraIds in the document. I am trying to join this data with the cameras table that is a normal table where cameraId is a column, and return unique rows from the table with the jsonb column (why I am using a group by in my query).
Any advice on how to optimize this query for performance would be greatly appreciated.
JSONB Col Example:
{
"date": {
"end": "2018-11-02T22:00:00.000Z",
"start": "2018-11-02T14:30:00.000Z"
},
"cameraIds": [100, 101],
"networkId": 5,
"filters": [],
"includeUnprocessed": true,
"reason": "some reason",
"vehicleFilter": {
"bodyInfo": "something",
"lpInfo": "something"
}
}
Query:
select ssr.id,
a.name as user_name,
ssr.start_date,
ssr.end_date,
ssr.created_at,
ssr.payload -> 'filters' as pretty_filters,
ssr.payload -> 'reason' as reason,
ssr.payload -> 'includePlates' as include_plates,
ssr.payload -> 'vehicleFilter' -> 'bodyInfo' as vbf,
ssr.payload -> 'vehicleFilter' -> 'lpInfo' as lpInfo,
array_agg(n.name) filter (where n.organization_id = ${orgId}) as network_names,
array_agg(c.name) filter (where n.organization_id = ${orgId}) as camera_names
from
ssr
cross join jsonb_array_elements(ssr.payload -> 'cameraIds') camera_id
inner join cameras as c on c.id = camera_id::int
inner join networks as n on n.id = c.network_id
inner join accounts as a on ssr.account_id = a.id
where n.organization_id = ${someId}
and ssr.created_at between ${startDate} and ${endDat}
group by 1,2,3,4,5,6,7,8,9,10
order BY ssr.created_at desc
OFFSET 0
LIMIT 25;
Your query says:
where n.organization_id = ${someId}
But then the aggregate FILTER says:
where n.organization_id = ${orgId}
... which is a contradiction. The aggregated arrays would always be empty - except where ${orgId} happens to be the same as ${someId}, but then the FILTER clause is useless noise. IOW, the query doesn't seem to make sense as given.
The query might make sense after dropping the aggregate FILTER clauses:
SELECT s.id
, a.name AS user_name
, s.start_date
, s.end_date
, s.created_at
, s.payload ->> 'filters' AS pretty_filters
, s.payload ->> 'reason' AS reason
, s.payload ->> 'includePlates' AS include_plates
, s.payload -> 'vehicleFilter' ->> 'bodyInfo' AS vbf
, s.payload -> 'vehicleFilter' ->> 'lpInfo' AS lpInfo
, cn.camera_names
, cn.network_names
FROM ssr s
JOIN accounts a ON a.id = s.account_id -- assuming referential integrity
CROSS JOIN LATERAL (
SELECT array_agg(c.name) AS camera_names -- sort order?
, array_agg(n.name) AS network_names -- same order? distinct?
FROM jsonb_array_elements_text(ssr.payload -> 'cameraIds') i(camera_id)
JOIN cameras c ON c.id = i.camera_id::int
JOIN networks n ON n.id = c.network_id
WHERE n.organization_id = ${orgId}
) cn
WHERE s.created_at BETWEEN ${startDate} AND ${endDate} -- ?
ORDER BY s.created_at DESC NULLS LAST
LIMIT 25;
Key is the LATERAL subquery, which avoids duplication of rows from ssr, so we can also drop the outer GROUP BY. Should be considerably faster.
Also note ->> instead of -> and jsonb_array_elements_text(). See:
How to turn JSON array into Postgres array?
I left some question marks at more dubious spots in the query. Notably, BETWEEN is almost always the wrong tool for timestamps. See:
Subtract hours from the now() function
Related
How would I filter out a table that it only includes one value for a column (it does not matter which one).
The SQL query used to create the below looks like this :
SELECT DISTINCT
S.Id AS ReferenceID,
M.NewModuleID AS ModuleId,
SM.Compulsory
FROM
Struct S
INNER JOIN
StructModule SM
ON SM.StructId = S.Id
INNER JOIN
ModuleMap M
ON M.StructId = S.Id
AND SM.ModuleId = M.OldModuleId
However this does not return the values in the way that I need it. the return table looks like this:
ReferenceID NewModuleID Compulsory
1 100 1
1 210 0
2 251 1
2 251 0
However I would like the SQL query to return a unique value for the NewModuleID field. Ideally taking the first occurrence of a value
the relevant columns of the above tables are as follows:
Struct:
ID (INT)
StructModule:
ID (INT)
StructID (INT)
ModuleID (INT)
Compulsory (BIT)
ModuleMap:
ID (INT)
OldModuleId (INT)
StructID (INT)
NewModuleID (INT)
Your question is not very clear, but after reading following statement.
However I would like the SQL query to return a unique value for the
NewModuleID field. Ideally taking the first occurrence of a value
I can guess that you are looking for something like following query.
SELECT * FROM
(
SELECT
S.Id AS ReferenceID,
M.NewModuleID AS ModuleId,
SM.Compulsory ,
ROW_NUMBER() OVER(PARTITION BY S.ID, M.NewModuleID ORDER BY M.NewModuleID) RN
FROM
Struct S
INNER JOIN
StructModule SM
ON SM.StructId = S.Id
INNER JOIN
ModuleMap M
ON M.StructId = S.Id
AND SM.ModuleId = M.OldModuleId
)T
WHERE RN=1
Note : You don't need distinct if you are using RN=1 condition.
I am trying to run a SELECT query using LEFT JOIN. I get a COUNT on my second table ( the table on the right side of LEFT JOIN ). This process becomes slightly heavy as the number of records on the second table goes up. My first and second table have a one-to-many relationship. The second table's CampaignId column is a foreign key to the first table's Id. This is a simplified version of my query:
SELECT a.[Id]
,a.CampaignId
,a.[Inserted] AS 'Date'
,COUNT(b.Id) AS 'Received'
FROM [CampaignRun] AS a
LEFT JOIN [CampaignRecipient] AS b
ON a.Id = b.CampaignRunId
GROUP BY
a.[Id], a.CampaignId,a.[Inserted]
HAVING
a.CampaignId = 637
ORDER BY
a.[Inserted] DESC
The number 637 is an example for one the records only.
Is there a way to make this query run faster?
Use a sub-select to calculate Received:
SELECT a.[Id]
,a.CampaignId
,a.[Inserted] AS 'Date'
, (SELECT COUNT(*) FROM [CampaignRecipient] AS b
WHERE a.Id = b.CampaignRunId ) AS 'Received'
FROM [CampaignRun] AS a
WHERE a.CampaignId = 637
ORDER BY a.[Inserted] DESC
You have unneed HAVING clause here, which you can move to WHERE clause
SELECT a.[Id]
,a.CampaignId
,a.[Inserted] AS 'Date'
,COUNT(b.Id) AS 'Received'
FROM [CampaignRun] AS a
LEFT JOIN [CampaignRecipient] AS b
ON a.Id = b.CampaignRunId
WHERE a.CampaignId = 637
GROUP BY a.[Id], a.CampaignId,a.[Inserted]
ORDER BY a.[Inserted] DESC
Also ensure that you have index on foreign key in [CampaignRecipient] table on CampaignRunId column. It's considered a good practice.
I need to select a random record from 3 tables and ensure I am ordering by photoOrder
Select TOP 1(a.id), a.mls_number, a.parcel_name, a.property_type, a.ownership_type, b.filename, b.photoOrder, c.county_Name
From property as a
Inner JOIN
listingPhotos as b on a.id = b.ListingID
LEFT JOIN
counties as C on a.county_name = c.id
WHERE a.isCommercial = 'True'
Order By NEWID()
So this query works, but I need to ensure that the b.filename record is ordered by b.photoOrder and thus the b.photoOrder should always be 1.
The b table (listing photos) has multiple photo files per property and I need to only select the photo that is 1st in the photo order.
Thanks
You could subquery your listingPhotos table and limit to WHERE PhotoOrder = 1:
Select TOP 1(a.id), a.mls_number, a.parcel_name, a.property_type, a.ownership_type, b.filename, b.photoOrder, c.county_Name
From property as a
Inner JOIN
(SELECT ListingID , filename, PhotoOrder FROM listingPhotos WHERE PhotoORder = 1
) as b on a.id = b.ListingID
LEFT JOIN
counties as C on a.county_name = c.id
WHERE a.isCommercial = 'True'
Order By NEWID()
This query works:
select p.Nombre as Nombre, c.Nombre as Categoria, s.Nombre as Subcategoria FROM Producto as p
inner join Subcategoria as s ON p.IDSubcategoria = s.ID
inner join Categoria as c on s.IDCategoria = c.ID
group by p.Nombre, c.Nombre, s.Nombre
order by p.Nombre
But when I remove the s.Nombre on the group by statement, I get this error:
Msg 8120, Level 16, State 1, Line 1
Column 'Subcategoria.Nombre' is
invalid in the select list because it
is not contained in either an
aggregate function or the GROUP BY
clause.
Can someone explain to me a little bit what the group by function does and why it allows the query to work?
In the interest of learning! Thanks.
When you state group by p.Nombre, you are specifying that there should be exactly 1 row of output for each distinct p.Nombre. Hence, other fields in the select clause must be aggregated (so that if there are multiple rows with the same p.Nombre, they can be 'collapsed' into one value)
By grouping on p.Nombre, c.Nombre, s.Nombre, you are saying that there should be exactly 1 row of output for each distinct tuple. Hence, it works (because the fields displayed are involved in the grouping clause).
If you use GROUP BY clause you can have on SELECT fields:
the fields that you already use in group by section
agregates (min, max, count....) on other fields
One little example:
MyTable
FieldA FieldB
a 1
a 2
b 3
b 5
Query:
select a, b from myTable GroupBy a
A B
a ?
b ?
Which values you want to have in the field B?
a-> 1 or a -> 2 or a -> 3 (1+2)
If the first you need min(a) aggregate function. If you need 2 - max. If 3 - sum().
The group by function collapses those rows that have the same value in the columns specified in the GROUP BY clause to just one row. For any other columns in your SELECT which are not specified in the GROUP BY clause, the SQL engine needs to know what to do with those columns too by way of an aggregation function, e.g. SUM, MAX, AVG, etc. If you don't specify an aggregation function then the engine throws an exception because it doesn't know what to do.
E.g.
select p.Nombre as Nombre, c.Nombre as Categoria, SUM(s.Nombre) as Subcategoria FROM Producto as p
inner join Subcategoria as s ON p.IDSubcategoria = s.ID
inner join Categoria as c on s.IDCategoria = c.ID
group by p.Nombre, c.Nombre
order by p.Nombre
A group-by clause is only required if you use aggregate functions like COUNT or MAX. As a side effect it removes duplicate rows. In your case it is simpler to remove duplicates by adding DISTINCT to the select clause, and removing the group-by clause altogether.
select DISTINCT p.Nombre as Nombre, c.Nombre as Categoria, s.Nombre as Subcategoria FROM Producto as p
inner join Subcategoria as s ON p.IDSubcategoria = s.ID
inner join Categoria as c on s.IDCategoria = c.ID
order by p.Nombre
How does one compare the text field of one record to all the other records in SQL server to return, for example, the top 5 most related records?
An example of the functionality I'm after are the various Related Posts plugins for Wordpress that produce a list of links to posts related to the post currently being viewed.
Cheers,
Iain
Thanks for these responses. I'm familiar with the referenced functions, but I'm not sure they do what I need. For example:
SELECT P.id, 'Product' AS Type, FT.rank, C.url + '/' + P.url AS url, longTitle, shortTitle, P.description
FROM Products P
INNER JOIN CONTAINSTABLE (Products, (longTitle, shortTitle), '"my text content"') AS FT ON P.id = FT.[key]
LEFT JOIN Product_Categories PC ON P.id = PC.productID
LEFT Join Categories C ON C.id = PC.categoryID
WHERE [primary] = 1
ORDER BY rank DESC
returns only rows with the exact phrase "my text content" - I need rows with only "text" to be returned, but at a lower rank. If I change the query as follows:
SELECT P.id, 'Product' AS Type, FT.rank, C.url + '/' + P.url AS url, longTitle, shortTitle, P.description
FROM Products P
INNER JOIN CONTAINSTABLE (Products, (longTitle, shortTitle), '"my" or "text" or "content"') AS FT ON P.id = FT.[key]
LEFT JOIN Product_Categories PC ON P.id = PC.productID
LEFT Join Categories C ON C.id = PC.categoryID
WHERE [primary] = 1
ORDER BY rank DESC
I get more rows, but rows with all three words don't appear to rank clearly higher than rows with 1 of the words.
Any further thoughts?
Here you go, from the excellent Robert Cain:
http://arcanecode.com/2007/06/28/getting-started-with-sql-server-2005-full-text-searching-part-3-%E2%80%93-using-sql/
You need to use CONTAINSTABLE , this returns a RANK column you can use to sort by.
SELECT TOP 5 [Key] FROM CONTAINSTABLE ([YourFullText],'SomethingToSearch')
ORDER BY [RANK] DESC