SQL Server - indexed view with string_agg

SQL Server - indexed view with string_agg - sql-server

I try to define an indexed view to create full text search index on it.
The view itself is created correctly:
CREATE OR ALTER VIEW dbo.my_view WITH SCHEMABINDING AS
SELECT p.id as protector_id,
p.name as protector_name,
string_agg(cast(c.name as nvarchar(max)), ', ') as crops_names,
count_big(*) as count_big
FROM dbo.protectors p
INNER JOIN dbo.protectors_crops pc on p.id = pc.protector_id
INNER JOIN dbo.crops c on pc.crop_id = c.id
GROUP BY p.id, p.name
But when I try to create an index:
CREATE UNIQUE CLUSTERED INDEX my_view_index ON dbo.my_view (protector_id)
i get an error:
[S0001][10125] Cannot create index on view "dbo.my_view" because it uses aggregate "STRING_AGG". Consider eliminating the aggregate, not indexing the view, or using alternate aggregates. For example, for AVG substitute SUM and COUNT_BIG, or for COUNT, substitute COUNT_BIG.
Documentation doesn't state anything about STRING_AGG, neither I can find any solution to replace it.

Although STRING_AGG is not currently listed as a disalowed element in the current documentation, it is indeed not allowed since it is called out explicitly in the error message. Minimal example:
CREATE TABLE dbo.test_agg(
id int
,col varchar(10)
)
GO
CREATE VIEW dbo.vw_test_agg
WITH SCHEMABINDING
AS
SELECT
id
, STRING_AGG(col, ',') AS col
, COUNT_BIG(*) AS CountBig
FROM dbo.test_agg
GROUP BY id;
GO
Msg 10125, Level 16, State 1, Line 21 Cannot create index on view
"tempdb.dbo.vw_test_agg" because it uses aggregate "STRING_AGG".
Consider eliminating the aggregate, not indexing the view, or using
alternate aggregates. For example, for AVG substitute SUM and
COUNT_BIG, or for COUNT, substitute COUNT_BIG.
Also, note STRING_AGG is a deterministic function so it's not disallowed for that reason:
SELECT
name
, COLUMNPROPERTY(OBJECT_ID(N'dbo.vw_test_agg'), name, 'IsDeterministic') AS IsDeterministic
FROM sys.columns AS c
WHERE
object_id = OBJECT_ID(N'dbo.vw_test_agg')
AND name = N'col';
name
IsDeterministic
col
1

Read the documentation again.
If the view definition contains a GROUP BY clause, the key of the
unique clustered index can reference only the columns specified in
the GROUP BY clause
Don't think string_agg is deterministic - so that is likely another issue. I would skip the inclusion of the name in the view to avoid the extra join and additional overhead. Is Name unique as well or is ID the only guaranteed unique row in your first table? As it stands now, you the tuple <id, name> is unique for your statement.

Related

Find Child with Parent having specific information

I am trying to find children whose parent have some specific information from different relational tables.
I have four tables as shown below
Search Criteria : Get all the "Section" who has parent as "Inventory" level with attached User name containing 'a' letter and role id is 'employee' (Please see LevelsUser table for relation).
I tried CTE (common table expression') approach to find the correct Section level but here I have to pass level Id as hard coded value and I cannot search all Section in the table.
WITH LevelsTree AS
(
SELECT Id, ParentLevelId, Level
FROM Levels
WHERE Level='Section' // here i need to pass value
UNION ALL
SELECT ls.Id, ls.ParentLevelId, ls.Level
FROM Levels ls
JOIN LevelsTree lt ON ls.Id = lt.ParentLevelId
)
SELECT * FROM LevelsTree
I need to find all sections match the above criteria.
Please help me here.

For hierarchical checks you need to select from and then join to the same table Levels. So something like this should help you:
declare #parentLevelName varchar(20) = 'Inventory';
with cte as (
select distinct
l1.id,
l1.Level
from Levels l1
join Levels l2 on l2.id=l1.ParentLevelId
and l2.Level = #parentLevelName -- use variable instead of hardcoded `Inventory`
where l1.Level='Section' -- replace `Section` with #var containing your value
) select * from cte
join LevelUsers lu on lu.LevelId=cte.id
join Users u on u.Id = lu.UserId
and u.UserName like '%a%' -- this letter check is not efficient
join Role r on r.id=lu.RoleId and r.Role='employee'
Note, the above query selects data only from the 4 tables which you have described in DB schema. However, you original query contains a reference to the HierarchyPosition table which you haven't described. If you really need to include HiearchyPosition reference then specify how it relates to the other 4 tables.
Also note, condition and u.UserName like '%a%' used to satisfy your requirement of User name containing 'a' letter is not efficient because of the leading %, which prevents the use of indexes. Consider changing your requirements if possible to User name starts with 'a' letter. This way and u.UserName like 'a%' will allow the use of index over Users table if it exists.
HTH

How does DISTINCT work in SQL Server 2008 R2? Are there other options? [duplicate]

I need to retrieve all rows from a table where 2 columns combined are all different. So I want all the sales that do not have any other sales that happened on the same day for the same price. The sales that are unique based on day and price will get updated to an active status.
So I'm thinking:
UPDATE sales
SET status = 'ACTIVE'
WHERE id IN (SELECT DISTINCT (saleprice, saledate), id, count(id)
FROM sales
HAVING count = 1)
But my brain hurts going any farther than that.

SELECT DISTINCT a,b,c FROM t
is roughly equivalent to:
SELECT a,b,c FROM t GROUP BY a,b,c
It's a good idea to get used to the GROUP BY syntax, as it's more powerful.
For your query, I'd do it like this:
UPDATE sales
SET status='ACTIVE'
WHERE id IN
(
SELECT id
FROM sales S
INNER JOIN
(
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(*) = 1
) T
ON S.saleprice=T.saleprice AND s.saledate=T.saledate
)

If you put together the answers so far, clean up and improve, you would arrive at this superior query:
UPDATE sales
SET status = 'ACTIVE'
WHERE (saleprice, saledate) IN (
SELECT saleprice, saledate
FROM sales
GROUP BY saleprice, saledate
HAVING count(*) = 1
);
Which is much faster than either of them. Nukes the performance of the currently accepted answer by factor 10 - 15 (in my tests on PostgreSQL 8.4 and 9.1).
But this is still far from optimal. Use a NOT EXISTS (anti-)semi-join for even better performance. EXISTS is standard SQL, has been around forever (at least since PostgreSQL 7.2, long before this question was asked) and fits the presented requirements perfectly:
UPDATE sales s
SET status = 'ACTIVE'
WHERE NOT EXISTS (
SELECT FROM sales s1 -- SELECT list can be empty for EXISTS
WHERE s.saleprice = s1.saleprice
AND s.saledate = s1.saledate
AND s.id <> s1.id -- except for row itself
)
AND s.status IS DISTINCT FROM 'ACTIVE'; -- avoid empty updates. see below
db<>fiddle here
Old sqlfiddle
Unique key to identify row
If you don't have a primary or unique key for the table (id in the example), you can substitute with the system column ctid for the purpose of this query (but not for some other purposes):
AND s1.ctid <> s.ctid
Every table should have a primary key. Add one if you didn't have one, yet. I suggest a serial or an IDENTITY column in Postgres 10+.
Related:
In-order sequence generation
Auto increment table column
How is this faster?
The subquery in the EXISTS anti-semi-join can stop evaluating as soon as the first dupe is found (no point in looking further). For a base table with few duplicates this is only mildly more efficient. With lots of duplicates this becomes way more efficient.
Exclude empty updates
For rows that already have status = 'ACTIVE' this update would not change anything, but still insert a new row version at full cost (minor exceptions apply). Normally, you do not want this. Add another WHERE condition like demonstrated above to avoid this and make it even faster:
If status is defined NOT NULL, you can simplify to:
AND status <> 'ACTIVE';
The data type of the column must support the <> operator. Some types like json don't. See:
How to query a json column for empty objects?
Subtle difference in NULL handling
This query (unlike the currently accepted answer by Joel) does not treat NULL values as equal. The following two rows for (saleprice, saledate) would qualify as "distinct" (though looking identical to the human eye):
(123, NULL)
(123, NULL)
Also passes in a unique index and almost anywhere else, since NULL values do not compare equal according to the SQL standard. See:
Create unique constraint with null columns
OTOH, GROUP BY, DISTINCT or DISTINCT ON () treat NULL values as equal. Use an appropriate query style depending on what you want to achieve. You can still use this faster query with IS NOT DISTINCT FROM instead of = for any or all comparisons to make NULL compare equal. More:
How to delete duplicate rows without unique identifier
If all columns being compared are defined NOT NULL, there is no room for disagreement.

The problem with your query is that when using a GROUP BY clause (which you essentially do by using distinct) you can only use columns that you group by or aggregate functions. You cannot use the column id because there are potentially different values. In your case there is always only one value because of the HAVING clause, but most RDBMS are not smart enough to recognize that.
This should work however (and doesn't need a join):
UPDATE sales
SET status='ACTIVE'
WHERE id IN (
SELECT MIN(id) FROM sales
GROUP BY saleprice, saledate
HAVING COUNT(id) = 1
)
You could also use MAX or AVG instead of MIN, it is only important to use a function that returns the value of the column if there is only one matching row.

If your DBMS doesn't support distinct with multiple columns like this:
select distinct(col1, col2) from table
Multi select in general can be executed safely as follows:
select distinct * from (select col1, col2 from table ) as x
As this can work on most of the DBMS and this is expected to be faster than group by solution as you are avoiding the grouping functionality.

I want to select the distinct values from one column 'GrondOfLucht' but they should be sorted in the order as given in the column 'sortering'. I cannot get the distinct values of just one column using
Select distinct GrondOfLucht,sortering
from CorWijzeVanAanleg
order by sortering
It will also give the column 'sortering' and because 'GrondOfLucht' AND 'sortering' is not unique, the result will be ALL rows.
use the GROUP to select the records of 'GrondOfLucht' in the order given by 'sortering
SELECT GrondOfLucht
FROM dbo.CorWijzeVanAanleg
GROUP BY GrondOfLucht, sortering
ORDER BY MIN(sortering)

tuning in sql server - views

I created a view in sql server 2012, such as:
create myview as
select mytable2.name
from mytable1 t1
join myTable2 t2
on t1.id = t2.id
I want that join table1 and table2 will be with correct index (id), but when I do:
select * from myview
where name = 'abcd'
I want that the last select will be with index of column 'name'.
What is the correct syntax in sql server with hints (tuning), that do the best run, as I have described?
I want to force using of index for join purpose only (the column = id), and forcing index name when doing:
select name from myview
where name = 'abcd'.
Something like
create myview as
select mytable2.name
/* index hint name on column name */
from mytable1 t1
join myTable2 t2
/* index hint name on column id - just for join */
on t1.id = t2.id
I don't want to force end-user that uses the view add hint when doing the view - just bring him the view as his with proper index hints.
(or, if it is not possible - how can I do that).
Need samples, please.
Thanks :)

I reckon creating an Index on the Name column would use the index, when selecting from view with the above shown where clause, you dont have to explicitly give any query hints to make that view use the index.
Index should be something like...
Index
CREATE NONCLUSTERED INDEX [IX_MyTable1_Name]
ON [dbo].[myTable2] ([CompanyName] ASC)
GO
View Definition
CREATE VIEW myview
AS
SELECT t2.name --<-- Use alias here since you have alised your table in from clause
FROM mytable1 t1
INNER JOIN myTable2 t2 ON t1.id = t2.id

Indexes in SqlServer are built from two sets of columns.
Create index IX on table B (Filter Columns,Sorting Columns) INCLUDE (Additional columns to be included).
And when selecting from views, the optimizer will incorporate indexes on the referenced tables.
The first set is the indexing table itself. Best practice is to place the columns by which you filter first, and then the columns by which you sort.
The second set (Include), are additional columns you add to the indexing table, so all the data you require is in the index (to prevent key look ups - dpending on your table design).
In your case, the order will be
1) Go to MyTable2 by name, and get all of the matching ID's.
2) With the Id's from step 1, find the matching ID's in Mytable1
Your indexes should be :
1) An index on Table2(Name,ID) or Table2(Name)Include(ID)
2) An index on Table1(ID)
There shouldn't be any hint used in this case.
And in general, you should avoid using hints.

SQL WHERE NOT EXISTS (skip duplicates)

Hello I'm struggling to get the query below right. What I want is to return rows with unique names and surnames. What I get is all rows with duplicates
This is my sql
DECLARE #tmp AS TABLE (Name VARCHAR(100), Surname VARCHAR(100))
INSERT INTO #tmp
SELECT CustomerName,CustomerSurname FROM Customers
WHERE
NOT EXISTS
(SELECT Name,Surname
FROM #tmp
WHERE Name=CustomerName
AND ID Surname=CustomerSurname
GROUP BY Name,Surname )
Please can someone point me in the right direction here.
//Desperate (I tried without GROUP BY as well but get same result)

DISTINCT would do the trick.
SELECT DISTINCT CustomerName, CustomerSurname
FROM Customers
Demo
If you only want the records that really don't have duplicates (as opposed to getting duplicates represented as a single record) you could use GROUP BY and HAVING:
SELECT CustomerName, CustomerSurname
FROM Customers
GROUP BY CustomerName, CustomerSurname
HAVING COUNT(*) = 1
Demo

First, I thought that #David answer is what you want. But rereading your comments, perhaps you want all combinations of Names and Surnames:
SELECT n.CustomerName, s.CustomerSurname
FROM
( SELECT DISTINCT CustomerName
FROM Customers
) AS n
CROSS JOIN
( SELECT DISTINCT CustomerSurname
FROM Customers
) AS s ;

Are you doing that while your #Tmp table is still empty?
If so: your entire "select" is fully evaluated before the "insert" statement, it doesn't do "run the query and add one row, insert the row, run the query and get another row, insert the row, etc."
If you want to insert unique Customers only, use that same "Customer" table in your not exists clause
SELECT c.CustomerName,c.CustomerSurname FROM Customers c
WHERE
NOT EXISTS
(SELECT 1
FROM Customers c1
WHERE c.CustomerName = c1.CustomerName
AND c.CustomerSurname = c1.CustomerSurname
AND c.Id <> c1.Id)
If you want to insert a unique set of customers, use "distinct"

Typically, if you're doing a WHERE NOT EXISTS or WHERE EXISTS, or WHERE NOT IN subquery,
you should use what is called a "correlated subquery", as in ypercube's answer above, where table aliases are used for both inside and outside tables (where inside table is joined to outside table). ypercube gave a good example.
And often, NOT EXISTS is preferred over NOT IN (unless the WHERE NOT IN is selecting from a totally unrelated table that you can't join on.)
Sometimes if you're tempted to do a WHERE EXISTS (SELECT from a small table with no duplicate values in column), you could also do the same thing by joining the main query with that table on the column you want in the EXISTS. Not always the best or safest solution, might make query slower if there are many rows in that table and could cause many duplicate rows if there are dup values for that column in the joined table -- in which case you'd have to add DISTINCT to the main query, which causes it to SORT the data on all columns.
-- Not efficient at all.
And, similarly, the WHERE NOT IN or NOT EXISTS correlated subqueries can be accomplished (and give the exact same execution plan) if you LEFT OUTER JOIN the table you were going to subquery -- and add a WHERE . IS NULL.
You have to be careful using that, but you don't need a DISTINCT. Frankly, I prefer to use the WHERE NOT IN subqueries or NOT EXISTS correlated subqueries, because the syntax makes the intention clear and it's hard to go wrong.
And you do not need a DISTINCT in the SELECT inside such subqueries (correlated or not). It would be a waste of processing (and for WHERE EXISTS or WHERE IN subqueries, the SQL optimizer would ignore it anyway and just use the first value that matched for each row in the outer query). (Hope that makes sense.)

T-SQL filtering on dynamic name-value pairs

I'll describe what I am trying to achieve:
I am passing down to a SP an xml with name value pairs that I put into a table variable, let's say #nameValuePairs.
I need to retrieve a list of IDs for expressions (a table) with those exact match of name-value pairs (attributes, another table) associated.
This is my schema:
Expressions table --> (expressionId, attributeId)
Attributes table --> (attributeId, attributeName, attributeValue)
After trying complicated stuff with dynamic SQL and evil cursors (which works but it's painfully slow) this is what I've got now:
--do the magic plz!
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
select distinct
e.expressionId, a.attributeName, a.attributeValue
into
#temp
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
group by
e.expressionId, a.attributeName, a.attributeValue
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select distinct
expressionId
from
#temp
group by expressionId
having count(*) = #noOfAttributes
Can people please review and see if they can spot any problems? Is there a better way of doing this?
Any help appreciated!

I belive that this would satisfy the requirement you're trying to meet. I'm not sure how much prettier it is, but it should work and wouldn't require a temp table:
SET #noOfAttributes = select count(*) from #nameValuePairs
SELECT e.expressionid
FROM expression e
LEFT JOIN (
SELECT attributeid
FROM attributes a
JOIN #nameValuePairs nvp ON nvp.name = a.Name AND nvp.Value = a.value
) t ON t.attributeid = e.attributeid
GROUP BY e.expressionid
HAVING SUM(CASE WHEN t.attributeid IS NULL THEN (#noOfAttributes + 1) ELSE 1 END) = #noOfAttributes
EDIT: After doing some more evaluation, I found an issue where certain expressions would be included that shouldn't have been. I've modified my query to take that in to account.

One error I see is that you have no table with an alias of b, yet you are using: a.attributeId = b.attributeId.
Try fixing that and see if it works, unless I am missing something.
EDIT: I think you just fixed this in your edit, but is it supposed to be a.attributeId = e.attributeId?

This is not a bad approach, depending on the sizes and indexes of the tables, including #nameValuePairs. If it these row counts are high or it otherwise becomes slow, you may do better to put #namValuePairs into a temp table instead, add appropriate indexes, and use a single query instead of two separate ones.
I do notice that you are putting columns into #temp that you are not using, would be faster to exclude them (though it would mean duplicate rows in #temp). Also, you second query has both a "distinct" and a "group by" on the same columns. You don't need both so I would drop the "distinct" (probably won't affect performance, because the optimizer already figured this out).
Finally, #temp would probably be faster with a clustered non-unique index on expressionid (I am assuming that this is SQL 2005). You could add it after the SELECT..INTO, but it is usually as fast or faster to add it before you load. This would require you to CREATE #temp first, add the clustered and then use INSERT..SELECT to load it instead.
I'll add an example of merging the queries in a mintue... Ok, here's one way to merge them into a single query (this should be 2000-compatible also):
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select
expressionId
from
(
select distinct
e.expressionId, a.attributeName, a.attributeValue
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
) as Temp
group by expressionId
having count(*) = #noOfAttributes

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight