Update row_number value in view joined to different table - sql-server

I created a view in SQL Server that includes a row_number() function. The table referenced in the view contains every record in my database and enumerates records based on duplicate instances of a composite ID. For example:
row_number() over (
partition by
composite_id
order by
sample_value) as rownum
The issue is that whenever I join this view against another table (or filter rows based on a WHERE clause), the row number nevertheless always returns the value that would be returned for the full table referenced in the view. Instead, I'd like the row number to update depending on the records that are ultimately returned in the eventual result set.
For example:
select *
from my_created_view a
where a.sample_value in ('a','b','c')
or
select *
from my_created_view a
inner join subset_of_data b on a.sample_value = b.sample_value
...where either query above would result in a smaller number of records than are contained in the full original table and the resulting set of composite_id would sometimes contain only one instance. In cases where the result set contains only one instance of composite_id, I'd like that row to receive a value of 1.
Is this possible? Or does row numbering within a view create a row number that's tied only to the query within the created view?
Thanks in advance for any light you can shed here!

Related

Create column to constantly update with rownum in SQL Server

I have a table in SQL Server database. It has a column testrownum. I want to update the testrownum column to be rownum whenever row is created in the table automatically.
Is there any setting that I can turn on to achieve this?
Perhaps it is best to get a row number at the time you SELECT your data, by using the ROW_NUMBER() function.
As others have pointed out (comments section) the data in a table is essentially unordered. You can however assign a row number for a certain ordering when you select data as follows (suppose a table that only has one column being name):
SELECT
name,
rownr=ROW_NUMBER() OVER(ORDER BY name)
FROM
name_table
You can create trigger function.
Trigger is a function which is called every time data is insert, update or delete from table (you can specify when you want your function to be called when creating trigger). So you can add two triggers, one for insert and one for delete and just increment/decrement value you want.

Order BY is not supported in view in sql server

i am trying to create a view in sql server.
create view distinct_product as
select distinct name from stg_user_dtlprod_allignmnt_vw order by name;
this is showing an error.
error message is:
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.
plz help me out where i am wrong.
You could use TOP with a number that is greater than the number of records:
CREATE VIEW [dbo].[distinct_product]
AS
SELECT DISTINCT TOP 10000000 name
FROM stg_user_dtlprod_allignmnt_vw
ORDER BY name
You cannot use TOP 100 PERCENT since the optimizer recognizes that TOP 100 PERCENT qualifies all rows and does not need to be computed at all, so the ORDER BY wouldn't be guaranteed.
A view cannot be sorted with an ORDER BY clause. You need to put the ORDER BY clause into any query that references the view.
A view is not materialized - the data isn't stored, so how could it be sorted? A view is kind of like a stored procedure that just contains a SELECT with no parameters... it doesn't hold data, it just holds the definition of the query. Since different references to the view could need data sorted in different ways, the way that you do this - just like selecting from a table, which is also an unsorted collection of rows, by definition - is to include the order by on the outer query.
You can't order a view like that when it's created as the message states, unless you follow the other answers from Tim / Raphael, but you can order results selected from a view:
So create it in step 1:
create view distinct_product as
select distinct name
from stg_user_dtlprod_allignmnt_vw
Then order it when you retrieve data:
select *
from distinct_product
order by name

Hierarchical SQL select-query

I'm using MS SqlServer 2008. And I have a table 'Users'. This table has the key field ID of bigint. And also a field Parents of varchar which encodes all chain of user's parent IDs.
For example:
User table:
ID | Parents
1 | null
2 | ..
3 | ..
4 | 3,2,1
Here user 1 has no parents and user 4 has a chain of parents 3->2->1. I created a function which parses the user's Parents field and returns result table with user IDs of bigint.
Now I need a query which will select and join IDs of some requested users and theirs parents (order of users and theirs parents is not important). I'm not an SQL expert so all I could come up with is the following:
WITH CTE AS(
SELECT
ID,
Parents
FROM
[Users]
WHERE
(
[Users].Name = 'John'
)
UNION ALL
SELECT
[Users].Id,
[Users].Parents
FROM [Users], CTE
WHERE
(
[Users].ID in (SELECT * FROM GetUserParents(CTE.ID, CTE.Parents) )
))
SELECT * FROM CTE
And basically it works. But performance of this query is very poor. I believe WHERE .. IN .. expression here is a bottle neck. As I understand - instead of just joining the first subquery of CTE (ID's of found users) with results of GetUserParents (ID's of user parents) it has to enumerate all users in the Users table and check whether the each of them is a part of the function's result (and judging on execution plan - Sql Server does distinct order of the result to improve performance of WHERE .. IN .. statement - which is logical by itself but in general is not required for my goal. But this distinct order takes 70% of execution time of the query). So I wonder how this query could be improved or perhaps somebody could suggest some another approach to solve this problem at all?
Thanks for any help!
The recursive query in the question looks redundant since you already form the list of IDs needed in GetUserParents. Maybe change this into SELECT from Users and GetUserParents() with WHERE/JOIN.
select Users.*
from Users join
(select ParentId
from (SELECT * FROM Users where Users.Name='John') as U
cross apply [GetDocumentParents](U.ID, U.Family, U.Parents))
as gup
on Users.ID = gup.ParentId
Since GetDocumentParents expects scalars and select... where produces a table, we need to apply the function to each row of the table (even if we "know" there's only one). That's what apply does.
I used indents to emphasize the conceptual parts of the query. (select...) as gup is the entity Users is join'd with; (select...) as U cross apply fn() is the argument to FROM.
The key knowledge to understanding this query is to know how the cross apply works:
it's a part of a FROM clause (quite unexpectedly; so the syntax is at FROM (Transact-SQL))
it transforms the table expression left of it, and the result becomes the argument for the FROM (i emphasized this with indent)
The transformation is: for each row, it
runs a table expression right of it (in this case, a call of a table-valued function), using this row
adds to the result set the columns from the row, followed by the columns from the call. (In our case, the table returned from the function has a single column named ParentId)
So, if the call returns multiple rows, the added records will be the same row from the table appended with each row from the function.
This is a cross apply so rows will only be added if the function returns anything. If this was the other flavor, outer apply, a single row would be added anyway, followed by a NULL in the function's column if it returned nothing.
This "parsing" thing violates even the 1NF. Make Parents field contain only the immediate parent (preferably, a foreign key), then an entire subtree can be retrieved with a recursive query.

SQL Server indexed calculated column that sums another table

I'd like to effectively add a calculated column, which sums a column from selected rows in another table. I need to to quickly retrieve and search for values in the calculated column without re-computing the sum.
The calculated column I'd like to add would look like this in Dream-SQL:
ALTER TABLE Invoices ADD Balance
AS SUM(Transactions.Amount) WHERE Transactions.InvoiceId = Invoices.Id
Of course, this doesn't work. My understanding is that you can't add a calculated column that references another table. However, it appears that an indexed view can contain such a column.
The project is based on Entity Framework Code First. The application needs to quickly find non-zero balances.
Assuming an indexed view is the way to go, what is the best approach to integrating it with the Invoices and Transactions tables to make it easy use with LINQ to Entities? Should the indexed view contain all the columns in the Invoices table or just the Balance (what gets persisted)? A code snippet of the SQL to create the recommended view and index would be helpful.
An indexed view won't work because it would only index expressions in the GROUP BY clause, which means it can't index the sum. A computed column won't work because the sum can't be persisted or indexed.
A trigger works, however:
CREATE TRIGGER UpdateInvoiceBalance ON Transactions AFTER INSERT, UPDATE AS
IF UPDATE(Amount) BEGIN
SET NOCOUNT ON;
WITH InvoiceBalances AS (
SELECT Transactions.InvoiceId, SUM(Transactions.Amount) AS Balance
FROM Transactions
JOIN inserted ON Transactions.InvoiceId = inserted.InvoiceId
GROUP BY Transactions.InvoiceId)
UPDATE Invoices
SET Balance = InvoiceBalances.Balance
FROM InvoiceBalances
WHERE Invoices.Id = InvoiceBalances.InvoiceId
END
It also helps to provide a default value of 0 for the Balance column since when you mark it as DatabaseGeneratedOption.Computed, EF won't provide any value for it when adding an Invoice row.

How does sql server choose values in an update statement where there are multiple options?

I have an update statement in SQL server where there are four possible values that can be assigned based on the join. It appears that SQL has an algorithm for choosing one value over another, and I'm not sure how that algorithm works.
As an example, say there is a table called Source with two columns (Match and Data) structured as below:
(The match column contains only 1's, the Data column increments by 1 for every row)
Match Data
`--------------------------
1 1
1 2
1 3
1 4
That table will update another table called Destination with the same two columns structured as below:
Match Data
`--------------------------
1 NULL
If you want to update the ID field in Destination in the following way:
UPDATE
Destination
SET
Data = Source.Data
FROM
Destination
INNER JOIN
Source
ON
Destination.Match = Source.Match
there will be four possible options that Destination.ID will be set to after this query is run. I've found that messing with the indexes of Source will have an impact on what Destination is set to, and it appears that SQL Server just updates the Destination table with the first value it finds that matches.
Is that accurate? Is it possible that SQL Server is updating the Destination with every possible value sequentially and I end up with the same kind of result as if it were updating with the first value it finds? It seems to be possibly problematic that it will seemingly randomly choose one row to update, as opposed to throwing an error when presented with this situation.
Thank you.
P.S. I apologize for the poor formatting. Hopefully, the intent is clear.
It sets all of the results to the Data. Which one you end up with after the query depends on the order of the results returned (which one it sets last).
Since there's no ORDER BY clause, you're left with whatever order Sql Server comes up with. That will normally follow the physical order of the records on disk, and that in turn typically follows the clustered index for a table. But this order isn't set in stone, particularly when joins are involved. If a join matches on a column with an index other than the clustered index, it may well order the results based on that index instead. In the end, unless you give it an ORDER BY clause, Sql Server will return the results in whatever order it thinks it can do fastest.
You can play with this by turning your upate query into a select query, so you can see the results. Notice which record comes first and which record comes last in the source table for each record of the destination table. Compare that with the results of your update query. Then play with your indexes again and check the results once more to see what you get.
Of course, it can be tricky here because UPDATE statements are not allowed to use an ORDER BY clause, so regardless of what you find, you should really write the join so it matches the destination table 1:1. You may find the APPLY operator useful in achieving this goal, and you can use it to effectively JOIN to another table and guarantee the join only matches one record.
The choice is not deterministic and it can be any of the source rows.
You can try
DECLARE #Source TABLE(Match INT, Data INT);
INSERT INTO #Source
VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4);
DECLARE #Destination TABLE(Match INT, Data INT);
INSERT INTO #Destination
VALUES
(1, NULL);
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN #Source Source
ON Destination.Match = Source.Match;
SELECT *
FROM #Destination;
And look at the actual execution plan. I see the following.
The output columns from #Destination are Bmk1000, Match. Bmk1000 is an internal row identifier (used here due to lack of clustered index in this example) and would be different for each row emitted from #Destination (if there was more than one).
The single row is then joined onto the four matching rows in #Source and the resultant four rows are passed into a stream aggregate.
The stream aggregate groups by Bmk1000 and collapses the multiple matching rows down to one. The operation performed by this aggregate is ANY(#Source.[Data]).
The ANY aggregate is an internal aggregate function not available in TSQL itself. No guarantees are made about which of the four source rows will be chosen.
Finally the single row per group feeds into the UPDATE operator to update the row with whatever value the ANY aggregate returned.
If you want deterministic results then you can use an aggregate function yourself...
WITH GroupedSource AS
(
SELECT Match,
MAX(Data) AS Data
FROM #Source
GROUP BY Match
)
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN GroupedSource Source
ON Destination.Match = Source.Match;
Or use ROW_NUMBER...
WITH RankedSource AS
(
SELECT Match,
Data,
ROW_NUMBER() OVER (PARTITION BY Match ORDER BY Data DESC) AS RN
FROM #Source
)
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN RankedSource Source
ON Destination.Match = Source.Match
WHERE RN = 1;
The latter form is generally more useful as in the event you need to set multiple columns this will ensure that all values used are from the same source row. In order to be deterministic the combination of partition by and order by columns should be unique.

Resources