The question is simply, what do I need to change in order to make this query updatable?
For those more comfortable with SQL, this is what the graphic represents:
SELECT tblAnswers.[Case Number], tblAnswers.Set,
tblAnswers.[Q#], tblAnswers.Level, tblAnswers.Position,
tblAnswers.Answer, tblQuestions.Question
FROM (tblAnswers
INNER JOIN tblSets ON
(tblAnswers.Position = tblSets.Position) AND
(tblAnswers.Level = tblSets.QLevel) AND
(tblAnswers.[Q#] = tblSets.QNum) AND
(tblAnswers.Set = tblSets.SetName))
INNER JOIN tblQuestions ON
tblSets.Q_ID = tblQuestions.Q_ID
WHERE (((tblAnswers.[Case Number]) Is Not Null) AND
((tblAnswers.Position)>400))
ORDER BY tblAnswers.[Case Number], tblAnswers.Set, tblAnswers.[Q#], tblAnswers.Position;
As you can probably guess, it's part of a questionnaire-type db app. It needs to display a number of informational fields including, naturally, the Question from tblQuestions but only needs to capture the tblAnswers.Answer.
I'm trying not to cloud the issue by including irrelevant information (noise). The tables shown have the same relationships defined in the relationship window. One possible hint/definite annoyance is that I can't set them to enforce relational integrity. Maybe whatever is preventing that is keeping the query from being updatable.
Related
I have some problems with EF Core. Every time when I write some linq in C# for getting data from the database, it adds a useless select * from statement. I can't figure out why it does this.
The raw SQL query works pretty quickly - 100ms vs 300ms using linq
This is the method in C#**:
return (from pr in _db.ex_DocumentExt1_PR
from doc in _db.ex_Document.Where(doc => doc.DOCID == pr.DOCID).DefaultIfEmpty()
from docAc in _db.ex_DOCAction.Where(docAc => docAc.DOCID == pr.DOCID).DefaultIfEmpty()
from st in _db.ex_Status.Where(st => st.STATUS_ID == doc.DOC_STATUS).DefaultIfEmpty()
from dep in _db.SSO_Entities.Where(dep => dep.Type == SSO_EntityTypes.COMPANY_STRUCTURE && dep.EntityCode == pr.RequestedForDepartamentId.ToString()).DefaultIfEmpty()
where docAc.ISPERFORMED == 1
&& docAc.ACTOR_ID == uid
&& doc.DOC_NUMBER != "YENI"
&& doc.DOC_NUMBER.Contains(searchText)
group new { doc, st, dep, docAc }
by new { doc.DOCID, doc.DOC_NUMBER, st.SHORT_NAME, dep.DisplayName, docAc.ACTION_PERFORMED } into g1
orderby g1.Key.ACTION_PERFORMED descending
select new LastActiveDocumentViewModel
{
DocId = g1.Key.DOCID,
DocNumber = g1.Key.DOC_NUMBER,
DocStatus = g1.Key.SHORT_NAME,
DocType = DocumentType.PR.ToString(),
Supplier = g1.Key.DisplayName,
Date = g1.Max(g => g.docAc.ACTION_PERFORMED)
});
This is SQL query generated by EF Core:
SELECT TOP (50)
[Project1].[C2] AS [C1],
[Project1].[DOCID] AS [DOCID],
[Project1].[DOC_NUMBER] AS [DOC_NUMBER],
[Project1].[SHORT_NAME] AS [SHORT_NAME],
[Project1].[C3] AS [C2],
[Project1].[DisplayName] AS [DisplayName],
[Project1].[C1] AS [C3]
FROM ( SELECT
[GroupBy1].[A1] AS [C1],
[GroupBy1].[K1] AS [DOCID],
[GroupBy1].[K2] AS [DOC_NUMBER],
[GroupBy1].[K3] AS [ACTION_PERFORMED],
[GroupBy1].[K4] AS [SHORT_NAME],
[GroupBy1].[K5] AS [DisplayName],
1 AS [C2],
N'PR' AS [C3]
FROM ( SELECT
[Filter1].[DOCID1] AS [K1],
[Filter1].[DOC_NUMBER] AS [K2],
[Filter1].[ACTION_PERFORMED] AS [K3],
[Filter1].[SHORT_NAME] AS [K4],
[Extent5].[DisplayName] AS [K5],
MAX([Filter1].[ACTION_PERFORMED]) AS [A1]
FROM (SELECT [Extent1].[RequestedForDepartamentId] AS [RequestedForDepartamentId], [Extent2].[DOCID] AS [DOCID1], [Extent2].[DOC_NUMBER] AS [DOC_NUMBER], [Extent3].[ACTOR_ID] AS [ACTOR_ID], [Extent3].[ACTION_PERFORMED] AS [ACTION_PERFORMED], [Extent4].[SHORT_NAME] AS [SHORT_NAME]
FROM [dbo].[ex_DocumentExt1_PR] AS [Extent1]
LEFT OUTER JOIN [dbo].[ex_Document] AS [Extent2] ON [Extent2].[DOCID] = [Extent1].[DOCID]
INNER JOIN [dbo].[ex_DOCAction] AS [Extent3] ON [Extent3].[DOCID] = CAST( [Extent1].[DOCID] AS bigint)
LEFT OUTER JOIN [dbo].[ex_Status] AS [Extent4] ON [Extent4].[STATUS_ID] = [Extent2].[DOC_STATUS]
WHERE ( NOT (('YENI' = [Extent2].[DOC_NUMBER]) AND ([Extent2].[DOC_NUMBER] IS NOT NULL))) AND (1 = [Extent3].[ISPERFORMED]) ) AS [Filter1]
LEFT OUTER JOIN [dbo].[SSO_Entities] AS [Extent5] ON ('COMPANY_STRUCTURE' = [Extent5].[Type]) AND (([Extent5].[EntityCode] = (CASE WHEN ([Filter1].[RequestedForDepartamentId] IS NULL) THEN N'' ELSE CAST( [Filter1].[RequestedForDepartamentId] AS nvarchar(max)) END)) OR (([Extent5].[EntityCode] IS NULL) AND (CASE WHEN ([Filter1].[RequestedForDepartamentId] IS NULL) THEN N'' ELSE CAST( [Filter1].[RequestedForDepartamentId] AS nvarchar(max)) END IS NULL)))
WHERE ([Filter1].[ACTOR_ID] = 1018) AND ([Filter1].[DOC_NUMBER] LIKE '%%' ESCAPE '~')
GROUP BY [Filter1].[DOCID1], [Filter1].[DOC_NUMBER], [Filter1].[ACTION_PERFORMED], [Filter1].[SHORT_NAME], [Extent5].[DisplayName]
) AS [GroupBy1]
) AS [Project1]
ORDER BY [Project1].[ACTION_PERFORMED] DESC
This is the raw SQL query I wrote that does the same thing as the Linq query:
SELECT TOP(50)
doc.DOCID,
doc.DOC_NUMBER,
'PR',
st.SHORT_NAME,
dep.DisplayName,
MAX(docAc.ACTION_PERFORMED)
FROM ex_DocumentExt1_PR pr
LEFT JOIN ex_Document doc ON doc.DOCID = pr.DOCID
LEFT JOIN ex_DOCAction docAc ON docAc.DOCID = doc.DOCID
LEFT JOIN ex_Status st ON st.STATUS_ID = doc.DOC_STATUS
LEFT JOIN SSO_Entities dep ON dep.Type = 'COMPANY_STRUCTURE' AND dep.EntityCode = pr.RequestedForDepartamentId
WHERE docAc.ISPERFORMED = 1
AND docAc.ACTOR_ID = 1018
AND doc.DOC_NUMBER != 'Yeni'
GROUP BY doc.DOCID, doc.DOC_NUMBER, st.SHORT_NAME, dep.DisplayName
ORDER BY MAX(docAc.ACTION_PERFORMED) DESC
EF is not intended to be a wrapper for SQL. I don't see any "SELECT *" in the generated SQL, though what you will encounter is a range of inner SELECT statements that EF builds to allow it to join tables that normally don't have established references to one another. This is a necessary evil for EF to be able to query across data based on how you want to relate them.
EF's strength is simplifying data access when working with properly normalized data structures where those relationships can be resolved either through convention or configuration. I don't agree that EF doesn't handle multiple tables "well", it can handle them quite quickly provided they are properly related and indexed. The reality though is that many data systems out there in the wild do not follow proper normalization and you end up with the need to query across loosely related data. EF can do it, but it won't be the most efficient at it.
If this is a new project / database whether leveraging Code First or Schema First, my recommendation would be to establish properly nomalized relationships with FKs and indexes/constraints between the tables.
If this is an existing database where you don't have the option to modify the schema then I would recommend employing a View to bind a desired entity model from where you can employ a more directly optimized SQL expression to get the data you want. This would be a distinct set of entities as opposed to the per-table entities that you would use to update data. The goal being larger, open-ended read operations with loose relationships leading to expensive queries can be optimized down, where update operations which should be "touching" far fewer records at a time can be managed via the table-based entities.
The queries don't look identical. For example, your query groups by 4 columns, whilst EF query groups by 5 - it has [Filter1].[ACTION_PERFORMED] in its group by clause, in addition to the other four. Depending on your testing data sample, they might behave similarly, but generally the results will differ.
As #allmhuran has noted in the comments, EF has a tendency to generate inefficient queries, especially when more than 2 tables are involved. Personally, when I find myself in such a situation, I create a database view, put the query there, add the view to the DbContext and select directly from it. In extreme scenarios, that might even be a stored procedure. But that's me, I know SQL much better than C#, always use Database First approach and have my database in an accompanying SSDT project.
If you use EF Code First and employ EF Migrations, adding a view might be a bit of a problem, but it should be possible. This question might be a good start.
I'm using Azure's SQL Database & MS SQL Server Management Studio and I wondering if its possible to create a self-referencing table that maintains itself.
I have three tables: Race, Runner, Names. The Race table includes the following columns:
Race_ID (PK)
Race_Date
Race_Distance
Number_of_Runners
The second table is Runner. Runner contains the following columns:
Runner_Id (PK)
Race_ID (Foreign Key)
Name_ID
Finish_Position
Prior_Race_ID
The Names Table includes the following columns:
Full Name
Name_ID
The column of interest is Prior_Race_ID in the Runner Table. I'd like to automatically populate this field via a Trigger or Stored Procedure, but I'm not sure if its possible to do so and how to go about it. The goal would be to be able to get all a runners races very quickly and easily by traversing the Prior_Race_ID field.
Can anyone point me to a good resource or references that explains if and how this is achievable. Also, if there is a preferred approach to achieving my objective please do share that.
Thanks for your input.
Okay, so we want, for each Competitor (better name than Names?), to find their two most recent races. You'd write a query like this:
SELECT
* --TODO - Specific columns
FROM
(SELECT
*, --TODO - Specific columns
ROW_NUMBER() OVER (PARTITION BY n.Name_ID ORDER BY r.Race_Date DESC) rn
FROM
Names n
inner join
Runners rs
on
n.Name_ID = rs.Name_ID
inner join
Races r
on
rs.Race_ID = r.Race_ID
) t
WHERE
t.rn in (1,2)
That should produce two rows per competitor. If needed, you can then PIVOT this data if you want a single row per competitor, but I'd usually leave that up to the presentation layer, rather than do it in SQL.
And so, no, I wouldn't even have a Prior_Race_ID column. As a general rule, don't store data that can be calculated - that just introduces opportunities for that data to be incorrect compared to the base data.
run the following sql(The distinct here is to avoid that a runner has more than one race at a same day):
update runner r1
set r1.prior_race_id =
(
select distinct race.race_id from runner, race where runner.race_id = race.race_id and runner.runner_id = r1.runner_id group by runner.runner_id having race.race_date = max(race.race_date)
)
I have several views and tables which I am trying to connect in one view to create a website GUI for several users (and myself) to use. It is basically an inventory system which is linking together purchase information which has been dumped into a SQL table + internal asset tag + user accessing. Tables are similar to:
Assets - Serial Number, ID
UserAudit - UserName, AssetsID, OfficeID, date/time
Office - location
Orders - Serial Number, detail
Several of the assets are Computers and the UserAudit is populated by a logon script which records the users name, computers name, and date/time. I am trying to create a view which links all of the information based upon Assets list regardless of if they related tables have matching data. For the UserAudit side I just want to display the most recent record (date field Desc).
The place I am running into an issue is grabbing just the top record from the UserAudit based upon the ComputerName while still returning all of the columns from the others. I tried creating a separate view for the UserAudit with showing 'Top 1' - this however limited the main view to only 1 result as well if I use an "inner Join" and doesn't display anything from the UserAUdit when using any OUTER join.
I did some research in which a Cross Apply looks like it might be relevant, however I have not used this before and attempts have not worked well. The view currently looks like:
SELECT TOP (100) PERCENT
dbo.AssetType.AssetType, dbo.AssetTagInventory.ID, dbo.AssetTagInventory.AssetDetail,
dbo.AssetTagInventory.Name, dbo.AssetTagInventory.Serial, dbo.AssetTagInventory.UserID,
dbo.AssetTagInventory.Age, dbo.AssetTagInventory.Notes,
dbo.vewOffices.Name AS OfficeName,
dbo.AssetTagPurchases.PurchaseDate, dbo.AssetTagPurchases.ProductDescription AS Model,
dbo.AssetTagPurchases.ID AS AECOrderNumber,
dbo.AssetTagPurchases.Vendor, dbo.AssetTagPurchases.QuotedPrice,
dbo.AssetTagPurchases.InvoicedPrice, dbo.AssetTagPurchases.InvoiceNum,
dbo.AssetTagPurchases.VendorOrderNumber, dbo.vewLogonAudit.Username,
dbo.vewLogonAudit.LoginTime
FROM
dbo.AssetTagInventory
INNER JOIN
dbo.vewLogonAudit ON dbo.AssetTagInventory.Name = dbo.vewLogonAudit.ComputerName
LEFT OUTER JOIN
dbo.AssetType ON dbo.AssetTagInventory.AssetTypeID = dbo.AssetType.ID
LEFT OUTER JOIN
dbo.vewOffices ON dbo.AssetTagInventory.OfficeID = dbo.vewOffices.ID
LEFT OUTER JOIN
dbo.AssetTagPurchases ON dbo.AssetTagInventory.Serial = dbo.AssetTagPurchases.Serial
Take a look at OUTER APPLY. It should be just what you need here.
It should work with something like (among/after the other joins):
OUTER APPLY
( SELECT TOP 1 <columns>
FROM dbo.UserAudit
WHERE dbo.AssetTagInventory.<??> = dbo.UserAudit.<??>
ORDER BY <...>
) T_UserAudit
EDIT
Try using a unique alias after the closing brackets:
OUTER APPLY(...) TLA
And then in the Select-Part, use the alias without dbo.
SELECT .., ... , TLA.UserName
Maybe that's the only little bit to get it going.
I am trying to convert a view from an Oracle RDBMS to SQL Server. The view looks like:
create or replace view user_part_v
as
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id)
Having read about recursive CTE's and also about their use in sub-queries, my best guess at translating the above into SQL Server syntax is:
create view user_part_v
as
with region_structure(region_id, parent_region_id) as (
select region_id
, parent_region_id
from region_relation
where parent_region_id = users.region_id
union all
select r.region_id
, r.parent_region_id
from region_relation r
join region_structure rs on rs.parent_region_id = r.region_id
)
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(select region_id from region_structure)
Obviously this gives me an error about the reference to users.region_id in the CTE definition.
How can I achieve the same result in SQL Server as I get from the Oracle view?
Background
I am working on the conversion of a system from running on an Oracle 11g RDMS to SQL Server 2008. This system is a relatively large Java EE based system, using JPA (Hibernate) to query from the database.
Many of the queries use the above mentioned view to restrict the results returned to those appropriate for the current user. If I cannot convert the view directly then the conversion will be much harder as I will need to change all of the places where we query the database to achieve the same result.
The tables referenced by this view have a structure similar to:
USERS
ID
REGION_ID
REGION
ID
NAME
REGION_RELATIONSHIP
PARENT_REGION_ID
REGION_ID
PART
ID
PARTNO
DESCRIPTION
PART_REGION
PART_ID
REGION_ID
So, we have regions, arranged into a hierarchy. A user may be assigned to a region. A part may be assigned to many regions. A user may only see the parts assigned to their region. The regions reference various geographic regions:
World
Europe
Germany
France
...
North America
Canada
USA
New York
...
If a part, #123, is assigned to the region USA, and the user is assigned to the region New York, then the user should be able to see that part.
UPDATE: I was able to work around the error by creating a separate view that contained the necessary data, and then have my main view join to this view. This has the system working, but I have not yet done thorough correctness or performance testing yet. I am still open to suggestions for better solutions.
I reformatted your original query to make it easier for me to read.
create or replace view user_part_v
as
select part_region.part_id, users.id as users_id
from part_region, users
where part_region.region_id in(
select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id
);
Let's examine what's going on in this query.
select part_region.part_id, users.id as users_id
from part_region, users
This is an old-style join where the tables are cartesian joined and then the results are reduced by the subsequent where clause(s).
where part_region.region_id in(
select region_id
from region_relation
start with region_id = users.region_id
connect by parent_region_id = prior region_id
);
The sub-query that's using the connect by statement is using the region_id from the users table in outer query to define the starting point for the recursion.
Then the in clause checks to see if the region_id for the part_region is found in the results of the recursive query.
This recursion follows the parent-child linkages given in the region_relation table.
So the combination of doing an in clause with a sub-query that references the parent and the old-style join means that you have to consider what the query is meant to accomplish and approach it from that direction (rather than just a tweaked re-arrangement of the old query) to be able to translate it into a single recursive CTE.
This query also will return multiple rows if the part is assigned to multiple regions along the same branch of the region heirarchy. e.g. if the part is assigned to both North America and USA a user assigned to New York will get two rows returned for their users_id with the same part_id number.
Given the Oracle view and the background you gave of what the view is supposed to do, I think what you're looking for is something more like this:
create view user_part_v
as
with user_regions(users_id, region_id, parent_region_id) as (
select u.users_id, u.region_id, rr.parent_region_id
from users u
left join region_relation rr on u.region_id = rr.region_id
union all
select ur.users_id, rr.region_id, rr.parent_region_id
from user_regions ur
inner join region_relation rr on ur.parent_region_id = rr.region_id
)
select pr.part_id, ur.users_id
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
Note that I've added the users_id to the output of the recursive CTE, and then just done a simple inner join of the part_region table and the CTE results.
Let me break down the query for you.
select u.users_id, u.region_id, rr.parent_region_id
from users u
left join region_relation rr on u.region_id = rr.region_id
This is the starting set for our recursion. We're taking the region_relation table and joining it against the users table, to get the starting point for the recursion for every user. That starting point being the region the user is assigned to along with the parent_region_id for that region. A left join is done here and the region_id is pulled from the user table in case the user is assigned to a top-most region (which means there won't be an entry in the region_relation table for that region).
select ur.users_id, rr.region_id, rr.parent_region_id
from user_regions ur
inner join region_relation rr on ur.parent_region_id = rr.region_id
This is the recursive part of the CTE. We take the existing results for each user, then add rows for each user for the parent regions of the existing set. This recursion happens until we run out of parents. (i.e. we hit rows that have no entries for their region_id in the region_relationship table.)
select pr.part_id, ur.users_id
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
This is the part where we grab our final result set. Assuming (as I do from your description) that each region has only one parent (which would mean that there's only one row in region_relationship for each region_id), a simple join will return all the users that should be able to view the part based on the part's region_id. This is because there is exactly one row returned per user for the user's assigned region, and one row per user for each parent region up to the heirarchy root.
NOTE:
Both the original query and this one do have a limitation that I want to make sure you are aware of. If the part is assigned to a region that is lower in the heirarchy than the user (i.e. a region that is a descendent of the user's region like the part being assigned to New York and the user to USA instead of the other way around), the user won't see that part. The part has to be assigned to either the user's assigned region, or one higher in the region heirarchy.
Another thing is that this query still exhibits the case I mentioned above about the original query, where if a part is assigned to multiple regions along the same branch of the heirarchy that multiple rows will be returned for the same combination of users_id and part_id. I did this because I wasn't sure if you wanted that behavior changed or not.
If this is actually an issue and you want to eliminate the duplicates, then you can replace the query below the CTE with this one:
select p.part_id, u.users_id
from part p
cross join users u
where exists (
select 1
from part_region pr
inner join user_regions ur on pr.region_id = ur.region_id;
where pr.part_id = p.part_id
and ur.users_id = u.users_id
);
This does a cartesian join between the part table and the users table and then only returns rows where the combination of the two has at least one row in the results of the subquery, which are the results that we are trying to de-duplicate.
Use a high level of redundant, denormalized data in my DB designs to improve performance. I'll often store data that would normally need to be joined or calculated. For example, if I have a User table and a Task table, I would store the Username and UserDisplayName redundantly in every Task record. Another example of this is storing aggregates, such as storing the TaskCount in the User table.
User
UserID
Username
UserDisplayName
TaskCount
Task
TaskID
TaskName
UserID
UserName
UserDisplayName
This is great for performance since the app has many more reads than insert, update or delete operations, and since some values like Username change rarely. However, the big draw back is that the integrity has to be enforced via application code or triggers. This can be very cumbersome with updates.
My question is can this be done automatically in SQL Server 2005/2010... maybe via a persisted/permanent View. Would anyone recommend another possibly solution or technology. I've heard document-based DBs such as CouchDB and MongoDB can handle denormalized data more effectively.
You might want to first try an Indexed View before moving to a NoSQL solution:
http://msdn.microsoft.com/en-us/library/ms187864.aspx
and:
http://msdn.microsoft.com/en-us/library/ms191432.aspx
Using an Indexed View would allow you to keep your base data in properly normalized tables and maintain data-integrity while giving you the denormalized "view" of that data. I would not recommend this for highly transactional tables, but you said it was heavier on reads than writes so you might want to see if this works for you.
Based on your two example tables, one option is:
1) Add a column to the User table defined as:
TaskCount INT NOT NULL DEFAULT (0)
2) Add a Trigger on the Task table defined as:
CREATE TRIGGER UpdateUserTaskCount
ON dbo.Task
AFTER INSERT, DELETE
AS
;WITH added AS
(
SELECT ins.UserID, COUNT(*) AS [NumTasks]
FROM INSERTED ins
GROUP BY ins.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount + added.NumTasks)
FROM dbo.[User] usr
INNER JOIN added
ON added.UserID = usr.UserID
;WITH removed AS
(
SELECT del.UserID, COUNT(*) AS [NumTasks]
FROM DELETED del
GROUP BY del.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount - removed.NumTasks)
FROM dbo.[User] usr
INNER JOIN removed
ON removed.UserID = usr.UserID
GO
3) Then do a View that has:
SELECT u.UserID,
u.Username,
u.UserDisplayName,
u.TaskCount,
t.TaskID,
t.TaskName
FROM User u
INNER JOIN Task t
ON t.UserID = u.UserID
And then follow the recommendations from the links above (WITH SCHEMABINDING, Unique Clustered Index, etc.) to make it "persisted". While it is inefficient to do an aggregation in a subquery in the SELECT as shown above, this specific case is intended to be denormalized in a situation that has higher reads than writes. So doing the Indexed View will keep the entire structure, including the aggregation, physically stored so each read will not recalculate it.
Now, if a LEFT JOIN is needed if some Users do not have any Tasks, then the Indexed View will not work due to the 5000 restrictions on creating them. In that case, you can create a real table (UserTask) that is your denormalized structure and have it populated via either a Trigger on just the User Table (assuming you do the Trigger I show above which updates the User Table based on changes in the Task table) or you can skip the TaskCount field in the User Table and just have Triggers on both tables to populate the UserTask table. In the end, this is basically what an Indexed View does just without you having to write the synchronization Trigger(s).