I have a simple database with 4 tables:
Customer (cusId)
Newspaper (papId)
SubCost (subId)
Subscription (cusId, papId, subId)
Newspaper has a column to track number of subscribers which is updated via a trigger on the Subscription table. It also has a column to track annual revenue which should be based on the number of subscribers and the cost associated with the subscription (subId).
I am looking for a trigger to track annual revenue. There are 3 subscription types (subId) with differing weekly costs and a paper can have more than one type of subscription so it can't just be (cost * 52 * numSubs).
Can you help me with this logic?
Your best bet is not using such a column at all. Instead use a view which computes the result, and index it if necessary
CREATE OR ALTER VIEW vTotalSubs
WITH SCHEMABINDING AS
SELECT
n.papid,
TotalRevenue = SUM(sc.Cost * 52),
TotalSubscriptions = COUNT_BIG(*) -- you MUST have this column here if aggregating with an index
FROM dbo.Newspaper n
JOIN dbo.Subscription s ON s.papid = n.papid
JOIN dbo.SubCost sc ON sc.subid = s.subid
GROUP BY
n.papid;
GO
CREATE UNIQUE CLUSTERED INDEX CX_vTotalSubs ON vTotalSubs (papid);
If you decide to index the view, be aware there are many restrictions to indexed views, in particular:
Only INNER JOIN is allowed, no other join types, no subqueries
Must schema-bind, and specify schema on all tables.
If aggregating, you must have COUNT_BIG(*), and the only other aggregation allowed is SUM
Make sure to add the WITH (NOEXPAND) hint when querying, otherwise there may be performance impacts
The server will automatically maintain the index, you do not need to update it.
Related
I working on groups project. I have those tables :
I can get the number of members for each group by using count function :
SELECT COUNT(1) AS Counts FROM [Groups].[GroupMembers]
WHERE GroupId=Id;
Or I can add another column to Groups table for counting and every time new member join to the group, this field will increase by one. Does it better to use count function or add another column for counting ? in other words, what are the advantages and disadvantages of each method ?
Creating a column to store the count's is not recommend at all.
When you want the count of each group you can use a simple Select query to show the count of each group.
SELECT G.groupid,
Count(userid)
FROM groups G
LEFT OUTER JOIN groupmembers GM
ON G.groupid = GM.groupid
GROUP BY G.groupid
In case you want to add a new column then you will require a Trigger on GroupMembers table to update the count column in Groups table when a new user is added to any group in GroupMembers table
It depends on your table engine. If your table engine is MyISAM it would be much faster because it would simply read number of rows in the table from stored value, however Innodb engines will need to do a full table scan.
It is not recommended to store a count inside of the table itself, so if this is something you're worried about, use the MyISAM engine if possible.
Storing a value in the table would needlessly require an extra UPDATE query on each new/lost membership.
I want to join against a huge partitioned table. The planner probably assumes that the partitioned table is very cheap to scan.
I have the following query:
select *
from (
select users where age < 18 limit 10
) as users
join
clicks on users.id = clicks.userid
where
clicks.ts between '2015-01-01' and now();
The table clicks is the master table with roughly 40 child tables containing together about 40 million records.
This query performs very slow. When I look at the planner postgres first performs a complete scan of the clicks table and then scans the user table.
However when I limit the users subquery to 1 the planner first scans the users and then the clicks.
It seems as if the planner assumes that the clicks table is very lightweight. If I look at the stats in pg_class the master table clicks has 0 tuples. Which is true on the one hand because it is a master table, but on the other hand, for the planner it should contain the sum of all its child tables.
How can I force the planner to use the cheapest option first?
edit: in simplifying the query I indeed missed out an additional constraint on the date.
The partitioning constraints are on: clicks.ts and clicks.userid
I have indexes on users.age, user.id, clicks.userid and clicks.ts
Maybe I have to trust the planner. I am just a little insecure because I once had a case where postgres showed some weird behavior with limits (PostgreSQL query very slow with limit 1).
I'd like to effectively add a calculated column, which sums a column from selected rows in another table. I need to to quickly retrieve and search for values in the calculated column without re-computing the sum.
The calculated column I'd like to add would look like this in Dream-SQL:
ALTER TABLE Invoices ADD Balance
AS SUM(Transactions.Amount) WHERE Transactions.InvoiceId = Invoices.Id
Of course, this doesn't work. My understanding is that you can't add a calculated column that references another table. However, it appears that an indexed view can contain such a column.
The project is based on Entity Framework Code First. The application needs to quickly find non-zero balances.
Assuming an indexed view is the way to go, what is the best approach to integrating it with the Invoices and Transactions tables to make it easy use with LINQ to Entities? Should the indexed view contain all the columns in the Invoices table or just the Balance (what gets persisted)? A code snippet of the SQL to create the recommended view and index would be helpful.
An indexed view won't work because it would only index expressions in the GROUP BY clause, which means it can't index the sum. A computed column won't work because the sum can't be persisted or indexed.
A trigger works, however:
CREATE TRIGGER UpdateInvoiceBalance ON Transactions AFTER INSERT, UPDATE AS
IF UPDATE(Amount) BEGIN
SET NOCOUNT ON;
WITH InvoiceBalances AS (
SELECT Transactions.InvoiceId, SUM(Transactions.Amount) AS Balance
FROM Transactions
JOIN inserted ON Transactions.InvoiceId = inserted.InvoiceId
GROUP BY Transactions.InvoiceId)
UPDATE Invoices
SET Balance = InvoiceBalances.Balance
FROM InvoiceBalances
WHERE Invoices.Id = InvoiceBalances.InvoiceId
END
It also helps to provide a default value of 0 for the Balance column since when you mark it as DatabaseGeneratedOption.Computed, EF won't provide any value for it when adding an Invoice row.
I've been reading documentation and looking at FAQs and haven't found an answer for this one which probably means it can't be done. My actual situation is a little more complex, but I'll try to simplify it for this question. For each of the past years, I have a header/detail tables with a foreign key linking them. The year datum is in the header records! I want to be able to query all tables concatenated across years.
I have set up views that follows a 'SELECT + UNION ALL' format. I've also put check constraints on the header tables to restrict their values to their respective year. This allows the SQL server query optimizer to only query specific tables when running a query that is restricted with a WHERE clause. Awesome. Up to this point, this information can be found anywhere and everywhere by searching for Partitioned Views.
I want to do the same sort of query optimization with the detail tables but can't figure it out. There is nothing in the detail record that indicates what year it belongs to without joining with the header record; Meaning, the foreign key constraint is the only constraint I have to go off of.
The only solution I've thought of is adding a 'year' column to the detail tables and then adding another where sub clause to the queries. Is there any thing I can do to create a partitioned view of the detail tables using the existing foreign key constraint?
Here is some DDL for reference:
CREATE TABLE header2008 (
hid INT PRIMARY KEY,
dt DATE CHECK ('2008-01-01' <= dt AND dt < '2009-01-01')
)
CREATE TABLE header2009 (
hid INT PRIMARY KEY,
dt DATE CHECK ('2009-01-01' <= dt AND dt < '2010-01-01')
)
CREATE TABLE detail2008 (
did INT PRIMARY KEY,
hid INT FOREIGN KEY REFERENCES header2008(hid),
value INT
)
CREATE TABLE detail2009 (
did INT PRIMARY KEY,
hid INT FOREIGN KEY REFERENCES header2009(hid),
value INT
)
GO
CREATE VIEW headerAll AS
SELECT * FROM header2008 UNION ALL
SELECT * FROM header2009
GO
CREATE VIEW detailAll AS
SELECT * FROM detail2008 UNION ALL
SELECT * FROM detail2009
GO
--This only hits the header2008 table (GOOD)
SELECT *
FROM headerAll h
WHERE dt = '2008-04-04'
--This hits the header2008, detail2008, and detail 2009 tables. (BAD)
SELECT *
FROM headerAll h
INNER JOIN detailAll d ON h.hid = d.hid
WHERE dt = '2008-04-04'
Since you're not going for partitioned tables, I'm assuming you can't target 2005+ Enterprise Edition or higher.
Here is an alternative to adding a new physical column to your tables:
CREATE VIEW detailAll AS
SELECT 2008 AS Year, * FROM detail2008
UNION ALL
SELECT 2009, * FROM detail2009
then,
SELECT *
FROM headerAll h
INNER JOIN detailAll d ON h.hid = d.hid
WHERE dt = '2008-04-04' AND d.Year = 2008
Before you run off and implement this, there is a catch; well, two catches actually.
This solution, like the headerAll view as it's written, cannot accommodate parameters on the partitioning column and still do partition elimination. Using a search predicate of WHERE dt = #date AND d.Year = YEAR(#date) causes table scans across all tables in both views because the query optimizer assumes #date is an arbitrary value (and there's no way to fix that). This is a recipe for a performance disaster if the view is exposed publicly in your database API: there is no restriction on parameterization in queries, and most query authors and ORMs tend to use parameterized queries wherever possible (it's almost always a good thing!).
To get the views to do partition elimination in a real application, you will have to resort to dynamic string execution. How you accomplish this will depend on your business requirements, data requirements, and application architecture. It will be a bit trickier if you're grabbing data from multiple years.
Note also that using dynamic string execution would allow you to write queries directly against the base tables instead of introducing a UNIONed view for each "table". I don't think there's anything wrong with the latter, but this is an option you may not have considered.
Use a high level of redundant, denormalized data in my DB designs to improve performance. I'll often store data that would normally need to be joined or calculated. For example, if I have a User table and a Task table, I would store the Username and UserDisplayName redundantly in every Task record. Another example of this is storing aggregates, such as storing the TaskCount in the User table.
User
UserID
Username
UserDisplayName
TaskCount
Task
TaskID
TaskName
UserID
UserName
UserDisplayName
This is great for performance since the app has many more reads than insert, update or delete operations, and since some values like Username change rarely. However, the big draw back is that the integrity has to be enforced via application code or triggers. This can be very cumbersome with updates.
My question is can this be done automatically in SQL Server 2005/2010... maybe via a persisted/permanent View. Would anyone recommend another possibly solution or technology. I've heard document-based DBs such as CouchDB and MongoDB can handle denormalized data more effectively.
You might want to first try an Indexed View before moving to a NoSQL solution:
http://msdn.microsoft.com/en-us/library/ms187864.aspx
and:
http://msdn.microsoft.com/en-us/library/ms191432.aspx
Using an Indexed View would allow you to keep your base data in properly normalized tables and maintain data-integrity while giving you the denormalized "view" of that data. I would not recommend this for highly transactional tables, but you said it was heavier on reads than writes so you might want to see if this works for you.
Based on your two example tables, one option is:
1) Add a column to the User table defined as:
TaskCount INT NOT NULL DEFAULT (0)
2) Add a Trigger on the Task table defined as:
CREATE TRIGGER UpdateUserTaskCount
ON dbo.Task
AFTER INSERT, DELETE
AS
;WITH added AS
(
SELECT ins.UserID, COUNT(*) AS [NumTasks]
FROM INSERTED ins
GROUP BY ins.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount + added.NumTasks)
FROM dbo.[User] usr
INNER JOIN added
ON added.UserID = usr.UserID
;WITH removed AS
(
SELECT del.UserID, COUNT(*) AS [NumTasks]
FROM DELETED del
GROUP BY del.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount - removed.NumTasks)
FROM dbo.[User] usr
INNER JOIN removed
ON removed.UserID = usr.UserID
GO
3) Then do a View that has:
SELECT u.UserID,
u.Username,
u.UserDisplayName,
u.TaskCount,
t.TaskID,
t.TaskName
FROM User u
INNER JOIN Task t
ON t.UserID = u.UserID
And then follow the recommendations from the links above (WITH SCHEMABINDING, Unique Clustered Index, etc.) to make it "persisted". While it is inefficient to do an aggregation in a subquery in the SELECT as shown above, this specific case is intended to be denormalized in a situation that has higher reads than writes. So doing the Indexed View will keep the entire structure, including the aggregation, physically stored so each read will not recalculate it.
Now, if a LEFT JOIN is needed if some Users do not have any Tasks, then the Indexed View will not work due to the 5000 restrictions on creating them. In that case, you can create a real table (UserTask) that is your denormalized structure and have it populated via either a Trigger on just the User Table (assuming you do the Trigger I show above which updates the User Table based on changes in the Task table) or you can skip the TaskCount field in the User Table and just have Triggers on both tables to populate the UserTask table. In the end, this is basically what an Indexed View does just without you having to write the synchronization Trigger(s).