I support SQL Servers 2000 thru 2008R2 (all Standard Edition).
Question 1
Please consider the following hypothetical situation. Let's say you have a 5 table view, with the tables inner joined. Let's further assume that I am pulling 10 fields from each table, so I have a View with a total of 50 fields in it. If I selected only 2 fields from the view (one in each of two of the tables), would that query be slower than the equivalent table join script assuming you had all the same tables and joins?
Question 2
What if I created a generic view with a script like this:
Create View SomeView
AS
select * from SomeTable
Go
These tables have fields added to them fairly regularly. Each time I add a field to "SomeTable" the view would automatically work, correct? Is this appreciably slower
if I SomeTable had 50 fields and I selected two of them from the generic view? If I select two fields from the Select * view, will it only query for those fields?
Question 3
If I create a view like the following
Create View Blah
AS
select (some fields)
from TableA join TableB on TableA.Blah = TableB.Blah
Go
Then I use the view like this:
Select (some fields) from dbo.Blah where SomeDate >= '1/1/2008'
Will I lose any performance? Will the optimizer actually only pull view records that apply or will it pull all and then sub select the result set?
TIA.
Each time I add a field to "SomeTable" the view would automatically work, correct?
No, see here how to make sure that the view will have the underlying table changes by using sp_refreshview
Will the optimizer actually only pull view records that apply or will it pull all and then sub select the result set?
A view is nothing but a stored query, the optimizer is smart enough to do a range seek or scan on just the data it needs, you can verify this by looking at the execution plan
so I have a View with a total of 50 fields in it. If I selected only 2 fields from the view (one in each of two of the tables), would that query be slower than the equivalent table join script assuming you had all the same tables and joins?
again run stats io or time against the query with * and just 2 columns and look if you see a difference in reads and time
so for example
SET STATISTICS IO ON
GO
SELECT * FROM SomeView
SET STATISTICS IO OFF
GO
SET STATISTICS IO ON
GO
SELECT Col1, Col2 FROM SomeView
SET STATISTICS IO OFF
GO
Related
When creating ad-hoc queries to look for information in a table I have run into this issue over and over.
Let's say I have a table with a million records with fields id - int, createddatetime - timestamp, category - varchar(50) and content - varchar(max). I want to find all records in the last day that have a certain string in the content field. If I create a query like this...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
it may complete in a second because in the last day there may only be 100 records so the LIKE clause is only operating on a small number of records
However if I add one more item to the where clause...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
and category = 'testing'
then it could take many minutes to complete while locking up the table.
It appears to be changing from performing all the straight forward WHERE clause items first and then the LIKE on the limited set of records, over to having the LIKE clause first. There are even times where there are multiple LIKE statements and adding one more causes the query to go from a split second to minutes.
The only solutions I've found are to either generate an intermediate table (maybe temp tables would work), insert records based on the basic WHERE clause items, then run a separate query to filter by one or more LIKE statements. I've tried various JOIN and CTE approaches which usually have no improvement. Alternatively CHARINDEX also appears to work though difficult to use if trying to convert the logic of multiple LIKE statements.
Is there any hint or something that can be placed in the query statement to tell sql server to wait until records are filtered by the basic WHERE clause items before filtering by the LIKE?
I actually just tried this approach and it had the same issue...
select *
from (
select *, charindex('something', content) as found
from bounce
where createddatetime > '2018-1-31'
) t
where found > 0
while the subquery independently returns in a couple seconds, the overall query just never returns. Why is this so bad
Not fancy, but I've had better luck with temp tables than nested select statements... It will isolate the first data set, and then you can select just from that. If you're looking for quick and dirty, which usually serves my purposes for ad-hoc, this may help. If this is a permanent stored proc, the indexing suggestions may serve you better in the long run.
select *
into #like
from table
where createddatetime > '2018-1-31'
and content like '%something%'
select *
from #like
where category = 'testing'
For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.
I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.
We have a Oracle database table called StockDailyQuote which contains billions of rows. There are two indexes: on TradeDate (a Date field) and on StockTicker (a string field). This table is actually a partitioned table and on each TradeDate there is an individual table. So we never query against it using
SELECT * FROM StockDailyQuote
because it is impractical to get any result. Instead we always use a WHERE clause on TradeDate and the the speed is acceptable.
Now we have compiled a view out of StockDailyQuote table (joined some other tables to get useful information). Because we can't specify the TradeDate at this stage so our view looks like this (this is a simplified version):
create or replace view MyStockQuoteView as
SELECT t1.StockTicker, t2.CompanyName, t1.TradeDate, t1.ClosePrice
FROM StockDailyQuote t1 join CompanyInstruction t2 on t1.StockTicker =
t2.StockTicker
And I always try to query against this view with a WHERE clause on TradeDate column. I thought when oracle sees my query SELECT * FROM MyStockQuoteView where TradeDate = '20170726', it would be smart enough to add the indexed where clause to the sql and then send it to the engine so the query should be as equally fast as SELECT * from StockDailyQuote where TradeDate = '20170726', assuming my other joins do not take much time. But it doesn't behave like that: it takes really long time to return values. I can only assume it queries against the whole view and then uses the where on the returned value. This makes this query impractical to use. So how can I solve my problem? One solution is to make a procedure and run it daily and save one day's data to a table. But is there some other options?
It would be nice to see the query plan SELECT * FROM MyStockQuoteView where TradeDate = '20170726'.
Oracle normally understands what it needs to read to the partition inside the view.
You can try to explicitly specify the hint, for example:
SELECT /* + full(MyStockQuoteView.StockDailyQuote) */ * FROM MyStockQuoteView where TradeDate = '20170726'
I am very new to SQL and SQL server, would appreciate any help with the following problem.
I am trying to update a share price table with new prices.
The table has three columns: share code, date, price.
The share code + date = PK
As you can imagine, if you have thousands of share codes and 10 years' data for each, the table can get very big. So I have created a separate table called a share ID table, and use a share ID instead in the first table (I was reliably informed this would speed up the query, as searching by integer is faster than string).
So, to summarise, I have two tables as follows:
Table 1 = Share_code_ID (int), Date, Price
Table 2 = Share_code_ID (int), Share_name (string)
So let's say I want to update the table/s with today's price for share ZZZ. I need to:
Look for the Share_code_ID corresponding to 'ZZZ' in table 2
If it is found, update table 1 with the new price for that date, using the Share_code_ID I just found
If the Share_code_ID is not found, update both tables
Let's ignore for now how the Share_code_ID is generated for a new code, I'll worry about that later.
I'm trying to use a merge query loosely based on the following structure, but have no idea what I am doing:
MERGE INTO [Table 1]
USING (VALUES (1,23-May-2013,1000)) AS SOURCE (Share_code_ID,Date,Price)
{ SEEMS LIKE THERE SHOULD BE AN INNER JOIN HERE OR SOMETHING }
ON Table 2 = 'ZZZ'
WHEN MATCHED THEN UPDATE SET Table 1.Price = 1000
WHEN NOT MATCHED THEN INSERT { TO BOTH TABLES }
Any help would be appreciated.
http://msdn.microsoft.com/library/bb510625(v=sql.100).aspx
You use Table1 for target table and Table2 for source table
You want to do action, when given ID is not found in Table2 - in the source table
In the documentation, that you had read already, that corresponds to the clause
WHEN NOT MATCHED BY SOURCE ... THEN <merge_matched>
and the latter corresponds to
<merge_matched>::=
{ UPDATE SET <set_clause> | DELETE }
Ergo, you cannot insert into source-table there.
You could use triggers for auto-insertion, when you insert something in Table1, but that will not be able to insert proper Shared_Name - trigger just won't know it.
So you have two options i guess.
1) make T-SQL code block - look for Stored Procedures. I think there also is a construct to execute anonymous code block in MS SQ, like EXECUTE BLOCK command in Firebird SQL Server, but i don't know it for sure.
2) create updatable SQL VIEW, joining Table1 and Table2 to show last most current date, so that when you insert a row in this view the view's on-insert trigger would actually insert rows to both tables. And when you would update the data in the view, the on-update trigger would modify the data.
Use a high level of redundant, denormalized data in my DB designs to improve performance. I'll often store data that would normally need to be joined or calculated. For example, if I have a User table and a Task table, I would store the Username and UserDisplayName redundantly in every Task record. Another example of this is storing aggregates, such as storing the TaskCount in the User table.
User
UserID
Username
UserDisplayName
TaskCount
Task
TaskID
TaskName
UserID
UserName
UserDisplayName
This is great for performance since the app has many more reads than insert, update or delete operations, and since some values like Username change rarely. However, the big draw back is that the integrity has to be enforced via application code or triggers. This can be very cumbersome with updates.
My question is can this be done automatically in SQL Server 2005/2010... maybe via a persisted/permanent View. Would anyone recommend another possibly solution or technology. I've heard document-based DBs such as CouchDB and MongoDB can handle denormalized data more effectively.
You might want to first try an Indexed View before moving to a NoSQL solution:
http://msdn.microsoft.com/en-us/library/ms187864.aspx
and:
http://msdn.microsoft.com/en-us/library/ms191432.aspx
Using an Indexed View would allow you to keep your base data in properly normalized tables and maintain data-integrity while giving you the denormalized "view" of that data. I would not recommend this for highly transactional tables, but you said it was heavier on reads than writes so you might want to see if this works for you.
Based on your two example tables, one option is:
1) Add a column to the User table defined as:
TaskCount INT NOT NULL DEFAULT (0)
2) Add a Trigger on the Task table defined as:
CREATE TRIGGER UpdateUserTaskCount
ON dbo.Task
AFTER INSERT, DELETE
AS
;WITH added AS
(
SELECT ins.UserID, COUNT(*) AS [NumTasks]
FROM INSERTED ins
GROUP BY ins.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount + added.NumTasks)
FROM dbo.[User] usr
INNER JOIN added
ON added.UserID = usr.UserID
;WITH removed AS
(
SELECT del.UserID, COUNT(*) AS [NumTasks]
FROM DELETED del
GROUP BY del.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount - removed.NumTasks)
FROM dbo.[User] usr
INNER JOIN removed
ON removed.UserID = usr.UserID
GO
3) Then do a View that has:
SELECT u.UserID,
u.Username,
u.UserDisplayName,
u.TaskCount,
t.TaskID,
t.TaskName
FROM User u
INNER JOIN Task t
ON t.UserID = u.UserID
And then follow the recommendations from the links above (WITH SCHEMABINDING, Unique Clustered Index, etc.) to make it "persisted". While it is inefficient to do an aggregation in a subquery in the SELECT as shown above, this specific case is intended to be denormalized in a situation that has higher reads than writes. So doing the Indexed View will keep the entire structure, including the aggregation, physically stored so each read will not recalculate it.
Now, if a LEFT JOIN is needed if some Users do not have any Tasks, then the Indexed View will not work due to the 5000 restrictions on creating them. In that case, you can create a real table (UserTask) that is your denormalized structure and have it populated via either a Trigger on just the User Table (assuming you do the Trigger I show above which updates the User Table based on changes in the Task table) or you can skip the TaskCount field in the User Table and just have Triggers on both tables to populate the UserTask table. In the end, this is basically what an Indexed View does just without you having to write the synchronization Trigger(s).