Slow insert: select from view

Slow insert: select from view - sql-server

The following statement takes at least 4 seconds:
INSERT INTO [SomeSmallTable]
SELECT * FROM ComplexView
WHERE [Date] = convert(datetime, '23/09/2020',103)
However, if we only run the SELECT part without the INSERT INTO, it takes less than half a second:
SELECT *
FROM ComplexView
WHERE [Date] = convert(datetime, '23/09/2020',103)
The view selects less than 200 rows, and the table called "SomeSmallTable" holds only a few rows. I think this issue started when we updated the view called "ComplexView". ComplexView is based on other views (and some of these views are based on other views itself), as well as some tables.
I tried to refresh all views using sp_refreshview, but to no avail.
How can we determine the cause of this issue and hopefully solve it?
[EDIT]
My reply to some comments:
#Dale K: I can't post the execution plans, I think they way to complex, and not relevant as they are equal for both statements, with or without the INSERT part, except for the Table Insert part. But I did see that the INSERT costs 100%. For some reason SQL has trouble inserting the view results in the table.
#Panagiotis Kanavos: Nobody but me is using the database. It's a copy of our clients database and I'm working on it on my local machine.
#gotqn: SomeSmallTable is a table, so no table variable or temporary table. However, it is created when a user opens a specific form in our application, and deleted then the user closes this form.
#Arvo: SomeSmallTable has no keys and no triggers. The view returns less than 200 rows which are inserted in this table, and before these are inserted the table is empty.
I followed the steps in the accepted answer, and eventually compared the current "ComplexView" with the previous version, and found out what caused this issue.

Checking the execution plan is the first step, as others have said. Given that the INSERT (rather than the query) is causing the delay, you could troubleshoot that further. Here are some things you can try:
Try using Statistics IO to find out more, as answered here.
Attempt an INSERT using static data (e.g. INSERT INTO [SomeSmallTable] VALUES (1, 2, '...etc');). This will tell you if the issue is any INSERT statement, or when inserting from a view specifically.
Check how much data the view is returning. 4s may or may not be reasonable, depending on how many rows are being inserted.
Check the table design to see how it is using primary keys, foreign keys, composite keys, indexes, triggers, etc. Some of these features optimise a table's design for selecting, but make insertion slower as a trade-off. A good answer about this can be found here.
If you know it's not a load issue (because you're the only one using this database), check whether something else might be restricting resources on the machine you're using (other resource-intensive tasks, any other queries happening at the same time, scheduled jobs within SQL Server, etc.) You can use SQL Server Profiler to watch the queries in real time.
If slow performance is not limited to this particular query, then there are other general design considerations you can look into.

Related

How can I optimise Zumero sync queries

I am currently experiencing very long sync times on a zumero synced database (well over a minute), and following some profiling, the culprit appears to be a particular query that is taking 20+ seconds (suitably anonymised):
WITH relevant_rvs AS
(
SELECT rv.z_rv AS rv FROM zumero."mydb_089eb7ec0e2e4772ba0dde90170ee368_mysynceddb$z$rv$271340031" rv
WHERE (rv.txid<=913960)
AND NOT EXISTS (SELECT 1 FROM zumero."mydb_089eb7ec0e2e4772ba0dde90170ee368_mysynceddb$z$dd$271340031" dd WHERE dd.rv=rv.z_rv AND (dd.txid<=913960))
)
INSERT INTO #final_included_271340031_e021cfbe1c97213dd5adbacd667c08439fb8c6 (z_rv)
SELECT z$this.z_rv
FROM zumero."mydb_089eb7ec0e2e4772ba0dde90170ee368_mysynceddb$z$271340031" z$this
WHERE (z$this.z_rv IN (SELECT rv FROM relevant_rvs))
AND MyID = (MyID = XXX AND MyOtherField=XXX)
UNION SELECT z$this.z_rv
FROM zumero."mydb_089eb7ec0e2e4772ba0dde90170ee368_mysynceddb$z$old$271340031" z$this
WHERE (z$this.z_rv IN (SELECT rv FROM relevant_rvs))
AND (MyID = XXX AND MyOtherField=XXX)
I have taken the latter SELECT part of the query and ran it in isolation, which reproduces the same poor performance. Interestingly the execution plan recommends an index be applied, but I'm reluctant to go changing the schema of zumero generated tables, is adding indexes to these tables something that can be attempted safely and is it likely to help?
The source tables have 100,000ish records in them and the filter results in each client syncing 100-1000ish records, so not trivial data volumes but levels I would not expect to be causing major issues in terms of query performance.
Does anyone have any experience optimising Zumero sync performance server side? Do any indexes on source tables propagate to these tables? they don't appear to in this case.

Creating a custom index on the z$old table should be safe. I do hope it helps boost your query performance! (And it would be great to see a comment letting us know if it does or not.)
I believe the only issue such an index may cause would be that it could block certain schema changes on the host table. For example, if you tried to DROP the [MyOtherField] column from the host table, the Zumero triggers would attempt to drop the same column from the z$old table as well, and the transaction would fail with an error (which might be a bit surprising, since the index is not on the table being directly acted on).
Another thing to consider: It might also help to give this new index a name that will be recognized/helpful if it ever appears in an error message. Then (as always) feel free to contact support#zumero.com with any further questions or issues if they come up.

Add DATE column to store when last read

We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.

This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.

You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.

Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement

Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead

Faster SQL Performance

I have to insert one record per tables across 30 tables. The data coming from some other System. I have to insert data in the tables for the first time, then if any update happened, then I need to update tables in the SQL Server. I have two options:
a) I can check timestamp for individual table rows and update if the timestamp is greater.
b) Everytime I can stateway delete records and insert data.
Which one will be faster in SQL Server Database? Is there any other option to address the situatation?

If you are not changing the index fields of the record, the stategy of trying to update first and then insert is usually faster than drop/insert as you don't force the database into updating a bunch of index info.
If using Sql2008+ you should be using the merge command, as it explictly handles the update/insert condition cleanly and clearly
ADDED
I should also add that is you know the usage pattern in rarely update (i.e., 90% insert), you may have a case when drop/insert in faster than update/insert -- depends on lots of details. Regardless, merge is the clear winner if using 2008+

I generally like drop and re-insert. I find it to be cleaner and easier to code. However, if this is happening very frequently and you're worried about concurrency issues, you're probably better off with option 1.
Also, another thing to factor in is how often does the timestamp check fail (where you don't have to insert nor update). If 99% of data is redundant/outdated data, you're probably better off with option 1 regardless.

How to profile and address insert/update performance issues?

I am trying to insert thousands of rows into a table and performance is not acceptable. Rows on a particular table take 300ms per row to insert.
I know that tools exist to profile queries run against SQL Server (SQL Server Profile, Database Tuning Advisor), but how would I profile insert and update statements to determine slow running inserts? Am I forced to use perfmon while the queries run and deduce the issue with counters?

I would first check the query plan of a single insert to understand the costs associated to that operation - it is not known from the question whether the insert is selecting the data from elsewhere.
I would then check the table indexing for the following:
how many indexes are in place (apart from filtered indexes, each index will be inserted into as well)
whether a clustered index is present or are we inserting into a heap.
if the clustered index key means we will be getting a hotspot benefit on the end of the table or causing a large quantity of page splits.
This is all SQL schema based issues, assuming there is no problems within SQL, you can start checking disk IO counters to check for disk queue lengths and response time. Not forgetting the Log drive response time since each insert will be logged.
These kind of problems are very difficult to nail down as any 1 perscriptive thing / silver bullet you can give advice over, just a range of things you should be checking.

I'm betting that the problem is with the selects and not necessarily the updates. Have you tried profiling the select part of the update statement to make sure there isn't a problem there first?

What should be returned when inserting into SQL?

A few months back, I started using a CRUD script generator for SQL Server. The default insert statement that this generator produces, SELECTs the inserted row at the end of the stored procedure. It does the same for the UPDATE too.
The previous way (and the only other way I have seen online) is to just return the newly inserted Id back to the business object, and then have the business object update the Id of the record.
Having an extra SELECT is obviously an additional database call, and more data is being returned to the application. However, it allows additional flexibility within the stored procedure, and allows the application to reflect the actual data in the table.
The additional SELECT also increases the complexity when wanting to wrap the insert/update statements in a transaction.
I am wondering what people think is better way to do it, and I don't mean the implementation of either method. Just which is better, return just the Id, or return the whole row?

We always return the whole row on both an Insert and Update. We always want to make sure our client apps have a fresh copy of the row that was just inserted or updated. Since triggers and other processes might modify values in columns outside of the actual insert/update statement, and since the client usually needs the new primary key value (assuming it was auto generated), we've found it's best to return the whole row.

The select statement will have some sort of an advantage only if the data is generated in the procedure. Otherwise the data that you have inserted is generally available to you already so no point in selecting and returning again, IMHO. if its for the id then you can have it with SCOPE_IDENTITY(), that will return the last identity value created in the current session for the insert.

Based on my prior experience, my knee-jerk reaction is to just return the freshly generated identity value. Everything else the application is inserting, it already knows--names, dollars, whatever. But a few minutes reflection and reading the prior 6 (hmm, make that 5) replies, leads to a number of “it depends” situations:
At the most basic level, what you inserted is what you’d get – you pass in values, they get written to a row in the table, and you’re done.
Slightly more complex that that is when there are simple default values assigned during an insert statement. “DateCreated” columns that default to the current datetime, or “CreatedBy” that default to the current SQL login, are a prime example. I’d include identity columns here, since not every table will (or should) contain them. These values are generated by the database upon table insertion, so the calling application cannot know what they are. (It is not unknown for web server clocks to not be synchronized with database server clocks. Fun times…) If the application needs to know the values just generated, then yes, you’d need to pass those back.
And then there are are situations where additional processing is done within the database before data is inserted into the table. Such work might be done within stored procedures or triggers. Once again, if the application needs to know the results of such calculations, then the data would need to be returned.
With that said, it seems to me the main issue underlying your decision is: how much control/understanding do you have over the database? You say you are using a tool to automatically generate your CRUD procedures. Ok, that means that you do not have any elaborate processing going on within them, you’re just taking data and loading it on in. Next question: are there triggers (of any kind) present that might modify the data as it is being written to the tables? Extend that to: do you know whether or not such triggers exists? If they’re there and they matter, plan accordingly; if you do not or cannot know, then you might need to “follow up” on the insert to see if changes occurred. Lastly: does the application care? Does it need to be informed of the results of the insert action it just requested, and if so, how much does it need to know? (New identity value, date time it was added, whether or not something changed the Name from “Widget” to “Widget_201001270901”.)
If you have complete understanding and control over the system you are building, I would only put in as much as you need, as extra code that performs no useful function impacts performance and maintainability. On the flip side, if I were writing a tool to be used by others, I’d try to build something that did everything (so as to increase my market share). And if you are building code where you don't really know how and why it will be used (application purpose), or what it will in turn be working with (database design), then I guess you'd have to be paranoid and try to program for everything. (I strongly recommend not doing that. Pare down to do only what needs to be done.)

Quite often the database will have a property that gives you the ID of the last inserted item without having to do an additional select. For example, MS SQL Server has the ##Identity property (see here). You can pass this back to your application as an output parameter of your stored procedure and use it to update your data with the new ID. MySQL has something similar.

INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
With this clause, returning the whole row does not require an extra SELECT and performance-wise is the same as returning only the id.
"Which is better" totally depends on your application needs. If you need the whole row, return the whole row, if you need only the id, return only the id.
You may add an extra setting to your business object which can trigger this option and return the whole row only if the object needs it:
IF #return_whole_row = 1
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.*
VALUES ('value1', 'value2')
ELSE
INSERT
INTO mytable (col1, col2)
OUTPUT INSERTED.id
VALUES ('value1', 'value2')
FI

I don't think I would in general return an entire row, but it could be a useful technique.
If you are code-generating, you could generate two procs (one which calls the other, perhaps) or parametrize a single proc to determine whther to return it over the wire or not. I doubt the DB overhead is significant (single-row, got to have a PK lookup), but the data on the wire from DB to client could be significant when all added up and if it's just discarded in 99% of the cases, I see little value. Having an SP which returns different things with different parameters is a potential problem for clients, of course.
I can see where it would be useful if you have logic in triggers or calculated columns which are managed by the database, in which case, a SELECT is really the only way to get that data back without duplicating the logic in your client or the SP itself. Of course, the place to put any logic should be well thought out.
Putting ANY logic in the database is usually a carefully-thought-out tradeoff which starts with the minimally invasive and maximally useful things like constraints, unique constraints, referential integrity, etc and growing to the more invasive and marginally useful tools like triggers.
Typically, I like logic in the database when you have multi-modal access to the database itself, and you can't force people through your client assemblies, say. In this case, I would still try to force people through views or SPs which minimize the chance of errors, duplication, logic sync issues or misinterpretation of data, thereby providing as clean, consistent and coherent a perimeter as possible.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight