How to handle performance of parallel CRUD in single table? - sql-server

I have a table named "Daily Result" that contain calculated information of all users in my application.this info are generated after running a calc method for each user.this method use data of multi relative table.
in my app after some actions I need to reCalculate info of "DailyResult" table for one or all users.
in this situation I will create threads to do calc.but my problem is when call calc method for multi user the threads will wait for resource table ("DailyResult") to release.and that method take time.
we use MVC 3, Sql server Enterprise 2012 64 bit, Dapper ORM for insert-delete-update info.
we use nhibrenate already but have to replace it with dapper.it is better but for parallel user over 2000 it take times to insert-delete-update info in Dailyresult table.
what is best solution for handle this issue,to ham maximum performance.what is your suggestion?

I presume you what to access the table during the "reCalculate" phase? So - if you can edit the query which you use to obtain data from the table in your app, add "set transaction isolation level read uncommitted;" at the beginning

Related

Having trouble with interface table structures in SQL Server

I'm currently working on a project that involves a third party database and application. So far we are able to successfully TEST and interface data between our databases. However we are having trouble when we are extracting a large set of data (ex 100000 rows and 10 columns per row) and suddenly it stopped at the middle of transaction for whatever reason(ex blackouts, force exit or etc..), missing or duplication of data is happening in this type of scenario.
Can you please give us a suggestions to handle these types of scenarios? Thank you!
Here's our current interface structure
OurDB -> Interface DB -> 3rdParty DB
OurDB: we are extracting records from OurDB (with bit column as false) to the InterfaceDb
InterfaceDB: after inserting records from OurDB, we will update OurDB bit column as true
3rdPartyDB: they will extract and delete all records from InterfaceDB (they assume that all records is for extraction)
Well, you defintitely need a ETL tool then and preferably SSIS. First it will drastically improve your transfer rates while also providing robust error handling. Additionally you will have to use lookup transforms to ensure duplicates do not enter the sytsem. I would suggest go for Cache Connection Manager in order to perform the look-ups.
In terms of design, if your source system (OurDB) is having a primary key say recId, then have a column say source_rec_id in your InterfaceDB table. Say your first run has transferred 100 rows. Now in your second run, you would then need to pick 100+1th record and move on to the next rows. This way you will have a tracking mechanism and one-to-one correlation between source system and destination system to understand how many records have got transferred, how many are left etc.
For best understanding of SSIS go to Channel 9 - msdn - SSIS. Very helpful resource.

Add DATE column to store when last read

We want to know what rows in a certain table is used frequently, and which are never used. We could add an extra column for this, but then we'd get an UPDATE for every SELECT, which sounds expensive? (The table contains 80k+ rows, some of which are used very often.)
Is there a better and perhaps faster way to do this? We're using some old version of Microsoft's SQL Server.
This kind of logging/tracking is the classical application server's task. If you want to realize your own architecture (there tracking architecture) do it on your own layer.
And in any case you will need application server there. You are not going to update tracking field it in the same transaction with select, isn't it? what about rollbacks? so you have some manager who first run select than write track information. And what is the point to save tracking information together with entity info sending it back to DB? Save it into application server file.
You could either update the column in the table as you suggested, but if it was me I'd log the event to another table, i.e. id of the record, datetime, userid (maybe ip address etc, browser version etc), just about anything else I could capture and that was even possibly relevant. (For example, 6 months from now your manager decides not only does s/he want to know which records were used the most, s/he wants to know which users are using the most records, or what time of day that usage pattern is etc).
This type of information can be useful for things you've never even thought of down the road, and if it starts to grow large you can always roll-up and prune the table to a smaller one if performance becomes an issue. When possible, I log everything I can. You may never use some of this information, but you'll never wish you didn't have it available down the road and will be impossible to re-create historically.
In terms of making sure the application doesn't slow down, you may want to 'select' the data from within a stored procedure, that also issues the logging command, so that the client is not doing two roundtrips (one for the select, one for the update/insert).
Alternatively, if this is a web application, you could use an async ajax call to issue the logging action which wouldn't slow down the users experience at all.
Adding new column to track SELECT is not a practice, because it may affect database performance, and the database performance is one of major critical issue as per Database Server Administration.
So here you can use one very good feature of database called Auditing, this is very easy and put less stress on Database.
Find more info: Here or From Here
Or Search for Database Auditing For Select Statement
Use another table as a key/value pair with two columns(e.g. id_selected, times) for storing the ids of the records you select in your standard table, and increment the times value by 1 every time the records are selected.
To do this you'd have to do a mass insert/update of the selected ids from your select query in the counting table. E.g. as a quick example:
SELECT id, stuff1, stuff2 FROM myTable WHERE stuff1='somevalue';
INSERT INTO countTable(id_selected, times)
SELECT id, 1 FROM myTable mt WHERE mt.stuff1='somevalue' # or just build a list of ids as values from your last result
ON DUPLICATE KEY
UPDATE times=times+1
The ON DUPLICATE KEY is right from the top of my head in MySQL. For conditionally inserting or updating in MSSQL you would need to use MERGE instead

Incremental table names

I'm currently trying to find some information with a query that will automatically update on a website every 24 hours - this involves a number of columns in a daily backup table that I need to pull information from.
The problem I have is the query has to specifically state the database table, which makes the query rather static - and I need something a bit more dynamic.
I have a daily backup table with the naming system as follows:
daily_backup_130328
daily_backup_130329
daily_backup_130330
daily_backup_130331
daily_backup_130401
daily_backup_130402
So when I state my FROM table, I name one of these - usually the latest one available (so "daily_backup_0402" from the example list). Currently the only way I can get this to update is to manually go in and update the query every day before the scheduled run.
My question is: is there a way that I can get it to select the latest "daily_backup_??????" table automatically?
Edit: I'm on about bog standard queries like "SELECT * FROM daily_backup_130402" ORDER BY CheeseType ASC;
A generic answer: you could implement a daily_backup SP.
Then, for a specified server/driver, there should be specific catalog and dynamic query facilities.
There are chances that the SW that fills daily_backup_NNNNNN could also update the stored procedure, then relieving you at all from the manual burden. Again, that could be done dynamically, depending on server details (triggers on metadata).

Best performance approach to history mechanism?

We are going to create History Mechanism for our changes in DB (DART in pic) via Triggers.
we have 600 tables.
Each record that will be changed - the trigger will insert the deleted one into XXX.
regarding to the XXX :
option 1 : clone each table in "Dart" DB and each table now will have a "sister table"
e.g. :
Table1 will have Table1_History
problems :
we will have 1200 tables
programmer can do mistakes by working on wrong tables...
option 2 : make a new DB (DART_2005 in pic) and the history tables will be there
option 3 : use linked server which stores the Db which will contain the history tables.
question :
1) which option gives the best performance ( I guess 3 is not - but is it 1 or 2 or same ?)
2) Does option 2 is acting like "linked server" ( in queries we will need to select from both DB's...)
3) What is the best practice approach ?
All three approaches are viable and have similar performance based on your network speed, but each one will cause you a lot of headaches on a system with many concurrent users.
Since your will be inserting/updating multiple tables in one transaction with a very different access pattern (source table is random, history table is sequential) you will end up with blocking and or deadlocks.
If the existing table schema can not be changed
If you want to have a history system in place driven by your database ideally you will queue your history updates to prevent blocking problems.
Fire a trigger on update of your table
The trigger will submit a message containing the information from the inserted/deleted tables to a SQL Server Service Broker Queue
An activation stored procedure can pull the information from the queue and write it to the appropriate history table
On failure, a new message is sent to an "error queue" where a retry mechanism can re-submit to the original queue (make sure to include a retry counter in the message)
This way your history updates will be non-blocking and can not get lost.
Note: when working with SQL Server Service broker, make sure you completely understand the "Poison message" concept.
If the existing table schema can be changed
When this is an option, I recommend working with a "Record versioning" system where every update will create a new record & your application will correctly query the most recent version of the data. To ensure proper performance the table can be partitioned to the keep the most recent version of the data in a partition and the older versions in an archive partition. (I usually have a field end_date or expiration_date which is set to 9999/12/31 for the currently valid record.)
This approach of course requires considerable code changes in your data model and the existing application which might be not very cost effective.
1 and 2 will have similar performance; option 3 might be faster, if you are currently limited by some resource on the database server (e.g. disk IO), and you have a very fast network available to you.
Option 1 will lead to longer back-up times for your DART database - this may be a concern.
In general, I believe that if your application domain needs the concept of "history", you should build it in as a first-class feature. There are several approaches to this - check out the links in question How to create a point in time architecture in MySQL.
Again, in general, I dislike the use of triggers for this kind of requirement. Your trigger either has to be very simple - in which case it's not always easy to use the data it creates in your history table - or it has to be smart, in which case your trigger does a lot of work, which may make evolving your database schema harder in future.

Creating an index on a view with OpenQuery

SQL Server doesn't allow creating an view with schema binding where the view query uses OpenQuery as shown below.
Is there a way or a work-around to create an index on such a view?
The best you could do would be to schedule a periodic export of the AD data you are interested in to a table.
The table could of course then have all the indexes you like. If you ran the export every 10 minutes and the possibility of getting data that is 9 minutes and 59 seconds out of date is not a problem, then your queries will be lightning fast.
The only part of concern would be managing locking and concurrency during the export time. One strategy might be to export the data into a new table and then through renames swap it into place. Another might be to use SYNONYMs (SQL 2005 and up) to do something similar where you just point the SYNONYM to two alternating tables.
The data that supplies the query you're performing comes from a completely different system outside of SQL Server. There's no way that SQL Server can create an indexed view on data it does not own. For starters, how would it be notified when something had been changed so it could update its indexes? There would have to be some notification and update mechanism, which is implausible because SQL Server could not reasonably maintain ACID for such a distributed, slow, non-SQL server transaction to an outside system.
Thus my suggestion for mimicking such a thing through your own scheduled jobs that refresh the data every X minutes.
--Responding to your comment--
You can't tell whether a new user has been added without querying. If Active Directory supports some API that generates events, I've never heard of it.
But, each time you query, you could store the greatest creation time of all the users in a table, then through dynamic SQL, query only for new users with a creation date after that. This query should theoretically be very fast as it would pull very little data across the wire. You would just have to look into what the exact AD field would be for the creation date of the user and the syntax for conditions on that field.
If managing the dynamic SQL was too tough, a very simple vbscript, VB, or .Net application could also query active directory for you on a schedule and update the database.
Here are the basics for Indexed views and thier requirements. Note what you are trying to do would probably fall in the category of a Derived Table, therefore it is not possible to create an indexed view using "OpenQuery"
This list is from http://www.sqlteam.com/article/indexed-views-in-sql-server-2000
1.View definition must always return the same results from the same underlying data.
2.Views cannot use non-deterministic functions.
3.The first index on a View must be a clustered, UNIQUE index.
4.If you use Group By, you must include the new COUNT_BIG(*) in the select list.
5.View definition cannot contain the following
a.TOP
b.Text, ntext or image columns
c.DISTINCT
d.MIN, MAX, COUNT, STDEV, VARIANCE, AVG
e.SUM on a nullable expression
f.A derived table
g.Rowset function
h.Another view
i.UNION
j.Subqueries, outer joins, self joins
k.Full-text predicates like CONTAIN or FREETEXT
l.COMPUTE or COMPUTE BY
m.Cannot include order by in view definition
In this case, there is no way for SQL Server to know of any changes (data, schema, whatever) in the remote data source. For a local table, it can use SCHEMABINDING etc to ensure the underlying tables(s) stay the same and it can track datachanges.
If you need to query the view often, then I'd use a local table that is refreshed periodically. In fact, I'd use a table anyway. AD queries are't the quickest at the best of times...

Resources