I am working on project where i am getting logs or data feed from outside in CSV and DAT file we have SSIS Package configured.i just wanted to create Trigger on table to reconcile table count with File count if i create triggers is it lead to performance issue?
AVG CSV/DAT FILE COUNT IS 2.5 M Records
Yes, triggers will reduce your system performance by locking the tables for longer time. It is always better to look for other alternate options like CDC or handling the it manually thru' stored procedures or some other way.
Related
Am writing some processes to pre-format certain data for another downstream process to consume. The pre-formatting essentially involves gathering data from several permanent tables in one DB, applying some logic, and saving the results into another DB.
The problem i am running into is the volume of data. the resulting data set that i need to commit has about 132.5million rows. The commit itself takes almost 2 hours. I can cut that by changing the logging to simple, but it's still quite substantial (seeing as the generating of the 132.5 million rows into a temp table only takes 9 mins).
I have been reading on best methods to migrate large data, but most of the solutions implicitly assumes that the source data already resides in a single file/data table (which is not the case here). Some solutions like using SSMS task option makes it difficult to embed some of the logic applications that i need.
Am wondering if anyone here has some solutions.
Assuming you're on SQL Server 2014 or later the temp table is not flushed to disk immediately. So the difference is probably just disk speed.
Try making the target table a Clustered Columnstore to optimize for compression and minimize IO.
I have an SSIS package which when runs, updates a table. It is using a staging table and subsequently, uses slowly changing dimension table to load data into the warehouse. We have set it up as a SQL Agent job and it runs every two hours.
The isolation level of the package is serializable. The database isolation level is read committed.
The issue is that when this job runs, this job blocks that table and therefore, clients cannot run any reports. It blanks it out.
So what would be the best option for me to avoid it? clients need to see that data, meanwhile, we need to update the table every two hours.
Using Microsoft SQL Server 2012 (SP3-GDR) (KB4019092) - 11.0.6251.0 (X64)
Thanks.
You're getting "lock escalation". It's a feature, not a bug. 8-)
SQL Server combines large numbers of smaller locks into a table lock to improve performance.
If INSERT performance isn't an issue, you can do your data load in smaller chunks inside of transactions and commit after each chunk.
https://support.microsoft.com/en-us/help/323630/how-to-resolve-blocking-problems-that-are-caused-by-lock-escalation-in
Another option is to give your clients/reports access to a clone of your warehouse table.
Do your ETL into a table that no one else can read from, and when it is finished, switch the table with the clone.
I have an INSERT statement that is eating a hell of a lot of log space, so much so that the hard drive is actually filling up before the statement completes.
The thing is, I really don't need this to be logged as it is only an intermediate data upload step.
For argument's sake, let's say I have:
Table A: Initial upload table (populated using bcp, so no logging problems)
Table B: Populated using INSERT INTO B from A
Is there a way that I can copy between A and B without anything being written to the log?
P.S. I'm using SQL Server 2008 with simple recovery model.
From Louis Davidson, Microsoft MVP:
There is no way to insert without
logging at all. SELECT INTO is the
best way to minimize logging in T-SQL,
using SSIS you can do the same sort of
light logging using Bulk Insert.
From your requirements, I would
probably use SSIS, drop all
constraints, especially unique and
primary key ones, load the data in,
add the constraints back. I load
about 100GB in just over an hour like
this, with fairly minimal overhead. I
am using BULK LOGGED recovery model,
which just logs the existence of new
extents during the logging, and then
you can remove them later.
The key is to start with barebones
tables, and it just screams. Building
the index once leaves you will no
indexes to maintain, just the one
index build per index.
If you don't want to use SSIS, the point still applies to drop all of your constraints and use the BULK LOGGED recovery model. This greatly reduces the logging done on INSERT INTO statements and thus should solve your issue.
http://msdn.microsoft.com/en-us/library/ms191244.aspx
Upload the data into tempdb instead of your database, and do all the intermediate transformations in tempdb. Then copy only the final data into the destination database. Use batches to minimize individual transaction size. If you still have problems, look into deploying trace flag 610, see The Data Loading Performance Guide and Prerequisites for Minimal Logging in Bulk Import:
Trace Flag 610
SQL Server 2008 introduces trace flag
610, which controls minimally logged
inserts into indexed tables.
We have a number of files generated from a test, with each file having almost 60,000 lines of data. Requirement is to calculate number of parameters with the help of data present in these files. There could be two ways of processing the data :
Each file is read line-by-line and processed to obtain required parameters
The file data is bulk copied into the database tables and required parameters are calculated with the help of aggregate functions in the stored procedure.
I was trying to figure out the overheads related to both the methods. As a database is meant to handle such situations, I am concerned with overheads which may be a problem when database grows larger.
Will it affect the retrieval rate from the tables, consequently making the calculations slower? Thus will file processing be a better solution taking into account the database size? Should database partitioning solve the problem for large database?
Did you consider using map-reduce (say under Hadoop maybe with HBase) to perform these tasks? If you're looking for high-throughput with big data volumes this is a very scaleable approach. Of course, not every problem can be addressed effectively using this paradigm and I don't know the details of your calculation.
If you set up indexes correctly you won't suffer performance issues. Additionally, there is nothing stopping you loading the files into a table and running the calculations and then moving the data into an archive table or deleting it altogether.
you can run a querty directly agianst the text file from SQL
SELECT * FROM OPENROWSET('MSDASQL',
'Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=C:\;',
'SELECT * FROM [text.txt];')
The distributed queries needs to be enabled to run this.
Or as you mentioned you can load the data data to a table (using SSIS, BCP, the query above ..). You did not mentioned what does it mean that the database will be larger. 60k of lines for a table is not so much (meaning that it will perform well).
I have a SP that has been worked on my 2 people now that still takes 2 minutes or more to run. Is there a way to have these pre run and stored in cache or somewhere else so when my client needs to look at this data in a web browser he doesn't want to hang himself or me?
I am no where near a DBA so I am kind of at the mercy of who I hire to figure this out for me, so having a little knowledge up front would really help me out.
If it truly takes that long to run, you could schedule the process to run using SQL Agent, and have the output go to a table, then change the web application to read the table rather than execute the stored procedure. You'd have to decide how often to run the refresh, and deal with the requests that occur while it is being refreshed, but that can be dealt with as well by having two output files, one live and one for the latest refresh.
But I would take another look at the procedure, look at the execution plan and see where it is slow, make sure it is not doing full table scans.
Preferred solutions in this order:
Analyze the query and optimize accordingly
Cache it in the application (you can use httpRuntime.Cache (even if not asp.net application)
Cache SPROC results in a table in the DB and then add triggers to invalidate the cache (delete the table) so a a call to the SPROC would first look to see if there is any data in the cache table. If none, run SPROC and store the result in the cache table, if so, return the data from that table. The triggers on the "source" tables for the SPROC would just delete * from CacheTable to "clear the cache" (depending on what you sproc is doing and its dependencies, you may even be able to partially update the cache table based on the trigger, but all of this quickly gets difficult to maintain...but sometimes you gotta do what you gotta do...This approach will allow the cache table to update itself as needed. You will always have the latest data and the SPROC will only run when needed.
Try "Analyze query in database engine tuning advisor" from the Query menu.
I usually script the procedure to a new window, take out the query definition part and try different combinations of temp tables, regular tables and table variables.
You could cache the result set in the application as opposed to the database, either in memory by keeping an instance of the datatable around, or by serializing it to disk. How many rows does it return?
Is it too long to post the code here?
OK first things first, indexes:
What indexes do you have on the tables and is the execution plan using them?
Do you have indexes on all the foreign key fields?
Second, does the proc use any of the following performance killers:
a cursor
a subquery
a user-defined function
select *
a search criteria that starts with a wildcard
third
Can the where clause be rewritten to be sargeable? There is more than one way to write almost everything and some ways are better performers than others.
I suggest you buy your developers some books on performance tuning.
Likely your proc can be fixed, but without seeing the code, it is hard to guess what the problems might be.