SQL Server 2008: A *Data File IO counter* for SQL Server - sql-server

I make a stress test of the SQL Server 2008 and I want to know what is the data flow to tempdb because of usage of temporary tables and variables.
The statistics is also shown in Activity Monitor:
Is it possible somehow to record the data and afterwards analyse it?
I have 2 cases in mind:
Record an SQL Server counter (I don't know which is the name for it)
Record somehow data from Activity Monitor

Writing into a database does not equate 1 to 1 with disk IO. Database updates only dirty in-memory pages that are later copied to disk by the lazy writer or at checkpoint. The only thing written to disk is the Write Ahead Log activity, for which there is a specific per database counter: Log Bytes Flushed/sec. Note that tempdb has special logging requirements as it is never recovered so it only needs undo information. Whenever dirty pages are actually flushed, be it at checkpoint or by lazy writer, there are specific counters for that too: Checkpoint pages/sec and Lazy writes/sec. These are not per database because these activities themselves are not 'per database'. Finally there are the virtual file stats DMVs: sys.dm_io_virtual_file_stats which offer the aggregate number of IO operations and number of bytes for each individual file of each individual database, including tempdb.
You mention that you want to measure the specific impact of temp tables and table variables, but you won't be able to separate them from the rest of tempdb activities (sort spools, worktables etc). I recommend you go over Working with tempdb in SQL Server 2005, as it still applies to SQL 2008.

If you use Performance Monitor (perfmon.exe) to monitor the SQL Server counters, you can configure this to log to a .csv file to analyse in Excel (for example)
The performance counter in question is Data File(s) Size under SQLServer:Databases

I would take some regular interval "snapshots" (using the following DMVs) loaded into tables to determine your internal usages of tempDB.
sys.all_objects
sys.dm_db_file_space_usage
sys.dm_db_task_space_usage
sys.dm_db_task_space_usage will break down usages by SPID, etc.

Related

Pulling instead pushing data from database

Loading data from my OLTP database (it's part of ETL) via OPENQUERY or SSIS Data Flow to another SQL Server database (Warehouse which run this SSIS package / OPENQUERY statement), kills it. As I checked in Performance Monitor I use resources from source database, not from destiny. Is possible to reverse this resource utilization (using SQL Server 2016 or SSIS)?
The problem here is in your destination write operation. If you are using OLE DB Destination with fast load access mode try setting the rows per batch value to a non-zero value and reduce the maximum insert commit size to a value that will be easy on your memory and CPU. SSIS will not have to wait for the default of 2147483647 before writing to the destination table which can have a large impact on your log file slowing your process down. Please refer to this Article for more info on setting this values. All the best
How does your export query looks like? Is it just a simple data dump or do you have some complex logic in (e.g. doing some denormalization/aggregation with the export)?
If it's just a simple export, check on which server your SSIS package runs and what resources it uses. In any case, you need to read the data from your source system, so expect some read disc operations.
In general it is better to get the data from an OLTP as quickly as possible and then apply other operations in further steps of your ETL process on your ETL/Data warehouse server. In order to reduce an impact on your transactional system.
Hope it helps.

SSIS ETL - Is it a good practice to have the destination DB pull data from sources directly

I have an ETL package that moves data from a number of source SQL Server DBs to a single destination SQL Server DB. All these DBs are on the same server. The destination DB contains a large number of views that reference the source DBs. E.g. SELECT * FROM SourceDB1.dbo.Transactions.
So the majority of the data goes directly source DB => destination DB, without passing through the SSIS server. I'm new to SSIS and wondering if this is a good thing to do, or should I look into changing the process.
Time passes, your company grows. You stand up Server2 and have SourceDBN on there. Now what? Your pattern of SELECT * FROM SourceDB.dbo.Transactions breaks.
SourceDB27, that client pays us a lot of money and so they ask us to add column FooBitsWhatsIt to their Transactions table. Now your SELECT * breaks because you have inconsistent columns across your ecosystem.
Someone writes a big query that takes a while to process - the people in the destination database are negatively impacting the ability of the Source databases to do their regular activity. Had the data been copied over to the destination and not merely referenced, there would be isolation between source and destination activities.
Generally speaking, the above costs and risks outweigh the additional development, storage and processing costs.
When I started learning about ETL and data migration using SSIS I was always told that it is best practice to first move the data into a staging database where you can validate the data, deduplicate, clean etc in there then move it to the destination DB

What is the purpose of tempdb in SQL Server?

I need a clarification about tempdb in SQL Server and need some clarifications on following things
What is the purpose of its?
Can we create a own tempdb and how to make refer the own tempdb to own database?
FROM MSDN
The tempdb system database is a global resource that is available to all users connected to the instance of SQL Server and is used to hold the following:
Temporary user objects that are explicitly created, such as: global
or local temporary tables, temporary stored procedures, table
variables, or cursors.
Internal objects that are created by the SQL Server Database Engine,
for example, work tables to store intermediate results for spools or
sorting.
Row versions that are generated by data modification transactions in
a database that uses read-committed using row versioning isolation
or snapshot isolation transactions.
Row versions that are generated by data modification transactions
for features, such as: online index operations, Multiple Active
Result Sets (MARS), and AFTER triggers.
Operations within tempdb are minimally logged.
This enables transactions to be rolled back. tempdb is re-created every time SQL Server is started so that the system always starts with a clean copy of the database.
Temporary tables and stored procedures are dropped automatically on disconnect, and no connections are active when the system is shut down. Therefore, there is never anything in tempdb to be saved from one session of SQL Server to another. Backup and restore operations are not allowed on tempdb.
TempdB is a system database and we cant create system databases .Tempdb is a global resource for all databases ,which means temp tables,table variables,version store for user databases...all will use tempdb..This is a pretty basic explanation for tempdb.Refer to below links on how it is used for other purposes like database emails,..
https://msdn.microsoft.com/en-us/library/ms190768.aspx
1: It is what it says. A temporary storage. FOr example when you ask for DISTINCT results, SQL Server must remember what rows it already sent you. Same with a temporary table.
2: Makes no sense. Tempdb is not a database but a server thing - ONE TempDB regardless how many database. You can change where it is and how it is (file number, size) but it is never related to one database (except obviously if you only have one database on a SQL Server instance). Having your own Tempdb is NOT how SQL Server works. And while we are at it - no need to ever make a backup (of tempdb). When SQL Server starts, Tempdb is reinitialized as empty.
And, btw., this would be obvious if you would bother with such things as being borderline competent. Which includes for me reading the documentation of every major technology I work with once. You should consider this to be something to adopt because it is the only way to know what you are doing.

read MSSQL tables in disk order

How to take advantage of Disk IO queueing
I need to do exactly this but on Microsoft SQL server tables.
I have a database with +100 tables.I need to read every record of every table.
Any suggestions? Is it worth writing code doing benchmarks and debugging for a few seconds?
It would be really nice if i could tell where each table resides on disk.
And because somebody will ask: Yes, This is a bottleneck in my program.
That other answer is irrelevant for SQL Server. SQL Server does IO it's own way.
Some pointers though:
Ensure every table has a clustered index
You have regular index maintenance
Ensure you have good disks underneath (RAID etc)
Use Enterprise edition for read ahead functionality if it is that critical
Ensure you have plenty of RAM
What does this mean?
If you want to use SQL Server Express on a single workstation disk, then don't bother. You can't optimise this
Having clustered indexes and index maintenance ensures that data is mostly contiguous on disk (subject to subsequent data changes)
A proxy for how long this will take would be to run DBCC CHECKDB on all table. Or ALTER INDEX rebuild. Both will require all data to be read from disk for all tables

Tracking down data load performance issues in SSIS package

Are there any ways to determine what the differences in databases are that affect a SSIS package load performance ?
I've got a package which loads and does various bits of processing on ~100k records on my laptop database in about 5 minutes
Try the same package and same data on the test server, which is a reasonable box in both CPU and memory, and it's still running ... about 1 hour so far :-(
Checked the package with a small set of data, and it ran through Ok
I've had similar problems over the past few weeks, and here are several things you could consider, listed in decreasing order of importance according to what made the biggest difference for us:
Don't assume anything about the server.
We found that our production server's RAID was miscconfigured (HP sold us disks with firmware mismatches) and the disk write speed was literally a 50th of what it should be. So check out the server metrics with Perfmon.
Check that enough RAM is allocated to SQL Server. Inserts of large datasets often require use of RAM and TempDB for building indices, etc. Ensure that SQL has enough RAM that it doesn't need to swap out to Pagefile.sys.
As per the holy grail of SSIS, avoid manipulating large datasets using T-SQL statements. All T-SQL statements cause changed data to write out to the transaction log even if you use Simple Recovery Model. The only difference between Simple and Full recovery models is that Simple automatically truncates the log file after each transactions. This means that large datasets, when manipulated with T-SQL, thrash the log file, killing performance.
For large datasets, do data sorts at the source if possible. The SSIS Sort component chokes on reasonably large datasets, and the only viable alternative (nSort by Ordinal, Inc.) costs $900 for a non-transferrable per CPU license. So... if you absolutely have to a large dataset then consider loading it into a staging database as an intermediate step.
Use the SQL Server Destination if you know your package is going to run on the destination server, since it offers roughly 15% performance increase over OLE DB because it shares memory with SQL Server.
Increase the network packaet size to 32767 on your database connection managers. This allows large volumes of data to move faster from the source server/s, and can noticably improve reads on large datasets.
If using Lookup transforms, experiment with cache sizes - between using a Cache connection or Full Cache mode for smaller lookup datasets, and Partial / No Cache for larger datasets. This can free up much needed RAM.
If combining multiple large datasets, use either RAW files or a staging database to hold your transformed datasets, then combine and insert all of a table's data in a single data flow operation, and lock the destination table. Using staging tables or RAW files can also help relive table locking contention.
Last but not least, experiment with the DefaultBufferSize and DefaulBufferMaxRows properties. You'll need to monitor your package's "Buffers Spooled" performance counter using Perfmon.exe, and adjust the buffer sizes upwards until you see buffers being spooled (paged to disk), then back off a little.
Point 8 is especially important on very large datasets, since you can only achieve a minimally logged bulk insert operation if:
The destination table is empty, and
The table is locked for the duration of the load operation.
The database is in Simply / Bulk Logged recovery mode.
This means that subesquent bulk loads a table will always be fully logged, so you want to get as much data as possible into the table on the first data load.
Finally, if you can partition you destination table and then load the data into each partition in parallel, you can achieve up to 2.5 times faster load times, though this isn't usually a feasible option out in the wild.
If you've ruled out network latency, your most likely culprit (with real quantities of data) is your pipeline organisation. Specifically, what transformations you're doing along the pipeline.
Data transformations come in four flavours:
streaming (entirely in-process/in-memory)
non-blocking (but still using I/O, e.g. lookup, oledb commands)
semi-blocking (blocks a pipeline partially, but not entirely, e.g. merge join)
blocking (blocks a pipeline until it's entirely received, e.g. sort, aggregate)
If you've a few blocking transforms, that will significantly mash your performance on large datasets. Even semi-blocking, on unbalanced inputs, will block for long periods of time.
In my experience the biggest performance factor in SSIS is Network Latency. A package running locally on the server itself runs much faster than anything else on the network. Beyond that I can't think of any reasons why the speed would be drastically different. Running SQL Profiler for a few minutes may yield some clues there.
CozyRoc over at MSDN forums pointed me in the right direction ...
- used the SSMS / Management / Activity Monitor and spotted lots of TRANSACTION entries
- got me thinking, read up on the Ole Db connector and unchecked the Table Lock
- WHAM ... data loads fine :-)
Still don't understand why it works fine on my laptop d/b, and stalls on the test server ?
- I was the only person using the test d/b, so it's not as if there should have been any contention for the tables ??

Resources