Parallel operations in sql server running "sequentially " - sql-server

There are n databases in our windows application where the schema is same.
We have a requirement where we need to export the databases as mirrors but only a subset of data should be copied.
For this we are following
Create a new empty DB.
Run a script which inserts selected data into the DB based on a key.
After insert we detach the DB.
steps 1,2,3 are run for each database which needs to be shipped.
Inserts we are running in batches. Our plan is to run parallel for each database which needs to be exported.
When we run for large data it takes 6 minutes for a database.
And if I run the same script for 5 different databases in 5 different sessions simultaneously. The last script finishes running by 30-32 minutes. Which means even though they are running parallel time taken is same as sequential.
We disabled all indexes and at end we are rebuilding them
The database is supposed to be in simple logging mode we are turning them to bulk logging mode and turning back to simple logging.
We tried using MAXDOP(0) and (1) -- No change
NO LOCK Hint is being used in all select queries.
I want to understand what should I do to get best performance, because they are 5 different databases being copied to 5 new databases where all operations are supposed to be independent.

Related

SQL server Instance hanging randomly

I have a SQL Server agent job running every 5 minutes with SSIS package from SSIS Catalog, that package does:
DELETE all existing data ON OLTP_DB
Extract data from Production DB
DELETE all existing data on OLAP_DB and then
Extract data transformed from OLTP_DB into OLAP_DB ...
PROBLEM
That job I mentioned above is hanging randomly for some reason that I don't know,
I just realize using the activity monitor, every time it hangs it shows something like:
and if I try to run any query against that database it does not response just say executing.... and nothing happen until I stop the job.
The average running time for that job is 5 or 6 minutes, but when it hangs it can stay running for days if I donĀ“t stop it. :(
WHAT I HAVE DONE
Set delayValidation : True
Improved my queries
No transactions running
No locking or blocking (I guess)
Rebuild and organize index
Ran DBCC FREEPROCCACHE
Ran DBCC FREESESSIONCACHE
ETC.....
My settings:
Recovery Mode Simple
SSIS OLE DB Destination
1-Keep Identity (checked)
2-Keep Nulls (checked)
3-Table lock (checked)
4-Check constraints (unchecked)
rows per batch (blank)
Maximum insert commit size(2147483647)
Note:
I have another job running a ssis package as well (small) in the same instance but different databases and when the main ETL mentioned above hangs then this small one sometimes, that is why I think the problem is with the instance (I guess).
I'm open to provide more information as need it.
Any assistance or help would be really appreciated!
As Joeren Mostert said, it's showing CXPACKET which means that it's executing some work in parallel. (cxpacket)
It's also showing ASYNC_NETWORK_IO (async_network_io) which means it's also transfering data to the network.
There could be many reasons. Just a few more hints:
- Have you checked if network connection is slow? - What is the size of the data being transfered vs the speed of the network? - Is there an antivirus running that could slow the data transfer?
My guess is that there is lots of data to transfer and that it's taking a long time. I would think either I/O or network but since you have an asyn_network_io that takes most of the cumulative wait time, I would go for network.
As #Jeroen Mostert and #Danielle Paquette-Harvey Said, By doing right click using the activity monitor I could figure out that I had an object that was executing in parallel (for some reason in the past), to fix the problem I remove the parallel structure and put everything to run in one batch.
Now it is working like a charm!!
Before:
After:

select query runs slow in production but faster in staging - oracle 11g

Here's the issue.
I'm using Oracle 11g Database.
I have a master table that has rows in lacs and is in use almost the entire working day. We have a monthly activity where we dump all production data into staging(testing) db. Now whenever I execute a select query to retrieve data of consumer based on a their address, the time taken on staging db is 3 seconds and 50-60 seconds on production server. All the necessary indexes are in place. No issues in the procedures being called. Stats have been gathered. Indexes have been rebuilt.
I need to know the following:
Is it because that the table on production is busy the entire day?
Does it have anything to do with buffer cache or any other database parameter?
Also, I would like to mention that we use the "LIKE" operator in where condition to search consumer based on a particular address. However, we are also using % as a prefix i.e. for example, select * from master_table where m_address like '%variable_name%'. Now I have read that Oracle might omit the use of index if the % wildcard is used as prefix i.e. before the character. However, the astonishing factor is that it runs perfectly fine on my Testing environment as previously mentioned.
Pl help. Let me know if you need more info.

How do I Queue Distributed Computing Jobs at home using SQL?

I'm using an application written in LabVIEW (an engineering software framework/programming language) to run about twenty thousand simulations. Each simulation takes about 5 mins to complete, and results will be dumped into a database hosted in a laptop in my local network. I'm using SQL Express as my database.
Each simulation job has a set of starting parameters that will be passed to the application. This could be as simple as a string of characters that the application would parse into valid simulation characteristics, but I'm not sure exactly how to structure this.
Because the simulations would take about 3 months to run on one computer, I want to add in the capability for the database computer to be able to "schedule" jobs. That way, I can run the application on any computer in my local network (I have 5 available) for a few simulations, and stop simulations when I need to use it for other things. The database computer will hand out these jobs as they get requested by the application, as well as continuously run jobs itself.
How would I go about setting up this queue from an SQL point of view? The framework I currently have in mind would work something like this: Database has 3 tables in addition to tables used to store simulation data. The tables contain CompletedJobs,RunningJobs, and JobsToRun. The application would request a job from JobsToRun, and place that job's ID into the RunningJobs table. It would then parse the job's ID for relevant information, run the simulation, and if it exits without errors, move the job ID to the CompletedJobs table.
Would this work?
I don't see the need for three tables - why not have one table, Jobs, with a JobStatus field that can take values (e.g.) ToRun, Running, Completed, and perhaps Failed - you can probably think of others. When a simulation starts a new job, it changes the status to Running, when it completes the job it changes it again to Completed or Failed.
You might want fields for StartTime and EndTime, perhaps ErrorCode if your simulation might fail with different types of error? What does the output of the simulation consist of - should you store a filename of the output file, or even upload the output data itself as a BLOB? Let the database take care of assigning each job a unique ID, which would be the primary key for the database table.
What sort of data actually are the starting parameters? If you can store them in database fields, do that. You could put those in a second table if you wanted, and have your Jobs table refer to the parameter set's ID in the job parameters table.

SQL Server Jobs Running Every 2 minutes...Bad Practice?

We have two servers, one is very slow and geographically far away. Setting up distributed queries is a headache because it does not always work (sometimes we receive a The semaphore timeout period has expired) and if the query works it can be slow.
One solution was to setup a job that populates temporary tables on the slow server with the data we need and then writes INSERT, UPDATE and DELETE statements to our server tables from the temporary tables so we have updated data on our faster servers. The job takes about one minunte and 30 seconds and are setup to run every 2 minutes. Is this bad practice and will it hurt our slower SQL Server box?
EDIT
The transactions are happening on the slow server agent (where the job is running) and using distributed queries to connect and update our fast server. If the job runs on the fast server we get that timeout error every now and then
As for the specifics, if the record exists on the faster server we perform an update, if it does not exist we insert and if the record no longer exists on the slow server we delete...I can post code when I get to a computer

Warehouse PostgreSQL database architecture recommendation

Background:
I am developing an application that allows users to generate lots of different reports. The data is stored in PostgreSQL and has natural unique group key, so that the data with one group key is totally independent from the data with others group key. Reports are built only using 1 group key at a time, so all of the queries uses "WHERE groupKey = X;" clause. The data in PostgreSQL updates intensively via parallel processes which adds data into different groups, but I don't need a realtime report. The one update per 30 minutes is fine.
Problem:
There are about 4 gigs of data already and I found that some reports takes significant time to generate (up to 15 seconds), because they need to query not a single table but 3-4 of them.
What I want to do is to reduce the time it takes to create a report without significantly changing the technologies or schemes of the solution.
Possible solutions
What I was thinking about this is:
Splitting one database into several databases for 1 database per each group key. Then I will get rid of WHERE groupKey = X (though I have index on that column in each table) and the number of rows to process each time would be significantly less.
Creating the slave database for reads only. Then I will have to sync the data with replication mechanism of PostgreSQL for example once per 15 minutes (Can I actually do that? Or I have to write custom code)
I don't want to change the database to NoSQL because I will have to rewrite all sql queries and I don't want to. I might switch to another SQL database with column store support if it is free and runs on Windows (sorry, don't have Linux server but might have one if I have to).
Your ideas
What would you recommend as the first simple steps?
Two thoughts immediately come to mind for reporting:
1). Set up some summary (aka "aggregate") tables that are precomputed results of the queries that your users are likely to run. Eg. A table containing the counts and sums grouped by the various dimensions. This can be an automated process -- a db function (or script) gets run via your job scheduler of choice -- that refreshes the data every N minutes.
2). Regarding replication, if you are using Streaming Replication (PostgreSQL 9+), the changes in the master db are replicated to the slave databases (hot standby = read only) for reporting.
Tune the report query. Use explain. Avoid procedure when you could do it in pure sql.
Tune the server; memory, disk, processor. Take a look at server config.
Upgrade postgres version.
Do vacuum.
Out of 4, only 1 will require significant changes in the application.

Resources