Snapshot Replication or Backup/Restore - sql-server

A SQL Server that serves as a DW for reporting purposes (OLAP) is also used directly by users to perform direct ad-hoc queries. User queries add a lot of extra load on this server because of number and concurrency and since this is not its primary role I am considering replicating the relatively big database to another machine and have users execute their queries on that machine to offload the DW.
This replication should be done daily.
My question is: what is the faster solution for this - Doing a Backup/Restore of the original DB or setting up Snapshot Replication ?
Is there any other approach to this problem - any suggestions?

Related

Transactional replication from Oracle to SQL Server

We have an Oracle OLTP system and a SQL Server reporting solution. We run nightly stored procedures to extract the data using linked server but it it very slow as it is transferring millions of records. It would be great to have some transactional replication to get this data into our SQL Server environment in a near real time manner for reporting purposes. Has anyone tried this? Without buying an expensive piece of software what would be the best bet?
We've looked into Apache Kafka to capture the changes and apply them to another database but it seems like a high maintenance approach.
If we switched on CDC on the Oracle side could we write a stored procedure to query the CDC table and make the changes?
Is there a simpler approach that we've overlooked?
We only have read access to the Oracle server but it is managed by a third party that could set up something on our behalf.

Advice on Azure platform to host Data Warehouse

I am a Data Warehouse developer currently looking into using the Azure platform to host a new Data Warehouse.
My experience is with using on premise servers hosting standard SQL Server Databases, one for the staging database and one for the Data Warehouse. Typically I would use a combination of SSIS and stored procedures running in a scheduled SQL server agent job for the ETL.
How can I replicate this kind of setup within Azure?
The storage size will be less than 1TB so could I just use Azure SQL Server Database over Azure SQL Data Warehouse?
If so would I need separate databases for staging and the data warehouse using the elastic pool option?
The data that I will be loading into staging will all be on premise. Will SSIS still be suitable for loading to Azure or will Azure Data Factory be a better fit?
Any help at all would be greatly appreciated! Thanks.
Leon has lots of good information there. But from a Data Warehouse perspective, I wouldn't use Data Sync for ETL purposes (mensioned as "not preferred" in the link Leon provided, Data Sync, in the list "When to use Data Sync").
For DW, Azure DB is a good option. Azure SQL Data Warehouse (known as Azure Synapse Analytics nowadays) is a heavy duty beast for handling DW. Are you really sure you need this kind of system with < 1Tb data? I'd personnally leave Azure Synaptics for now, and tried with Azure DB first. It's a LOT cheaper and you can upgrade later if necessary.
One thing to note about Azure DB though: Azure DB doesn't support queries over databases. That's not a deal breaker though, everything can be handled in the same database. I personally use a schema to differentiate staging from the DW (and of course I use other schemas in the DW as well). It's not very difficult to use separate databases of course, but the border between them is a lot deeper in Azure DB than on-premise SQL Server or other Azure solutions (Managed Instance for example).
SSIS is still an option, but the problem is, what you use to run the packages? There are options like:
continue running them from on-premise (all the hard work is still done in the cloud)
rent a VM with SQL Server from Azure, deploy the packages to the VM and run them from VM
use Data Factory to run the SSIS packages
None of those are a perfect solution for every use case. First two options come with quite a heavy cost, if running SSIS is the only thing you need them for. Using Data Factory to run SSIS is a bit cumbersome at the moment, but it's an option anyway.
Data Factory itself is a good option as well (I haven't personally tried it, but I have heard good things about it). If you use Data Factory to run your SSIS, why not start using Data Factory without SSIS packages in the first place? Of course Data Factory has some limitations compared to SSIS which might be the reason, but if your SSIS packages are simple enough, why not give Data Factory a try.
I would suggest you using Azure SQL database. It provides many price tier with difference storage for you. You can select the most suitable price tier for you. Azure SQL database also support scale up/down base on the usage.
Ref: Service tiers in the DTU-based purchase model
And as you said, the data that I will be loading into staging will all be on premise.
Azure SQL database has the feature Data Sync can help you do that:
Data Sync is useful in cases where data needs to be kept updated across several Azure SQL databases or SQL Server databases. Here are the main use cases for Data Sync:
Hybrid Data Synchronization: With Data Sync, you can keep data
synchronized between your on-premises databases and Azure SQL
databases to enable hybrid applications. This capability may appeal
to customers who are considering moving to the cloud and would like
to put some of their application in Azure.
Distributed Applications: In many cases, it's beneficial to separate
different workloads across different databases. For example, if you
have a large production database, but you also need to run a
reporting or analytics workload on this data, it's helpful to have a
second database for this additional workload. This approach minimizes
the performance impact on your production workload. You can use Data
Sync to keep these two databases synchronized.
Globally Distributed Applications: Many businesses span several
regions and even several countries/regions. To minimize network
latency, it's best to have your data in a region close to you. With
Data Sync, you can easily keep databases in regions around the world
synchronized.
When you create the SQL database, you can migrate the schema or data to Azure with many tools, such as Data Migration Assistant(DMA).
Then Set up SQL Data Sync between Azure SQL Database and SQL Server on-premises, it will help sync the data auto every 5 mins.
Hope this helps.
If you want to start on the less expensive options in Azure, go with a general purpose SQL database and an Azure Data Factory pipeline with a few activities.
Dynamic Resource Scaling ETL
You can scale up the database by issuing an alter database statement and then move onto your stored proc based ETL. I would even use a "master" proc to call the dimension and fact proc's to control the execution flow. Then scale down the database with another alter database statement. I even created my own stored proc to issue these scaling statements.
You also cannot predict when the scaling will be completed, so I have a wait activity. You could be a little more nerdy with a loop that checks the service objective property and then proceeds when it is complete. But it was just easier to wait for 10 minutes. I have only been burnt a couple times when the scaling took longer.
Data Pipeline Activities:
Scale up, proceed if successful
Wait about 10 minutes, proceed always
Execute the ETL, proceed always
Scale down
Elastic Query
You can query across databases with vertical partition Elastic Query. Performance isn't great, and they don't recommend it for ETL, but it will work. To improve performance try dumping any large table you need into a temp table and then transform the data locally.

db replication vs mirroring

Can anyone explain the differences from a replication db vs a mirroring db server?
I have huge reports to run. I want to use a secondary database server to run my report so I can off load resources from the primary server.
Should I setup a replication server or a mirrored server and why?
For your requirements the replication is the way to go. (asumming you're talking about transactional replication) As stated before mirroring will "mirror" the whole database but you won't be able to query unless you create snapshots from it.
The good point of the replication is that you can select which objects will you use and you can also filter it, and since the DB will be open you can delete info if it's not required( just be careful as this can lead to problems maintaining the replication itself), or create specific indexes for the report which are not needed in "production". I used to maintain this kind of solutions for a long time with no issues.
(Assuming you are referring to Transactional Replication)
The biggest differences are: 1) Replication operates on an object-by-object basis whereas mirroring operates on an entire database. 2) You can't query a mirrored database directly - you have to create snapshots based on the mirrored copy.
In my opinion, mirroring is easier to maintain, but the constant creation of snapshots may prove to be a hassle.
As mentioned here
Database mirroring and database replication are two high data
availability techniques for database servers. In replication, data and
database objects are copied and distributed from one database to
another. It reduces the load from the original database server, and
all the servers on which the database was copied are as active as the
master server. On the other hand, database mirroring creates copies of
a database in two different server instances (principal and mirror).
These mirror copies work as standby copies and are not always active
like in the case of data replication.
This question can also be helpful or have a look at MS Documentation

Sql Server distribution and configuration for best performance

I want design and implement an enterprise software with silverlight.I use sql server database for this.many useres run sql queireis on sql server database.
how can i configure sql server database for best performance?
how can i distribute sql server database for best performance?
how can i distribute sql server database between some servers for best performance?
and so what technologies can i use in sql server for best performance?
In addition to replication you can use mirroring or log shipping for this. Note that I am talking only about scaling out reads, not write. So reports etc. can be run from the copies of the database but writes must go to the main copy (unless you are using merge replication, which is frightening to me). There are some caveats of course.
With database mirroring, you can use the secondary as a read-only reporting source by taking a snapshot. There are limits here to how many databases you can mirror and there is of course maintenance to manage the snapshots. It is not quite true distribution of resources here, but it can be helpful to offload some of the load. In the next version of SQL Server (Denali), you will be able to set secondaries as read-only, so you can avoid the maintenance of snapshots.
With log shipping, you can essentially keep a stale version of the database around for reporting, and replace it periodically by restoring logs to it. You have a lot more flexibility here compared to replication or mirroring, as you can actually define a delay (like every 6 hours or once a day, you refresh the copy) - which can also serve as a "recover from a shoot-yourself-in-the-foot" scenario. The downside is that to restore a new copy of the database you need to kick all the current users out, as the database needs to be in single user mode in order to recover.
Those are just a couple of ideas for helping scale out reads, but deep down I agree with #gbn - are you solving a problem you don't have yet? It's one thing to design for scalability, but it's very easy to step over that line and completely over-engineer.
Well, SQL Server doesn't really have a load balancing mchanism in and off itself. What it does support, however, is an active/passive node configuration and also replication.
We are using the replication strategy in one application I support. You can read more about it here:
http://msdn.microsoft.com/en-us/library/ms151198.aspx
In our configuration, we basically have a transactional database and a reporting database. We replicate the data from our transactional DB to the reporting DB. Any reporting is done against this reporting DB, so that we don't slow down work being done on the transactional DB due to some long running report.
Note that the replication isn't truly real time. In other words, there's some time involved in replicating the data from the transactional to the reporting DB, albeit a very small time amount. But replication is certainly one strategy you could consider if you are trying to balance workload.
Other things you might consider are partitioning large tables for better performance.
As gbn pointed out in his comment though, it's better to determine if you actually need these strategies before implementing them, because they add a lot of complexity and maintenance efforts, which may not even be needed. It's important to properly analyze how much data you think you will have, and how much activity will be occurring against that data to determine if strategies such as the ones I just described are even needed.
Also, you can refer to this link for some other helpful information and some links to whitepapers you may find helpful:
http://social.msdn.microsoft.com/Forums/en/sqldisasterrecovery/thread/05cf41b7-c558-44bf-86c6-12f5c2b2ffe2

Database replication. 2 servers, Master database and the 2nd is read-only

Say you have 2 database servers, one database is the 'master' database where all write operations are performed, it is treated as the 'real/original' database. The other server's database is to be a mirror copy of the master database (slave?), which will be used for read only operations for a certain part of the application.
How do you go about setting up a slave database that mirrors the data on the master database? From what I understand, the slave/readonly database is to use the master db's transaction log file to mirror the data correct?
What options do I have in terms of how often the slave db mirrors the data? (real time/every x minutes?).
What you want is called Transactional Replication in SQL Server 2005. It will replicate changes in near real time as the publisher (i.e. "master") database is updated.
Here is a pretty good walk through of how to set it up.
SQL Server 2008 has three different modes of replication.
Transactional for one way read only replication
Merge for two way replication
Snapshot
From what I understand, the slave/readonly database is to use the master db's transaction log file to mirror the data correct?
What options do I have in terms of how often the slave db mirrors the data? (real time/every x minutes?).
This sounds like you're talking about log shipping instead of replication. For what you're planning on doing though I'd agree with Jeremy McCollum and say do transactional replication. If you're going to do log shipping when the database is restored every x minutes the database won't be available.
Here's a good walkthrough of the difference between the two. Sad to say you have to sign up for an account to read it though. =/ http://www.sqlservercentral.com/articles/Replication/logshippingvsreplication/1399/
The answer to this will vary depending on the database server you are using to do this.
Edit: Sorry, maybe i need to learn to look at the tags and not just the question - i can see you tagged this as sqlserver.
Transactional replication is real time.
If you do not have any updates to be done on your database , what you need is just retrieving of data say once a day : then use snapshot replication instead of transactional replication. In snapshot replication, changes will replicate when and as defined by the user say once in 24 hrs.

Resources