Advice on SQL Server database architecture - sql-server

We have a medium-sized web application (multiple instances), querying against a single SQL Server 2014 database.
Not the must robust architecture, no clustering/failover, and we have been getting a few deadlocks recently.
I'm looking at how i can improve the performance and availability of the database, reduce these deadlocks, and have a better backup/failover strategy.
I'm not a DBA, so looking for some advice here.
We currently have the following application architecture:
Multiple web servers reading and writing to a single SQL Server DB
Multiple background services reading and writing to the same single SQL Server DB
I'm contemplating making the following changes:
Split the single DB into two DB's, one read-only and another read-write. The read-write DB replicates the data to the read-only DB using SQL Server replication
Web servers connect to the given DB depending on the operation.
Background servers connect to the read-write DB (most the writes happen here)
Most of the DB queries on the web servers are reads (and a lot of the writes can be offloaded to the background services), so that's the reason for my thoughts here.
I could then also potentially add clustering to the read-only databases.
Is this a good SQL Server database architecture? Or would the DBA's out there simply suggest a clustering approach?
Goals: performance, scalability, reliability

Without more specific details about your server, it's tough to give you specific advice (for example, what's a medium-sized web application? what are the specs on your database server? What's your I/O latency like? CPU contention? Memory utilization?)
At a high level of abstraction, deadlocks usually occur because of two reasons:
Your reads are too slow, and
Your writes are too slow.
There's lots of ways to address both of those issues, but in general:
You can cover a lot of coding sins with good hardware, and
Don't re-architect a solution until you've pursued performance tuning options (including indexing strategies and/or procedure rewrites).
Clustering is generally considered to be used as a strategy for High Availability/Disaster Recovery, not performance augmentation (there are always exceptions).

Related

Options for a secondary SQL database

I have a VM in Azure running a single SQL Server instance.
I also have recently setup Power BI to refresh from this source at 1am every morning. Unfortunately, this refresh is causing performance issues, where all queries/operations are timing out due to stress.
What are my options regarding a secondary DB for reporting purposes? Main requirements are ease of maintenance and cost (dont need anything enterprise level).
Things that come to mind:
Secondary DB on same VM. Use replication to mirror data
Another cheap VM. Use replication
Use sql server availability sets, connect to read only replica
SQL data warehouse
Can anyone provide some guidance, or ask questions that may help find my answer?
Thanks.
I think Always ON availability group with secondary read-only replica will be best suited for your needs.
Building a separate DW for reporting purpose will be an overkill, as your reporting needs are satisfied from current database already, except for performance.
Transactional replication could be of help here. But, it also needs lot of knowledge on setup and maintenance.
I can think of several options, but in general this sounds like a canonical OLTP vs. OLAP issue, or a call for data warehouse, but since you are on the budget, let's consider low cost options.
Assuming the databases are small (GBs not TBs), I would separate operational and reporting instances either to be on the same machine if it is a pretty beefy machine, or better have two VMs so you can manage capacity separately.
I would consider replication from one instance to another.
Can you boost your VM resources during the period of the Power BI refresh only?
That's one of the key benefits of Azure - you can scale up and down and save money. How long does the refresh take? Who is using your DB at 1am?
I guess for a VM it's difficult to do this so you'd need to migrate to SQL Azure rather than a VM

SQL Server 2014 In-Memory OLTP vs Redis

Is SQL Server 2014's In-Memory OLTP (Hekaton) the same or similar concept with Redis?
I use Redis for in-memory storage (storage in RAM) and caching, while having a separate SQL Server database (like StackExchange does). Can Hekaton do the same thing?
They're similar by both being primarily in-memory, but that's about it.
Redis is an in-memory key-value database. It can persist data to disk if configure it, but it keeps the entire dataset in memory so you need enough RAM for that. The key-value architecture allows various different data types so you can store a value as a simple string or lists, sets, hashes, etc. Basically all the data structures you can use inside of a programming language are available in Redis natively.
SQL Server Hekaton (In-Memory OLTP) is a new engine designed to run relational tables in memory. All the data for these tables is kept in RAM but also stored to disk so they are fully durable.
Hekaton can take individual tables in a SQL Server database and run them in a different process using MVCC (instead of pages and locks) and other optimizations so operations are thousands of times faster than the traditional disk-based engine. There is a lot of research that went into this and the primary use-case would be to take a table that is under heavy load and switch it to run in-memory to increase performance and scalability.
Hekaton was not meant to run an entire database in memory (although you can do that if you really want to) but rather as a new engine designed to handle specific cases while keeping the interface the same. Everything to the end-user is identical to the rest of SQL Server: you can use SQL, stored procedures, triggers, indexes, atomic operations with ACID properties and you can work seamlessly with data in both regular and in-memory tables.
Because of the performance potential of Hekaton, you can use it to replace Redis if you need the speed and want to model your data within traditional relational tables. If you need the other key-value and data structure features of Redis, you're better off staying with that.
With SQL 2016 SP1 and newer, all tiers of SQL Server now have access to the same features and the only difference is pricing for support and capacity.
Firstly you need the enterprise edition (very expensive) of SQL Server to use Hekaton (In-Memory OLTP). Note you have to pay for sql server per CPU, adding more workload to SQL server may require you to have more CPU and therefore a lot more licence costs.
But unlike Redis, you can have a trigger or stored proc update your “in memory cache” as part of the database transaction. You may also find that Hekaton is fast enough that you don’t need a separate set of caches from your main tables.
So yes, Hekaton can do the same as Redis, but it is unlikely to be sensible to use it in that way unless its usage does not cost you much.
Hekaton comes into its own when it allows you to process a lot more data without having to invest in the programming cost of re-designing your system to make use of caching with Redis or otherwise.

Sql Server distribution and configuration for best performance

I want design and implement an enterprise software with silverlight.I use sql server database for this.many useres run sql queireis on sql server database.
how can i configure sql server database for best performance?
how can i distribute sql server database for best performance?
how can i distribute sql server database between some servers for best performance?
and so what technologies can i use in sql server for best performance?
In addition to replication you can use mirroring or log shipping for this. Note that I am talking only about scaling out reads, not write. So reports etc. can be run from the copies of the database but writes must go to the main copy (unless you are using merge replication, which is frightening to me). There are some caveats of course.
With database mirroring, you can use the secondary as a read-only reporting source by taking a snapshot. There are limits here to how many databases you can mirror and there is of course maintenance to manage the snapshots. It is not quite true distribution of resources here, but it can be helpful to offload some of the load. In the next version of SQL Server (Denali), you will be able to set secondaries as read-only, so you can avoid the maintenance of snapshots.
With log shipping, you can essentially keep a stale version of the database around for reporting, and replace it periodically by restoring logs to it. You have a lot more flexibility here compared to replication or mirroring, as you can actually define a delay (like every 6 hours or once a day, you refresh the copy) - which can also serve as a "recover from a shoot-yourself-in-the-foot" scenario. The downside is that to restore a new copy of the database you need to kick all the current users out, as the database needs to be in single user mode in order to recover.
Those are just a couple of ideas for helping scale out reads, but deep down I agree with #gbn - are you solving a problem you don't have yet? It's one thing to design for scalability, but it's very easy to step over that line and completely over-engineer.
Well, SQL Server doesn't really have a load balancing mchanism in and off itself. What it does support, however, is an active/passive node configuration and also replication.
We are using the replication strategy in one application I support. You can read more about it here:
http://msdn.microsoft.com/en-us/library/ms151198.aspx
In our configuration, we basically have a transactional database and a reporting database. We replicate the data from our transactional DB to the reporting DB. Any reporting is done against this reporting DB, so that we don't slow down work being done on the transactional DB due to some long running report.
Note that the replication isn't truly real time. In other words, there's some time involved in replicating the data from the transactional to the reporting DB, albeit a very small time amount. But replication is certainly one strategy you could consider if you are trying to balance workload.
Other things you might consider are partitioning large tables for better performance.
As gbn pointed out in his comment though, it's better to determine if you actually need these strategies before implementing them, because they add a lot of complexity and maintenance efforts, which may not even be needed. It's important to properly analyze how much data you think you will have, and how much activity will be occurring against that data to determine if strategies such as the ones I just described are even needed.
Also, you can refer to this link for some other helpful information and some links to whitepapers you may find helpful:
http://social.msdn.microsoft.com/Forums/en/sqldisasterrecovery/thread/05cf41b7-c558-44bf-86c6-12f5c2b2ffe2

Transactional Replication For Write Heavy Medium Sized Database

We have a decent sized, write-heavy database that is about 426 GB (including indexes) and about 300 million rows . We currently collect location data from devices that report to our server every couple of minutes, and we serve about 10,000 devices - so lots of writes every second. The location table that stores the location of each device has about 223 million rows. The data is currently archived by year.
Problems occur when users run large reports on this database, the whole database grinds down almost to a stop.
I understand I need a reporting database, but my question is if anyone has experience of using SQL Server Transactional Replication on a database of equivalent size, and their experience of using this technology?
My rough plan is to point all the reports in our application to the Reporting Database, use Transactional Replication to replicate the data over from the master to the slave (Reporting Database).
Anyone have any thoughts on this strategy and the problems I may encounter?
Many thanks!
Transactional replication should work well in this scenario (the only effect the size of the database will have is the time taken to generate the initial snapshot). However, it may not solve your problem.
I think the issue you'll have if you choose transactional replication is that the slave server is going to be under the same load as the master machine as changes are applied - it will still crawl when users run large reports (assuming it's of a similar spec).
Depending on the acceptable latency of reporting data to the live data, this may or may not be OK for your users.
If some latency is acceptable you may get better performance from log shipping, since changes are applied in batches.
Before acquiring a reporting server, another approach would be to investigate the queries that your users are running and look at modifying either their code or the indexing strategy to better match what they're trying to do.
Transactional Replication could work well for you. The things to consider:
The target database tables must be read-only.
The server containing the target database should be stout enough to handle the SELECT traffic from the reporting applications.
Depending on the INSERT/UPDATE traffic, you may need to have a third server act as the Distribution server.
You also have to consider the size of the Distribution database.
Based on what I read here, I'd use a pull subscription from the Reporting server to offload traffic from the OLTP server.
You can skip the torment of a snapshot by initializing the reporting database from a backup of the OLTP database. See https://msdn.microsoft.com/en-us/library/ms151705.aspx
There will be INSERT/UPDATE/DELETE traffic from the Replication into both the Distribution and the Subscriber databases. That requires consideration, but lock/block issues should be no worse (and probably better) than running those reports off of OLTP.
I am running multiple publications on a 2.6TB database with 2.5GB/day of growth, using both pure transactional to drive reports (to two reporting servers) and Peer-to-Peer Transactional to replicate data in a scale-out for a SaaS offering (to three more servers). Because of this, we have a separate distributor.
Hope this helps.
Thanks
John.

Would it ever be wise to have a SQL server per web server?

I'm wondering if, under the circumstances that
You get lots more reads than writes
Your SQL server of choice is cheap/free and offers a fast mirroring/replication service
Your database isn't insanely large
rather than having separate SQL servers it would be better to have an instance of SQL on each machine getting instant updates from the master. This way there would be no network latency when doing all the read queries, but there would be a per box performance hit as the SQL instance has to execute. Would this be better overall for performance? Are there any other pros/cons that might come up?
Your SQL Server should always be on a different box to the webserver, of that there is no question.
How many DB servers and webservers you have, and how they mirror (or otherwise) is up to how you scale your application.
You have SQL Server on a different machine because it needs (and deserves) a lot of RAM.
It's quite a common architectural pattern to have read-only replicas of a database. We accept some degree of stalesness in them, perhaps they are even only updated once a day.
The general rule will be that multiple copies will introduce complexity in terms of operations and management and tend to introduce the possibilities of inconsistency of data - almost inevitably the copies will not be perfectly is step (or the costs of making them soo will be too high.)
An example: what happens if your replication processing breaks a bit. So that some, but not all copies become stale. Now your users start to see radically different views of the world. How much might that matter to you? If it's a site with low value data (eg. celebrity sightings in London suberbs) then perhaps that's fine. If it's on hand inventory, and being out of date means that your customers can't place orders, then maybe you care rather more.
My advice: things that sound simple at a boxed on paper sort of level don't always work out that way when you're sitting in an operations room at 3AM. Be very sure that you can easily operate your solution.
How would your SQL Server be cheap/free? I should have said the licensing costs for this setup would be crippling. At retail prices you're looking at $6000 per server. See also Jeff's comments about costs. Scale out the web servers by all means, but not your SQL Server until it's pretty much on its' knees.
You might instead want to think about a distributed cache like Velocity or NCache.
Either way, run your site first with one SQL server and see how it copes with the load, then think about mirroring/replication across servers, otherwise you're just optimising prematurely. Measure first!
An immediate con is that there is no distributed lock co-ordinator in SQL Server so you can get merge conflicts as updates can change the same row on two different servers at the same time.
Depending on the size of the database and the disks in the web servers, you will find your network latency is smaller than the disk latency you will start suffering as the web server disks will not usually be as performant as the disk array you give to the database. If you wanted that kind of performance, you would be buying it per web server.
Replication performance is not without latency either, the distribution of the transactions isn't 'free' and careful maintenance of the transaction log would have to be planned to ensure you did not get log fragmentation (too many vlog's wthin the transaction log) which kills replication performance.

Resources