Using SQL Service broker vs Queues with web service [closed] - sql-server

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
We have a legacy system with central database (SQL Server) and small clients (KIOSK- with local DB (SQL Express)) which is writtent using WPF application. The data sync between client and central DB is done using C#, ADO.NET sql statements. This takes huge toll on the performance. The number of clients currently we have are 400 and it will be increasing. Each client sends 100,000 records per day to the central database.
We are planning to re-write this sync part using SQL Service Broker
One of the main issue, the schema between client and central DB is different. The tables were not normalized and the worst case is most of the columns were using nvarchar datatypes for storing datetime, intergers data.
I am concerned about using Service Broker as most of the business logic would be written using Stored Procedure.
I would like to get some ideas on whether using this technology would be the best or do we need to consider creating REST based service using Message Queue.

TL;DR: I would recommend against using Service Broker in this kind of environment.
Detailed answer
While Service Broker is indeed a very lightweight and reliable communication mechanism, it was designed with a different goal in mind. Namely, it works best in a static topology, when administrators setup everything once and then the entire system runs for years, with little or no changes.
Judging by what I understood from your explanation, your network of connected hosts is much more dynamic, with hosts coming and going on a daily basis. This will incur high maintenance costs on your support, because in order to establish communication between two Service Broker endpoints belonging to different SQL Server instances, you will need (among many other things) to generate at least 1 pair of certificates and exchange their public keys between participating instances, after which they will have to be deployed in both the master and the subject databases on both sides.
This certificate exchange and deployment should be done before Service Broker messaging will be possible, so you will need another communication channel between the servers for the exchange to happen. Normally, this is done manually by DBAs due to high security risks associated with potential loss of transport-level keys. In your environment, however, there is a good chance that people will simply not be able to keep up. Not to mention a potential for human errors, which will be quite high due to large amount of repetitive manual work.
In short, I would recommend to look for something which is easier to deploy and maintain. Change tracking might be a good start; as for transport, you have a full smorgasbord of choices, from WCF to WebAPI (to whatever else have appeared in the last few years).

Related

How to Synch two databases in Microservice architecture in CQRS separate ones for read/write [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I was asked this question in this interview:
How to Synch two database data? There will be time delays etc. How do we handle?
The background: I mentioned about Microservice architecture and also using CQRS for performance (Separate Read/get query database) and separate write command database.
Now, if the customer enters or modifies data, how it will be replicated/synched in to the read database?
I was talking about stuffs like cosmos db options etc which prevents dirty read etc. I also mentioned about cache. But I am not certain what are all variousoptions to do synch. Interviewer specifically asked me in SQL DB level how do I synch between two DBs.
CQRS is a pattern which dictates that the responsibility of Command and Query operations be seperated.
Now there are multiple ways you can sychronize the data between databases. You can use Master-Slave Configuration or Oplog Replication Mechanism or something very much specific to the database.
But what's more important here is to decide what strategy to use. Since, you are using CQRS pattern now you have more than one data store (write store, read store) and there are fair chances that these data stores are network partitioned. In which case you would have to decide what really matters to you the most Consistency or Availabililty, which is generally goverened by what businesses require.
So in general, what replication strategy is to be used depends on whether your businesses require Consistency or Availabililty.
References:
CAP Theroem: https://en.wikipedia.org/wiki/CAP_theorem
Replication (Driven by CAP Theorem): https://www.brianstorti.com/replication/
There are a couple of options for database syncing in SQL server.
1. SQL Server Always on Feature (SQL 2012 Onwards) - By using this feature, You need to make a primary and secondary replica (could be multiple secondry replica), Once Always On feature is configured, the Second replicas automatically updated based on Primary replica updates. This also provides HADR feature, if the primary replica goes down, the secondary replica will be active and play primary replica role.
https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/overview-of-always-on-availability-groups-sql-server?view=sql-server-2017
2. SQL Server Replication - Merge replication, Transaction replication etc.
https://learn.microsoft.com/en-us/sql/relational-databases/replication/types-of-replication?view=sql-server-2017

Multiple Databases Vs Single Database with logically partitioned data [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am pondering over a database design issue. Any help would be highly appreciated.
We are designing an application which has 20 tables (which may grow to about 30 maximum during new feature development)
The technology stack
MVC4,.NET 4.X, Entity Framework 5, SQL Server 2012, ASP.NET membership framework
No of users
We intend to cater to about 1000 clients who would have on average 20 users.
The Question
Should we design the database and the application in such a way that the tables are logically partitioned, i.e all clients use the same tables with a partition guid to separate the data.
OR
Go for multiple databases which could prove to be difficult during new feature launch and bug fixing. BUT could potentially allow for scaling?
Caveats: one of the tables has a binary column which stores files (maximum 5MB per record)
In addition to this we need to consider the Membership framework tables, which we will be extending to another custom table and logically mapping users to a partition guid.
You'll wish you had used separate databases:
If you ever want to grant permissions to the databases themselves to clients or superusers.
If you ever want to restore just one client's database without affecting the data of the others.
If there are regulatory concerns governing your data and data breaches, and you belatedly discover that these regulations can only be met by having separate databases. (Update: a little over 4 years after the writing of this answer, GDPR went into effect)
If you ever want to easily move your customer data to multiple database servers or otherwise scale out, or move larger/more important customers to different hardware. In a different part of the world.
If you ever want to easily archive and decommission old customer data.
If your customers care about their data being siloed, and they find out that you did otherwise.
If your data is subpoenaed and it's hard to extract just one customer's data, or the subpoena is overly broad and you have to produce the entire database instead of just the data for the one client.
When you forget to maintain vigilance and just one query slips through that didn't include AND CustomerID = #CustomerID. Hint: use a scripted permissions tool, or schemas, or wrap all tables with views that include WHERE CustomerID = SomeUserReturningFunction(), or some combination of these.
When you get permissions wrong at the application level and customer data is exposed to the wrong customer.
When you want to have different levels of backup and recovery protection for different clients.
Once you realize that building an infrastructure to create, provision, configure, deploy, and otherwise spin up/down new databases is worth the investment because it forces you to get good at it.
When you didn't allow for the possibility of some class of people needing access to multiple customers' data, and you need a layer of abstraction on top of Customer because WHERE CustomerID = #CustomerID won't cut it now.
When hackers target your sites or systems, and you made it easy for them to get all the data of all your customers in one fell swoop after getting admin credentials in just one database.
When your database backup takes 5 hours to run and then fails.
When you have to get the Enterprise edition of your DBMS so you can make compressed backups so that copying the backup file over the network takes less than 5 hours more.
When you have to restore the entire database every day to a test server which takes 5 hours, and run validation scripts that take 2 hours to complete.
When only a few of your customers need replication and you have to apply it to all of your customers instead of just those few.
When you want to take on a government customer and find out that they require you to use a separate server and database, but your ecosystem was built around a single server and database and it's just too hard or will take too long to change.
You'll be glad you used separate databases:
When a pilot rollout to one customer completely explodes and the other 999 customers are completely unaffected. And you can restore from backup to fix the problem.
When one of your database backups fails and you can fix just that one in 25 minutes instead of starting the entire 10-hour process over again.
You'll wish you had used a single database:
When you discover a bug that affects all 1000 clients and deploying the fix to 1000 databases is hard.
When you get permissions wrong at the database level and customer data is exposed to the wrong customer.
When you didn't allow for the possibility of some class of people needing access to a subset of all the databases (perhaps two customers merge).
When you didn't think how hard it would be to merge two different databases of data.
When you've merged two different databases of data and realize one was the wrong one, and you didn't plan for recovering from this scenario.
When you try to grow past 32,767 customers/databases on a single server and find out that this is the maximum in SQL Server 2012.
When you realize that managing 1,000+ databases is a bigger nightmare than you ever imagined.
When you realize that you can't onboard a new customer just by adding some data in a table, and you have to run a bunch of scary and complicated scripts to create, populate, and set permissions on a new database.
When you have to run 1000 database backups every day, make sure they all succeed, copy them over the network, restore them all to a test database, and run validation scripts on each single one, reporting any failures in a way that will guaranteed to be seen, and which are easily and quickly actionable. And then 150 of these fail in various places and have to be fixed one at a time.
When you find out you have to set up replication for 1000 databases.
Just because I listed more reasons for one doesn't mean it is better.
Some readers may get value from MSDN: Multi-Tenant Data Architecture. Or perhaps SaaS Tenancy App Design Patterns. Or even Developing Multi-tenant Applications for the Cloud, 3rd Edition
If you are refering your architecural as "multi tenant", Microsoft has a good article which is worth to read here. It shows some comparison between "isolated" (multiple db) and "shared" (single db). Generally, shared wins when the # of tenant (client) is big, but when the size of each tenant is big, an isolated approach is recommended.
Those consideration however can only be calculated by experienced developers though.
Still if you managed to use isolated (multiple db) architecture, you still won't get direct benefit in performance when they are still run at same instance. And if you use shared (single db) architecture, consider using int instead of guid, or sequential guid if you still need to use it.

Database geo-replication with low latency [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a website which I am currently hosting on a single server in Europe. To improve the latency for non-European users I would like to add local servers in the US and Asia.
Keeping static files in sync is no problem. We add new content only once a day, so a simple rsync cron job will do fine to keep files updated.
What I am totally stuck on is how to handle this on the database side? I would prefer a single master database that holds all user information, so that if a local server ever goes offline we always have the user data in the main server (regardless of backups).
So far we consider 2 options:
A database with Geo Replication support
A database that supports geo replication out of the box. Should be very easy to setup and should have very low latency for DB writes (ie. without having to wait for a 'write success' message on the master server).
Programmatic approach with a master and a local database
A user is visiting from one region at the same time, so we could cook something up that connects to both the master and the local database. At first login all user information would be pulled from the master database and cached in the local database. All data generated by the user from then on, could be stored in the local database and synched back to the master database in the background. Could work, but seems overly complex and hard to fix if something goes out of sync?
A little more background information on the database
our database does a lot of reads and few writes
database performance is not an issue at all. So we are only looking to improve the user experience (lower latency)
a user does not generate much data (10kb in general, 200kb at maximum)
we are not a bank or stock exchange, if some user data is synched back to the master server a minute or even a few minutes later it's not a big problem.
Our questions
is there a name that describes this specific problem? (so I can Google better)
is there a database that does geo replication out of the box without latency penalty? (Couchbase perhaps?)
would the programmatic approach be doable, or will it be a world of pain?
I would be very thankful for any insights, or perhaps a link to an article that covers something like this. I'm sure there are more small scale websites out there which have run into this problem.

Distributed Database Communication

I am a junior software engineer,I like this site and people in it.So,I want to ask my problems to see your answers and suggestions.Then,please don't mad at me for asking without searching.
My problem is about Distributed database communication(MSSQL Server 2008).In picture,I need a Main Server in center of star topology and other small servers that holding less data than center.Small servers job is normal web service with small database changes.On the other hand,Main Server should communicate with others periodicly(Once in hour,or twice in a day) and gather distrubited data changes in other small databases.
According to this plan,Main server is trustable,secured and backed up.Here is my question:
I plan to communicate on web services level.Main server should have a methods for controlling and checking databases.Is there any tools for this usage?I am looking forward to your suggestions,visions.
my kind regards and thanks in advance.
Take a look at SQL Server Replication. From the description you have provided, it sounds as though a solution using remote Subscribers that utilise Push updates could provide the functionality you require.
In the first instance, and in the interest of broadening your knowledge, I suggest you familiarise yourself with the varying flavours of solution that are available to you through SQL Server Replication technology and study their corresponding architectures.

Caching to a local SQL instance on a web server

I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.
I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.
Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.
Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.
It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.
Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).

Resources