NServiceBus Database Backup - database

I am looking to set up a system that consist of various autonomous services communicating via NServiceBus. This system will be deployed in various configurations (some services may be excluded, services will be setup differently) at client locations. These locations will often be large warehouses and will not have a massive IT infrastructure. The services may all sit on one machine, or on many different ones; this may be different for every site. Most sites will have one active DB server (usually SQL server) and a back-up server. Sites will make scheduled back-ups of the DB at various intervals - let's say daily.
Each service has it's own data store (this could be a truly separate database or segregated tables in a shared schema). Each service of course also has it's own message queues. Although the services are autonomous they do locally store (referentially in their own DB) information from other services as a type of local read-only cache, this data is derived from received messages.
Here is the question: How do I make a meaningful (i.e. consistent and restorable) back-up of this system?
I have read the following related answer by Udi Dahan on this subject: http://tech.groups.yahoo.com/group/nservicebus/message/12815
My problem with this answer is this: There is no data center, no SAN; no snapshots. There are "normal" sysadmins that are used to backing up DBs on-site and/or off-site.

If your client sites are not going to invest in some kind of infrastructure to support some sort of fail safe for MSMQ, then you may be able to leverage the Audit/Gateway feature of NSB. With this turned on you could have the client messages audited and stored over to some infrastructure that you manage. You would have to work out the details on how to restore the messages, but at least you would have them offsite somewhere.

Related

limitations of server less integration tools (such as "Knime Analytics Platform") for working with databases

I'm searching for a proper data integration tool that can integrates multiple data source types, such as SQL Server and Oracle databases, CSVs, Excels, ... and can manages and organizes data access to end users.
Databases are hosted on different instances on different servers and servers are located in a local network and files in a shared space. I need to integrate data (join, union, ...) on data tables on different databases (on the same or different instances) without transferring to user's machine (Due to the high volume of data in database tables) unless to view, saving locally or other local processes.
I already have experience with "Knime Analytics Platform" that is free as I want. Now, when I referred to it again, I realized that this software does not have the features I want (i.e executing queries on different databases without loading to my machine). For example, on joining two tables from two different databases, shows this warning "different database connections not supported". Some solutions to this problem include:
Create a view/stored procedure/function in one of DB, reading both sources.
Set a linked server for second DB.
Read necessary data from both DB separately to KNIME and make join in KNIME.
The first and second solutions require the end user to engage with the details of infrastructure, which is not possible. The third solution has performance issues. I guess that "Knime Server" can do such operations with proper performance and without engaging end users with such technical details. So the questions are:
Does "Knime Server" resolves the mentioned limitations of "Knime Analytics Platform"?
Does all desktop integration tools (without a server module) have mentioned limitations?
Is there any free integration tool that does not have mentioned limitations? If no, can I build such system with integrating required free modules?
Thanks a lot

Database Architecture, Central and/vs Localized Server

The system in question is for a company with multiple locations. Unreliable internet speeds/availability at some locations have led to the path of a local server at each location off of which a location and a central server.
The role of the local server is for each location to be able to run no matter if it is connected to the outside world or not, or to eliminate high latency if the the connection speed is less than optimal.
The role of the central server is two-fold:
Configuration, policy, user, etc, management. For example, new products, price changes, promotions, user changes, etc, are done on the central server and then distributed to the local servers so they have the most up to date info.
Centralize all data created at each location to run reports, analytics and warehouse data.
The question of how much data to keep on the local server is debatable. For example some processes are dependent upon not just that one location, like customer loyalty, so a query must be run to the central server to check user activity and determine incentives. On the other hand, active customer base should be within the scope of the local servers data.
I lack experience in these types of distributed systems. My question is what database should we use that will facilitate this type of setup, hopefully incorporating the functionality to work automatically without much coding needed to achieve the data syncs to/from central server.
Master-Slave Replication:
In this type of replication one server (the master) accepts writes and will replicate the changes to read replicas(slaves)
Characteristics
Asynchronous
Read Scalability
Master is a point of failure for all the nodes (SPOF)
Master-Master:
In this setup all the database servers accepts read and writes and synchronize together.
Characteristics
Synchronous(hopefully)
Read and Write Scalability
Performance is worse than Master-Slave
No SPOF
Master-Master is harder to setup and maintain. Possibility of id collisions.
Any Popular Database Server these days supports the features above.

Database synchronization

Recently my clients have asked me if they can use they’re application remotely, disconnected from the local network and the company server.
One solution is to place the database in the cloud, but a connection to the database, and the cloud and an internet connection must be always available.
There not always the case.
So my question is - Is there any database sync system, or a synchronization library so that I can work disconnected with local database and when I connect synchronize the changes I have made and receive changes others have made?
Update:
The application is under Windows (7/xp) ( for now )
It's in Delphi 2007 win32
All client need to have Read/Write access
All Clients have internet connection, but not always ON
Security is not critical, but the Sync service should encrypt the communication
When in the presence of the companies network the system should sync and use the Server Database and not the local one.
You have a host of issues with thinking about such a solution. First, there are lots of possible solutions, such as:
Using database replication within a database, to mimic every update (like a "hot" backup)
Building an application to copy the database periodically (every night)
Using a third-party tool (which is what you are asking, I think)
With replication services, the connection does not have to always be up. Changes to the database are logged when the connection is not available and then applied when they can be sent.
However, there are lots of other issues when you leave a corporate network. What about security of the data and access rights? Do you have other options, such as making it easier to access the database from within the network? Do the users need only read-access to the database or read-write access? Would both versions need to be accessed at the same time. Would there be updates to both at the same time?
You may have other options that are more secure than just moving a database to the cloud.
I believe RemObjects DataAbstract allows offline mode and synchronization by using what they call Briefcases. All your other requirements (security, encrypted connections, etc.) are also covered.
This is not a drop-in replacement, thought, and may need extensive rewrite/refactoring of your application. There are lots of upsides, thought; business rules can/should be enforced on the server (real security), scriptable business rules, multiplatform architecture, etc.
There are some products available in the Java world (SymmetricDS lgpl license) - apart from actually being a working system it is documents how it achieved synchronization. Connects to any db with jdbc support. . There is a pro version but the user guide (downloadable pdf) gives you the db schema plus rules on push pull syncing. Useful if you want to build your own.
Btw there is a data replication so tag that would help.
One possibility that is free is the Microsoft Sync Framework: http://msdn.microsoft.com/en-us/sync/bb736753.aspx
It may be possible for you to use it, but you would need to provid some more detail about your application and operating environment to be sure.
IS it possible to share a database like .mdb and work fine? i try but sometimes the file where the databse is changes from DB to DB1 i use delphi Xe4 and Google Drive .
Thank´s

SQL Server Replication, Distributor

I need to implement a SQL Server replication solution. Very simple need for now. I just need to replicate one pretty simple table from 200 remote sites or so to one central server. The data is not really transactional in nature. I just need it moved up to the central server once a day. I can't decide if I should use push or pull, and I'm not sure if the distributor should live on the server side, or on all the clients.
The server and all the remote sites all live on a fairly decent VPN. The server is 2005, and it's not being pushed very hard at the moment. Just a few jobs here and there collecting data (which I want to get away from) and pushing reports/exports to various vendors once a day. The sites are a mix of 2000/2005.
I'd recommend you do some scalability tests first. Replication is very verbose in terms of agent jobs and T-SQL connections for reading and writing data. 200 publications you're talking 200 publisher agents, 200 subscription agents, plus the distributor maintenance. Most sites complain about maintenance problems of having 1 publisher and 1 subscriber... Say you manage to pull this off and operate it successfully, what is going to be your upgrade story? And how are you going to implement a schema change?
The largest replication deployment I heard of (some years ago) had I believe 450 publishers and was implemented by an army of Microsoft field consultants sweating for months to bend the behemoth into shape. Your 200 replication sites project is way more ambitious than you realize.
I suggest you explore some alternatives too. If you need a periodic table snapshot then SSIS can be a good match. If you need a continuous stream of changes then Service Broker can scale way way easier than replication.
If there is need to adjust the replication down the road, having the central server initiate a pull will be much easier to administrate than adjusting 200 sites to accomplish the same thing. Also, that would naturally manage the load, rather than some scheme to prevent, say, 100 remote sites all connecting at once.
Push subscriptions are the way to go here if you wish to centrally manage the data distribution of your application platform.
From what you have described you will need to make a choice between Snapshot Replication and Transactional Replication for your architecture.
Dependent on how much data you are looking to push and also the schedule of your updates will determine the most appropriate Replication Method for you to use. For example, if you looking to update all Subscriptions at the same time then dependent on how much data you need to push Snapshot Replication may not be suitable and you may be better off using Transactional Replication, perhaps pushed at specific determined intervals. Your network may even be able to support near real-time replication however conducting a small test of your environment will determine this for you. For example, setup the Publisher, local Distributor and a handful of Subscribers at geographically different locations on your network in order to test network transfer times and Replication Latency.
Things to consider:
How much data is to be moved across
the network? Size in Kb and record
volume.
Consider the physical location of
your sites
What is the suitability of your
network? Seed, capacity etc.
You may wish to consider using a
dedicated Distributor.

Caching to a local SQL instance on a web server

I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.
I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.
Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.
Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.
It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.
Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).

Resources