The ways of Database Integration in SOA/ESB? - database

Situation: Some Bank has an old legacy ABS (Automatic bank system).
Bank wants to:
notify old legacy CRM system about client's account changes (Publish operation).
check PIN codes of client cards (Request/Response operation) - in synchronious mode.
ABS is implemented in very old private technologies with StoredProcedures calls. So, I can connect to this system via database only.
Which ways of Java/.Net (ESB) application integration with old/legacy database system do you know?
Write/Publish operation
Any vendor's databse server:
Scan tables for new entries - too low speed.
Trigger (if they're supported) which handles SQL updates and inserts and writes event information to some table. And application listener should be checking this table for events.
Oracle serevr : PL/SQL TRIGGERS + Oracle AQ. And listener for JMS.
Reading operation
Just write result into tables of ABS - dangerous.
...
How to notify legacy database system about responses in synchronious mode??? How to implement Write/Read in synchronious mode???
Again, which ways of Java/.Net (ESB) application integration with old/legacy database system do you know?

Lot of vendors hype about DataServices. I think the most value of these products is when integrating different datasources.
I would consider making a simple "application" that exposes this data as a service

It depends on many factors; particularly read/write throughput and performance sensitivity of the database.
Databases tend to be kinda sensitive things and are often very fragile to general purpose access from arbitrary other systems when they are finely tuned for production use in a specific system; so often folks replicate the database to another read-only slave database that can be then used for doing integration work & querying and so forth.
You can then use triggers/polling/JMS based on whatever you need without impacting the original database.
Depending on the database replication technology used; you can then often install triggers in the replica database (which can afford to get a little behind from the master from time to time) - to minimise impact in the production database

I can propose you to use Mule as ESB in your bank (see also http://www.mulesource.org/display/MULE/Home).
It allows you to communicate to database directly (jdbc level which has to be OK with stored procedures as well as tables/views level). I have positive experience with it for integration core banking system (database level, Oracle) with standalone application (web services level).
Frankly, I din't got all your questions (your can ask me in Russian directly if you are prefere),but IMO Mule is your way - it can consume JMS, JDBC, file level and many others and process syncronouse and asyncornouse events as well (see also http://www.mulesource.org/display/MULE2USER/Available+Transports).
Reagrds.
P.S. To be more clear for English speaking audience, I can propose you use more standard term core banking system instead of ABS (which means the same in xUSSR countries).

Related

Database Architecture, Central and/vs Localized Server

The system in question is for a company with multiple locations. Unreliable internet speeds/availability at some locations have led to the path of a local server at each location off of which a location and a central server.
The role of the local server is for each location to be able to run no matter if it is connected to the outside world or not, or to eliminate high latency if the the connection speed is less than optimal.
The role of the central server is two-fold:
Configuration, policy, user, etc, management. For example, new products, price changes, promotions, user changes, etc, are done on the central server and then distributed to the local servers so they have the most up to date info.
Centralize all data created at each location to run reports, analytics and warehouse data.
The question of how much data to keep on the local server is debatable. For example some processes are dependent upon not just that one location, like customer loyalty, so a query must be run to the central server to check user activity and determine incentives. On the other hand, active customer base should be within the scope of the local servers data.
I lack experience in these types of distributed systems. My question is what database should we use that will facilitate this type of setup, hopefully incorporating the functionality to work automatically without much coding needed to achieve the data syncs to/from central server.
Master-Slave Replication:
In this type of replication one server (the master) accepts writes and will replicate the changes to read replicas(slaves)
Characteristics
Asynchronous
Read Scalability
Master is a point of failure for all the nodes (SPOF)
Master-Master:
In this setup all the database servers accepts read and writes and synchronize together.
Characteristics
Synchronous(hopefully)
Read and Write Scalability
Performance is worse than Master-Slave
No SPOF
Master-Master is harder to setup and maintain. Possibility of id collisions.
Any Popular Database Server these days supports the features above.

What alternatives do I have if I want a distributed multi-master database?

I will build a system where I want to reduce single-point-of-failures, and I need a database. Is there any (free) relational database systems that can handle multi-master setups good (i.e where it is easy to add and remove nodes) or is it better to go with a NoSQL-database?
As what I have understood, a key-value store will handle this better. What database system do you recommend for a multi-master (cluster) setup?
Mysql's NDB Cluster WILL do this. But it's far from easy to set up and has a lot of gotchas.
And also, its performance is generally fairly sucky and it keeps data in memory (yes, I know they sound contradictory).
Essentially, updates need to acquire distributed locks throughout the cluster (or at least in the storage node group where those table(s) are held)
It is not easy to manage, but you can do some level of hot-add.
Unless you require very rapid failover and consistency, I'd recommend against it.
I'd recommend ignoring multi-master, and using a HA MySQL instead (with e.g. InnoDB) which is easy to set up and works very well with typical sub 30-second failover times. This is a master-slave system where the slave cannot even do reads (but you can add read slaves with replication provided you don't need them to be completely up to date)
Key-value stores are not necessarily fault tolerant. They are primarily performance tools. Only when data is stored on more than one server is there any form of fault tolerance. If it is just safety, reducing single point of failure the simplest solution is probably set up a mirroring solution, where you have a mirror that just tracks the master database. When the master somehow fails, you quickly switch over (hopefully automatically).
The complexity of this is much lower as there is no consistency management needed during normal operation. The mirror is read-only and just tracks the master database. When the master fails, the mirror is switched to master and the link broken. After the master gets back up the state between them is inconsistent and you must make sure to update the original master from the mirror now acting as master. Most database systems can handle this scenario, and if you have no insane uptime requirements or a very heavy load it is the most pragmatic solution.
I think Oracle has nailed this concept. However, if you're a mortal without a swiss bank account, then maybe you should look into MySQL's NDB Cluster.

Caching to a local SQL instance on a web server

I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.
I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.
Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.
Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.
It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.
Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).

Swapping out databases?

It seems like the goal of a lot of ORM tools and custom data access layers (DAO pattern, etc.) is to abstract the database to the point where you could supposedly swap out the entire database system with minimal work.
Following the common DAL patterns is usually a good idea in code, but it seems like it would never be minimal work to swap out a database. (Cost, training, data migration, etc.)
Does anyone have any experience with swapping out one database for another in a large system, and dealing with the implications in code? Is it worth it to worry about abstracting the actual database from your code?
Question 1: Does anyone have any experience with
swapping out one database for another
in a large system, and dealing with
the implications in code?
Yes we tried it. Our customer is using a large MS Access based Delphi client server application. After about five years we considered switching to SQL Server. We analyzed the problem and concluded that swapping the database would be very costly and provide only a few advantages. Customer decided not to swap the database. The application is still running fine and the customer is still happy.
Note that:
MS Access is only being used for data storage and report generation.
The server application ensures that MS Access is only being accessed on the server. Normal multi-user MS Access applications will transfer large chunks of the Access database over the network - resulting in slow and unreliable database functionality. This is not the case for this application. Client <> Server <> MS Access. Only the server application communicates with the MS Access database. Actually the Server has exclusive access to the MS Access database. No other computer can open to the MS Access database. Conclusion: MS Access is being used as a true RDBMS, Relational DataBase Management System - please no flaming about MS Access being inferior and unstable - it has been running fine for more than 10 years.
The most important issues you will have to consider:
SQL statements: (SELECT, UPDATE, DELETE, INSERT, CREATE TABLE) and make sure they would be compatible with the SQL database. It's amazing how much all the RDBMS differ in the details (date formats, number formats, search formats, string formats, join syntax, create table syntax, stored procedures, user defined functions, (auto) primary keys, etc.)
Report generation: Depending on your database you might be using a different reporting tool. Our customer has over 200 complex reports. Converting all these reports is very time consuming.
Performance: all RDBMS have different performances in different environments. Normally performance optimalisations are very much RDBMS dependent.
Costs: the costs of tools, developers, server and user licenses varies greatly. It ranges from free to very expensive. Free does not mean cheap and expensive does not always equate to good. A cost/value comparison will have to be made.
Experience: making the best use of your RDBMS requires experience. If you have to develop for an "unknown" RDBMS your productivity will suffer.
Question 2: Is it worth it to worry about
abstracting the actual database from
your code?
Yes. In an ideal world, swapping a database would just be adjusting the data connection string. In the real world this is not possible because all databases are different. They all have tables and SQL support but the differences are in the details. If you can keep the differences of the databases shielded through abstraction - please do so. Make a list of the databases you need to support. Check the selected database systems for the differences. Provide centralized code to handle the differences. Support one RDBMS and provide stubs for future support of other RDBMS.
I disagree that the purpose is to be able to swap out databases, and I think you are correct in showing some suspicion about ORMs leading towards that goal.
However, I would still use an ORM, as it abstracts away the details of data access. Isn't this the goal of object oriented programming? Keep your concerns separated.
I think the primary use case for database abstraction (via ORM tools) is to be able to ship a product that works with multiple database brands. I believe it's a rarer occurrence for a company to switch between database vendors, but that's still one of the use cases.
I've worked jobs where we started out using MySQL for monetary reasons (think a startup) and, one we started making money, wanted to switch to Oracle. We didn't end up making the switch, but it was nice to have the option.
Still, ORM tools are not a completely leak-less abstractions and I know our migration still would have been painful and costly. It totally depends on what you are building, but it has been my experience that -- for performance reasons, usually -- you end up either working around your ORM solution or exploiting vendor-specific features at some point.
The only time I've seen a database switch was from HSQL during early development to Oracle as the project progressed. The ORM made this easy.
I often use the DAO pattern to swap out data services (from a database to web service or to swap a web service to a test stub).
For ORM I don't think the goal is to enable you to switch databases - it is to hide you from the complexities of different database implementations and removing the need to worry about the fine details of translating from relational to object represenations of your data.
By having someone smart write an ORM that handles caching, only updates fields that have changed, groups updates, etc I don't need to. Although in the cases where I need something special I can still revert to SQL if I want.

Implementing functionality/code directly in database system

RDBMS packages today offer a tremendous amount of functionality beyond standard data storage and retrieval. SQL Server for example can send emails, expose web service methods, and execute CLR code amongst other capabilities. However, I have always tried to limit the amount of processing my database server does to just data storage and retrieval as much as possible, for the following reasons:
A database server is harder to scale than web servers
In a lot of projects I've worked on, the DB server is a lot busier than the web servers, and thus has less spare capacity
It potentially exposes your database server to a security attack (web services for example)
My question is, how do you decide how much functionality or code should be implemented directly on your database server versus other servers in your architecture? What recommendations do you have for people starting new projects?
I know Microsoft SQL Server and Oracle really push using stored procedures for everything, which helps to encapsulate the relational architecture and creates a more procedural interface for the software developers, who typically aren't as facile writing SQL queries.
But then half your application logic is written in PL/SQL (or T-SQL or whatever) and the other half is written in your application language, Java or PHP or C#, etc. The DBA is typically responsible for coding the procedures, and the developers are responsible for everything else. No one has visibility and access to the full application logic. This tends to slow down development, testing, and future revisions to the project.
Also software development tools tend to be poor for stored procedures. Tools and best practices for debugging, source control, and testing all seem to be about 10-15 years behind the state of the art for application languages.
So I tend to stay away from stored procedures and triggers if at all possible. Except in certain cases when a well-placed stored procedure can make a complex SQL operation happen entirely in the server instead of shuffling data back and forth. This can be very effective at eliminating performance bottlenecks.
It's possible to go too far in the other direction as well. People who prefer the application manage data versus metadata, and employ designs like Entity-Attribute-Value or Polymorphic Associations, get themselves into trouble. Let the database manage that. Use referential integrity constraints (foreign keys). Use transactions.
The vendors have one set of best practices. You, however, voice concerns with that.
Years ago I supported a Major Software Product. Major.
They said "The database is relational storage. Nothing more." Every user conference people would ask about stored procedure, triggers, and all that malarkey.
Their architect was firm. As soon as you get away from plain-old-SQL, you've got a support and maintenance nightmare. They did object-relational mapping from the DB into their product, and everything else was in their product.
This scales well. Multiple application servers can easily share a single database server.

Resources