Observer pattern in Oracle - database

Can I set hook on changing or adding some rows in table and get notified someway when such event araised? I discover web and only stuck with pipes. But there is no way to get pipe message immediately when it was send. Only periodical tries to receive.

Implementing an Observer pattern from a database should generally be avoided.
Why? It relies on vendor proprietary (non-standard) technology, promotes database vendor lock-in and support risk, and causes a bit of bloat. From an enterprise perspective, if not done in a controlled way, it can look like "skunkworks" - implementing in an unusual way behaviour commonly covered by application and integration patterns and tools. If implemented at a fine-grained level, it can result in tight-coupling to tiny data changes with a huge amount of unpredicted communication and processing, affecting performance. An extra cog in the machine can be an extra breaking point - it might be sensitive to O/S, network, and security configuration or there may be security vulnerabilities in vendor technology.
If you're observing transactional data managed by your app:
implement the Observer pattern in your app. E.g. In Java, CDI and javabeans specs support this directly, and OO custom design as per Gang Of Four book is a perfect solution.
optionally send messages to other apps. Filters/interceptors, MDB messages, CDI events and web services are also useful for notification.
If users are directly modifying master data within the database, then either:
provide a singular admin page within your app to control master data refresh OR
provide a separate master data management app and send messages to dependent apps OR
(best approach) manage master data edits in terms of quality (reviews, testing, etc) and timing (treat same as code change), promote through environments, deploy and refresh data / restart app to a managed shedule
If you're observing transactional data managed by another app (shared database integration) OR you use data-level integration such as ETL to provide your application with data:
try to have data entities written by just one app (read-only by others)
poll staging/ETL control table to understand what/when changes occured OR
use JDBC/ODBC-level proprietary extension for notification or polling, as well mentioned in Alex Poole's answer OR
refactor overlapping data operations from 2 apps into a shared SOA service can either avoid the observation requirement or lift it from a data operation to a higher level SOA/app message
use an ESB or a database adapter to invoke your application for notification or a WS endpoint for bulk data transfer (e.g. Apache Camel, Apache ServiceMix, Mule ESB, Openadaptor)
avoid use of database extension infrastructure such as pipes or advanced queuing
If you use messaging (send or recieve), do so from your application(s). Messaging from the DB is a bit of an antipattern. As a last resort, it is possible to use triggers which invoke web services (http://www.oracle.com/technetwork/developer-tools/jdev/dbcalloutws-howto-084195.html), but great care is required to do this in a very coarse fashion, invoking a business (sub)-process when a set of data changes, rather than crunching fine-grained CRUD type operations. Best to trigger a job and have the job call the web service outside the transaction.

In addition to the other answers, you can look at database change notification. If your application is Java-based there is specific documentation covering JDBC, and similar for .NET here and here; and there's another article here.
You can also look at continuous query notification, which can be used from OCI.
I know link-only answers aren't good but I don't have the experience to write anything up (I have to confess I haven't used either, but I've been meaning to look into DCN for a while now...) and this is too long for a comment *8-)

Within the database itself triggers are what you need. You can run arbitrary PL/SQL when data is inserted, deleted, updated, or any combination thereof.
If you need to have the event propagate outside the database you would need a way to call out to your external application from your PL/SQL trigger. Some possible options are:
DBMS_PIPES - Pipes in Oracle are similar to Unix pipes. One session can write and a separate session can read to transfer information. Also, they are not transactional so you get the message immediately. One drawback is that the API is poll based so I suggest option #2.
Java - PL/SQL can invoke arbitrary Java (assuming you load your class into your database). This opens the door to do just about any type of messaging you'd like including using JMS to push messages to a message queue. Depending on how you implement this you can even have it being transactionally tied the INSERT/UPDATE/DELETE statement itself. The listening application would then just listen to the JMS queue and it wouldn't be tied to the DB publishing the event at all.

Depending on your requirements use triggers or auditing

Look at DBMS_ALERT, DBMS_PIPE or (preferably) AQ (Advanced queuing) it's Oracle's internal messaging system. Oracle's AQ has its own API, but also can treated like Java JMS provider.
There are also techniques like Stream or (XStream) but those are quite complex.

Related

Is it possible to produce Kafka messages containing serialized generic (unknown) C# objects and deserialize them by inferring type in the consumer?

EDIT: From the initial comments, it seems I may be barking up the wrong tree. I initially suggested using replication, but according to the networking team that is not possible because the two databases are on different virtual networks. It doesn't seem like this sort of thing would ever be "impossible", but I don't have the in-depth knowledge required to "fight back". Is there anything specific I should look into regarding SQL replication over the internet/different virtual networks, or points that I could bring up?
I've got an application that uses EF to save changes to an MS SQL database. For reasons, I need to be able to capture all changes made through the application and commit those exact DB changes to another, remote MS SQL db, using Kafka as the communication hub. The solution contains a ton of data entities, so ideally I'm looking for a generic way to capture these changes from the application and apply them to the remote DB where appropriate. The consuming application that is reading the messages from Kafka will also be applying the changes via EF (at least for my purposes).
So, is it possible to do something like this without having to strongly type everything for my hundreds of entities? What are the best practices for doing such a thing? Do I produce messages containing the new entity and the change state (added, updated, deleted)? I'm sure it's a bad idea for some reason, but what about serializing the entire EF ChangeManager and sending that, consolidating all of the Add/Update/Deletes and allowing us to make only a single call vs a bunch?
If I need to provide any more information please just let me know, I'll be monitoring this thread closely.

How to design/develop an integration layer or bus for different external services/apps

We are currently looking into replacing one of our apps with possibly an ESB or some similar tool and was looking for some insights into how best to approach this.
We currently have a stand alone service that consumes/interact with different external services and data sources, some delivered through SOAP Web Services and others we just use a DB connection. This service is exposed through SOAP and we have other apps that consume this service but are very tightly coupled to it, now we also have other apps that need to consume some of the external services and would like to replace this all together with an ESB or some sort of SOA platform.
What would be the best way to replace this 'external' services integration layer with an ESB? We were thinking of having a 'global' contract/API in which all of the services we consume are exposed as one single contract where all the possible operations and data structures that we use are exposed under one single namespace, would this be the best way of approaching this? and if so are there any tools that could help us automate this process or do we basically have to handcraft this contract/API?. This would also mean that for any changes to the underlying services/API's we will have to update this new API as well.
If not then the other option I see is to basically use the 'ESB' as a 'proxy' layer in which all of our sources are exposed as they are, so we would end up with several different 'contracts' / API endpoints, but I don't really see the value in this.
Also given the above what would be the best tool for the job? is a full blown ESB an overkill or are we much better rolling our own using something like Apache Camel or Spring Integration?.
A few more details:
We are currently integrating over 5 different external services with more to come in the future.
Only a couple of apps consuming our current app at the moment but several other apps/systems in the future will need to consume some of these external services.
We are currently using a single method of communication (SOAP) between these services but some apps might use pub/sub messaging in the future, although SOAP will still be the main protocol used.
I am new to ESB integration so I apologize in advance if I'm misunderstanding a lot of these technologies and the problems they are meant to solve.
Any help/tips/pointers will be greatly appreciated.
Thanks.
You need to put in some design thoughts of what you want to achieve over time.
There are multiple benefits and potential pitfalls with an ESB introduction.
Here are some typical benefits/use cases
When your applications are hard to change or have very different release cycles - then it's convenient to have an ESB in the middle that can adopt the changes quickly. This is very much the case when your organization buys a lot of COTS products and cloud services that might come with an update the next day that breaks the current API.
When you need to adapt data from one master data system to several other systems and they might not support the same interfaces, i.e. CRM system might want data imported via web services as soon as it's available, ERP want data through db/staging tables and production system wants data every weekend in a flat file delivered via FTP. To keep the master data system clean and easy to maintain, just implement one single integration service in the master data system, and adapt this interface to the various other applications within the ESB plattform instead.
Aggregation or splitting of data from various sources to protect your sensitive systems might be a use case. Say that you have an old system that can take a small updates of information at a time and it's not worth to upgrade this system - then an integration solution that can do aggreggation or splitting or throttling can be a good solution.
Other benefits and use cases include the ability to track and wire tap every message passing between systems - which can even be used together with business intelligence tools to gather KPI:s.
A conceptual ESB can also introduce a canonical message format that is used for all services that needs to communicate. If a lot of applications share the same data with several other applications (not only point to point) - then the benefits of a canonical message format can outweight the cost (which is/can be high). An ESB server might be useful to deal with canonical data as it is usually very good at mapping from one format to another.
However, introducing an ESB without a plan what benefits you are trying to achieve is not really a good thing, since it introduces overhead - you need another server to keep alive, you need perhaps another team to understand all data flows. You need particular knowledge with your integration product. Finally, you need to be able to have some governance around it so that your ESB initiative does not drift away from the goals/benefits you have foreseen.
You should choose some technology that you are comfortable with - or think you can be comfortable with. Apache Camel is indeed very powerful and my favorite integration engine - but it's not an ESB as it does not come with a runtime that you can use to deploy/manage/monitor your integration services with. You can use it together with most Java EE application servers or even better - Apache ServiceMix (= Karaf+Camel+ActiveMQ+CXF) which is built for this task.
The same goes with spring integration - you need to run it somewhere, app servers or what not.
There is a large set of different products, both open source and commercial that does these things.

Best practices to structure a database to be scaling-ready

I know this is a very generic and subjective question, so feel free to vote to close it if it does not meet the StackOverflow netiquette.. but for me, it's worth trying ;)
I've never built a high-traffic application since now, so I'm not aware (except for some reading on the web) about scaling practices.
How can I design a database that, when a scaling is needed, I dont have to refactor the database structure, or the application code?
I know that development (and optimization) should come step-by-step, optimize bottleneck as they happen, and is nearly impossible to design the perfect structure when you don't know how many users you'll have and how would they use the database (e.g. read/write ratio), I'm just looking for a good base to start.
What are the best practices for making a structure almost ready to be scaled with partitioning and sharding, and what hacks must be absolutely avoided?
Edit some detail about my application:
The application will run as a multisite behavior
I'll have a database for each application version (db_0_0_1, db_0_0_2, etc..)*
Every 'site' will have a schema inside a database* and a role that can access only his own schemas
Application code will be mostly PHP and few things (daemons and maintenance things) in Python
Web server will probably be Nginx and lighttpd or node.js as support for long-polling tasks (e.g. chat)
Caching will be done with memcached (plus apc for things strictly related to the php code, as it can be used outside php)
The question is really generic, but here are few tips:
Do not use any session variables (pg_backend_pid(), inet_client_addr()) or per-session control (SET ROLE, SET SESSION) in application code.
Do not use explicit transaction control (BEGIN/COMMIT/SET TRANSACTION) in application code. All such logic should be wrapped in UDFs. This enables stateless, statement-mode pooling which enables fastest possible DB pooling. (see pgbouncer docs, and pg wiki for more info)
Encapsulate all App<->Db communication in well defined DB API of UDFs - this will let you use PL/Proxy. If doing this with all SELECTs is too hard, do it at least for all data writes (INSERT/UPDATE/DELETE). Example: instead of INSERT INTO users(name) VALUES('Joe') you need SELECT create_user('Joe').
check your DB schema - is it easy to separate all data belonging to given user? (most probably this will be the partitioning key). All that's left is common, shared data which will need to be replicated to all nodes.
think of caching before you need it. what will be caching key? what will be cache timeout? will you use memcached?

Web services and database concurrency

I'm building a .NET client application (C#, WinForms) that uses a web service for interaction with the database. The client will be run from remote locations using a WAN or VPN, hence the idea of using a web service rather than direct database access.
The issue I'm grappling with right now is how to handle database concurrency. That is, if two people from different locations update the same data, how do I deal with it? I'm considering using timestamps on each database record and having that as part of the update where clauses, but that means that the timestamps have to move back and forth through the web service interface, which seems kind of ugly.
What is the best way to approach this?
I don't think you want your web service to talk directly to the database. You probably want your service to interact with some type of business components who in turn interact with a data access layer. Any concurrency exceptions can be passed from the DAL up to the business layer where they can be handled so that the web service never has to see the timestamps.
But if you are passing something like a data table up to the client and you want to avoid timestamps, you can do concurrency checking by comparing field by field. The Table Adapter wizards generate this type of concurrency checking by default if you ask for optimistic concurrency checking.
If your collisions occur infrequently enough that they can be resolved manually, a simple solution is to add an update trigger that copies a row's pre-update values to an audit table. This way the most recent write is the "winner", but no data is ever lost to an overwrite, and an administrator can restore an earlier row state or even combine them.
This technique has its downsides, and is not a very good solution where frequent overwrites are common.
Also, this is slightly off-topic, but using web services isn't necessarily the way to go just because the clients will be remoting into the network. ASP.NET web services are XML-based and very verbose. If your client application can count on being always connected, you'd be better off not using web services.

How do you keep two related, but separate, systems in sync with each other?

My current development project has two aspects to it. First, there is a public website where external users can submit and update information for various purposes. This information is then saved to a local SQL Server at the colo facility.
The second aspect is an internal application which employees use to manage those same records (conceptually) and provide status updates, approvals, etc. This application is hosted within the corporate firewall with its own local SQL Server database.
The two networks are connected by a hardware VPN solution, which is decent, but obviously not the speediest thing in the world.
The two databases are similar, and share many of the same tables, but they are not 100% the same. Many of the tables on both sides are very specific to either the internal or external application.
So the question is: when a user updates their information or submits a record on the public website, how do you transfer that data to the internal application's database so it can be managed by the internal staff? And vice versa... how do you push updates made by the staff back out to the website?
It is worth mentioning that the more "real time" these updates occur, the better. Not that it has to be instant, just reasonably quick.
So far, I have thought about using the following types of approaches:
Bi-directional replication
Web service interfaces on both sides with code to sync the changes as they are made (in real time).
Web service interfaces on both sides with code to asynchronously sync the changes (using a queueing mechanism).
Any advice? Has anyone run into this problem before? Did you come up with a solution that worked well for you?
This is a pretty common integration scenario, I believe. Personally, I think an asynchronous messaging solution using a queue is ideal.
You should be able to achieve near real time synchronization without the overhead or complexity of something like replication.
Synchronous web services are not ideal because your code will have to be very sophisticated to handle failure scenarios. What happens when one system is restarted while the other continues to publish changes? Does the sending system get timeouts? What does it do with those? Unless you are prepared to lose data, you'll want some sort of transactional queue (like MSMQ) to receive the change notices and take care of making sure they get to the other system. If either system is down, the changes (passed as messages) will just accumulate and as soon as a connection can be established the re-starting server will process all the queued messages and catch up, making system integrity much, much easier to achieve.
There are some open source tools that can really make this easy for you if you are using .NET (especially if you want to use MSMQ).
nServiceBus by Udi Dahan
Mass Transit by Dru Sellers and Chris Patterson
There are commercial products also, and if you are considering a commercial option see here for a list of of options on .NET. Of course, WCF can do async messaging using MSMQ bindings, but a tool like nServiceBus or MassTransit will give you a very simple Send/Receive or Pub/Sub API that will make your requirement a very straightforward job.
If you're using Java, there are any number of open source service bus implementations that will make this kind of bi-directional, asynchronous messaging a snap, like Mule or maybe just ActiveMQ.
You may also want to consider reading Udi Dahan's blog, listening to some of his podcasts. Here are some more good resources to get you started.
I'm mid-way through a similar project except I have multiple sites that need to keep in sync over slow connections (dial-up in some cases).
Firstly you need to track changes, if you can use SQL 2008 (even the Express version is enough if the 2Gb limit isn't a problem) this will ease the pain greatly, just turn on Change Tracking on the database and each table. We're using SQL Server 2008 at the head office with the extended schema and SQL Express 2008 at each site with a sub-set of data and limited schema.
Secondly you need to track your changes, Sync Services does the trick nicely and supports using a WCF gateway into the main database. In this example you will need to use the Sync using SQL Express Client sample as a starting point, note that it's based on SQL 2005 so you'll need to update it to take advantage of the Change Tracking features in 2008. By default the Sync Services uses SQL CE on the clients, which I'm sure isn't enough in your case. You'll need a service that runs on your Web Server that periodically (could be as often as every 10 seconds if you want) runs the Synchronize() method. This will tell your main database about changes made locally and then ask the server for all changes made there. You can set up the get and apply SQL code to call stored procedures and you can add event handlers to handle conflicts (e.g. Client Update vs Server Update) and resolve them accordingly at each end.
We have a shop as a client, with three stores connected to the same VPN
Two of the shops have a computer running as a "server" for that shop and the the third one has the "master database"
To synchronize all to the master we don't have the best solution, but it works: there is a dedicated PC running an application that checks the timestamp of every record in every table of the two stores and if it is different that the last time you synchronize, it copies the results
Note that this works both ways. I.e. if you update a product in the master database, this change will propagate to the other two shops. If you have a new order in one of the shops, it will be transmitted to the "master".
With some optimizations you can have all the shops synchronize in around 20minutes
Recently I have had a lot of success with SQL Server Service Broker which offers reliable, persisted asynchronous messaging out of the box with very little implementation pain.
It is quick to set up and as you learn more you can use some of the more advanced features.
Unknown to most, it is also part of the desktop editions so it can be used as a workstation messaging system
If you have existing T-SQL skills they can be leveraged as all the code to read and write messages is done in SQL
It is blindingly fast
It is a vastly under-hyped part of SQL Server and well worth a look.
I'd say just have a job that copies the data in the pub database input table into a private database pending table. Then once you update the data on the private side have it replicated to the public side. If you don't have any of the replicated data on the public side updated it should be a fairly easy transactional replication solution.

Resources