I am looking for some initial pointers on how to cluster a ServiceMix solution. Basically what I need is:
having 2 (or more) ServiceMix instances serving my routing needs and sharing the load
if one instance fails, other(s) continue to serve
if the failed one is brought back to life, it joins the party
Searching for information confuses me since
some references (eg. http://trenaman.blogspot.fi/2010/04/four-things-you-need-to-know-about-new.html) talk about "JBI cluster engine". I don't want to use JBI. Support for it is deprecated. Is there a separate "Non-JBI cluster engine" or what is going on...?
I see a lot of mentions about "DOSGi". Do I need to worry my simple head with all that if I want to achieve clustered ServiceMix?
My solution will probably have a few bundles that communicate with each other using JMS queues. Should I in that case just have 2 independent ServiceMix instances (who do not know of each other). Wouldn't that be the simplest option? I see some support for a failover configuration (http://servicemix.apache.org/docs/4.5.x/users-guide/failover.html) but what benefits would that really give (am I missing something)? Also this failover configuration does not help with load balancing since just one instance is serving requests.
From what it sounds like, all you need is two ServiceMix instances running side by side with no failover specifically configured. Failover is there if you want a cluster of instances, only one of which services requests.
Ignore the JBI stuff - it's legacy. Distributed OSGi is a red herring in the use case that you have described.
As boday suggests, Cellar is used to manage the installation of your bundles uniformly across a logical group of Karaf/ServiceMix instances, so you can manage them from one location as opposed to installing new versions on each instance by hand.
Fabric8 (http://fabric8.io/) can do Karaf/ServiceMix clustering and much more out of the box. It also have additional clustered Camel components such as the master and fabric endpoints
http://fabric8.io/gitbook/camelEndpointMaster.html
http://fabric8.io/gitbook/camelEndpointFabric.html
There is a clustered Camel example, that demonstrates that
https://github.com/fabric8io/fabric8/tree/master/fabric/fabric8-karaf/src/main/resources/distro/fabric/import/fabric/profiles/example/camel/cluster
The principle is illustrated in the image below:
take a look at Apache Cellar as its targeted at these use cases...
http://karaf.apache.org/index/subprojects/cellar.html
Related
Why I'd want to have multiple replicas of my DB?
Redundancy: I have > 1 replicas of my app code. Why? In case one node fails, another can fill its place when run behind a load balancer.
Load: A load balancer can distribute traffic to multiple instances of the app.
A/B testing. I can have one node serve one version of the app, and another serve a different one.
Maintenance. I can bring down one instance for maintenance, and keep the other one up with 0 down-time.
So, I assume I'd want to do the same with the backing db if possible too.
I realize that many nosql dbs are better configured for multiple instances, but I am interested in relational dbs.
I've played with operators like this and this but have found problems with the docs, have not been able to get them up and running and found the community a bit lacking. Relying on this kind of thing in production makes me nervous. The Mysql operator has a note even, saying it's not for production use.
I see that native k8s statefulsets have scaling but these docs aren't specific to dbs at all. I assume the complication is that dbs need to write persistently to disk via a volume and that data has to be synced and routed somehow if you have more than one instance.
So, is this something that's non-trivial to do myself? Or, am I better off having a dev environment that uses a one-replica db image in the cluster in order to save on billing, and a prod environment that uses a fully managed db, something like this that takes care of the scaling/HA for me? Then I'd use kustomize to manage the yaml variances.
Edit:
I actually found a postgres operator that worked great. Followed the docs one time through and it all worked, and it's from postgres docs.
I have created this community wiki answer to summarize the topic and to make pertinent information more visible.
As Turing85 well mentioned in the comment:
Do NOT share a pvc to multiple db instances. Even if you use the right backing volume (it must be an object-based storage in order to be read-write many), with enough scaling, performance will take a hit (after all, everything goes to one file system, this will stress the FS). The proper way would be to configure clustering. All major relational databases (mssql, mysql, postgres, oracle, ...) do support clustering. To be on the secure side, however, I would recommend to buy a scalable database "as a service" unless you know exactly what you are doing.
The good solution might be to use a single replica StatefulSet for development, to avoid billing and use a fully managed cloud based sql solution in prod. Unless you have the knowledge or a suffiiciently professional operator to deploy a clustered dbms.
Another solution may be to use a different operator as Aaron did:
I actually found a postgres operator that worked great. Followed the docs one time through and it all worked, and it's from postgres: https://www.kubegres.io/doc/getting-started.html
See also this similar question.
What about ReplicaSet_B and ReplicaSet_A update the same db? I hoped the pods in ReplicaSet_A were stopped with taking a snapshot. But there is not any explanation like this in https://kubernetes.io/docs/concepts/workloads/controllers/deployment/. I think, It is assumed that the containers are running online applications in the pods. What if they are batch applications? I mean the old pods belonging to old replicas will update the dbs in old manner. This will require also a data migration issue.
Yes. ReplicaSets (managed by Deployments) make two assumptions: 1. your workload is stateless, and 2. all pods are identical clones (other than their IP addresses). Now, StatefulSets address some aspects, for example, you can assign pods a certain identity (for example: leader or follower), but really only work for specific workloads. Also, the Jobs abstractions in Kubernetes won't really help you a lot concerning stateful workloads. What you likely are looking at is a custom controller or operator. We're collecting good practices and tooling via stateful.kubernetes.sh, maybe there's something there that can be of help?
Would Apache Gora fit when you have to build an application which writes/reads from a set of databases including SQLServer, MongoDB, HBase & Cassandra?
The idea is to develop an application which is capable of performing CRUD operations across databases? Request 1 goes to SQLServer, Request 2 goes to MongoDB and Request 3 goes to HBase and so on. The Request will have the information as to which database the application should hit and there is a finite list of databases.
Are there any alternatives?
Any pointers?
Let me know if any other information is required.
From your description I would say "yes", except accessing SQL Server (not supported).
Two things I can tell you as BIG tips to begin:
Create your datastores with this DataStoreFactory#createDataStore() method that allows to configure a different "gora.properties" content, and Configuration.
Remember that each gora-xxx-mapping.xml is shared between all the connections to a same backend.
Alternatives:
Kundera, maybe?
-- Edit from comments:
There is a gora-sql module but it had to be disabled years ago because of some license issues. If you look at the modules in the pom, you will see that gora-sql is not being compiled. No one took the staff to rebuild it :(
About point 2, it can exist Application1MongoDB and Application2MongoDB: If they are different applications, they can have a different gora-xxx-mapping.xml in each one's classpath.
If they are datastores instances from calls to #createDataStore() (in the same application), then all the mappings will have to go in the casspath's gora-xxx-mapping.xml. It is just a tip I advised that I found tricky.
More alternatives:
Hibertane OGM as told in the comments.
EclipseLink (although does not support much backends)
DataNucleus
We are currently looking into replacing one of our apps with possibly an ESB or some similar tool and was looking for some insights into how best to approach this.
We currently have a stand alone service that consumes/interact with different external services and data sources, some delivered through SOAP Web Services and others we just use a DB connection. This service is exposed through SOAP and we have other apps that consume this service but are very tightly coupled to it, now we also have other apps that need to consume some of the external services and would like to replace this all together with an ESB or some sort of SOA platform.
What would be the best way to replace this 'external' services integration layer with an ESB? We were thinking of having a 'global' contract/API in which all of the services we consume are exposed as one single contract where all the possible operations and data structures that we use are exposed under one single namespace, would this be the best way of approaching this? and if so are there any tools that could help us automate this process or do we basically have to handcraft this contract/API?. This would also mean that for any changes to the underlying services/API's we will have to update this new API as well.
If not then the other option I see is to basically use the 'ESB' as a 'proxy' layer in which all of our sources are exposed as they are, so we would end up with several different 'contracts' / API endpoints, but I don't really see the value in this.
Also given the above what would be the best tool for the job? is a full blown ESB an overkill or are we much better rolling our own using something like Apache Camel or Spring Integration?.
A few more details:
We are currently integrating over 5 different external services with more to come in the future.
Only a couple of apps consuming our current app at the moment but several other apps/systems in the future will need to consume some of these external services.
We are currently using a single method of communication (SOAP) between these services but some apps might use pub/sub messaging in the future, although SOAP will still be the main protocol used.
I am new to ESB integration so I apologize in advance if I'm misunderstanding a lot of these technologies and the problems they are meant to solve.
Any help/tips/pointers will be greatly appreciated.
Thanks.
You need to put in some design thoughts of what you want to achieve over time.
There are multiple benefits and potential pitfalls with an ESB introduction.
Here are some typical benefits/use cases
When your applications are hard to change or have very different release cycles - then it's convenient to have an ESB in the middle that can adopt the changes quickly. This is very much the case when your organization buys a lot of COTS products and cloud services that might come with an update the next day that breaks the current API.
When you need to adapt data from one master data system to several other systems and they might not support the same interfaces, i.e. CRM system might want data imported via web services as soon as it's available, ERP want data through db/staging tables and production system wants data every weekend in a flat file delivered via FTP. To keep the master data system clean and easy to maintain, just implement one single integration service in the master data system, and adapt this interface to the various other applications within the ESB plattform instead.
Aggregation or splitting of data from various sources to protect your sensitive systems might be a use case. Say that you have an old system that can take a small updates of information at a time and it's not worth to upgrade this system - then an integration solution that can do aggreggation or splitting or throttling can be a good solution.
Other benefits and use cases include the ability to track and wire tap every message passing between systems - which can even be used together with business intelligence tools to gather KPI:s.
A conceptual ESB can also introduce a canonical message format that is used for all services that needs to communicate. If a lot of applications share the same data with several other applications (not only point to point) - then the benefits of a canonical message format can outweight the cost (which is/can be high). An ESB server might be useful to deal with canonical data as it is usually very good at mapping from one format to another.
However, introducing an ESB without a plan what benefits you are trying to achieve is not really a good thing, since it introduces overhead - you need another server to keep alive, you need perhaps another team to understand all data flows. You need particular knowledge with your integration product. Finally, you need to be able to have some governance around it so that your ESB initiative does not drift away from the goals/benefits you have foreseen.
You should choose some technology that you are comfortable with - or think you can be comfortable with. Apache Camel is indeed very powerful and my favorite integration engine - but it's not an ESB as it does not come with a runtime that you can use to deploy/manage/monitor your integration services with. You can use it together with most Java EE application servers or even better - Apache ServiceMix (= Karaf+Camel+ActiveMQ+CXF) which is built for this task.
The same goes with spring integration - you need to run it somewhere, app servers or what not.
There is a large set of different products, both open source and commercial that does these things.
we have a few various applications that stores its data and we need one common service which provides access to these data.
With the applications I mean for example Atlassian Jira, Confluence, SVN, Git, LDAP, few internal mysql databases, etc. Some of them offers you SOAP API, REST API, various command line clients, for some you have to directly access database to get data.
What we want is a common REST API interface, to access all possible data sources. Of course, we have to solve authentication and authorization, caching and many more tasks.
It seems that something like ESB - Enterprise service bus and EIP - Enterprise integration patterns is answer to our needs.
For start, we are playing and actually dig in to Apache Camel - it's not full EIP stack, it's "just" a integration framework. But I guess it's good enough for us right now.
My question is - what you mean about the solution? Are we on the good way?
Thanks!
Camel has a lot of connectors, so that would be a great start.
If you are afraid it is too thin, then take a look at Apache ServiceMix, which provides a deployment (OSGi) container for camel routes (and other things). Camel comes bundled within the standard service mix release out of the box.
The hard task is probably to design the generic API good enough to cover your use cases.
A GIT repo and a Database are very different, is this very generic? Do you only want to access "text" data or something?
I like the approach with camel non the less, since it's rather generic and flexible in these kind of scenarios. That you will need