How much time a SOAP call takes in Camel - apache-camel

Currently we're using Apache Camel (with Spring Boot) as an Integration platform. We have multiple backend systems to integrate. Mostly we use Apache CXF and CXF RS to call to those systems.
We'd like to log how much time we wait for the backend systems, and how much overhead is put on by our application.
We've created an EventNotifierSupport bean, where we can log the following:
The time between the ExchangeCreatedEvent and the ExchangeCompletedEvent events. I think this is the full time approximately it takes to serve the request. (full time as our overhead and the backend system's time)
And I can log the timeTaken property of the ExchangeSentEvent˙notification.
I have a problem with the latter. Under high load it takes pretty much time for our application to process the SOAP response and this time is included into the timeTaken property.
What's the proper Camel-way to measure the time we wait for backend systems?

Camel supports traceability via JMX support out-of-the-box. You can drill down at various resolutions such as routes or individual endpoints in real time, and it can report min/max/mean time for completion and more just using JConsole. This should at least give you some insight into the request->response time for backend processes called on individual endpoints.
See here.

For the record: I think I'd be able to measure a more precise time for the SOAP requests, if
I've implemented an org.apache.cxf.interceptor.Interceptor (based on the org.apache.cxf.interceptor.MessageSenderInterceptor.MessageSenderEndingInterceptor class - I needed to set the constructor arguments also!)
And I register this interceptor to the CXF endpoint as it's mentioned here: http://camel.apache.org/cxf.html
I think the SOAP responses are consumed in a separate ThreadPool, and under high load it takes much time to consume a response object from the pool's queue. But if I set up the CXF interceptor as mentioned above it runs before the response is put into the queue.
I've not tested this solution, and not implemented any more of it. But if I'd face this problem in the future (hello future-googler me!), I'd start looking at these places.

Related

Sharing data to client whenever API has results available?

I have the following scenario, can anyone guide what's the best approach:
Front End —> Rest API —> SOAP API (Legacy applications)
The Legacy applications behave unpredictably; sometimes it's very slow, sometimes fast.
The following is what needs to be achieved:
- As and when data is available to Rest API, the results should be made available to the client
- Whatever info is available, show the intermediate results.
Can anyone share insights in how to design this system?
you have several options to do that
polling from the UI - will require some changes to the API, the initial call will return a url where results will be available and the UI will check that out everytime
websockets - will require changing the api
server-sent events - essentially keeping the http connection open and pushing new results as they are available - sounds the closest to what you want
You want some sort of event-based API that the API consumers can subscribe to.
Event-driven architectures come in many forms - from event notification ('hey, I have new data, come and get it') to message/payload delivery; full-on publish/subscribe solutions to that allow consumers to subscribe to one or more "topics", with event back-up and replay functionality to relatively basic ones.
If you don't want a full-on eventing platform, you could look at WebHooks.
A great way to get started will be to start familiarizing yourself with some event-based architecture patterns. That last link is for Chris Richardson's website, he's got a lot of great info on such architectures and would be well worth a look.
In terms of the defining the event API, if you're familiar with OpenAPI, there's AsyncAPI which is the async equivalent.
In terms of solutions, there's a few well known platforms, including open source ones. The big cloud providers (Azure, GCP and AWS) will also have async / event based services you can use.
For more background there's this Wikipedia page (which I have not read - so can't speak for it's quality but it does look detailed).
Update: Webhooks
Webhooks are a bit like an ice-berg, there's more to them than might appear at first glance. A full-on eventing solution will have a very steep learning curve but will solve problems that you'll otherwise have to address separately (write your own code, etc). Two big areas to think about:
Consumer management. How will you onboard new consumers? Is it a small handful of internal systems / URLs that you can manage through some basic config, manually? Or is it external facing for public third parties? If it's the latter, will you need to provide auto-provisioning through a secure developer portal or get them to email/submit details for manual set-up at your end?
Error handling & missed events. Let's say you have an event, you call the subscribing webhook - but there's no response (or an error). What do you do? Do you retry? If so, how often, for how long? Once the consumer is back up what do you do - did you save the old events to replay? How many events? How do you track who has received what?
Polling
#Arnon is right to mention polling as an approach but I'd only do it if you have no other choice, or, if you have a very small number of internal system doing the polling, i.e - incurs low load, and you control both "ends" of the polling; in such a scenario its a valid approach.
But if its for an external API you'll need to implement throttling to protect your systems, as you'll have limited control over who's calling you and how much. Caching will be another obvious topic to explore in a polling approach.

Synchronous response with Apache Flink

I could not find any answer to my question on the web so far, so I thought its good to ask here.
I know that Apache Flink is by design asynchronous, but I was wondering if there is any project or design, which aims to build a synchronous pipeline with Flink.
With synchronous response I mean in e.g. having an API endpoint, where I send my data to, the processing is done by Flink, and the outcome of the processing is given back (in what form ever) in the body of the answer to the API call e.g. a 200
I already looked into RabbitMQ RPC but I was not able to successfully implement it.
I'm happy for any direction or suggestion.
Thanks,
Jon
The closest thing that comes into my mind seems to be deploying Flink job with TcpSource available in Apache Bahir. You could have an HTTP endpoint that would receive some data and call Flink on the specified address then process it and create a response. The problem is that there is only TcpSource available in Bahir, which means You would need to create large part of the code (whole Sink) by yourself.
There can be also other ways of doing that (like trying to assign an id to each message and then waiting for message with that Id to arrive on Kafka and sending it as a response, but seems to be troublesome and error-prone)
The other way would be to make the response asynchronous(I know the question specifically mentions sync response but mentioning that just for sake of completeness)
However, I would like to say that this seems like a misuse of Flink to me. Flink was primary designed to allow real-time computations on multiple nodes, which doesn't seem to be a case here. I would suggest looking into different streaming libraries that are much more lightweight, easier to compose, and can offer the functionality You want out-of-the-box. You may want to take a look at Akka Streams for example.

Is it okay to have dataloader and rest api caching?

I buiding a graphQL server to wrap a multiple restful APIs. Some of the APIs that i will be integrating are third party and some that we own. We use redis as caching layer. Is it okay if i implement dataloader caching on graphQL? Will it have an effect on my existing redis caching?
Dataloader does not only serve one purpose. In fact there are three purposes that dataloader serves.
Caching: You mentioned caching. I am assuming that you are building a GraphQL gateway/proxy in front of your GraphQL API. Caching in this case means that when you need a specific resource and later on you will need it again, you can reach back to the cached value. This caching happens in memory of your JavaScript application and usually does not conflict with any other type of caching e.g. on the network.
Batching: Since queries can be nested quite deeply you will ultimately get to a point where you request multiple values of the same resource type in different parts of your query execution. Dataloader will basically collect them and resolve resources like a cascade. Requests flow into a queue and are kept there until the execution cycle ends. Then they are all "released" at once (and probably can be resolved in batches). Also the delivered Promises are all resolved at once (even if some results come in earlier than others). This allows the next execution level to also happen within one cycle.
Deduplication: Let's say you fetch a list of BlogPost with a field author of type User. In this list multiple blog posts have been written by the same author. When the same key is requested twice it will only be delivered once to the batch function. Dataloader will then take care of delivering the resources back by resolving the repective promises.
The point is that (1) and (3) can be achieved with a decent http client that caches the requests (and not only the responses, that means does not fire another request when one is already running for that resource). This means that the interesting question is if your REST API supports batch requests (e.g. api/user/1,2 in one requests instead of api/user/1 and api/user/2). If so, using dataloader can massively improve the performance of your API.
Maybe you want to look into what Apollo is building right now with their RESTDatasource: https://www.apollographql.com/docs/apollo-server/v2/features/data-sources.html#REST-Data-Source

Concurrency & Parallelism in AppEngine

I am learning app-engine and have created a spring based application which has a controller for accepting all in-coming requests. There is just one method in the controller which will be used to populated 5 tables in BigQuery. So, I have 5 separate methods to insert data in BigQuery. I am calling each of these methods one at a time sequentially in my controller method. But, I want to execute these 5 BQ methods in parallel not in sequence. How can I achieve such a parallelism in App-Engine app.
There are a two different strategies you can use on GAE - concurrency and deferred approaches. Both have a few flavours.
Concurrency
There are two basic flavours of this, relying on async APIs or creating background threads.
Most of the GAE platform APIs are asynchronous (or can be) and you can invoke multiple of them at once then block until they've all resolved. In this case, you could make 5 asynchronous calls to BigQuery using the UrlFetchService.
GAE also allows the creation of background threads for the duration of a request. All threads must complete before the result is returned to the client. This is generally the least idiomatic approach for GAE.
Deferred processing
GAE offers two flavours of task queue, push and pull.
Push queues are basically a queued task being executed by a specified URL at a rate you control. They can participate in transactions and have retry rules etc. they can be used to ensure a workload is executed but independently of the initiating request. This is the most idiomatic solution for the general problem of 'background work' on GAE
Pull queues are queues that wait for an initiating request to slurp some data out for processing, usually in bulk. They're triggered by cron jobs typically.
In your case, your best bet is to use async http requests, unless you're using an SDK/API wrapper that doesn't expose this. If not, look to task queues. Almost any app you build will end up using them anyway, and they're very graceful and simple to comprehend.

GWT RPC with Google App Engine (GAE) has delay causing performance problems

We are sending a query from a GWT client by RPC to the GAE Java server. The response is a fairly complex object tree. The RPC implementations on the server takes 900ms from start to finish. The total HTTP request is taking 4-5 seconds. We have checked that the actual transmission time and ping time is negligible. (An RPC with void resonse takes 300ms and actual transmission time is small.)
I thought maybe the serialization of the response could be taking time but when we call that explicitly on the server using RPC.encodeResponseForSuccess it takes just 50ms.
So we have 3-4 seconds overhead completely unaccounted for and I'm at a loss how to debug that. We even tried sending the serialized RPC response back using a servlet instead of RPC and sure enough the very same response took ~1s instead of 5!
Thanks!
You're forgetting the client side serialization time for the request data and deserialization time for the response data.
Compile your app with -style PRETTY and run it through Chrome Dev Tools Profiler, IE Dev Tools Profiler or dynaTrace Ajax (http://ajax.dynatrace.com/ajax/en/) or a similar javascript profiling tool to see where your time goes.
Very large responses will take a long time to deserialize. If you use a lot of BigDecimal values in your response, because of the really complex nature of the emulation code it will take even longer (this one is a killer).

Resources