I have a rest endpoint sample.org which returns a json response of the form
{
"response" : "pending"
}
My route looks like this
from("http://sample.org")
.marshal(xmlFormatConverterUtil.getxmlJsonDataFormat()) //To convert into json as I receive data in xml format which needs to be converted to json
I read about polling consumer but couldn't find an example on how to keep polling the endpoint, till it returns response as "success".
Should a polling consumer be used ? If so can an example relevant to my case be illustrated. Any other resource to poll rest endpoints will be highly useful.
You need to start from a timer instead and then call the rest endpoint. Then you can check the result and if its then stop the route using controlbus. The filter can be used to check if its pending and then just stop continue routing, and then the next timer will try again.
Someting along this pseudo route
from timer
to http
marshal
filter (if pending)
stop
end
to something with positive response
to controlbus stop route
You can find more details at
http://camel.apache.org/timer
http://camel.apache.org/controlbus
http://camel.apache.org/how-can-i-stop-a-route-from-a-route.html
http://camel.apache.org/message-filter.html
I had a similar problem and ended up writing a custom endpoint for polling.
It works as a producer and polls the specified uri until a specified predicate has been met or polling reaches the maximum number of tries.
from("direct:start")
.to("poll:http://example.com/status?maxRetries=3&successPredicate=#statusSuccess")
The polling endpoint uses a simple processor that uses a polling consumer for polling.
public class PollProcessor implements Processor {
private final String uri;
private final long requestTimeoutMs;
private final long period;
private final int maxTries;
private final Predicate<Exchange> successPredicate;
public PollProcessor(String uri, long requestTimeoutMs, long period, int maxTries, Predicate<Exchange> successPredicate) {
Preconditions.checkArgument(maxTries > 0);
Preconditions.checkArgument(period >= 0);
Preconditions.checkNotNull(successPredicate);
this.uri = uri;
this.requestTimeoutMs = requestTimeoutMs;
this.period = period;
this.maxTries = maxTries;
this.successPredicate = successPredicate;
}
#Override
public void process(Exchange exchange) throws Exception {
PollingConsumer consumer = exchange.getContext().getEndpoint(uri).createPollingConsumer();
for (int tryNumber = 1; tryNumber <= maxTries; ++tryNumber) {
Exchange pollExchange = consumer.receive(requestTimeoutMs);
if (successPredicate.test(pollExchange)) {
exchange.setOut(pollExchange.getOut());
exchange.setException(pollExchange.getException());
return;
}
log.warn("Polling {} failed try number {}, waiting {} ms for next try...", uri, tryNumber);
Thread.sleep(period);
}
throw new RuntimeException("Polling failed maximum allowed number of tries [" + maxTries + "], see log for details.");
}
}
Related
We are migrating a spark job to flink. We have used pre-shuffle aggregation in spark. Is there a way to execute similar operation in spark. We are consuming data from apache kafka. We are using keyed tumbling window to aggregate the data. We want to aggregate the data in flink before performing shuffle.
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
yes, it is possible and I will describe three ways. First the already built-in for Flink Table API. The second way you have to build your own pre-aggregate operator. The third is a dynamic pre-aggregate operator which adjusts the number of events to pre-aggregate before the shuffle phase.
Flink Table API
As it is shown here you can do MiniBatch Aggregation or Local-Global Aggregation. The second option is better. You basically tell to Flink to create mini-batches of every 5000 events and pre-aggregate them before the shuffle phase.
// instantiate table environment
TableEnvironment tEnv = ...
// access flink configuration
Configuration configuration = tEnv.getConfig().getConfiguration();
// set low-level key-value options
configuration.setString("table.exec.mini-batch.enabled", "true");
configuration.setString("table.exec.mini-batch.allow-latency", "5 s");
configuration.setString("table.exec.mini-batch.size", "5000");
configuration.setString("table.optimizer.agg-phase-strategy", "TWO_PHASE");
Flink Stream API
This way is more cumbersome because you have to create your own operator using OneInputStreamOperator and call it using the doTransform(). Here is the example of the BundleOperator.
public abstract class AbstractMapStreamBundleOperator<K, V, IN, OUT>
extends AbstractUdfStreamOperator<OUT, MapBundleFunction<K, V, IN, OUT>>
implements OneInputStreamOperator<IN, OUT>, BundleTriggerCallback {
#Override
public void processElement(StreamRecord<IN> element) throws Exception {
// get the key and value for the map bundle
final IN input = element.getValue();
final K bundleKey = getKey(input);
final V bundleValue = this.bundle.get(bundleKey);
// get a new value after adding this element to bundle
final V newBundleValue = userFunction.addInput(bundleValue, input);
// update to map bundle
bundle.put(bundleKey, newBundleValue);
numOfElements++;
bundleTrigger.onElement(input);
}
#Override
public void finishBundle() throws Exception {
if (!bundle.isEmpty()) {
numOfElements = 0;
userFunction.finishBundle(bundle, collector);
bundle.clear();
}
bundleTrigger.reset();
}
}
The call-back interface defines when you are going to trigger the pre-aggregate. Every time that the stream reaches the bundle limit at if (count >= maxCount) your pre-aggregate operator will emit events to the shuffle phase.
public class CountBundleTrigger<T> implements BundleTrigger<T> {
private final long maxCount;
private transient BundleTriggerCallback callback;
private transient long count = 0;
public CountBundleTrigger(long maxCount) {
Preconditions.checkArgument(maxCount > 0, "maxCount must be greater than 0");
this.maxCount = maxCount;
}
#Override
public void registerCallback(BundleTriggerCallback callback) {
this.callback = Preconditions.checkNotNull(callback, "callback is null");
}
#Override
public void onElement(T element) throws Exception {
count++;
if (count >= maxCount) {
callback.finishBundle();
reset();
}
}
#Override
public void reset() {
count = 0;
}
}
Then you call your operator using the doTransform:
myStream.map(....)
.doTransform(metricCombiner, info, new RichMapStreamBundleOperator<>(myMapBundleFunction, bundleTrigger, keyBundleSelector))
.map(...)
.keyBy(...)
.window(TumblingProcessingTimeWindows.of(Time.seconds(20)))
A dynamic pre-aggregation
In case you wish to have a dynamic pre-aggregate operator check the AdCom - Adaptive Combiner for stream aggregation. It basically adjusts the pre-aggregation based on backpressure signals. It results in using the maximum possible of the shuffle phase.
I have written a small test case code in Flink to sort a datastream. The code is as follows:
public enum StreamSortTest {
;
public static class MyProcessWindowFunction extends ProcessWindowFunction<Long,Long,Integer, TimeWindow> {
#Override
public void process(Integer key, Context ctx, Iterable<Long> input, Collector<Long> out) {
List<Long> sortedList = new ArrayList<>();
for(Long i: input){
sortedList.add(i);
}
Collections.sort(sortedList);
sortedList.forEach(l -> out.collect(l));
}
}
public static void main(final String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(2);
env.getConfig().setExecutionMode(ExecutionMode.PIPELINED);
DataStream<Long> probeSource = env.fromSequence(1, 500).setParallelism(2);
// range partition the stream into two parts based on data value
DataStream<Long> sortOutput =
probeSource
.keyBy(x->{
if(x<250){
return 1;
} else {
return 2;
}
})
.window(TumblingProcessingTimeWindows.of(Time.seconds(20)))
.process(new MyProcessWindowFunction())
;
sortOutput.print();
System.out.println(env.getExecutionPlan());
env.executeAsync();
}
}
However, the code just outputs the execution plan and a few other lines. But it doesn't output the actual sorted numbers. What am I doing wrong?
The main problem I can see is that You are using ProcessingTime based window with very short input data, which surely will be processed in time shorter than 20 seconds. While Flink is able to detect end of input(in case of stream from file or sequence as in Your case) and generate Long.Max watermark, which will close all open event time based windows and fire all event time based timers. It doesn't do the same thing for ProcessingTime based computations, so in Your case You need to assert Yourself that Flink will actually work long enough so that Your window is closed or refer to custom trigger/different time characteristic.
One other thing I am not sure about since I never used it that much is if You should use executeAsync for local execution, since that's basically meant for situations when You don't want to wait for the result of the job according to the docs here.
I am implementing Circuit breaker using Hystrix in my Spring boot application, my code is something like below:
#service
public class MyServiceHandler {
#HystrixCommand(fallbackMethod="fallback")
public String callService() {
// if(remote service is not reachable
// throw ServiceException
}
public String fallback() {
// return default response
}
}
// In application.properties, I have below properties defined:
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=10000
hystrix.command.default.circuitBreaker.requestVolumeThreshold=3
hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds=30000
hystrix.threadpool.default.coreSize=4
hystrix.threadpool.default.metrics.rollingStats.timeInMilliseconds=200000
I see that the fallback() is getting called with each failure of callService(). However, the circuit is not opening after 3 failures. After 3 failures, I was expecting that it will directly call fallback() and skip callService(). But this is not happening. Can someone advise what I am doing wrong here?
Thanks,
B Jagan
Edited on 26th July to add more details below:
Below is the actual code. I played a bit further with this. I see that the Circuit opens as expected on repeated failured when I call the remote service directly in the RegistrationHystrix.registerSeller() method. But, when I wrap the remote service call within Spring retry template, it keeps going into fallback method, but circuit never opens.
#Service
public class RegistrationHystrix {
Logger logger = LoggerFactory.getLogger(RegistrationHystrix.class);
private RestTemplate restTemplate;
private RetryTemplate retryTemplate;
public RegistrationHystrix(RestTemplate restTemplate) {
this.restTemplate = restTemplate;
retryTemplate = new RetryTemplate();
FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
fixedBackOffPolicy.setBackOffPeriod(1000l);
retryTemplate.setBackOffPolicy(fixedBackOffPolicy);
SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy();
retryPolicy.setMaxAttempts(3);
retryTemplate.setRetryPolicy(retryPolicy);
}
#HystrixCommand(fallbackMethod = "fallbackForRegisterSeller", commandKey = "ordermanagement")
public String registerSeller(SellerDto sellerDto) throws Exception {
String response = retryTemplate.execute(new RetryCallback<String, Exception>() {
#Override
public String doWithRetry(RetryContext context) {
logger.info(String.format("Retry count %d", context.getRetryCount()));
return restTemplate.postForObject("/addSeller", sellerDto, String.class);
}
});
return response;
}
public List<SellerDto> getSellersList() {
return restTemplate.getForObject("/sellersList", List.class);
}
public String fallbackForRegisterSeller(SellerDto sellerDto, Throwable t) {
logger.error("Inside fall back, cause - {}", t.toString());
return "Inside fallback method. Some error occured while calling service for seller registration";
}
}
Below is the service class which in turn calls the above Hystrix wrapped service. This class in turn is invoked by a controller.
#Service
public class RegistrationServiceImpl implements RegistrationService {
Logger logger = LoggerFactory.getLogger(RegistrationServiceImpl.class);
private RegistrationHystrix registrationHystrix;
public RegistrationServiceImpl(RegistrationHystrix registrationHystrix) {
this.registrationHystrix = registrationHystrix;
}
#Override
public String registerSeller(SellerDto sellerDto) throws Exception {
long start = System.currentTimeMillis();
String registerSeller = registrationHystrix.registerSeller(sellerDto);
logger.info("add seller call returned in - {}", System.currentTimeMillis() - start);
return registerSeller;
}
So, I am trying to understand why the Circuit breaker is not working as expected when using it along with Spring RetryTemplate.
You should be using metrics.healthSnapshot.intervalInMilliseconds while testing. I guess you are executing all 3 request within default 500 ms and hence the circuit isn't getting open. You can either decrease this interval or you may put a sleep between the 3 requests.
I am writing a component (an endpoint) that will receive the Camel Exchange like this:
from("file|activemq|whatever").to(myEndpoint);
Upon receiving, I want it to pass the exchange to a set of subroutines, which may work asynchronously, and which will eventually decide that they have finished, possibly composed a response in an Out message of the Exchange. All may happen outside the Camel Context, I am working only with the Exchange object.
Then my subroutines should invoke something that will tell Camel that it should propagate the response back, do other stuff as per the source and middle components requirements (for example, if it is a file:/, rename a file) and consider the routing ow this Exchange completed.
I was thinking that I would invoke the Exchange's Uint of Work done method.
Unfortunately I am noticing that Camel still tries to end the exchange by itself too, in wrong time and state. For example, for file source, it fails to rename the file which already has been removed.
Here is some of my code:
Here I define an endpoint:
_proceeder = new DefaultEndpoint() {
private final String _defaultUri = "rex:producer-" + UUID.randomUUID().toString();
#Override
protected String createEndpointUri() {
return _defaultUri;
}
#Override
public Producer createProducer() throws Exception {
return new DefaultAsyncProducer(this) {
#Override
public boolean process(final Exchange exchange1, final AsyncCallback callback) {
final ExchangeWrapper exchange = new ExchangeWrapper(_uri, exchange1, MessageSystem.this);
_LOG.debug("Got input for {}. Processing...", _uri);
exchange._taken(); // 1. all subsequent will increase by 1
/// some majick....
final boolean done = exchange._released(); // if all were released immediately, itll be 0 and sent back now. otherwise the last to release will send it back.
if (done) {
_LOG.debug("Processed input for {} synchronously", _uri);
//callback.done(true);
} else {
_LOG.debug("Processed input for {} asynchronously, awaiting response", _uri);
//exchange1.addOnCompletion(new Synchronization() {
// #Override
// public void onComplete(Exchange exchange) {
// onFailure(exchange);
// }
//
// #Override
// public void onFailure(Exchange exchange) {
// callback.done(false);
// }
//});
}
return done;
}
};
}
#Override
public Consumer createConsumer(Processor processor) throws Exception {
throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
}
#Override
public boolean isSingleton() {
return true;
}
};
_proceeder.setCamelContext(context);
Needless to say that I don't understand why I am given an AsyncCallback in my DefaultAsyncProducer.process() method; regardless of me calling its done() method, the system doesn't see this and still ends the exchange by itself once more. But it is not the question.
here is the ExchangeWrapper _released and _done methods:
private void _done() throws Exception {
UnitOfWork uow = _exchange.getUnitOfWork();
uow.done(_exchange);
//try{
// uow.stop();
//}catch(Exception e){
//
//}
_exchange.setUnitOfWork(null);
}
private boolean _released() {
final boolean ret;
final int cnt;
final int trancnt;
synchronized (_exchange) {
cnt = _exchange.getProperty("rex.takenCount", Integer.class) - 1;
_exchange.setProperty("rex.takenCount", cnt);
trancnt = _exchange.getProperty("rex.takenAsTransient", Integer.class);
}
if (_LOG.isDebugEnabled()) {
_LOG.debug("Input for {} released. {} times left, {} transient", new Object[]{_exchange.getProperty("rex.uri", String.class), cnt, trancnt});
}
if (cnt <= 0 || cnt <= trancnt) {
if (_LOG.isDebugEnabled()) {
_LOG.debug("Message for {} is processed by all non-transient receivers. Setting done...", new Object[]{_exchange.getProperty("rex.uri", String.class)});
}
_done();
ret = true;
if (_LOG.isDebugEnabled()) {
_LOG.debug("Message for {} is set done", new Object[]{_exchange.getProperty("rex.uri", String.class)});
}
} else {
ret = false;
}
return ret;
}
So basically I wrap the Exchange to keep state and decide when the processing should be stopped.
While digging into the Camel internals I've seen some similar counters that keep track of how many times the Exchange has been taken for processing, but I'd like to be in control, thus my own wrapper.
So what should I call instead of
_exchange.getUnitOfWork().done(_exchange);
to tell the Camel Internal Processor and others that there is no need to mark the exchange done because I am doing it?
My latest finding is to call uow.stop(); so that it clears all the 'after' processors etc, but I suddenly understood that I may try and hack Camel myself for a long time, but it's better to ask people who know exactly what to do without trying and guessing.
These are the examples of my Routes:
RouteBuilder rb = new RouteBuilder(_context) {
#Override
public void configure() throws Exception {
if (_tokenizer != null) {
from(_uri).split().method(_tokenizer, "tokenizeReader").streaming().to(_proceeder);
} else {
from(_uri).to(_proceeder);
}
}
};
If I could avoid building routes, instantiating endpoints and producers, and employ standalone processors, I'd happily do, but I don't want to ditch what the marvelous Camel project has to offer in terms of splitting, streaming, marshalling etc etc; and all of this seems to be built around routes.
May be I am not clear with what are you trying to achieve with this, but let me try.
Upon receiving, I want it to pass the exchange to a set of
subroutines, which may work asynchronously, and which will eventually
decide that they have finished
So for this you can write a processor and configure it at the end of your route .Within your processor you can use a thread pool, submit to it the subroutine tasks, wait for their completion and decide if you want to change the message body ( correct way explained here with a good diagram explaining flow of an exchange through a route) and camel will automatically take care of returning the response to the caller of the route based on exchange pattern. For example in your case if the route begins from file/activemq route then it is event based/one way exchange and no response will be sent to the caller as there is no caller client as such. It will be just an event which will initiate the exchange.
Update :
For using the async processing feature in camel for enhanced scalability take a look at this example with code from the highly recommended Camel in Action book
I'm new to Flink and I work with DataSet API. After a whole bunch of processing as the last stage I need to normalize one of the values by dividing it by its maximum value. So, I have used the .max() operator to take the max and later I'm passing the result as constructor's argument to the MapFunction.
This works, however all the processing is performed twice. One job is executed to find max values, and later another job is executed to create final result (starting execution from the beginning)... Is there any workaround to execute whole dataflow only once?
final List<Tuple6<...>> maxValues = result.max(2).collect();
assert maxValues.size() == 1;
result.map(new NormalizeAttributes(maxValues.get(0))).writeAsCsv(...)
#FunctionAnnotation.ForwardedFields("f0; f1; f3; f4; f5")
#FunctionAnnotation.ReadFields("f2")
private static class NormalizeAttributes implements MapFunction<Tuple6<...>, Tuple6<...>> {
private final Tuple6<...> maxValues;
public NormalizeAttributes(Tuple6<...> maxValues) {
this.maxValues = maxValues;
}
#Override
public Tuple6<...> map(Tuple6<...> value) throws Exception {
value.f2 /= maxValues.f2;
return value;
}
}
collect() immediately triggers an execution of the program up to the dataset requested by collect(). If you later call env.execute() or collect() again, the program is executed second time.
Besides the side effect of execution, using collect() to distribute values to subsequent transformation has also the drawback that data is transferred to the client and later back into the cluster. Flink offers so-called Broadcast variables to ship a DataSet as a side input into another transformation.
Using Broadcast variables in your program would look as follows:
DataSet maxValues = result.max(2);
result
.map(new NormAttrs()).withBroadcastSet(maxValues, "maxValues")
.writeAsCsv(...);
The NormAttrs function would look like this:
private static class NormAttr extends RichMapFunction<Tuple6<...>, Tuple6<...>> {
private Tuple6<...> maxValues;
#Override
public void open(Configuration config) {
maxValues = (Tuple6<...>)getRuntimeContext().getBroadcastVariable("maxValues").get(1);
}
#Override
public PredictedLink map(Tuple6<...> value) throws Exception {
value.f2 /= maxValues.f2;
return value;
}
}
You can find more information about Broadcast variables in the documentation.