Custom HighRepJobPolicy in Google App Engine development server

Custom HighRepJobPolicy in Google App Engine development server - google-app-engine

I am using a custom HighRepJobPolicy in the App Engine development server. The same class works fine when I use it in my unit tests:
LocalDatastoreServiceTestConfig datastoreConfig = new LocalDatastoreServiceTestConfig();
datastoreConfig.setAlternateHighRepJobPolicyClass(CustomHighRepJobPolicy.class);
But when I try to use this class in the Java development server by adding the JVM tag
-Ddatastore.high_replication_job_policy_class=foo.bar.CustomHighRepJobPolicy
I get a ClassNotFoundException:
Caused by: java.lang.ClassNotFoundException: foo.bar.CustomHighRepJobPolicy
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at com.google.appengine.tools.development.DevAppServerClassLoader.loadClass(DevAppServerClassLoader.java:87)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at com.google.appengine.api.datastore.dev.LocalDatastoreService.initHighRepJobPolicy(LocalDatastoreService.java:426)
... 66 more
Is this expected to work, or has anyone else tried this before? I ask because I could not find any information about this via Google, and the App Engine docs only mention the DefaultHighRepJobPolicy.

Looks like you were early to the party :
public class LocalCustomPolicyHighRepDatastoreTest {
private static final class CustomHighRepJobPolicy implements HighRepJobPolicy {
static int count = 0;
#Override
public boolean shouldApplyNewJob(Key entityGroup) {
// every other new job fails to apply
return count++ % 2 == 0;
}
#Override
public boolean shouldRollForwardExistingJob(Key entityGroup) {
// every other existing job fails to apply
return count++ % 2 == 0;
}
}
private final LocalServiceTestHelper helper =
new LocalServiceTestHelper(new LocalDatastoreServiceTestConfig()
.setAlternateHighRepJobPolicyClass(CustomHighRepJobPolicy.class));
#Before
public void setUp() {
helper.setUp();
}
#After
public void tearDown() {
helper.tearDown();
}
#Test
public void testEventuallyConsistentGlobalQueryResult() {
DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
ds.put(new Entity("yam")); // applies
ds.put(new Entity("yam")); // does not apply
// first global query only sees the first Entity
assertEquals(1, ds.prepare(new Query("yam")).countEntities(withLimit(10)));
// second global query sees both Entities because we "groom" (attempt to
// apply unapplied jobs) after every query
assertEquals(2, ds.prepare(new Query("yam")).countEntities(withLimit(10)));
}
}

Related

Hystrix Circuit breaker not opening the circuit

I am implementing Circuit breaker using Hystrix in my Spring boot application, my code is something like below:
#service
public class MyServiceHandler {
#HystrixCommand(fallbackMethod="fallback")
public String callService() {
// if(remote service is not reachable
// throw ServiceException
}
public String fallback() {
// return default response
}
}
// In application.properties, I have below properties defined:
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=10000
hystrix.command.default.circuitBreaker.requestVolumeThreshold=3
hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds=30000
hystrix.threadpool.default.coreSize=4
hystrix.threadpool.default.metrics.rollingStats.timeInMilliseconds=200000
I see that the fallback() is getting called with each failure of callService(). However, the circuit is not opening after 3 failures. After 3 failures, I was expecting that it will directly call fallback() and skip callService(). But this is not happening. Can someone advise what I am doing wrong here?
Thanks,
B Jagan
Edited on 26th July to add more details below:
Below is the actual code. I played a bit further with this. I see that the Circuit opens as expected on repeated failured when I call the remote service directly in the RegistrationHystrix.registerSeller() method. But, when I wrap the remote service call within Spring retry template, it keeps going into fallback method, but circuit never opens.
#Service
public class RegistrationHystrix {
Logger logger = LoggerFactory.getLogger(RegistrationHystrix.class);
private RestTemplate restTemplate;
private RetryTemplate retryTemplate;
public RegistrationHystrix(RestTemplate restTemplate) {
this.restTemplate = restTemplate;
retryTemplate = new RetryTemplate();
FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
fixedBackOffPolicy.setBackOffPeriod(1000l);
retryTemplate.setBackOffPolicy(fixedBackOffPolicy);
SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy();
retryPolicy.setMaxAttempts(3);
retryTemplate.setRetryPolicy(retryPolicy);
}
#HystrixCommand(fallbackMethod = "fallbackForRegisterSeller", commandKey = "ordermanagement")
public String registerSeller(SellerDto sellerDto) throws Exception {
String response = retryTemplate.execute(new RetryCallback<String, Exception>() {
#Override
public String doWithRetry(RetryContext context) {
logger.info(String.format("Retry count %d", context.getRetryCount()));
return restTemplate.postForObject("/addSeller", sellerDto, String.class);
}
});
return response;
}
public List<SellerDto> getSellersList() {
return restTemplate.getForObject("/sellersList", List.class);
}
public String fallbackForRegisterSeller(SellerDto sellerDto, Throwable t) {
logger.error("Inside fall back, cause - {}", t.toString());
return "Inside fallback method. Some error occured while calling service for seller registration";
}
}
Below is the service class which in turn calls the above Hystrix wrapped service. This class in turn is invoked by a controller.
#Service
public class RegistrationServiceImpl implements RegistrationService {
Logger logger = LoggerFactory.getLogger(RegistrationServiceImpl.class);
private RegistrationHystrix registrationHystrix;
public RegistrationServiceImpl(RegistrationHystrix registrationHystrix) {
this.registrationHystrix = registrationHystrix;
}
#Override
public String registerSeller(SellerDto sellerDto) throws Exception {
long start = System.currentTimeMillis();
String registerSeller = registrationHystrix.registerSeller(sellerDto);
logger.info("add seller call returned in - {}", System.currentTimeMillis() - start);
return registerSeller;
}
So, I am trying to understand why the Circuit breaker is not working as expected when using it along with Spring RetryTemplate.

You should be using metrics.healthSnapshot.intervalInMilliseconds while testing. I guess you are executing all 3 request within default 500 ms and hence the circuit isn't getting open. You can either decrease this interval or you may put a sleep between the 3 requests.

Flink streaming example that generates its own data

Earlier I asked about a simple hello world example for Flink. This gave me some good examples!
However I would like to ask for a more ‘streaming’ example where we generate an input value every second. This would ideally be random, but even just the same value each time would be fine.
The objective is to get a stream that ‘moves’ with no/minimal external touch.
Hence my question:
How to show Flink actually streaming data without external dependencies?
I found how to show this with generating data externally and writing to Kafka, or listening to a public source, however I am trying to solve it with minimal dependence (like starting with GenerateFlowFile in Nifi).

Here's an example. This was constructed as an example of how to make your sources and sinks pluggable. The idea being that in development you might use a random source and print the results, for tests you might use a hardwired list of input events and collect the results in a list, and in production you'd use the real sources and sinks.
Here's the job:
/*
* Example showing how to make sources and sinks pluggable in your application code so
* you can inject special test sources and test sinks in your tests.
*/
public class TestableStreamingJob {
private SourceFunction<Long> source;
private SinkFunction<Long> sink;
public TestableStreamingJob(SourceFunction<Long> source, SinkFunction<Long> sink) {
this.source = source;
this.sink = sink;
}
public void execute() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<Long> LongStream =
env.addSource(source)
.returns(TypeInformation.of(Long.class));
LongStream
.map(new IncrementMapFunction())
.addSink(sink);
env.execute();
}
public static void main(String[] args) throws Exception {
TestableStreamingJob job = new TestableStreamingJob(new RandomLongSource(), new PrintSinkFunction<>());
job.execute();
}
// While it's tempting for something this simple, avoid using anonymous classes or lambdas
// for any business logic you might want to unit test.
public class IncrementMapFunction implements MapFunction<Long, Long> {
#Override
public Long map(Long record) throws Exception {
return record + 1 ;
}
}
}
Here's the RandomLongSource:
public class RandomLongSource extends RichParallelSourceFunction<Long> {
private volatile boolean cancelled = false;
private Random random;
#Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
random = new Random();
}
#Override
public void run(SourceContext<Long> ctx) throws Exception {
while (!cancelled) {
Long nextLong = random.nextLong();
synchronized (ctx.getCheckpointLock()) {
ctx.collect(nextLong);
}
}
}
#Override
public void cancel() {
cancelled = true;
}
}

Flink StreamingFileSink not writing data to AWS S3

I have a collection that represents a data stream and testing StreamingFileSink to write the stream to S3. Program running successfully, but there is no data in the given S3 path.
public class S3Sink {
public static void main(String args[]) throws Exception {
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
see.enableCheckpointing(100);
List<String> input = new ArrayList<>();
input.add("test");
DataStream<String> inputStream = see.fromCollection(input);
RollingPolicy<Object, String> rollingPolicy = new CustomRollingPolicy();
StreamingFileSink s3Sink = StreamingFileSink.
forRowFormat(new Path("<S3 Path>"),
new SimpleStringEncoder<>("UTF-8"))
.withRollingPolicy(rollingPolicy)
.build();
inputStream.addSink(s3Sink);
see.execute();
}
}
Checkpointing enabled as well. Any thoughts on why Sink is not working as expected ?
UPDATE:
Based on David's answer, created custom source which generates random string continuously and I am expecting Checkpointing to trigger after configured interval to write the data to S3.
public class S3SinkCustom {
public static void main(String args[]) throws Exception {
StreamExecutionEnvironment see = StreamExecutionEnvironment.getExecutionEnvironment();
see.enableCheckpointing(1000);
DataStream<String> inputStream = see.addSource(new CustomSource());
RollingPolicy<Object, String> rollingPolicy = new CustomRollingPolicy();
StreamingFileSink s3Sink = StreamingFileSink.
forRowFormat(new Path("s3://mybucket/data/"),
new SimpleStringEncoder<>("UTF-8"))
.build();
//inputStream.print();
inputStream.addSink(s3Sink);
see.execute();
}
static class CustomSource extends RichSourceFunction<String> {
private volatile boolean running = false;
final String[] strings = {"ABC", "XYZ", "DEF"};
#Override
public void open(Configuration parameters){
running = true;
}
#Override
public void run(SourceContext sourceContext) throws Exception {
while (running) {
Random random = new Random();
int index = random.nextInt(strings.length);
sourceContext.collect(strings[index]);
Thread.sleep(1000);
}
}
#Override
public void cancel() {
running = false;
}
}
}
Still, There is no data in s3 and Flink Process is not even validating given S3 bucket is valid or not, but the process running without any issues.
Update:
Below is the custom rolling policy details:
public class CustomRollingPolicy implements RollingPolicy<Object, String> {
#Override
public boolean shouldRollOnCheckpoint(PartFileInfo partFileInfo) throws IOException {
return partFileInfo.getSize() > 1;
}
#Override
public boolean shouldRollOnEvent(PartFileInfo partFileInfo, Object o) throws IOException {
return true;
}
#Override
public boolean shouldRollOnProcessingTime(PartFileInfo partFileInfo, long l) throws IOException {
return true;
}
}

I believe the issue is that the job you've written isn't going to run long enough to actually checkpoint, so the output isn't going to be finalized.
Another potential issue is that the StreamingFileSink only works with the Hadoop-based S3 filesystem (and not the one from Presto).

Above issue is resolved after setting up flink-conf.yaml with required s3a properties like fs.s3a.access.key,fs.s3a.secret.key.
We need to let Flink know about the config location as well.
FileSystem.initialize(GlobalConfiguration.loadConfiguration(""));
With these changes, I was able to run S3 sink from local and messages persisted to S3 without any issues.

Have a problem when run jbmp sample in eclipse

Coud anybody help me fix my problem? When I tried to run jbpm sample in eclipse. This is code:
public class ProcessMain {
public static void main(String[] args) {
KieServices ks = KieServices.Factory.get();
KieContainer kContainer = ks.getKieClasspathContainer();
KieBase kbase = kContainer.getKieBase("kbase");
RuntimeManager manager = createRuntimeManager(kbase);
RuntimeEngine engine = manager.getRuntimeEngine(null);
KieSession ksession = engine.getKieSession();
ksession.startProcess("com.sample.bpmn.hello");
manager.disposeRuntimeEngine(engine);
System.exit(0);
}
private static RuntimeManager createRuntimeManager(KieBase kbase) {
JBPMHelper.startH2Server();
JBPMHelper.setupDataSource();
EntityManagerFactory emf = Persistence.createEntityManagerFactory("org.jbpm.persistence.jpa");
RuntimeEnvironmentBuilder builder = RuntimeEnvironmentBuilder.Factory.get()
.newDefaultBuilder().entityManagerFactory(emf)
.knowledgeBase(kbase);
return RuntimeManagerFactory.Factory.get()
.newSingletonRuntimeManager(builder.get(), "com.sample:example:1.0");
}
}
Then, this is error in console window:
Exception in thread "main" java.lang.IllegalArgumentException: Driver class name cannot be empty.
at org.kie.test.util.db.internal.DatabaseProvider.fromDriverClassName(DatabaseProvider.java:32)
at org.kie.test.util.db.DataSourceFactory.setupPoolingDataSource(DataSourceFactory.java:57)
at org.kie.test.util.db.DataSourceFactory.setupPoolingDataSource(DataSourceFactory.java:42)
at org.jbpm.test.JBPMHelper.setupDataSource(JBPMHelper.java:103)
at com.sample.ProcessMain.createRuntimeManager(ProcessMain.java:34)
at com.sample.ProcessMain.main(ProcessMain.java:23)

jBPMHelper no longer sets default values for H2,- https://github.com/kiegroup/drools/commit/34293e9675ae4f36f2a3a9e633305bbcc8260d19
We need to use - PersistenceUtil.setupPoolingDataSource(); instead JBPMHelper.setupDataSource();
Also include datasource.properties file at resources folder.
datasource.properties - > https://github.com/kiegroup/jbpm/blob/master/jbpm-examples/src/main/resources/datasource.properties

mapreduce fails with message "The request to API call datastore_v3.Put() was too large."

I am running a mapreduce job over 50 million User records.
For each user I read two other Datastore entities and then stream stats for each player to bigquery.
My first dry run (with streaming to bigquery disabled) failed with the following stacktrace.
/_ah/pipeline/handleTask
com.google.appengine.tools.cloudstorage.NonRetriableException: com.google.apphosting.api.ApiProxy$RequestTooLargeException: The request to API call datastore_v3.Put() was too large.
at com.google.appengine.tools.cloudstorage.RetryHelper.doRetry(RetryHelper.java:121)
at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:166)
at com.google.appengine.tools.cloudstorage.RetryHelper.runWithRetries(RetryHelper.java:157)
at com.google.appengine.tools.pipeline.impl.backend.AppEngineBackEnd.tryFiveTimes(AppEngineBackEnd.java:196)
at com.google.appengine.tools.pipeline.impl.backend.AppEngineBackEnd.saveWithJobStateCheck(AppEngineBackEnd.java:236)
I have googled this error and the only thing I find is related to that the Mapper is too big to be serialized but our Mapper has no data at all.
/**
* Adds stats for a player via streaming api.
*/
public class PlayerStatsMapper extends Mapper<Entity, Void, Void> {
private static Logger log = Logger.getLogger(PlayerStatsMapper.class.getName());
private static final long serialVersionUID = 1L;
private String dataset;
private String table;
private transient GbqUtils gbq;
public PlayerStatsMapper(String dataset, String table) {
gbq = Davinci.getComponent(GbqUtils.class);
this.dataset = dataset;
this.table = table;
}
private void readObject(java.io.ObjectInputStream in) throws IOException, ClassNotFoundException {
in.defaultReadObject();
log.info("IOC reinitating due to deserialization.");
gbq = Davinci.getComponent(GbqUtils.class);
}
#Override
public void beginShard() {
}
#Override
public void endShard() {
}
#Override
public void map(Entity value) {
if (!value.getKind().equals("User")) {
log.severe("Expected a User but got a " + value.getKind());
return;
}
User user = new User(1, value);
List<Map<String, Object>> rows = new LinkedList<Map<String, Object>>();
List<PlayerStats> playerStats = readPlayerStats(user.getUserId());
addRankings(user.getUserId(), playerStats);
for (PlayerStats ps : playerStats) {
rows.add(ps.asMap());
}
// if (rows.size() > 0)
// gbq.insert(dataset, table, rows);
}
.... private methods only
}
The maprecuce job is started with this code
MapReduceSettings settings = new MapReduceSettings().setWorkerQueueName("mrworker");
settings.setBucketName(gae.getAppName() + "-playerstats");
// #formatter:off <I, K, V, O, R>
MapReduceSpecification<Entity, Void, Void, Void, Void> spec =
MapReduceSpecification.of("Enque player stats",
new DatastoreInput("User", shardCountMappers),
new PlayerStatsMapper(dataset, "playerstats"),
Marshallers.getVoidMarshaller(),
Marshallers.getVoidMarshaller(),
NoReducer.<Void, Void, Void> create(),
NoOutput.<Void, Void> create(1));
// #formatter:on
String jobId = MapReduceJob.start(spec, settings);

Well I solved this by backing to appengine-mapreduce-0.2.jar which was the one we had used before. The one used above was appengine-mapreduce-0.5.jar which actually turned out not to work for us.
When backing to 0.2 the console _ah/pipiline/list started to work again as well!
Anyone else that have encountered similar problem with 0.5?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Custom HighRepJobPolicy in Google App Engine development server - google-app-engine

Related

Hystrix Circuit breaker not opening the circuit

Flink streaming example that generates its own data

Flink StreamingFileSink not writing data to AWS S3

Have a problem when run jbmp sample in eclipse

mapreduce fails with message "The request to API call datastore_v3.Put() was too large."

Categories

Resources