Flink Not Emitting Value to Store in Cassandra - apache-flink

I have following POJO class,
import com.datastax.driver.mapping.annotations.Column;
import com.datastax.driver.mapping.annotations.Table;
#Table(keyspace = "testKey", name = "contact")
public class Person implements Serializable {
private static final long serialVersionUID = 1L;
#Column(name = "name")
private String name;
#Column(name = "timeStamp")
private LocalDateTime timeStamp;
}
and Mapper code is,
DataStream<Reading> sideOutput = stream.flatMap(new FlatMapFunction<String, Person>() {
#Override
public void flatMap(String value, Collector<Person> out) throws Exception {
try {
out.collect(objectMapper.readValue(value, Person.class));
} catch (JsonProcessingException e) {
e.printStackTrace();
}
}
}).getSideOutput(new OutputTag<>("contact", TypeInformation.of(Person.class)));
env.execute();
CassandraSink.addSink(sideOutput)
.setHost("localhost")
.setMapperOptions(() -> new Mapper.Option[]{Mapper.Option.saveNullFields(true)})
.build();
It's not working without .getSideOutput(new OutputTag<>("contact", TypeInformation.of(Person.class))); also.
The sideOutput is not emitting value to store in Cassandra. any idea where I am doing wrong?

I would say, env.execute(); should be called after the pipeline is build, i.e. after the CassandraSink and would get rid of side output. Somethink like this should work:
DataStream<Reading> ds = stream.flatMap(new FlatMapFunction<String, Person>() {
#Override
public void flatMap(String value, Collector<Person> out) throws Exception {
try {
out.collect(objectMapper.readValue(value, Person.class));
} catch (JsonProcessingException e) {
e.printStackTrace();
}
}
});
CassandraSink.addSink(ds)
.setHost("localhost")
.setMapperOptions(() -> new Mapper.Option[]{Mapper.Option.saveNullFields(true)})
.build();
env.execute();

Related

Flink Get the KeyedState State Value and use in Another Stream

I know that keyed state belongs to the its key and only current key accesses its state value, other keys can not access to the different key's state value.
I tried to access the state with the same key but in different stream. Is it possible?
If it is not possible then I will have 2 duplicate data?
Not: I need two stream because each of them will have different timewindow and also different implementations.
Here is the example (I know that keyBy(sommething) is the same for both stream operations):
public class Sample{
streamA
.keyBy(something)
.timeWindow(Time.seconds(4))
.process(new CustomMyProcessFunction())
.name("CustomMyProcessFunction")
.print();
streamA
.keyBy(something)
.timeWindow(Time.seconds(1))
.process(new CustomMyAnotherProcessFunction())
.name("CustomMyProcessFunction")
.print();
}
public class CustomMyProcessFunction extends ProcessWindowFunction<..>
{
private Logger logger = LoggerFactory.getLogger(CustomMyProcessFunction.class);
private transient ValueState<SimpleEntity> simpleEntityValueState;
private SimpleEntity simpleEntity;
#Override
public void open(Configuration parameters) throws Exception
{
ValueStateDescriptor<SimpleEntity> simpleEntityValueStateDescriptor = new ValueStateDescriptor<SimpleEntity>(
"sample",
TypeInformation.of(SimpleEntity.class)
);
simpleEntityValueState = getRuntimeContext().getState(simpleEntityValueStateDescriptor);
}
#Override
public void process(...) throws Exception
{
SimpleEntity value = simpleEntityValueState.value();
if (value == null)
{
SimpleEntity newVal = new SimpleEntity("sample");
logger.info("New Value put");
simpleEntityValueState.update(newVal);
}
...
}
...
}
public class CustomMyAnotherProcessFunction extends ProcessWindowFunction<..>
{
private transient ValueState<SimpleEntity> simpleEntityValueState;
#Override
public void open(Configuration parameters) throws Exception
{
ValueStateDescriptor<SimpleEntity> simpleEntityValueStateDescriptor = new ValueStateDescriptor<SimpleEntity>(
"sample",
TypeInformation.of(SimpleEntity.class)
);
simpleEntityValueState = getRuntimeContext().getState(simpleEntityValueStateDescriptor);
}
#Override
public void process(...) throws Exception
{
SimpleEntity value = simpleEntityValueState.value();
if (value != null)
logger.info(value.toString()); // I expect that SimpleEntity("sample")
out.collect(...);
}
...
}
As has been pointed out already, state is always local to a single operator instance. It cannot be shared.
What you can do, however, is stream the state updates from the operator holding the state to other operators that need it. With side outputs you can create complex dataflows without needing to share state.
I tried with your idea to share state between two operators using same key.
import org.apache.flink.api.common.state.ValueState;
import org.apache.flink.api.common.state.ValueStateDescriptor;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.source.SourceFunction;
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.api.windowing.windows.TimeWindow;
import org.apache.flink.util.Collector;
import java.io.IOException;
public class FlinkReuseState {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(3);
DataStream<Integer> stream1 = env.addSource(new SourceFunction<Integer>() {
#Override
public void run(SourceContext<Integer> sourceContext) throws Exception {
int i = 0;
while (true) {
sourceContext.collect(1);
Thread.sleep(1000);
}
}
#Override
public void cancel() {
}
});
DataStream<Integer> stream2 = env.addSource(new SourceFunction<Integer>() {
#Override
public void run(SourceContext<Integer> sourceContext) throws Exception {
while (true) {
sourceContext.collect(1);
Thread.sleep(1000);
}
}
#Override
public void cancel() {
}
});
DataStream<Integer> windowedStream1 = stream1.keyBy(Integer::intValue)
.timeWindow(Time.seconds(3))
.process(new ProcessWindowFunction<Integer, Integer, Integer, TimeWindow>() {
private ValueState<Integer> value;
#Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
ValueStateDescriptor<Integer> desc = new ValueStateDescriptor<Integer>("value", Integer.class);
value = getRuntimeContext().getState(desc);
}
#Override
public void process(Integer integer, Context context, Iterable<Integer> iterable, Collector<Integer> collector) throws Exception {
iterable.forEach(x -> {
try {
if (value.value() == null) {
value.update(1);
} else {
value.update(value.value() + 1);
}
} catch (IOException e) {
e.printStackTrace();
}
});
collector.collect(value.value());
}
});
DataStream<String> windowedStream2 = stream2.keyBy(Integer::intValue)
.timeWindow(Time.seconds(3))
.process(new ProcessWindowFunction<Integer, String, Integer, TimeWindow>() {
private ValueState<Integer> value;
#Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
ValueStateDescriptor<Integer> desc = new ValueStateDescriptor<Integer>("value", Integer.class);
value = getRuntimeContext().getState(desc);
}
#Override
public void process(Integer s, Context context, Iterable<Integer> iterable, Collector<String> collector) throws Exception {
iterable.forEach(x -> {
try {
if (value.value() == null) {
value.update(1);
} else {
value.update(value.value() + 1);
}
} catch (IOException e) {
e.printStackTrace();
}
});
collector.collect(String.valueOf(value.value()));
}
});
windowedStream2.print();
windowedStream1.print();
env.execute();
}
}
It doesn't work, each stream only update its own value state, the output is listed below.
3> 3
3> 3
3> 6
3> 6
3> 9
3> 9
3> 12
3> 12
3> 15
3> 15
3> 18
3> 18
3> 21
3> 21
3> 24
3> 24
keyed state
Based on the official docs, *Each keyed-state is logically bound to a unique composite of <parallel-operator-instance, key>, and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as <operator, key>*.
I think it is not possible to share state by giving same name to states in different operators.
Have u tried coprocess function? By doing so, you can also implement two proccess funcs for each stream, the only problem will be the timewindow then. Can you provide more details about your process logic?
Why cant you return the state as part of map operation and that stream can be used to connect to other stream

Apache Flink read at least 2 record to trigger sink

I am write my Apache Flink(1.10) to update records real time like this:
public class WalletConsumeRealtimeHandler {
public static void main(String[] args) throws Exception {
walletConsumeHandler();
}
public static void walletConsumeHandler() throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
FlinkUtil.initMQ();
FlinkUtil.initEnv(env);
DataStream<String> dataStreamSource = env.addSource(FlinkUtil.initDatasource("wallet.consume.report.realtime"));
DataStream<ReportWalletConsumeRecord> consumeRecord =
dataStreamSource.map(new MapFunction<String, ReportWalletConsumeRecord>() {
#Override
public ReportWalletConsumeRecord map(String value) throws Exception {
ObjectMapper mapper = new ObjectMapper();
ReportWalletConsumeRecord consumeRecord = mapper.readValue(value, ReportWalletConsumeRecord.class);
consumeRecord.setMergedRecordCount(1);
return consumeRecord;
}
}).assignTimestampsAndWatermarks(new BoundedOutOfOrdernessGenerator());
consumeRecord.keyBy(
new KeySelector<ReportWalletConsumeRecord, Tuple2<String, Long>>() {
#Override
public Tuple2<String, Long> getKey(ReportWalletConsumeRecord value) throws Exception {
return Tuple2.of(value.getConsumeItem(), value.getTenantId());
}
})
.timeWindow(Time.seconds(5))
.reduce(new SumField(), new CollectionWindow())
.addSink(new SinkFunction<List<ReportWalletConsumeRecord>>() {
#Override
public void invoke(List<ReportWalletConsumeRecord> reportPumps, Context context) throws Exception {
WalletConsumeRealtimeHandler.invoke(reportPumps);
}
});
env.execute(WalletConsumeRealtimeHandler.class.getName());
}
private static class CollectionWindow extends ProcessWindowFunction<ReportWalletConsumeRecord,
List<ReportWalletConsumeRecord>,
Tuple2<String, Long>,
TimeWindow> {
public void process(Tuple2<String, Long> key,
Context context,
Iterable<ReportWalletConsumeRecord> minReadings,
Collector<List<ReportWalletConsumeRecord>> out) throws Exception {
ArrayList<ReportWalletConsumeRecord> employees = Lists.newArrayList(minReadings);
if (employees.size() > 0) {
out.collect(employees);
}
}
}
private static class SumField implements ReduceFunction<ReportWalletConsumeRecord> {
public ReportWalletConsumeRecord reduce(ReportWalletConsumeRecord d1, ReportWalletConsumeRecord d2) {
Integer merged1 = d1.getMergedRecordCount() == null ? 1 : d1.getMergedRecordCount();
Integer merged2 = d2.getMergedRecordCount() == null ? 1 : d2.getMergedRecordCount();
d1.setMergedRecordCount(merged1 + merged2);
d1.setConsumeNum(d1.getConsumeNum() + d2.getConsumeNum());
return d1;
}
}
public static void invoke(List<ReportWalletConsumeRecord> records) {
WalletConsumeService service = FlinkUtil.InitRetrofit().create(WalletConsumeService.class);
Call<ResponseBody> call = service.saveRecords(records);
call.enqueue(new Callback<ResponseBody>() {
#Override
public void onResponse(Call<ResponseBody> call, Response<ResponseBody> response) {
}
#Override
public void onFailure(Call<ResponseBody> call, Throwable t) {
t.printStackTrace();
}
});
}
}
and now I found the Flink task only receive at least 2 records to trigger sink, is the reduce action need this?
You need two records to trigger the window. Flink only knows when to close a window (and fire subsequent calculation) when it receives a watermark that is larger than the configured value of the end of the window.
In your case, you use BoundedOutOfOrdernessGenerator, which updates the watermark according to the incoming records. So it generates a second watermark only after having seen the second record.
You can use a different watermark generator. In the troubleshooting training there is a watermark generator that also generates watermarks on timeout.

[Ljava.lang.Object; cannot be cast to com.lglsys.entity.EntityName

I was trying to get specific data from database but every time I'm getting the following error!
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to com.lglsys.entity.TDasProductDownload
So this is my QueryService class
#Dependent
public class QueryService {
List<TDasProductDownload> downloadLink = new ArrayList();
final private Logger logger =
LogManager.getLogger(QueryService.class.getName());
#PersistenceContext(unitName="DownloadServices")
EntityManager em;
public QueryService() { super(); }
public List<TDasProductDownload> findAllDownloadLinks() {
try {
downloadLink=
em.createQuery(queryForDownloadLinks,TDasProductDownload.class)
.getResultList();
return downloadLink;
} catch (Exception e) {
logger.info(e.toString());
return null;
}
}
}
program gives error in this class /
EndPoint class
public class PreControlWSEndPoint {
private Session session;
final private Logger logger = LogManager.getLogger(PreControlWSEndPoint.class.getName());
List<TDasProductDownload> downloadLink = new ArrayList();
#PersistenceContext(unitName="DownloadServices")
EntityManager em;
#Inject
QueryService service;
#OnOpen
public void Open(Session session) throws IOException, InterruptedException {
this.session = session;
this.sendMessage("Connection Oppened");
logger.info("EndPoint Opened");
try {
downloadLink = service.findAllDownloadLinks();
logger.info(downloadLink.size());
TDasProductDownload str = downloadLink.get(0);
logger.info(str.getDownloadStatus()); //**Eror line!!**
} catch (Exception e) {
logger.info(e.toString() + " .D");
}
}
#OnMessage
public void onMessage(String message) {}
#OnClose
public void Close() {}
}
I can't see what's happening in my code.
I fixed it!
public List<String> findAllDownloadLinks() {
try {
downloadLink=
em.createQuery(queryForDownloadLinks,String.class)
.getResultList();
return downloadLink;
} catch (Exception e) {
logger.info(e.toString());
return null;
}
}
then i can print like so
for(int temp=0;temp<=downloadLink.size();temp++){
logger.info(downloadLink.get(temp));
}

Flink -- get data from Cassandra as generic ResultSet and convert it to DataSet

I have StreamExecutionEnvironment job that consumes from kafka simple cql select queries.
I try to handle this queries asynchronically using following code:
public class GenericCassandraReader extends RichAsyncFunction {
private static final Logger logger = LoggerFactory.getLogger(GenericCassandraReader.class);
private ExecutorService executorService;
private final Properties props;
private Session client;
public ExecutorService getExecutorService() {
return executorService;
}
public GenericCassandraReader(Properties props, ExecutorService executorService) {
super();
this.props = props;
this.executorService = executorService;
}
#Override
public void open(Configuration parameters) throws Exception {
client = Cluster.builder().addContactPoint(props.getProperty("cqlHost"))
.withPort(Integer.parseInt(props.getProperty("cqlPort"))).build()
.connect(props.getProperty("keyspace"));
}
#Override
public void close() throws Exception {
client.close();
synchronized (GenericCassandraReader.class) {
try {
if (!getExecutorService().awaitTermination(1000, TimeUnit.MILLISECONDS)) {
getExecutorService().shutdownNow();
}
} catch (InterruptedException e) {
getExecutorService().shutdownNow();
}
}
}
#Override
public void asyncInvoke(final UserDefinedType input, final AsyncCollector<ResultSet> asyncCollector) throws Exception {
getExecutorService().submit(new Runnable() {
#Override
public void run() {
ListenableFuture<ResultSet> resultSetFuture = client.executeAsync(input.query);
Futures.addCallback(resultSetFuture, new FutureCallback<ResultSet>() {
public void onSuccess(ResultSet resultSet) {
asyncCollector.collect(Collections.singleton(resultSet));
}
public void onFailure(Throwable t) {
asyncCollector.collect(t);
}
});
}
});
}
}
each response of this code provides Cassandra ResultSet with different amount of fields .
Any Ideas for handling Cassandra ResultSet in Flink or should I use another technics to reach my goal ?
Thanks for any help in advance!
Cassandra ResultSet is not thread-safe. Better try to use Flink Cassandra connector. Or at least write your implementation in a similar way

Process works where transform does not

I am trying to hit a REST endpoint on Camel and convert that data into a class (and for simplicity and testing convert that class into a JSON string) and make a POST to a local server. I can get it to do all but make that final post and just seems to hang.
App:
#SpringBootApplication
public class App {
/**
* A main method to start this application.
*/
public static void main(String[] args) {
SpringApplication.run(App.class, args);
}
#Component
public class RestTest extends RouteBuilder {
#Override
public void configure() throws Exception {
restConfiguration().component("restlet").host("localhost").port(8000).bindingMode(RestBindingMode.json);
rest("/test").enableCORS(true)
.post("/post").type(User.class).to("direct:transform");
from("direct:transform")
.transform().method("Test", "alter")
.to("http4:/localhost:8088/ws/v1/camel");
}
}
}
Bean:
#Component("Test")
public class Test {
public void alter (Exchange exchange) {
ObjectMapper mapper = new ObjectMapper();
User body = exchange.getIn().getBody(User.class);
try {
String jsonInString = mapper.writeValueAsString(body);
exchange.getOut().setHeader(Exchange.HTTP_METHOD, constant(HttpMethods.POST));
exchange.getOut().setHeader(Exchange.CONTENT_TYPE, MediaType.APPLICATION_JSON);
exchange.getOut().setBody(jsonInString);
} catch (JsonGenerationException e) {
e.printStackTrace();
} catch (JsonMappingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
User:
public class User {
#JsonProperty
private String firstName;
#JsonProperty
private String lastName;
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public String getFirstName() {
return firstName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
public String getLastName() {
return lastName;
}
}
UPDATE
Able to get it to work with process instead of transform but errors when a response is sent back to Camel from the POST:
from("direct:transform")
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
ObjectMapper mapper = new ObjectMapper();
User body = exchange.getIn().getBody(User.class);
try {
String jsonInString = mapper.writeValueAsString(body);
exchange.getOut().setHeader(Exchange.HTTP_METHOD, constant(HttpMethods.POST));
exchange.getOut().setHeader(Exchange.CONTENT_TYPE, MediaType.APPLICATION_JSON);
exchange.getOut().setBody(jsonInString);
} catch (JsonGenerationException e) {
e.printStackTrace();
} catch (JsonMappingException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
})
.to("http4://0.0.0.0:8088/ws/v1/camel");
Error
com.fasterxml.jackson.databind.JsonMappingException: No serializer found for class org.apache.camel.converter.stream.CachedOutputStream$WrappedInputStream and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationFeature.FAIL_ON_EMPTY_BEANS)
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:284)
at com.fasterxml.jackson.databind.SerializerProvider.mappingException(SerializerProvider.java:1110)
at com.fasterxml.jackson.databind.SerializerProvider.reportMappingProblem(SerializerProvider.java:1135)
at com.fasterxml.jackson.databind.ser.impl.UnknownSerializer.failForEmpty(UnknownSerializer.java:69)
at com.fasterxml.jackson.databind.ser.impl.UnknownSerializer.serialize(UnknownSerializer.java:32)
at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:292)
at com.fasterxml.jackson.databind.ObjectWriter$Prefetch.serialize(ObjectWriter.java:1429)
at com.fasterxml.jackson.databind.ObjectWriter._configAndWriteValue(ObjectWriter.java:1158)
at com.fasterxml.jackson.databind.ObjectWriter.writeValue(ObjectWriter.java:988)
at org.apache.camel.component.jackson.JacksonDataFormat.marshal(JacksonDataFormat.java:155)
at org.apache.camel.processor.MarshalProcessor.process(MarshalProcessor.java:69)
at org.apache.camel.util.AsyncProcessorHelper.process(AsyncProcessorHelper.java:109)
at org.apache.camel.processor.MarshalProcessor.process(MarshalProcessor.java:50)
at org.apache.camel.component.rest.RestConsumerBindingProcessor$RestConsumerBindingMarshalOnCompletion.onAfterRoute(RestConsumerBindingProcessor.java:363)
at org.apache.camel.util.UnitOfWorkHelper.afterRouteSynchronizations(UnitOfWorkHelper.java:154)
at org.apache.camel.impl.DefaultUnitOfWork.afterRoute(DefaultUnitOfWork.java:278)
at org.apache.camel.processor.CamelInternalProcessor$RouteLifecycleAdvice.after(CamelInternalProcessor.java:317)
at org.apache.camel.processor.CamelInternalProcessor$InternalCallback.done(CamelInternalProcessor.java:246)
at org.apache.camel.processor.Pipeline.process(Pipeline.java:109)
at org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:197)
at org.apache.camel.processor.DelegateAsyncProcessor.process(DelegateAsyncProcessor.java:97)
at org.apache.camel.component.restlet.RestletConsumer$1.handle(RestletConsumer.java:68)
at org.apache.camel.component.restlet.MethodBasedRouter.handle(MethodBasedRouter.java:54)
at org.restlet.routing.Filter.doHandle(Filter.java:150)
at org.restlet.routing.Filter.handle(Filter.java:197)
at org.restlet.routing.Router.doHandle(Router.java:422)
at org.restlet.routing.Router.handle(Router.java:639)
at org.restlet.routing.Filter.doHandle(Filter.java:150)
at org.restlet.routing.Filter.handle(Filter.java:197)
at org.restlet.routing.Router.doHandle(Router.java:422)
at org.restlet.routing.Router.handle(Router.java:639)
at org.restlet.routing.Filter.doHandle(Filter.java:150)
at org.restlet.engine.application.StatusFilter.doHandle(StatusFilter.java:140)
at org.restlet.routing.Filter.handle(Filter.java:197)
at org.restlet.routing.Filter.doHandle(Filter.java:150)
at org.restlet.routing.Filter.handle(Filter.java:197)
at org.restlet.engine.CompositeHelper.handle(CompositeHelper.java:202)
at org.restlet.Component.handle(Component.java:408)
at org.restlet.Server.handle(Server.java:507)
at org.restlet.engine.connector.ServerHelper.handle(ServerHelper.java:63)
at org.restlet.engine.adapter.HttpServerHelper.handle(HttpServerHelper.java:143)
at org.restlet.engine.connector.HttpServerHelper$1.handle(HttpServerHelper.java:64)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Leading to the question of what is the fundamental difference between process and transform?
A "Processor" in camel is the lowest level message processing primitive. Under the covers a transform definition is executed as a org.apache.camel.processor.TransformProcessor which is itself a processor. In fact mostly everything is a Processor under the hood so strictly anything you can accomplish with transform you can accomplish with a pure processor:
Your last error is because you need to unmarshal the output of your HTTP call to an object, before it can be marshalled back to JSON using Jackson. Something like that:
from("direct:transform")
.transform().method("Test", "alter")
.to("http4:/localhost:8088/ws/v1/camel")
.unmarshal().json(JsonLibrary.Jackson, User.class);

Resources