Running a Stream on Flink Mini Cluster (1.11) and AvroKryoSerializerUtils are not working - apache-flink

I'm running into an issue when testing a stream on the flink mini cluster in my integration test.
The stream maps a generated Avro SpecificRecord Pojo Class (Java).
The stream job is written in Scala.
The flink runtime is crashing because it cannot instantiate the org.apache.flink.formats.avro.utils.AvroKryoSerializerUtils
Here is the stack trace:
stack: java.lang.ClassCastException: class org.apache.flink.formats.avro.utils.AvroKryoSerializerUtils
java.lang.RuntimeException: Could not instantiate org.apache.flink.formats.avro.utils.AvroKryoSerializerUtils.
at org.apache.flink.api.java.typeutils.AvroUtils.getAvroUtils(AvroUtils.java:53)
at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.buildKryoRegistrations(KryoSerializer.java:572)
I think the problem is that Flink is unable to serialize the Avro Pojo Class because the Class has multiple nested Avro Pojo Classes in it.
I tried to add all the type informations for all nested Pojo Class types but still running into the same issue.
So now I wonder if anyone made a Flink Job work with a generated Avro Pojo Class with nested Avro Pojo Classes. All classes inherit the type SpecificRecord and are generated from a avro schema.
Is there some kind of special Serializer that needs to be written? Is there any documentation or example out there for such a Serializer that deals with multiple nested Pojo Classes in Scala or Java?
Or is it a different problem altogether?
Many Thanks in advance for any help!

The issue may arise if flink-avro is not in classpath. If you are using Avro anyways, I'd disable Kryo completely to catch more subtle errors.

I made it work by doing the parsing inside a process function.
I had to parse a string to json and then to a Record Class for one specific field of the SpecificRecord class which should end up in the DataSink.
The parsing of the json is now implemented within another ProcessFuncton and now it works. Before I had the parsing in a map directly applied to the DataStream.

Related

Camel Bean IO use Java annotation instead of Xml Mapping

We are creating an output file which has different format per line, and this group of lines will repeat. Bean Io provides excellent support for the same with "groups" and "records"
We are using Camel for the same and BeanIo dataformater in Camel is looking for a mapping XML.Bean IO Mapping file
https://camel.apache.org/components/3.16.x/dataformats/beanio-dataformat.html
Stand alone versions of the BeanIO provide annotation to create the "Groups" and "Records". Is there a way to use the annotation in Camel as well rather than a mapping file

Apache Camel Concept Data format vs Data type

I am using Apache Camel. While I have an idea about the following concepts, would like to get a clear understanding of the following concepts. Yes, I have gone through Apache Camel documentation.
Data Format conversion
Data Type conversion
Marshalling and Unmarshalling
What I am looking for is a clear conceptual differentiation. Thanks in advance.
These terms have a lot of different meaning in programming and computers in general. Additionally, across Camel components the terms Data Format and Data Type may be used interchangeably.
Data Format -- typically the format of "data on the wire". This is like Text, or binary for file handling and messaging scenarios (.txt, .csv, .bin, JMS, MQTT, STOMP, etc).. or JSON and XML for REST and SOAP web services (generally over http)
Data Type -- totally overloaded.. in Camel, (I'll risk being flamed for this..).. it generally has a meaning of what Java class is used as the input or output to a component. Camel also has a ton of auto-type conversion routines, so some of the subtle differences go unnoticed by users. For example, consuming from a JMS queue.. may generate a javax.jms.TextMessage, but the next step may use a java.lang.String class. Camel can auto-convert between those types.
Marshalling and Unmarshalling is the step in converting from Java Class -> Data Format and Data Format -> Java Class. For example, a JSON payload would be unmarshalled to a com.bobtire.Order Java class and used by a Java processor in Camel. Conversely, after doing some processing, one may need to marshall a com.bobtire.Order Java class to JSON to send to a REST endpoint. These functions are handled by "data format" modules within Camel. Common ones: JSON, JAXB (for XML), Bindy, PGP and JCE (for encryption)

Extract data from JSON in vanilla Java/Camel/Spring

I am trying to write a Camel route to get JMX data from an ActiveMQ server through the Jolokia REST API. I was able to successfully get the JSON object from the ActiveMQ server, but I am running into an issue where I cannot figure out how to parse the JSON object in my Camel route. Camel is integrated with Jackson, Gson, and XStream, but each of those appear to require an extra library that I do not have. Camel also has support for JSONPath, but it requires another library that I do not have. All of my research so far seems to point to using a new software library, so I am looking for someone who knows a solution to possibly save me some time from trying several more dead ends.
The big catch is that I am trying to parse JSON with something that comes with Java/Camel/Spring/ActiveMQ/apache-commons. I would prefer a solution that only uses Camel/Spring XML, but another solution using Java would work (maybe JXPath with Apache Commons?).
The reason I am trying to use libraries that I currently have is the long process that our company has for getting new software libraries approved. I can wait several months to get a library approved or I can write my own specialized parser, but I am hoping there is some other way for me to extract some of the information from the JSON object that I am getting from the Jolokia JMX REST API in ActiveMQ.
There is no JSOn library out of the box in Java itself. But there is a RFE to maybe add that in a future Java release, maybe Java 9.
So if you want to parse json, you need to use a 3rd party library. So you better get your company to approve a library.
camel-core 2.15.x has a json scheme parser we use to parse the component docs json schemas that is shipped now. But its not a general purpose json parser, but can parse simple schemas.
Its at org.apache.camel.util.JsonSchemaHelper

Spring AOP - exclude specific aspects?

I'm using Spring 3.0.5, and was wondering if it's possible, to somehow exclude aspect classes from being loaded that have been annotated with the #Aspect stereotype, but at the same time, include other aspect annotated classes? It seems to be an all or nothing if you're going the annotation route(which I am) I've tried looking at the and but can;t seem to find anything that hints at this.
The reason for this is that I have a central core library which contains aspects, but I may not want to include these aspects in every project I create using this central library.
Thanks.
This is a long time for an answer...
If you are using AspectJ annotations with Spring AOP and not AspectJ runtime or compile-time weaving then you're in luck. Spring will only pick up #Aspect annotated classes if they're annotated with something like #Component as well. Spring does not consider #Aspect a candidate for component scanning. If you're using XML configuration, simply remove that bean from your config.
I would suggest NOT using component scanning that would hit a core library. For example, if your app is com.example.myapp1 and your core library is com.example.corelibrary.* make sure your component scanning is looking at com.example.myapp1 ONLY and not com.example.
It is not safe to component scan outside of your app's base package because of this exact reason. Pull in the individual aspects with an XML config for the bean.

Google App Engine: JDO does the job, JPA does not

I have setup a project using both Jdo and Jpa.
I used Jpa Annotation to Declare my Entity.
Then I setup my testCases based on LocalTestHelper (from Google App Engine Documentation).
When I run the test,
a call to makePersistent of Jdo:PersistenceManager is perfectly OK;
a call to persist of Jpa:EntityManager raised an error:
java.lang.IllegalArgumentException: Type ("org.seamoo.persistence.jpa.model.ExampleModel") is not that of an entity but needs to be for this operation
at org.datanucleus.jpa.EntityManagerImpl.assertEntity(EntityManagerImpl.java:888)
at org.datanucleus.jpa.EntityManagerImpl.persist(EntityManagerImpl.java:385)
Caused by: org.datanucleus.exceptions.NoPersistenceInformationException: The class "org.seamoo.persistence.jpa.model.ExampleModel" is required to be persistable yet no Meta-Data/Annotations can be found for this class. Please check that the Meta-Data/annotations is defined in a valid file location.
at org.datanucleus.ObjectManagerImpl.assertClassPersistable(ObjectManagerImpl.java:3894)
at org.datanucleus.jpa.EntityManagerImpl.assertEntity(EntityManagerImpl.java:884)
... 27 more
How can it be the case?
Below is the link to the compact source code of the maven projects that reproduce that problem:
Updated: http://seamoo.com/jpa-bug-reproduce-compact.tar.gz
Execute the maven test goal over the parent pom you will notice that 3/4 tests from org.seamoo.persistence.jdo.JdoGenericDAOImplTest passed, while all tests from org.seamoo.persistence.jpa.JpaGenericDAOImplTest failed.
So you either haven't enhanced your model classes, or haven't provided persistence metadata (XML, annotations) for them (at runtime). The log tells you ample information. And I really don't think that presenting people with some tgz with 3 separate projects and expecting them to find the particular class that you're referring to could be called "optimum usage of their time". Cut it down to the actual class, its metadata and a sample Main

Resources