Registering an Aggregate UDF in Apache Flink

Registering an Aggregate UDF in Apache Flink - apache-flink

I am trying to follow the steps here to create a basic Flink Aggregate UDF. I've added the dependencies () and implemented
public class MyAggregate extends AggregateFunction<Long, TestAgg> {..}
I've implemented the mandatory methods as well as a few other: accumulate, merge, etc. All this builds without errors. Now according to the docs, I should be able to register this as
StreamExecutionEnvironment sEnv = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment sTableEnv = StreamTableEnvironment.getTableEnvironment(sEnv);
sTableEnv.registerFunction("MyMin", new MyAggregate());
But, the registerFucntion seems to want a ScalarFunction only as input. I am getting an incompatible type error: The method registerFunction(String, ScalarFunction) in the type TableEnvironment is not applicable for the arguments (String, MyAggregate)
Any help would be great.

You need to import the StreamTableEnvironment for your chosen language which is in your case org.apache.flink.table.api.java.StreamTableEnvironment.
org.apache.flink.table.api.StreamTableEnvironment is a common abstract class for the Java and Scala variants of StreamTableEnvironment. We've noticed that this part of the API is confusing for users and we will improve it in the future.

Related

How to convert RowData into Row when using DynamicTableSink

I have a question regarding the new sourceSinks interface in Flink. I currently implement a new custom DynamicTableSinkFactory, DynamicTableSink, SinkFunction and OutputFormat. I use the JDBC Connector as an example and I use Scala.
All data that is fed into the sink has the type Row. So the OutputFormat serialisation is based on the Row Interface:
override def writeRecord(record: Row): Unit = {...}
As stated in the documentation:
records must be accepted as org.apache.flink.table.data.RowData. The
framework provides runtime converters such that a sink can still work
on common data structures and perform a conversion at the beginning.
The goal here is to keep the Row data structure and only convert Row into RowData when inserted into the SinkFunction. So in this way the rest of the code does not need to be changed.
class MySinkFunction(outputFormat: MyOutputFormat) extends RichSinkFunction[RowData] with CheckpointedFunction
So the resulting question is: How to convert RowData into Row when using a DynamicTableSink and OutputFormat? Where should the conversion happen?
links:
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sourceSinks.html
https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-jdbc/src/test/java/org/apache/flink/connector/jdbc
Thanks.

You can obtain a converter instance in the Context provided in org.apache.flink.table.connector.sink.DynamicTableSink#getSinkRuntimeProvider.
// create type information for the DeserializationSchema
final TypeInformation<RowData> producedTypeInfo =
context.createTypeInformation(producedDataType);
// most of the code in DeserializationSchema will not work on internal data structures
// create a converter for conversion at the end
final DataStructureConverter converter =
context.createDataStructureConverter(producedDataType);
The instance is Java serializable and can be passed into the sink function. You should also call the converter.open() method in your sink function.
A more complex example can be found here (for sources but sinks work in a similar way). Have a look at SocketDynamicTableSource and ChangelogCsvFormat in the same package.

State Schema Evolution with POJOs

I'm using flink 1.11 with Scala and i have a question regarding the schema evolution using a POJO.
In the documentation is written, that POJOs are supported for state schema evolution (with some limitations).
Are Scala case clases also considered as POJO and therefore supported?
case class WordCount(word: String, count: Int)
Or have i to write something like this:
class WordCount(var word: String, var count: Int) {
def this() {
this(null, -1)
}
}

Case classes are not POJOs. In particular, they do not satisfy:
The class has a public no-argument constructor
All non-static, non-transient fields in the class (and all superclasses) are either public (and non-final) or have a public getter- and a setter- method that follows the Java beans naming conventions for getters and setters. (afaik case classes have final fields with getters in the generated JVM class)
You can implement all required things in a normal scala class but your IDE might not support you well. An option is to create your class in Java, let your IDE beanify it and convert it to scala (or use it directly).
There is also the option to create evolution support for case classes with a custom serializer. That will eventually be available by Flink. (You could also go ahead and contribute it).

Datastore query without model class

I recently encountered a situation where one might want to run a datastore query which includes a kind, but the class of the corresponding model is not available (e.g. if it's defined in a module that hasn't been imported yet).
I couldn't find any out-of-the-box way to do this using the google.appengine.ext.db package, so I ended up using the google.appengine.api.datastore.Query class from the low-level datastore API.
This worked fine for my needs (my query only needed to count the number of results, without returning any model instances), but I was wondering if anyone knows of a better solution.
Another approach I've tried (which also worked) was subclassing db.GqlQuery to bypass its constructor. This might not be the cleanest solution, but if anyone is interested, here is the code:
import logging
from google.appengine.ext import db, gql
class ClasslessGqlQuery(db.GqlQuery):
"""
This subclass of :class:`db.GqlQuery` uses a modified version of ``db.GqlQuery``'s constructor to suppress any
:class:`db.KindError` that might be raised by ``db.class_for_kind(kindName)``.
This allows using the functionality :class:`db.GqlQuery` without requiring that a Model class for the query's kind
be available in the local environment, which could happen if a module defining that class hasn't been imported yet.
In that case, no validation of the Model's properties will be performed (will not check whether they're not indexed),
but otherwise, this class should work the same as :class:`db.GqlQuery`.
"""
def __init__(self, query_string, *args, **kwds):
"""
**NOTE**: this is a modified version of :class:`db.GqlQuery`'s constructor, suppressing any :class:`db.KindError`s
that might be raised by ``db.class_for_kind(kindName)``.
In that case, no validation of the Model's properties will be performed (will not check whether they're not indexed),
but otherwise, this class should work the same as :class:`db.GqlQuery`.
Args:
query_string: Properly formatted GQL query string.
*args: Positional arguments used to bind numeric references in the query.
**kwds: Dictionary-based arguments for named references.
Raises:
PropertyError if the query filters or sorts on a property that's not indexed.
"""
from google.appengine.ext import gql
app = kwds.pop('_app', None)
namespace = None
if isinstance(app, tuple):
if len(app) != 2:
raise db.BadArgumentError('_app must have 2 values if type is tuple.')
app, namespace = app
self._proto_query = gql.GQL(query_string, _app=app, namespace=namespace)
kind = self._proto_query._kind
model_class = None
try:
if kind is not None:
model_class = db.class_for_kind(kind)
except db.KindError, e:
logging.warning("%s on %s without a model class", self.__class__.__name__, kind, exc_info=True)
super(db.GqlQuery, self).__init__(model_class)
if model_class is not None:
for property, unused in (self._proto_query.filters().keys() +
self._proto_query.orderings()):
if property in model_class._unindexed_properties:
raise db.PropertyError('Property \'%s\' is not indexed' % property)
self.bind(*args, **kwds)
(also available as a gist)

You could create a temporary class just to do the query. If you use an Expando model, the properties of the class don't need to match what is actually in the datastore.
class KindName(ndb.Expando):
pass
You could then do:
KindName.query()
If you need to filter on specific properties, then I suspect you'll have to add them to the temporary class.

Why I am able to re-create java.lang package and classes?

I am just playing with package structure. And to my surprise I can bypass the default classes by creating my package and class name with that name.
For ex:
I created a package called java.lang and Class is Boolean. When I import java.lang.Boolean it's not the JDK's version of Boolean. It's mine. It's just showing the methods of Objects which every object java have.
Why so ? Why I am allowed to create the package java.lang? And the program runs fine.
Another baffle is if I create a Class with name Object and try to runs the program then an exception
java.lang.SecurityException: Prohibited package name: java.lang
at java.lang.ClassLoader.preDefineClass(Unknown Source)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
Why is this behaviour ? is this a bug or normal behaviour ?

The restriction on java.lang classes is a runtime restriction, not a compile time one.
The JVM actually specifically provides a mechanism for overriding classes in java.lang. You can do it using the -Xbootclasspath command line flag:
-Xbootclasspath:bootclasspath
Specifies a semicolon-separated list of directories, JAR files, and ZIP archives to search for boot class files. These are used in place of the boot class files included in the Java platform JDK.
Applications that use this option for the purpose of overriding a class in rt.jar should not be deployed because doing so would contravene the Java Runtime Environment binary code license.
-Xbootclasspath/a:path
Specifies a semicolon-separated path of directories, JAR files, and ZIP archives to append to the default bootstrap class path.
-Xbootclasspath/p:path
Specifies a semicolon-separated path of directories, JAR files, and ZIP archives to add in front of the default bootstrap class path.
Do not deploy applications that use this option to override a class in rt.jar because this violates the Java Runtime Environment binary code license.
However, as I've already emphasized with bold marks, doing so is a violation of the Oracle Binary Code License Agreement for Java SE and JavaFX Technologies:
D. JAVA TECHNOLOGY RESTRICTIONS. You may not create, modify, or change the behavior of, or authorize your licensees to create, modify, or change the behavior of, classes, interfaces, or subpackages that are in any way identified as "java", "javax", "javafx", "sun", “oracle” or similar convention as specified by Oracle in any naming convention designation. You shall not redistribute the Software listed on Schedule 1.
Apart from the above, you may add whatever class you want to whatever packages you want; it's specifically discussed in the the JLS §13.3:
13.3. Evolution of Packages
A new top level class or interface type may be added to a package without breaking compatibility with pre-existing binaries, provided the new type does not reuse a name previously given to an unrelated type.
If a new type reuses a name previously given to an unrelated type, then a conflict may result, since binaries for both types could not be loaded by the same class loader.
Changes in top level class and interface types that are not public and that are not a superclass or superinterface, respectively, of a public type, affect only types within the package in which they are declared. Such types may be deleted or otherwise changed, even if incompatibilities are otherwise described here, provided that the affected binaries of that package are updated together.

Answer to SecurityException related question:
SecurityManger throws this RuntimeException while your classloader calling defineClass method and encountered specified class(your "custom class") name has "java.*" in it.
This is because you defined your class in "java.*" package and as per ClassLoader's documentation this is not allowed.
defineClass( )
..
The specified name cannot begin with "java.", since all classes in the "java.* packages can only be defined by the bootstrap class loader. If name is not null, it must be equal to the binary name of the class specified by the byte array "b", otherwise a NoClassDefFoundError will be thrown.
Throws:
..
SecurityException - If an attempt is made to add this class to a package that contains classes that were signed by a different set of certificates than this class, or if name begins with "java.".
For your testing, try creating java.test package and define one Custom class (names doesn't matter; like Object..). In this case as well you will get same SecurityException.
package java.test;
public class Test {
public static void main(String[] args) {
System.out.println("This is Test");
}
}

This is not Bug.
Behaviour beacause of:
When the Java Virtual Machine (JVM) tries to load our class, it recognizes its package name as invalid and thus, a SecurityException is thrown.
The SecurityException indicates that a security violation has occurred an thus, the application cannot be executed.
public class SecurityException
extends RuntimeException
Thrown by the security manager to indicate a security violation.
please use different package name it not for only language package of java.it covers all package not gives permissions to override in build classes and packages of java.
By Changing this we can create or override same package and class:
a/j2ee.core.utilities/src/org/netbeans/modules/j2ee/core/api/support/java/JavaIdentifiers.java
b/j2ee.core.utilities/src/org/netbeans/modules/j2ee/core/api/support/java/JavaIdentifiers.java
**if (packageName.startsWith(".") || packageName.endsWith(".")) {// NOI18N
return false;
}
if(packageName.equals("java") || packageName.startsWith("java.")) {//NOI18N
return false;
}**
String[] tokens = packageName.split("\\."); //NOI18N
if (tokens.length == 0) {
return Utilities.isJavaIdentifier(packageName);
a/j2ee.core.utilities/test/unit/src/org/netbeans/modules/j2ee/core/api/support/java/JavaIdentifiersTest.java b/j2ee.core.utilities/test/unit/src/org/netbeans/modules/j2ee/core/api/support/java/JavaIdentifiersTest.java
assertFalse(JavaIdentifiers.isValidPackageName(" "));
assertFalse(JavaIdentifiers.isValidPackageName("public"));
assertFalse(JavaIdentifiers.isValidPackageName("int"));
assertFalse(JavaIdentifiers.isValidPackageName("java"));
assertFalse(JavaIdentifiers.isValidPackageName("java.something"));
}

Your problem with java.lang.Boolean as your Boolean Class, and not the Object one is simple to explain.
The Object class is the root of every other classes you can find, use, or even create. Which means that if you could have the ability to override it, not a single class, method, or whatever you want to use would work, since every of them depends on that root class.
For the Boolean Class, it is not a boolean type, but a class for a boolean type. And since nothing depends on it, it is then possible to override it.
A better way to understand this problem, is to look at this link: [http://docs.oracle.com/javase/7/docs/api/overview-tree.html] You will notice that every kind of package, containing every kind of java classes, depends on the Object Class.
So the security exception you encountered is like a "life savior" for your program.
If I'm wrong about your question, other persons may find a more appropriate answer to it. :)

Camel - extend Java DSL?

I've got a repeating pattern in my routes - a certain Processor needs the same 3 Headers set every time I call it, so I've got the following code in my routes about 10+ times:
.whatever()
.setHeader("foo1", "bar1")
.setHeader("foo2", "bar2")
.setHeader("foo3", "bar3")
.processRef("processorBazThatNeedsHeaders")
.whatever()
The headers are populated differently every time, so abstracting this out into a subroute doesn't really buy me anything.
What I love to be able to do is subclass RouteDefinition to have another method in my DSL that would allow me to do this:
.whatever()
.bazProcessor("bar1", "bar2", "bar3")
.whatever()
and in 'bazProcessor', set the headers and call the processor.
I've tried to do this but it seems that it's only possible with some serious probably-not-future-proof surgery, and it seems that others have had similar luck.
I need them to be set as headers as opposed to passing them as parameters directly to the processor because the values are also used after the processor for routing.
Is there some hidden facility to achieve something like this?

By subclassing the RouteDefinition your extension will only be visible direct after from(...). This could be a limitation if you would like to use the DSL extension for example after the filter(...) DSL.
A simpler approach would be to encapsulate the logic somewhere, and use it in a class that implements the org.apache.camel.Processor interface, and then call an overload of .process(...), or bean(...) in the route to use the logic. You will be actually very closed to a DSL extension if you use a meaningful name for the Processor instance or a method, that returns that Processor instance. Here is an example of the suggested approach. At the end, your code could look like:
.whatever()
.process(setTheHeadersForBaz)
.whatever()
Just for reference: if you really need to do a DSL, there is a project that extends the Camel DSL based on Groovy. I guess a Scala way based on the Camel Scala DSL could be also an option.

Though slightly irrelevant, following is an example of extending Scala DSL.
We can create an implicit methods to DSL trait via an implicit class.
object DSLImplicits {
implicit class RichDSL(val dsl: DSL) {
def get = dsl.setHeader(Exchange.HTTP_METHOD, _ => HttpMethods.GET.name)
def post = dsl.setHeader(Exchange.HTTP_METHOD, _ => HttpMethods.POST.name)
}
}
And use it like this.
import DSLImplicits.RichDSL
//----------------------------
from("someWhere")
//Do some processing
.get.to("http://somewhere.com")
More details #
http://siliconsenthil.in/blog/2013/07/11/apache-camel-with-scala-extending-dsl/

So you only set the headers because you want the Processor to have access to those values?
If so then a simple example using a Factory could look like this:
whatever()
.process(BazProcessorFactory.instance("bar1", "bar2", "bar3"))
.whatever()
Where the BazProcessorFactory is just a wrapper around your Processor:
public class BazProcessorFactory {
public Processor instance(final String...vals) {
return new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
//access your array of values here
System.out.println("Foo1 = "+vals[0]);
}
}
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Registering an Aggregate UDF in Apache Flink - apache-flink

Related

How to convert RowData into Row when using DynamicTableSink

State Schema Evolution with POJOs

Datastore query without model class

Why I am able to re-create java.lang package and classes?

Camel - extend Java DSL?

Categories

Resources