Is it better to use Row or GenericRowData with DataStream API? - apache-flink

I Am working with flink 1.15.2, should i use Row or GenericRowData that inherit RowData for my own data type?, i mostly use streaming api.
Thanks.
Sig.

In general the DataStream API is very flexible when it comes to record types. POJO types might be the most convenient ones. Basically any Java class can be used but you need to check which TypeInformation is extracted via reflection. Sometimes it is necessary to manually overwrite it.
For Row you will always have to provide the types manually as reflection cannot do much based on class signatures.
GenericRowData should be avoided, it is rather an internal class with many caveats (strings must be StringData and array handling is not straightforward). Also GenericRowData becomes BinaryRowData after deserialization. TLDR This type is meant for the SQL engine.

The docs are actually helpful here, I was confused too.
The section at the top titled "All Known Implementing Classes" lists all the implementations. RowData and GenericRowData are described as internal data structures. If you can use a POJO, then great. But if you need something that implements RowData, take a look at BinaryRowData, BoxedWrapperRowData, ColumnarRowData, NestedRowData, or any of the implementations there that aren't listed as internal.
I'm personally using NestedRowData to map a DataStream[Row] into a DataStream[RowData] and I'm not at all sure that's a good idea :) Especially since I can't seem to add a string attribute

Related

What's the difference between using the Visitor pattern and a separate class?

I would like to know what is the difference between the Visitor pattern and using a static method to execute code in separation.
Let's take a look at an example where I might call the Visitor pattern:
new AnalyticsVisitor.accept(myClass);
and this when called in from myClass for example, would move the work into a visitor to execute. It would even garbage collect faster if it's memory intensive.
Now lets take a look at using a simple method to achieve more or less the same thing:
new AnalyticsManager.execute(myClass);
Have I achieved the same thing?
I have code separation.
I can apply this to several data structures
I can add info to legacy code without changing it.
So why use the Visitor pattern instead of just a class (unless for double dispatch)?
This question is still a little confused. I suspect you haven't understood the goal of the Visitor pattern.
As discussed here the visitor pattern is useful when you have complex data structure (such as a parse tree) that is relatively stable (in terms of development), but you want to be able to keep adding new operations on all of its elements. This is clumsy with standard OO techniques.
The technology the visitor pattern is based on is double-dispatch, so when you say "Why use the Visitor pattern unless for double-dispatch?" you are effectively saying "Why use the visitor pattern?"
Your example code only includes the client, so it isn't clear what your new technique actually offers.
The supplied code appears to be backwards for a real visitor pattern. It should be:
my_datastructure.accept(analytics_visitor);
where analytics_visitor inherits from MyDataStructureVisitor, and supplies individual methods for each of the element types that the data structure can hold.
As for the achievements:
"Code separation" is a vague term. The visitor pattern allows the data structure to be defined without all the the operations (putative methods) to be defined. Instead, they can be defined separately - with a cost of poorer encapsulation.)
It isn't clear what it means to apply a visitor pattern to several data structures. Each visitor class is associated with one data structure.
The goal isn't to add 'info' to legacy code. It is to add operations to legacy code.

Can I use OWL API to enforce specific subject-predicate-object relationships?

I am working on a project using RDF data and I am thinking about implementing a data cleanup method which will run against an RDF triples dataset and flag triples which do not match a certain pattern, based on a custom ontology.
For example, I would like to enforce that class http://myontology/A must denote http://myontology/Busing the predicate http://myontology/denotes. Any instance of Class A which does not denote an instance of Class B should be flagged.
I am wondering if a tool such as the OWLReasoner from OWL-API would have the capability to accomplish something like this, if I designed a custom axiom for the Reasoner. I have reviewed the documentation here: http://owlcs.github.io/owlapi/apidocs_4/org/semanticweb/owlapi/reasoner/OWLReasoner.html
It seems to me that the methods available with the Reasoner might not be up for the purpose which I would like to use them for, but I'm wondering if anyone has experience using OWL-API for this purpose, or knows another tool which could do the trick.
Generally speaking, OWL reasoning is not well suited to finding information that's missing in the input and flagging it up: for example, if you create a class that asserts that an instance of A has exactly one denote relation to an instance of B, and have an instance of A that does not, under Open World assumption the reasoner will just assume that the missing statement is not available, not that you're in violation.
It would be possible to detect incorrect denote uses - if, instead of relating to an instance of B, the relation was to an instance of a class disjoint with B. But this seems a different use case than the one you're after.
You can implement code with the OWL API to do this check, but it likely wouldn't benefit from the ability to reason, and given that you're working at the RDF level I'd think an API like Apache Jena might actually work better for you (you won't need to worry if your input file is not OWL compliant, for example).

Advantages of Custom Collaborative Objects over CollaborativeMap?

From reading the docs, and my experience using the 4 built-in collaborative types, these possible advantages come to mind:
If you prefer to have the realtime functionality mixed into your classes, rather than using composition (class contains a Collaborative* field; this is what I'm doing now).
Some of the usual advantages of constructors, with the initializer hook, to ensure that all objects of the class satisfy some property.
Some of the usual advantages of typed objects over untyped ones. It seems you cannot write to fields that you haven't registered, so no bugs caused by mistyping a CollaborativeMap key, or accidentally assigning to a key that was meant for a different CollaborativeMap of a different informal type. The latter has happened to me. If I understand correctly, one could preclude both such bugs statically when using Typescript or Flow.
The onLoaded hook. It's not clear to me why something like this isn't available for the built-in types. Can it be simulated for the built-in types?
The two functionally equivalent (a custom collaborative object is implemented as a CollaborativeMap under the hood), the primary difference is just in the syntax as you point out.
For the onLoaded hook, you can do similar work for built in types in the document onLoaded function.

Data Contract Serializer mandates super class to know about subclass

I got this problem,
"The deserializer has no knowlege of any type that maps to this contract"
After googling, I reached this post
The deserializer has no knowlege of any type that maps to this contract
where the answer says, the base class have to declare "KnownTypes" like
[DataContract, KnownType(typeof(Subclass)) ...],
If I have to declare this in my parent class, [DataContract, KnownType(typeof(Subclass))], doesn't it break the principles of OO Design that parent class doesn't have to know about subclass?
What is the right way of doing this?
The serializer is designed in a way that, if it serializes an object, it should be able to read it back. If you attempt to serialize an object with a declared type of 'Base', but an actual type of 'Derived' (see example below), if you want to be able to read back from the serialized object an instance of 'Derived', you need to somehow annotate the XML that the instance is not of the type of which it was declared.
[DataContract]
public class MyType
{
[DataMember]
public object obj = new Derived();
}
The serialized version of the type would look something like the XML below:
<MyType>
<obj actualType="Derived">
<!-- fields of the derived type -->
</obj>
</MyType>
When the type is being deserialized, the serializer will look at the "actualType" (not actual name) attribute, and it will have to find that type, initialize it, and set its properties. It's a potential security issue to let the serializer (with in Silverlight lives is a trusted assembly and has more "rights" than the normal user code) to create arbitrary type, so that's one reason for limiting the types which can be deserialized. And based on the design of the serializer (if we can serialize it, we should be able to deserialize it), the serialization fails for that reason as well.
Another problem with that is that the serialized data is often used to communicate between different services, in different computers, and possibly with different languages. It's possible (and often it is the case) that you have a class in a namespace in the client which has a similar data contract to a class in the server side, but they have different names and / or reside in different namespaces. So simply adding the CLR type name in the "actualType" attribute won't work in this scenario either (the [KnownType] attribute helps the serialzier map the data contract name / namespace to the actual CLR type). Also, if you're talking to a service in a different language / platform (i.e., Java), CLR type names don't even make sense.
Another more detailed explanation is given at the post http://www.dotnetconsult.co.uk/weblog2/PermaLink,guid,a3775eb1-b441-43ad-b9f1-e4aaba404235.aspx - it talks about [ServiceKnownType] instead of [KnownType], but the principles are the same.
Finally, about your question: does it break that OO principle? Yes, that principle is broken, that's a price to pay for being able to have lose coupling between the client and services in your distributed (service-oriented) application.
Yes it breaks the principles of OO design. This is because SOA is about sharing contracts (the C in ABC of services) and not types, whereas OO is about type hierarchies. Think like this the client for a service may not be even in an OO language but SOA principles can still be applied. How the mapping is done on server side is an implementation issue.

What are the Pros and Cons of having Multiple Inheritance?

What are the pros and cons of having multiple inheritance?
And why don't we have multiple inheritance in C#?
UPDATE
Ok so it is currently avoided because of the issue with clashes resolving which parent method is being called etc. Surely this is a problem for the programmer to resolve. Or maybe this could be resolve simularly as SQL where there is a conflict more information is required i.e. ID might need to become Sales.ID to resolve a conflict in the query.
Here is a good discussion on the pitfalls of multiple inheritance:
Why should I avoid multiple inheritance in C++?
Here is a discussion from the C# team on why they decided not to allow multiple inheritance:
http://blogs.msdn.com/csharpfaq/archive/2004/03/07/85562.aspx
http://dotnetjunkies.com/WebLog/unknownreference/archive/2003/09/04/1401.aspx
It's just another tool in the toolbox. Sometimes, it is exactly the right tool. If it is, having to find a workaround because the language actually prohibits it is a pain and leads to good opportunities to screw it up.
Pros and cons can only be found for a concrete case. I guess that it's quite rare to actually fit a problem, but who are the language designers to decide how I am to tackle a specific problem?
I will give a pro here based on a C++ report-writer I've been converting to REALbasic (which has interfaces but only single-inheritance).
Multiple inheritance makes it easier to compose classes from small mixin base classes that implement functionality and have properties to remember state. When done right, you can get a lot of reuse of small code without having to copy-and-paste similar code to implement interfaces.
Fortunately, REALbasic has extends methods which are like the extension methods recently added to C# in C# 3.0. These help a bit with the problem, especially as they can be applied to arrays. I still ended up with some class hierarchies being deeper as a result of folding in what were previously multiply-inherited classes.
The main con is that if two classes have a method with the same name, the new subclass doesn't know which one to call.
In C# you can do a form of multiple inheritance by including instances of each parent object within the child.
class MyClass
{
private class1 : Class1;
private class2: Class2;
public MyClass
{
class1 = new Class1;
class2 = new Class2;
}
// Then, expose whatever functionality you need to from there.
}
When you inherit from something you are asserting that your class is of that (base) type in every way except that you may implement something slightly differently or add something to it, its actually extremely rare that your class is 2 things at once. Usually it just has behavour common to 2 or more things, and a better way to describe that generally is to have your class implement multiple interfaces. (or possibly encapsulation, depending on your circumstances)
It's one of those help-me-to-not-shoot-myself-in-the-foot quirks, much like in Java.
Although it is nice to extend fields and methods from multiple sources (imagine a Modern Mobile Phone, which inherits from MP3 Players, Cameras, Sat-Navs, and the humble Old School Mobile Phone), clashes cannot be resolved by the compiler alone.

Resources