DoubleValueSerializer and DoubleSerializer - apache-flink

DoubleSerializer and DoubleValueSerializer are all implemented TypeSerializerSingleton interface, they shared same methods, and in DoubleValue, the document shows that, it is a boxed value of java Double.
My question is, since we have DoubleValueSerializer, why we still need DoubleSerializer, what is the design in here?
Thanks for advance!

The DoubleSerializer and DoubleValueSerializer exist because the former serializes java Doubles and the latter serializes DoubleValue instances. These types are different.
The DoubleValue type represents a java Double which implements the Key and Value interface. These interfaces date back to the time when Flink could not directly handle java primitives. There you always had to wrap them in a Value type. Nowadays, there is no necessity anymore to use them directly. However, they are still used internally for some components.

Related

FloatArray vs Array<Float>: why are they different

Apologies if the answer is obvious, but I don't get it. I have a function that accepts a FloatArray so I passed a Array<Float> to it but it rejects it! I thought FloatArray was just another way of creating Array<Float>. What's the difference?
Short answer: one is an array of primitives, the other an array of references to Float objects.
The difference is mostly hidden from you in Kotlin, so to explain it's probably best to go back to Java…
Java has nine basic types (if I've counted correctly). Eight of them hold a value directly: boolean, byte, short, char, int, long, float, and double — those are called ‘primitives’. The other type is a reference, which can point to an instance of an object or array.
Because there are cases when you need to pass one of those primitive values around as an object, Java also provides some objects which simply wrap a primitive value: java.lang.Boolean, java.lang.Byte, and so on. There's one for each primitive type.
Most code uses primitives directly, but sometimes it's handy to be able to pass an object reference. (For one thing, primitives are not nullable, so if you need to support a null, then you'll need an object reference. For another, generic code such as List and the other classes in the collections framework can handle only object references.)
However, object wrappers are less efficient, because each instance is a full object and takes a certain amount of memory (e.g. 16–32 bytes, depending on the Java runtime) — and that's in addition to the size of references to it (perhaps 8 bytes). The JVM caches commonly-used wrappers (e.g. true and false for booleans, and some small numbers), but for anything else you'll be creating new objects on the heap.
The wrappers are clearly distinguished from the primitive types — they're capitalised (and, in the case of Integer, spelled differently). In early versions of Java, they were not interchangeable; you needed to explicitly wrap (e.g. Int(someValue) and unwrap (e.g. someReference.intValue()) when needed. Java 5 added ‘autoboxing’, where in many cases the compiler would do that for you. This blurs the distinction a bit, but most of the time you still need to be aware of it.
One of the benefits of Kotlin is that it removes some of Java's unnecessary complexity. One of the ways it does this is by hiding that distinction almost completely. The Kotlin language has no primitives: everything looks like an object. However, for reasons of efficiency, compiled Kotlin uses primitives ‘under the hood’ where possible. For example:
var i: Int
That declares an Int value — which will be stored as a primitive field. However:
var i: Int?
That declares a reference to an integer wrapper. (That's because primitives are not nullable, and so a primitive can't store a null value.)
This is an implementation detail: most of the time, when you're writing Kotlin, you don't need to be aware of this. But the distinction is still there at runtime, and arrays are one of the rare times it becomes visible:
FloatArray is an array of primitives. It uses the minimum of memory, and interoperates with Java code that uses a float[] type.
Array<Float> is an array of references to Float objects. It's more flexible, and interoperates with Java code that uses a Float[] type.
So you can see that these are two different types, even though they do similar things.
If you're interoperating with existing code, that will control which one you should use. If you're writing new code, then you have the choice: FloatArray is likely to be more efficient and use less memory — but Array<Float> tends to be better supported in other code (which may be able to process all the relevant types just by accepting a generic Array, instead of having to support FloatArray and IntArray and LongArray and all the others).
Some information about arrays in Kotlin is available here: https://kotlinlang.org/docs/basic-types.html#primitive-type-arrays
Kotlin also has classes that represent arrays of primitive types without boxing overhead: ByteArray, ShortArray, IntArray, and so on. These classes have no inheritance relation to the Array class, but they have the same set of methods and properties.
So FloatArray and Array<Float> are not the same, the difference is that the first has no boxing overhead.
Look at how FloatArray is declared in the documentation. It is just another class, not related to the Array<T> class at all. Sure, they represent very similar things, with the difference being that one of them would box Float values, and the other doesn't, as explained by the other answer. But from the perspective of the type system, they are totally unrelated. It's as if I declared:
class A
class B
and tried to pass an instance of A to a parameter expecting a B.
There are builtin methods to convert between these types though:
floatArrayOf(1f,2f,3f).toTypedArray() // FloatArray to Array<Float>
arrayOf(1f,2f,3f).toFloatArray() // Array<Float> to FloatArray
It's just that there is no implicit conversion between them, because these are unrelated types, unlike if you have subclasses and superclasses for example.

Why does a System.Array object have an Add() method?

I fully understand that a System.Array is immutable.
Given that, why does it have an Add() method?
It does not appear in the output of Get-Member.
$a = #('now', 'then')
$a.Add('whatever')
Yes, I know this fails and I know why it fails. I am not asking for suggestions to use [System.Collections.ArrayList] or [System.Collections.Generic.List[object]].
[System.Array] implements [System.Collections.IList], and the latter has an .Add() method.
That Array implements IList, which is an interface that also covers resizable collections, may be surprising - it sounds like there are historical reasons for it[1]
.
In C#, this surprise is hard to stumble upon, because you need to explicitly cast to IList or use an IList-typed variable in order to even access the .Add() method.
By contrast, since version 3, PowerShell surfaces even a type's explicit interface implementations as direct members of a given type's instance. (Explicit interface implementations are those referencing the interface explicitly in their implementation, such as IList.Add() rather than just .Add(); explicit interface implementations are not a direct part of the implementing type's public interface, which is why C# requires a cast / interface-typed variable to access them).
As a byproduct of this design, in PowerShell the .Add() method can be called directly on System.Array instances, which makes it easier to stumble upon the problem, because you may not realize that you're invoking an interface method. In the case of an array, the IList.Add() implementation (rightfully) throws an exception stating that Collection was of a fixed size; the latter is an exception of type NotSupportedException, which is how types implementing an interface are expected to report non-support for parts of an interface.
What helps is that the Get-Member cmdlet and even just referencing a method without invoking it - simply by omitting () - allow you to inspect a method to determine whether it is native to the type or an interface implementation:
PS> (1, 2).Add # Inspect the definition of a sample array's .Add() method
OverloadDefinitions
-------------------
int IList.Add(System.Object value)
As you can see, the output reveals that the .Add() method belongs to the Ilist interface.
[1] Optional reading: Collection-related interfaces in .NET with respect to mutability
Disclaimer: This is not my area of expertise. If my explanation is incorrect / can stand improvement, do tell us.
The root of the hierarchy of collection-related interfaces is ICollection (non-generic, since v1) and ICollection<T> (generic, since v2).
(They in turn implement IEnumerable / IEnumerable<T>, whose only member is the .GetEnumerator() method.)
While the non-generic ICollection interface commendably makes no assumptions about a collection's mutability, its generic counterpart (ICollection<T>) unfortunately does - it includes methods for modifying the collection (the docs even state the interface's purpose as "to manipulate generic collections" (emphasis added)). In the non-generic v1 world, the same had happened, just one level below: the non-generic IList includes collection-modifying methods.
By including mutation methods in these interfaces, even read-only/fixed-size lists/collections (those whose number and sequence of elements cannot be changed, but their element values may) and fully immutable lists/collections (those that additionally don't allow changing their elements' values) were forced to implement the mutating methods, while indicating non-support for them with NotSupportedException exceptions.
While read-only collection implementations have existed since v1.1 (e.g, ReadOnlyCollectionBase), in terms of interfaces it wasn't until .NET v4.5 that IReadOnlyCollection<T> and IImmutableList<T> were introduced (with the latter, along with all types in the System.Collections.Immutable namespace, only available as a downloadable NuGet package).
However, since interfaces that derive from (implement) other interfaces can never exclude members, neither IReadOnlyCollection<T> nor IImmutableCollection<T> can derive from ICollection<T> and must therefore derive directly from the shared root of enumerables, IEnumerable<T>.
Similarly, more specialized interfaces such as IReadOnlyList<T> that implement IReadOnlyCollection<T> can therefore not implement IList<T> and ICollection<T>.
More fundamentally, starting with a clean slate would offer the following solution, which reverses the current logic:
Make the major collection interfaces mutation-agnostic, which means:
They should neither offer mutation methods,
nor should they make any guarantees with respect to immutability.
Create sub-interfaces that:
add members depending on the specific level of mutability.
make immutability guarantees, if needed.
Using the example of ICollection and IList, we'd get the following interface hierarchy:
IEnumerable<T> # has only a .GetEnumerator() method
ICollection<T> # adds a .Count property (only)
IResizableCollection<T> # adds .Add/Clear/Remove() methods
IList<T> # adds read-only indexed access
IListMutableElements<T> # adds writeable indexed access
IResizableList<T> # must also implement IResizableCollection<T>
IResizableListMutableElements<T> # adds writeable indexed access
IImmutableList<T> # guarantees immutability
Note: Only the salient methods/properties are mentioned in the comments above.
Note that these new ICollection<T> and IList<T> interfaces would offer no mutation methods (no .Add() methods, ..., no assignable indexing).
IImmutableList<T> would differ from IList<T> by guaranteeing full immutability (and, as currently, offer mutation-of-a-copy-only methods).
System.Array could then safely and fully implement IList<T>, without consumers of the interface having to worry about NotSupportedExceptions.
To "Add" to #mklement0's answer: [System.Array] implements [System.Collections.IList] which specifies an Add() method.
But to answer why have an Add() if it doesn't work? Well, we haven't looked at the other properties, i.e. IsFixedSize :
PS > $a = #('now', 'then')
PS > $a.IsFixedSize
True
So, a [System.Array] is just a [System.Collections.IList] that is a Fixed Size. When we look back at the Add() method, it explicitly defines that if the List is Read-Only or Fixed Size, throw NotSupportedException which it does.
I believe the essence is not, "Let's have a function that just throws an error message for no reason", or to expand on it, No other reason than to fulfill an Interface, but it actually is providing a warning that you are legitimately doing something that you shouldn't do.
It's the typical Interface ideas, you can have an IAnimal type, with an GetLeg() method. This method would be used 90% of all animals, which makes it a good reason for implementing into the base Interface, but would throw an error when you use it against a Snake object because you didn't first check the .HasFeet property first.
The Add() method is a really good method for a List Interface, because it is an essential method for Non-Readonly and Non-Fixed length lists. We are the ones being stupid by not checking that the list is not IsFixedSize before calling an Add() method that would not work. i.e. this falls into the category of $null checks before trying to use things.

Passing array of bytes to an ActiveX

I want to pass a array of bytes to ActiveX. I am using delphi 7 and i'm using a InProcess Server (DLL).
I am using a pointer to the array of bytes and the size of the array, passing it to the InProcess Server. It is working well. I did this because I need performance. Does anyone see any trouble in this approach?
I see a post that is very similar: What data type is suitable to handle binary data in ActiveX method? but nobody gave this answer.
Passing the byte array as pointer together with size information is just fine.
However, some programming languages support only a small subset of all possible types. For example, Visual Basic for Application (not VB.NET) can only handle Automation compatible data types (see http://msdn.microsoft.com/en-us/library/cc237562(v=prot.20).aspx), and even not all of them (no support for 16bit unsigned integers, for example). To be on the safe side, I always use SAFEARRAYs whenever there is no good argument against it.
Also note that using non-automation compatible interfaces forces you to provide your own marshalling code in case you wanted to use your component OutProc. Since you mention that you intend to use your component only InProc, this should not worry you.
Regards,
Stuart

TCL/C - when is setFromAnyProc() called

I am creating a new TCL_ObjType and so I need to define the 4 functions, setFromAnyProc, updateStringProc, dupIntRepProc and freeIntRepProc. When it comes to test my code, I see something interesting/mystery.
In my testing code, when I do the following:
Tcl_GetString(p_New_Tcl_obj);
updateStringProc() for the new TCL object is called, I can see it in gdb, this is expected.
The weird thing is when I do the following testing code:
Tcl_SetStringObj(p_New_Tcl_obj, p_str, strlen(p_str));
I expect setFromAnyProc() is called, but it is not!
I am confused. Why it is not called?
The setFromAnyProc is not nearly as useful as you might think. It's role is to convert a value[*] from something with a populated bytes field into something with a populated bytes field and a valid internalRep and typePtr. It's called when something wants a generic conversion to a particular format, and is in particular the core of the Tcl_ConvertToType function. You probably won't have used that; Tcl itself certainly doesn't!
This is because it turns out that the point when you want to do the conversion is in a type-specific accessor or manipulator function (examples from Tcl's API include Tcl_GetIntFromObj and Tcl_ListObjAppendElement, which are respectively an accessor for the int type[**] and a manipulator for the list type). At that point, you're in code that has to know the full details of the internals of that specific type, so using a generic conversion is not really all that useful: you can do the conversion directly if necessary (or factor that out to a conversion function).
Tcl_SetStringObj works by throwing away the internal representation of your object (with the freeIntRepProc callback), disposing of the old bytes string representation (through Tcl_InvalidateStringRep, or rather its internal analog) and then installing the new bytes you've supplied.
I find that I can leave the setFromAnyProc field of a Tcl_ObjType set to NULL with no problems.
[*] The Tcl_Obj type is mis-named for historic reasons. It's a value. Tcl_Value was taken for something else that's now obsolete and virtually unused.
[**] Integers are actually represented by a cluster of internal types, depending on the number of bits required. You don't need to know the details if you're just using them, as the accessor functions completely hide the complexity.

Array vs ArraySeq comparison

This is a bit of a general question but I was wondering if anybody could advise me on what would be advantages of working with Array vs ArraySeq. From what I have seen Array is scala's representation of java Array and there are not too many members in its API whereas ArraySeq seems to contain a much richer API.
There are actually four different classes you could choose from to get mutable array-like functionality.
Array + ArrayOps
WrappedArray
ArraySeq
ArrayBuffer
Array is a plain old Java array. It is by far the best way to go for low-level access to arrays of primitives. There's no overhead. Also it can act like the Scala collections thanks to implicit conversion to ArrayOps, which grabs the underlying array, applies the appropriate method, and, if appropriate, returns a new array. But since ArrayOps is not specialized for primitives, it's slow (as slow as boxing/unboxing always is).
WrappedArray is a plain old Java array, but wrapped in all of Scala's collection goodies. The difference between it and ArrayOps is that WrappedArray returns another WrappedArray--so at least you don't have the overhead of having to re-ArrayOps your Java primitive array over and over again for each operation. It's good to use when you are doing a lot of interop with Java and you need to pass in plain old Java arrays, but on the Scala side you need to manipulate them conveniently.
ArraySeq stores its data in a plain old Java array, but it no longer stores arrays of primitives; everything is an array of objects. This means that primitives get boxed on the way in. That's actually convenient if you want to use the primitives many times; since you've got boxed copies stored, you only have to unbox them, not box and unbox them on every generic operation.
ArrayBuffer acts like an array, but you can add and remove elements from it. If you're going to go all the way to ArraySeq, why not have the added flexibility of changing length while you're at it?
From the scala-lang.org forum:
Array[T] - Benefits: Native, fast -
Limitations: Few methods (only apply,
update, length), need to know T at
compile-time, because Java bytecode
represents (char[] different from
int[] different from Object[])
ArraySeq[T] (the class formerly known
as GenericArray[T]): - Benefits: Still
backed by a native Array, don't need
to know anything about T at
compile-time (new ArraySeq[T] "just
works", even if nothing is known about
T), full suite of SeqLike methods,
subtype of Seq[T] - Limitations: It's
backed by an Array[AnyRef], regardless
of what T is (if T is primitive, then
elements will be boxed/unboxed on
their way in or out of the backing
Array)
ArraySeq[Any] is much faster than
Array[Any] when handling primitives.
In any code you have Array[T], where T
isn't <: AnyRef, you'll get faster
performance out of ArraySeq.
Array is a direct representation of Java's Array, and uses the exact same bytecode on the JVM.
The advantage of Array is that it's the only collection type on the JVM to not undergo type erasure, Arrays are also able to directly hold primitives without boxing, this can make them very fast under some circumstances.
Plus, you get Java's messed up array covariance behaviour. (If you pass e.g. an Array[Int] to some Java class it can be assigned to a variable of type Array[Object] which will then throw an ArrayStoreException on trying to add anything that isn't an int.)
ArraySeq is rarely used nowadays, it's more of a historic artifact from older versions of Scala that treated arrays differently. Seeing as you have to deal with boxing anyway, you're almost certain to find that another collection type is a better fit for your requirements.
Otherwise... Arrays have exactly the same API as ArraySeq, thanks to an implicit conversion from Array to ArrayOps.
Unless you have a specific need for the unique properties of arrays, try to avoid them too.
See This Talk at around 19:30 or This Article for an idea of the sort of problems that Arrays can introduce.
After watching that video, it's interesting to note that Scala uses Seq for varargs :)
As you observed correctly, ArraySeq has a richer API as it is derived from IndexedSeq (and so on) whereas Array is a direct representation of Java arrays.
The relation between the both could be roughly compared to the relation of the ArrayList and arrays in Java.
Due to it's API, I would recommend using the ArraySeq unless there is a specific reason not to do so. Using toArray(), you can convert to an Array any time.

Resources