why the NFS v3 is stateless and NFS V4 is state full? - filesystems

is there any specific reason why the authors chosen NFS v3 as stateless and V4 as state full?

NFS V4 is explicitly stateful, in contrast with earlier versions of NFS, which are stateless in principle, but rely on an auxiliary stateful protocol (NLM) for file locking. Among other things, this means that file locking operations are part of the NFS4 protocol proper, eliminating the need for separate rpc.statd and rpc.lockd daemons.

Related

How flink handle unused keyed state field when we update our job

We have a job which all the user feature and information are stored in keyed state. Each user feature represents a state descriptor. But we are evolving our features so sometimes some features are abandoned in our next release/version because we will no longer declare the abandoned feature state's descriptor in our code. My question is how flink takes care of those abandoned state? Will it no longer restore those abandoned state automatically?
If you are using Flink POJOs or Avro types, then Flink will automatically migrate the types and state for you. Otherwise, it will not, and you could implement a custom serializer instead. Or you could use the State Processor API to clean things up.

Apache Flink Stateful Functions state scaling

We are able to scale stateless functions as much as we want. But stateful functions likely to be bottleneck if we don't scale them as we scale stateless functions. Scaling state seems to be tricky because of the distribution of the data across the nodes. If stateful functions become a bottlenecks, can we scale them too?
Stateful functions are scaled exactly in the same way as stateless.
In StateFun, the remote functions are in fact stateless processes that receive the state just prior to the invocation for a given key, and after a successful invocation the changes to the state are communicated back to the Flink cluster. Therefore scaling stateless or stateful functions is really the same.
To learn more I'd recommend watching a keynote introducing StateFun 2.0 or visiting the distributed architecture page

Is global state with multiple workers possible in Flink?

Everywhere in Flink docs I see that a state is individual to a map function and a worker. This seems to be powerful in a standalone approach, but what if Flink runs in a cluster ? Can Flink handle a global state where all workers could add data and query it ?
From Flink article on states :
For high throughput and low latency in this setting, network communications among tasks must be minimized. In Flink, network communication for stream processing only happens along the logical edges in the job’s operator graph (vertically), so that the stream data can be transferred from upstream to downstream operators.
However, there is no communication between the parallel instances of an operator (horizontally). To avoid such network communication, data locality is a key principle in Flink and strongly affects how state is stored and accessed.
I think that Flink only supports state on operators and state on Keyed streams, if you need some kind of global state, you have to store and recover data into some kind of database/file system/shared memory and mix that data with your stream.
Anyways, in my experiece, with a good processing pipeline design and partitioning your data in the right way, in most cases you should be able to apply divide and conquer algorithms or MapReduce strategies to archive your needs
If you introduce in your system some kind of global state, that global state could be a great bottleneck. So try to avoid it at all cost.

Does Apache Flink checkpointing need to use with stateful functions?

Does the Apache Flink checkpointing feature need to be used with stateful functions?
You don't need to. If your functions do not have State, nothing will be checkpointed. But be aware that certain built-in functions have state on their own e.g. FlinkKafkaConsumer.

How to share static field across various workers(JVM) in storm?

I have a static field(counter) in one of my bolt. Now if i run the topology in cluster with various worker. Each JVM is going to have its own copy of static field. But I want a field which can be shared across by workers. How can i accomplish that?
I know i can persist the counter somewhere and read that in each JVM and update it(synchronously). But that will be performance issue. Is there any way out through Storm?
The default Storm api only provides at-least-once processing guarantee. This might or might not be what you want depending on whether the accuracy of the counter matters (e.g. when a tuple is reprocessed due to a worker failure, counter is incremented incorrectly)
I think you can look into Trident, which is a high-level API for Storm that can provide exactly-once processing guarantee and the abstractions (e.g. Trident State) for persisting states using databases or in-memory stores. Cassandra has a column type for counters which might be suitable for your use case.
Persisting trident state in memcached
https://github.com/nathanmarz/trident-memcached
Persisting trident state in Cassandra
https://github.com/Frostman/trident-cassandra

Resources