Whether to support redis6.0 multi-threading, compatible with redis3.0, 4.0, 5.0 transactions and subscriptions
Related
I am using Flink 1.13.1 and we plan to injest ftp based sources. Since Flink depends on Hadoop dependencies to support such operations, I need to understand if Flink supports Hadoop 3.X. The documentation doesn't mention anything about this.
Below is a slide about Flink's optimizer from my a presentation I watched. I'm particularly confused about the comment that Flink's optimizer decides on parallelism depending on the cardinalities of the provided dataset.
I'm currently going through the Flink 1.4 (the version I'm using) documentation and I can't seem to find any documentation regarding Flink's decision on parallelism. Do I need to provide Flink's optimizer with statistics about the datasets in order to take advantage of this feature?
On a related note, I thought that by specifying a maxParallelism value, this potentially would enable Flink to dynamically determine what level of parallelism would be appropriate for the provided dataset automatically (as detailed above). However, I'm unable to specify max parallelism as specified by the Flink 1.4 documentation, which is why I haven't been able to verify my hypothesis. For some context, I am using the DataSet API. How do I specify max parallelism in Flink?
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
env.setMaxParallelism(20); // can't seem to call this method on env
Not sure where you found this presentation but it is quite old, probably 2014 or early 2015.
The slide discusses the optimizer of Flink's DataSet API. The optimizer is not used to optimize DataStream API programs. On the other hand, the setting of the maximum parallelism is only applicable for DataStream API programs but not for DataSet programs.
The quoted sentence is under the bullet point "Goal: efficient execution plans for data processing plans". Not all of its subpoints have been implemented, including automatic configuration of exeuction parallelism.
The roadmap of the Flink community includes the plan to integrate the DataSet API into the DataStream API and drop the optimizer. Flink's Table API / SQL will continue to have a cost-based optimizer (based on Apache Calcite) and might also configure the execution parallelism in the future.
Quote from javadoc on StreamExecutionEnvironment.setMaxParallelism:
The maximum degree of parallelism specifies the upper limit for dynamic scaling.
Which exactly dynamic scaling is meant here? I couldn't find any empirical evidence of operator auto scaling: whatever number of free slots you have, and no matter how big is maxParallelism, and how many logical partitions is there, the actual parallelism (according to the web ui) is always the one that was set through a setParallelism
Also, according to this the most accepted and never challenged answer https://stackoverflow.com/a/43493109/2813148
there's no such thing as dynamic scaling in Flink.
So is there any? Or the javadoc is misleading (or what's the meaning of "dynamic" there)?
If there's none, are there any plans for this feature?
Flink (in version 1.5.0) does not support dynamic scaling yet.
However, job can be manually scaled (or by an external service) by taking a savepoint, stopping the running job, and restarting the job with an adjusted (smaller or larger) parallelism. However, the new parallelism can be at most the previously configured max-parallelism. Once a job was started, the max-parallelism is baked into the savepoints and cannot be changed anymore.
Support for dynamic scaling is on the roadmap. Since version 1.5.0 (released in May 2018), Flink supports dynamic resource allocation from resource managers such as Yarn and Mesos. This is an important step towards dynamic scaling. In fact, an experimental version of this feature has been demonstrated at Flink Forward SF 2018 in April 2018.
As both are streaming frameworks which processes event at a time, What are the core architectural differences between these two technologies/streaming framework?
Also, what are some particular use cases where one is more appropriate than the other?
As you mentioned both are streaming platform which to in memory computation in real time. But there are some architectural differences when you take a closer look.
Apex is yarn native architecture, it fully utilises yarn for scheduling, security & multi-tenancy where as Flink integrates with yarn. Apex can do resource allocation at operator (container) level with yarn.
Partitioning: Apex supports several sophisticated stream partitioning schemes and also allows controlling operator locality & stream locality. Flink supports simple hash partitions and custom partitions.
Apex allows dynamic changes to topology without having to take down the application. Apex allows the application to be updated at runtime so you can add and remove operators, update properties of operators, or automatically scale the application at runtime. Apache Flink does not support any of these capabilities.
Buffer Server: There is a message bus called buffer server between operators. Subscribers can connect to buffer server and fetch data from particular offsets. This is window aware, and holds data as long as no subscriber needs it.
Fault tolerance: Apex has incremental recovery model, on failure it can only part of topology can be restarted no need to go back to source, where in flink it goes back to source.
Apex has high level api as well as low level api. Flink only has high level api.
Apex has a library called Apache Malhar which has vast variety of well tested connectors and processing operators which can be reused easily.
Lastly Apex is more focused on productizing big data applications so has many features which will help in easy development and maintenance of applications.
Note: I am a committer to Apache Apex, so I might sound biased to Apex :)
if the application is connected in SSL3.0, I do not want to display any page.
I am learner of English and GAE.
If English is hard to understand , I'm sorry .
It seems the Google security team has already taken the necessary steps.
According to their recent security bulletin,
App Engine, Cloud Storage, BigQuery, and CloudSQL customers do not need to take any actions. Google’s servers have been updated and are protected from this vulnerability. Customers of Compute Engine need to update their OS images.
I am not sure exactly what that means, but presumably SSLv3 fallback connections are disabled now.
Update
Okay, apparently they did not disable SSLv3 completely, but provide a more secure fallback. From a blog post
Disabling SSL 3.0 support, or CBC-mode ciphers with SSL 3.0, is sufficient to mitigate this issue, but presents significant compatibility problems, even today. Therefore our recommended response is to support TLS_FALLBACK_SCSV. This is a mechanism that solves the problems caused by retrying failed connections and thus prevents attackers from inducing browsers to use SSL 3.0. It also prevents downgrades from TLS 1.2 to 1.1 or 1.0 and so may help prevent future attacks.
Google Chrome and our servers have supported TLS_FALLBACK_SCSV since February and thus we have good evidence that it can be used without compatibility problems