I need to use the same HashMap from different threads (EDT, a Timer, and maybe the Network Thread). Since Codename One doesn't have a ConcurrentHashMap implementation in its API, I tried to circumvent this issue so:
/**
* ConcurrentHashMap emulation for Codename One.
*/
public class ConcurrentHashMap<K, V> {
private static final EasyThread thread = EasyThread.start("concurrentHashMap_Thread");
private final Map<K, V> map = new HashMap<>();
public V put(K key, V value) {
return thread.run(() -> map.put(key, value));
}
public V get(K key) {
return thread.run(() -> map.get(key));
}
public V remove(K key) {
return thread.run(() -> map.remove(key));
}
}
I have two questions:
Is a code like the above actually necessary in Codename One, when accessing a Map from two or more threads? Is it better to use an HashMap directly?
What do you think about that code? I tried to keep it as simple as possible, but I'm not sure of its correctness.
That code is very correct. It's also very inefficient as you're creating multiple locks to add a new object to a different thread. So you're effectively bringing a tank to a problem that's far simpler.
The TL;DR is this:
map = Collections.synchronizedMap(map);
Your code could have been written like this and would have been more efficient:
public class ConcurrentHashMap<K, V> {
private static final EasyThread thread = EasyThread.start("concurrentHashMap_Thread");
private final Map<K, V> map = new HashMap<>();
public synchronized V put(K key, V value) {
return map.put(key, value);
}
public synchronized V get(K key) {
return map.get(key);
}
public synchronized V remove(K key) {
return map.remove(key);
}
}
But you obviously don't need it since we have the synchronizedMap method...
So why don't we have ConcurrentHashMap?
Because we don't run on huge servers. See this article: https://crunchify.com/hashmap-vs-concurrenthashmap-vs-synchronizedmap-how-a-hashmap-can-be-synchronized-in-java/
The TL;DR of that article is that ConcurrentHashMap makes sense when you have a lot of threads and a lot of server cores. It tries to go the extra mile at limiting synchronization for maximum scale. In mobile that's a huge overkill.
Related
We are migrating a spark job to flink. We have used pre-shuffle aggregation in spark. Is there a way to execute similar operation in spark. We are consuming data from apache kafka. We are using keyed tumbling window to aggregate the data. We want to aggregate the data in flink before performing shuffle.
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html
yes, it is possible and I will describe three ways. First the already built-in for Flink Table API. The second way you have to build your own pre-aggregate operator. The third is a dynamic pre-aggregate operator which adjusts the number of events to pre-aggregate before the shuffle phase.
Flink Table API
As it is shown here you can do MiniBatch Aggregation or Local-Global Aggregation. The second option is better. You basically tell to Flink to create mini-batches of every 5000 events and pre-aggregate them before the shuffle phase.
// instantiate table environment
TableEnvironment tEnv = ...
// access flink configuration
Configuration configuration = tEnv.getConfig().getConfiguration();
// set low-level key-value options
configuration.setString("table.exec.mini-batch.enabled", "true");
configuration.setString("table.exec.mini-batch.allow-latency", "5 s");
configuration.setString("table.exec.mini-batch.size", "5000");
configuration.setString("table.optimizer.agg-phase-strategy", "TWO_PHASE");
Flink Stream API
This way is more cumbersome because you have to create your own operator using OneInputStreamOperator and call it using the doTransform(). Here is the example of the BundleOperator.
public abstract class AbstractMapStreamBundleOperator<K, V, IN, OUT>
extends AbstractUdfStreamOperator<OUT, MapBundleFunction<K, V, IN, OUT>>
implements OneInputStreamOperator<IN, OUT>, BundleTriggerCallback {
#Override
public void processElement(StreamRecord<IN> element) throws Exception {
// get the key and value for the map bundle
final IN input = element.getValue();
final K bundleKey = getKey(input);
final V bundleValue = this.bundle.get(bundleKey);
// get a new value after adding this element to bundle
final V newBundleValue = userFunction.addInput(bundleValue, input);
// update to map bundle
bundle.put(bundleKey, newBundleValue);
numOfElements++;
bundleTrigger.onElement(input);
}
#Override
public void finishBundle() throws Exception {
if (!bundle.isEmpty()) {
numOfElements = 0;
userFunction.finishBundle(bundle, collector);
bundle.clear();
}
bundleTrigger.reset();
}
}
The call-back interface defines when you are going to trigger the pre-aggregate. Every time that the stream reaches the bundle limit at if (count >= maxCount) your pre-aggregate operator will emit events to the shuffle phase.
public class CountBundleTrigger<T> implements BundleTrigger<T> {
private final long maxCount;
private transient BundleTriggerCallback callback;
private transient long count = 0;
public CountBundleTrigger(long maxCount) {
Preconditions.checkArgument(maxCount > 0, "maxCount must be greater than 0");
this.maxCount = maxCount;
}
#Override
public void registerCallback(BundleTriggerCallback callback) {
this.callback = Preconditions.checkNotNull(callback, "callback is null");
}
#Override
public void onElement(T element) throws Exception {
count++;
if (count >= maxCount) {
callback.finishBundle();
reset();
}
}
#Override
public void reset() {
count = 0;
}
}
Then you call your operator using the doTransform:
myStream.map(....)
.doTransform(metricCombiner, info, new RichMapStreamBundleOperator<>(myMapBundleFunction, bundleTrigger, keyBundleSelector))
.map(...)
.keyBy(...)
.window(TumblingProcessingTimeWindows.of(Time.seconds(20)))
A dynamic pre-aggregation
In case you wish to have a dynamic pre-aggregate operator check the AdCom - Adaptive Combiner for stream aggregation. It basically adjusts the pre-aggregation based on backpressure signals. It results in using the maximum possible of the shuffle phase.
I have a keyd stream of data that looks like:
{
summary:Integer
uid:String
key:String
.....
}
I need to aggregate the summary values in some time range, and once I achieved a specifc number , to flush the summary and all the of the UID'S that influenced the summary to database/log file.
after the first flush , I want to discare all the uid's from the memory , and just flush every new item immediatelly.
So I tried this aggregate function.
public class AggFunc implements AggregateFunction<Item, Acc, Tuple2<Integer,List<String>>>{
private static final long serialVersionUID = 1L;
#Override
public Acc createAccumulator() {
return new Acc());
}
#Override
public Acc add(Item value, Acc accumulator) {
accumulator.inc(value.getSummary());
accumulator.addUid(value.getUid);
return accumulator;
}
#Override
public Tuple2<Integer,List<String>> getResult(Acc accumulator) {
List<String> newL = Lists.newArrayList(accumulator.getUids());
accumulator.setUids(Lists.newArrayList());
return Tuple2.of(accumulator.getSum(), newL);
}
#Override
public Acc merge(Acc a, Acc b) {
.....
}
}
and in the aggregate process function , I flush the list to state, and if I need to save to dataBase I'm clearing the state and save flag in the state to indicate it.
But it seems crooked to me. And I'm not sure if that would work well for me.
Is there a better solution to this situation?
Work with a state inside a rich function. Keep adding the uid in your state and when the window triggers to flush the values. This page from the official documentation has an example.
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/state/state.html#using-keyed-state
For your case a ListState will work well.
EDIT:
The solution above is for non-window case. for window case simply use the aggrgation with apply function that can have a rich window function
I am current running a comparative experiment for some algorithms running on top of a relational database (PostgreSQL) and a graph one (Neo4j).
I implemented my algorithm as a user-defined procedure for Neo4j, but it doesn't look like it performs any caching out-of-box.
Is there a way to configure caching for user-defined procedures in Neo4j?
Thanks
You'll have to implement the caching yourself, if it's relevant for your use case and you have something to cache: probably something not related to the transaction, so no nodes or relationships; Neo4j ids are tricky, since they can be reused, so it's probably best to only cache them for a short duration, or not at all. Application-level ids would be fine, as would beans composed of strings or scalar types.
Suppose you have this procedure defined:
public class MyProcedure {
#Context
public GraphDatabaseService db;
#Procedure
public Stream<MyBean> doSomething(#Name("uuid") String uuid) {
int count = 0;
// ...
return Stream.of(new MyBean(count));
}
public static class MyBean {
public int count;
public MyBean(int count) {
this.count = count;
}
}
}
You can add some simple caching using a ConcurrentMap:
public class MyProcedure {
private static final ConcurrentMap<String, Collection<MyBean>> CACHE =
new ConcurrentHashMap<>();
#Context
public GraphDatabaseService db;
#Procedure
public Stream<MyBean> doSomething(#Name("uuid") String uuid) {
Collection<MyBean> result = CACHE.computeIfAbsent(uuid,
k -> doSomethingCacheable(k).collect(Collectors.toList()));
return result.stream();
}
private Stream<MyBean> doSomethingCacheable(String uuid) {
int count = 0;
// ...
return Stream.of(new MyBean(count));
}
public static class MyBean {
// ...
}
}
Note that you can't cache a Stream as it can only be consumed once, so you have to consume it yourself by collecting into an ArrayList (you can also move the collect inside the method, change the return type to Collection<MyBean> and use a method reference). If the procedure takes more than one argument, you'll need to create a proper class for the composite key (immutable if possible, with correct equals and hashCode implementation). The restrictions that apply to cacheable values also apply to the keys.
This is an eternal, unbounded cache. If you need more features (expiration, maximum size), I suggest you use a real cache implementation, such as Guava's Cache (or LoadingCache) or Ben Manes' Caffeine.
I am new in hadoop and mapreduce programming and don't know what should i do. I want to define an array of int in hadoop partitioner. i want to feel in this array in main function and use its content in partitioner. I have tried to use IntWritable and array of it but none of them didn't work . I tried to use IntArrayWritable but again it didn't work. I will be pleased if some one help me. Thank you so much
public static IntWritable h = new IntWritable[1];
public static void main(String[] args) throws Exception {
h[0] = new IntWritable(1);
}
public static class CaderPartitioner extends Partitioner <Text,IntWritable> {
#Override
public int getPartition(Text key, IntWritable value, int numReduceTasks) {
return h[0].get();
}
}
if you have limited number of values, you can do in the below way.
set the values on the configuration object like below in main method.
Configuration conf = new Configuration();
conf.setInt("key1", value1);
conf.setInt("key2", value2);
Then implement the Configurable interface for your Partitioner class and get the configuration object, then key/values from it inside your Partitioner
public class testPartitioner extends Partitioner<Text, IntWritable> implements Configurable{
Configuration config = null;
#Override
public int getPartition(Text arg0, IntWritable arg1, int arg2) {
//get your values based on the keys in the partitioner
int value = getConf().getInt("key");
//do stuff on value
return 0;
}
#Override
public Configuration getConf() {
// TODO Auto-generated method stub
return this.config;
}
#Override
public void setConf(Configuration configuration) {
this.config = configuration;
}
}
supporting link
https://cornercases.wordpress.com/2011/05/06/an-example-configurable-partitioner/
note if you have huge number of values in a file then better to find a way to get cache files from job object in Partitioner
Here's a refactored version of the partitioner. The main changes are:
Removed the main() which isnt needed, initialization should be done in the constructor
Removed static from the class and member variables
public class CaderPartitioner extends Partitioner<Text,IntWritable> {
private IntWritable[] h;
public CaderPartitioner() {
h = new IntWritable[1];
h[0] = new IntWritable(1);
}
#Override
public int getPartition(Text key, IntWritable value, int numReduceTasks) {
return h[0].get();
}
}
Notes:
h doesn't need to be a Writable, unless you have additional logic not included in the question.
It isn't clear what the h[] is for, are you going to configure it? In which case the partitioner will probably need to implement Configurable so you can use a Configurable object to set the array up in some way.
I want to share an Array which all classes can "get" and "change" data inside that array. Something like a Global array or Multi Access array. How this is possible with ActionScript 3.0 ?
There are a couple of ways to solve this. One is to use a global variable (as suggested in unkiwii's answer) but that's not a very common approach in ActionScript. More common approaches are:
Class variable (static variable)
Create a class called DataModel or similar, and define an array variable on that class as static:
public class DataModel {
public static var myArray : Array = [];
}
You can then access this from any part in your application using DataModel.myArray. This is rarely a great solution because (like global variables) there is no way for one part of your application to know when the content of the array is modified by another part of the application. This means that even if your data entry GUI adds an object to the array, your data list GUI will not know to show the new data, unless you implement some other way of telling it to redraw.
Singleton wrapping array
Another way is to create a class called ArraySingleton, which wraps the actual array and provides access methods to it, and an instance of which can be accessed using the very common singleton pattern of keeping the single instance in a static variable.
public class ArraySingleton {
private var _array : Array;
private static var _instance : ArraySingleton;
public static function get INSTANCE() : ArraySingleton {
if (!_instance)
_instance = new ArraySingleton();
return _instance;
}
public function ArraySingleton() {
_array = [];
}
public function get length() : uint {
return _array.length;
}
public function push(object : *) : void {
_array.push(object);
}
public function itemAt(idx : uint) : * {
return _array[idx];
}
}
This class wraps an array, and a single instance can be accessed through ArraySingleton.INSTANCE. This means that you can do:
var arr : ArraySingleton = ArraySingleton.INSTANCE;
arr.push('a');
arr.push('b');
trace(arr.length); // traces '2'
trace(arr.itemAt(0)); // trace 'a'
The great benefit of this is that you can dispatch events when items are added or when the array is modified in any other way, so that all parts of your application can be notified of such changes. You will likely want to expand on the example above by implementing more array-like interfaces, like pop(), shift(), unshift() et c.
Dependency injection
A common pattern in large-scale application development is called dependency injection, and basically means that by marking your class in some way (AS3 meta-data is often used) you can signal that the framework should "inject" a reference into that class. That way, the class doesn't need to care about where the reference is coming from, but the framework will make sure that it's there.
A very popular DI framework for AS3 is Robotlegs.
NOTE: I discourage the use of Global Variables!
But here is your answer
You can go to your default package and create a file with the same name of your global variable and set the global variable public:
//File: GlobalArray.as
package {
public var GlobalArray:Array = [];
}
And that's it! You have a global variable. You can acces from your code (from anywhere) like this:
function DoSomething() {
GlobalArray.push(new Object());
GlobalArray.pop();
for each (var object:* in GlobalArray) {
//...
}
}
As this question was linked recently I would add something also. I was proposed to use singleton ages ago and resigned on using it as soon as I realized how namespaces and references work and that having everything based on global variables is bad idea.
Aternative
Note this is just a showcase and I do not advice you to use such approach all over the place.
As for alternative to singleton you could have:
public class Global {
public static const myArray:Alternative = new Alternative();
}
and use it almost like singleton:
var ga:Alternative = Global.myArray;
ga.e.addEventListener(GDataEvent.NEW_DATA, onNewData);
ga.e.addEventListener(GDataEvent.DATA_CHANGE, onDataChange);
ga.push(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, "ten");
trace(ga[5]); // 5
And your Alternative.as would look similar to singleton one:
package adnss.projects.tchqs
{
import flash.utils.Proxy;
import flash.utils.flash_proxy;
public class Alternative extends Proxy
{
private var _data:Array = [];
private var _events:AltEventDisp = new AltEventDisp();
private var _dispatching:Boolean = false;
public var blockCircularChange:Boolean = true;
public function Alternative() {}
override flash_proxy function getProperty(id:*):* {var i:int = id;
return _data[i += (i < 0) ? _data.length : 0];
//return _data[id]; //version without anal item access - var i:int could be removed.
}
override flash_proxy function setProperty(id:*, value:*):void { var i:int = id;
if (_dispatching) { throw new Error("You cannot set data while DATA_CHANGE event is dipatching"); return; }
i += (i < 0) ? _data.length : 0;
if (i > 9 ) { throw new Error ("You can override only first 10 items without using push."); return;}
_data[i] = value;
if (blockCircularChange) _dispatching = true;
_events.dispatchEvent(new GDataEvent(GDataEvent.DATA_CHANGE, i));
_dispatching = false;
}
public function push(...rest) {
var c:uint = -_data.length + _data.push.apply(null, rest);
_events.dispatchEvent(new GDataEvent(GDataEvent.NEW_DATA, _data.length - c, c));
}
public function get length():uint { return _data.length; }
public function get e():AltEventDisp { return _events; }
public function toString():String { return String(_data); }
}
}
import flash.events.EventDispatcher;
/**
* Dispatched after data at existing index is replaced.
* #eventType adnss.projects.tchqs.GDataEvent
*/
[Event(name = "dataChange", type = "adnss.projects.tchqs.GDataEvent")]
/**
* Dispatched after new data is pushed intwo array.
* #eventType adnss.projects.tchqs.GDataEvent
*/
[Event(name = "newData", type = "adnss.projects.tchqs.GDataEvent")]
class AltEventDisp extends EventDispatcher { }
The only difference form Singleton is that you can actually have multiple instances of this class so you can reuse it like this:
public class Global {
public static const myArray:Alternative = new Alternative();
public static const myArray2:Alternative = new Alternative();
}
to have two separated global arrays or even us it as instance variable at the same time.
Note
Wrapping array like this an using methods like myArray.get(x) or myArray[x] is obviously slower than accessing raw array (see all additional steps we are taking at setProperty).
public static const staticArray:Array = [1,2,3];
On the other hand you don't have any control over this. And the content of the array can be changed form anywhere.
Caution about events
I would have to add that if you want to involve events in accessing data that way you should be careful. As with every sharp blade it's easy to get cut.
For example consider what happens when you do this this:
private function onDataChange(e:GDataEvent):void {
trace("dataChanged at:", e.id, "to", Global.myArray[e.id]);
Global.myArray[e.id]++;
trace("new onDataChange is called before function exits");
}
The function is called after data in array was changed and inside that function you changing the data again. Basically it's similar to doing something like this:
function f(x:Number) {
f(++x);
}
You can see what happens in such case if you toggle myArray.blockCircularChange. Sometimes you would intentionally want to have such recursion but it is likely that you will do it "by accident". Unfortunately flash will suddenly stop such events dispatching without even telling you why and this could be confusing.
Download full example here
Why using global variables is bad in most scenarios?
I guess there is many info about that all over the internet but to be complete I will add simple example.
Consider you have in your app some view where you display some text, or graphics, or most likely game content. Say you have chess game. Mayby you have separated logic and graphics in two classes but you want both to operate on the same pawns. So you create your Global.pawns variable and use that in both Grahpics and Logic class.
Everything is randy-dandy and works flawlessly. Now You come with the great idea - add option for user to play two matches at once or even more. All you have to do is to create another instance of your match... right?
Well you are doomed at this point because, every single instance of your class will use the same Global.pawns array. You not only have this variable global but also you have limited yourself to use only single instance of each class that use this variable :/
So before you use any global variables, just think twice if the thing you want to store in it is really global and universal across your entire app.