Global Variables in Flink

Global Variables in Flink - apache-flink

I want to use a FIFO queue of size 2 to store elements of a datastream. At any instance, I need the previous element that came in the stream and not the current element. To do this, I have created a queue outside the stream code and I am enqueuing the current element. When my queue has two elements, I dequeue it and use the first element.
The problem I am facing is that I am not able to enqueue the queue as it is declared outside my stream code. I guess this is because streaming use multiple JVMs and my queue would be declared in one JVM.
Below is a sample code:
val queue = Queue[Array[Double]]() //Global Queue
val ws = dataStream.map(row => {
queue.enqueue(row)
println(queue.size) //Prints 0 always
if(queue.size == 2){
result = operate(queue(0))
queue.dequeue
}
result
})
Here, nothing is getting enqueued and the size of the queue is always 0.
Is there a way we can create global variables in Flink which are distributed across all the JVMs? If not, is there any other way to implement this logic?

Surprisingly enough, it worked when I replaced Queue with Scala List.

Related

Is there a way to dequeue from a queue and enqueue to another queue?

Hey all I've been working on a program using two queues. Where a stream of numbers enters one queue and is then dequeued from the first queue and enqueues to a second queue.
I've tried everything I can think of but nothing seems to work, I can't find anything online either.

Assuming you are queueing and dequeueing NODE structs containing the actual data, consider the following:
The dequeue function of your queue data structure should return the dequeued struct (or a pointer to it). By doing this you can simply write
NODE *node_dequeued_from_queue1 = Dequeue(&queue1);
Enqueue(&queue2, node_dequeued_from_queue1->data);
Where dequeued_data is a node struct containing a member data of type int.

Swift threading issue in Array

In my project, have a data provider, which provides data in every 2 milli seconds. Following is the delegate method in which the data is getting.
func measurementUpdated(_ measurement: Double) {
measurements.append(measurement)
guard measurements.count >= 300 else { return }
ecgView.measurements = Array(measurements.suffix(300))
DispatchQueue.main.async {
self.ecgView.setNeedsDisplay()
}
guard measurements.count >= 50000 else { return }
let olderMeasurementsPrefix = measurements.count - 50000
measurements = Array(measurements.dropFirst(olderMeasurementsPrefix))
print("Measurement Count : \(measurements.count)")
}
What I am trying to do is that when the array has more than 50000 elements, to delete the older measurement in the first n index of Array, for which I am using the dropFirst method of Array.
But, I am getting a crash with the following message:
Fatal error: Can't form Range with upperBound < lowerBound
I think the issue due to threading, both appending and deletion might happen at the same time, since the delegate is firing in a time interval of 2 millisecond. Can you suggest me an optimized way to resolve this issue?

So to really fix this, we need to first address two of your claims:
1) You said, in effect, that measurementUpdated() would be called on the main thread (for you said both append and dropFirst would be called on main thread. You also said several times that measurementUpdated() would be called every 2ms. You do not want to be calling a method every 2ms on the main thread. You'll pile up quite a lot of them very quickly, and get many delays in their updating, as the main thread is going to have UI stuff to be doing, and that always eats up time.
So first rule: measurementUpdated() should always be called on another thread. Keep it the same thread, though.
Second rule: The entire code path from whatever collects the data to when measurementUpdated() is called must also be on a non-main thread. It can be on the thread that measurementUpdated(), but doesn't have to be.
Third rule: You do not need your UI graph to update every 2ms. The human eye cannot perceive UI change that's faster than about 150ms. Also, the device's main thread will get totally bogged down trying to re-render as frequently as every 2ms. I bet your graph UI can't even render a single pass at 2ms! So let's give your main thread a break, by only updating the graph every, say, 150ms. Measure the current time in MS and compare against the last time you updated the graph from this routine.
Fourth rule: don't change any array (or any object) in two different threads without doing a mutex lock, as they'll sometimes collide (one thread will be trying to do an operation on it while another is too). An excellent article that covers all the current swift ways of doing mutex locks is Matt Gallagher's Mutexes and closure capture in Swift. It's a great read, and has both simple and advanced solutions and their tradeoffs.
One other suggestion: You're allocating or reallocating a few arrays every 2ms. It's unnecessary, and adds undue stress on the memory pools under the hood, I'd think. I suggest not doing append and dropsFirst calls. Try rewriting such that you have a single array that holds 50,000 doubles, and never changes size. Simply change values in the array, and keep 2 indexes so that you always know where the "start" and the "end" of the data set is within the array. i.e. pretend the next array element after the last is the first array element (pretend the array loops around to the front). Then you're not churning memory at all, and it'll operate much quicker too. You can surely find Array extensions people have written to make this trivial to use. Every 150ms you can copy the data into a second pre-allocated array in the correct order for your graph UI to consume, or just pass the two indexes to your graph UI if you own your graph UI and can adjust it to accommodate.
I don't have time right now to write a code example that covers all of this (maybe someone else does), but I'll try to revisit this tomorrow. It'd actually be a lot better for you if you made a renewed stab at it yourself, and then ask us a new question (on a new StackOverflow) if you get stuck.

Update As #Smartcat correctly pointed this solution has the potential of causing memory issues if the main thread is not fast enough to consume the arrays in the same pace the worker thread produces them.
The problem seems to be caused by ecgView's measurements property: you are writing to it on the thread receiving the data, while the view tries to read from it on the main thread, and simultaneous accesses to the same data from multiple thread is (unfortunately) likely to generate race conditions.
In conclusion, you need to make sure that both reads and writes happen on the same thread, and can easily be achieved my moving the setter call within the async dispatch:
let ecgViewMeasurements = Array(measurements.suffix(300))
DispatchQueue.main.async {
self.ecgView.measurements = ecgViewMeasurements
self.ecgView.setNeedsDisplay()
}

According to what you say, I will assume the delegate is calling the measuramentUpdate method from a concurrent thread.
If that's the case, and the problem is really related to threading, this should fix your problem:
func measurementUpdated(_ measurement: Double) {
DispatchQueue(label: "MySerialQueue").async {
measurements.append(measurement)
guard measurements.count >= 300 else { return }
ecgView.measurements = Array(measurements.suffix(300))
DispatchQueue.main.async {
self.ecgView.setNeedsDisplay()
}
guard measurements.count >= 50000 else { return }
let olderMeasurementsPrefix = measurements.count - 50000
measurements = Array(measurements.dropFirst(olderMeasurementsPrefix))
print("Measurement Count : \(measurements.count)")
}
}
This will put the code in an serial queue. This way you can ensure that this block of code will run only one at a time.

Establishing Data Consistency

I have a fixed size FIFO type array to store newly added datas. In my main function this array keeps updating itself continuously and one thread working on this data. I want my thread to work on latest passed data to itself meanwhile main function keeps updating it. In the code below, I tried to demonstrate what I am trying to explain. Thread 1 includes a while(1) function itself as well. The reason I am updating Queue on main thread, because Thread 1 has a sleep duration. It may has a simple answer, however my brain currently stopped working.
int main(){
pthread_create(Thread1);
while(1) {
QueuePut(Some_Value);
arguments_of_Thread1.input = Queue;
...
}
return 0;

C -Mutex data structure with multithreading

I have a problem that i can't solve.
I have to make a data structure shared by some thread, the problems are:
The thread are executed simultaneusly and they should insert data in an particular structure, but every object should be inserted in mutex esclusion, because if an object is alredy present it must not be re-inserted.
I have think of making an array where threads put the key of object they are working, if another thread wants to put the same key it should wait for the current thread finish.
so, in other words every thread execute this function for lock element:
void lock_element(key_t key){
pthread_mutex_lock(&mtx_array);
while(array_busy==1){
pthread_cond_wait(&var_array,&mtx_array);
}
array_busy=1;
if((search_insert((int)key))==-1){
// the element is present in array and i can't insert,
// and i must wait for the array to be freed.
// (i think that the problem is here)
}
array_busy=0;
pthread_cond_signal(&var_array);
pthread_mutex_unlock(&mtx_array);
}
after i finish with the object i free the key in the arry with the follow function:
void unlock_element(key_t key){
pthread_mutex_lock(&mtx_array);
while(array_busy==1){
pthread_cond_wait(&var_array,&mtx_array);
}
array_busy=1;
zeroed((int)key);
array_busy=0;
pthread_cond_signal(&var_array);
pthread_mutex_unlock(&mtx_array);
}
in this way, the result change in every execution (for example: in a first time the program insert 300 object, and in a second time insert 100 object).
Thanks for the help!
UPDATE:
#DavidSchwartz #Asthor I modified the code as follows:
void lock_element(key_t key){
pthread_mutex_lock(&mtx_array);
while((search_insert((int)key))==-1){
//wait
pthread_cond_wait(&var_array,&mtx_array);
}
pthread_mutex_unlock(&mtx_array);
}
and...
void unlock_element(key_t key){
pthread_mutex_lock(&mtx_array);
zeroed((int)key);
pthread_cond_signal(&var_array);
pthread_mutex_unlock(&mtx_array);
}
But not work.. It behaves in the same way as before.
I also noticed a strange behavior of the function search_insert(key);
int search_insert(int key){
int k=0;
int found=0;
int fre=-1;
while(k<7 && found==0){
if(array[k]==key){
found=1;
} else if(array[k]==-1) fre=k;
k++;
}
if (found==1) {
return -1; //we can't put the key in the array
}else {
if(fre==-1) exit(EXIT_FAILURE);
array[fre]=key;
return 0;
}
}
never goes in
if(found == 1)

You have a couple of options.
The simplest option is just to hold the mutex during the entire operation. You should definitely choose this option unless you have strong evidence that you need greater concurrency.
Often, it's possible to just allow more than one thread to do the work. This pattern works like this:
Acquire the mutex.
Check if the object is in the collection. If so, use the object from the collection.
Otherwise, release the mutex.
Generate the object
Acquire the mutex again.
Check if the object is in the collection. If not, add it and use the object you generated.
Otherwise, throw away the object you generated and use the one from the collection.
This may result in two threads doing the same work. That may be unacceptable in your use case either because it's impossible (some work can only be done once) or because the gain in concurrency isn't worth the cost of the duplicated work.
If nothing else works, you can go with the more complex solution:
Acquire the mutex.
Check if the object is in the collection. If so, use the object in the collection.
Check if any other thread is working on the object. If so, block on the condition variable and go to step 2.
Indicate that we are working on the object.
Release the mutex.
Generate the object.
Acquire the mutex.
Remove the indication that we are working on the object.
Add the object to the collection.
Broadcast the condition variable.
Release the mutex.
This can be implemented with a separate collection just to track which objects are in progress or you can add a special version of the object to the collection that contains a value that indicates that it's in progress.

The answer is based on assumptions as it is.
Consider this scenario. You have 2 threads trying to insert their objects. Thread 1 and thread 2 both get objects with index 0. We then present 2 possible scenarios.
A:
Thread 1 starts, grabs the mutex and proceeds to insert their object. They finish, letting the next thread through from the mutex, which is 2. Thread 1 tries to get the mutex again to release the index but is blocked as thread 2 has it. Thread 2 tries to insert their object but fails due to the index being taken, so the insert never happens. It releases the mutex and thread 1 can grab it, releasing the index. However thread 2 has already attempted and failed to insert the object it had, meaning that we only get 1 insertion in total.
B:
Second scenario. Thread 1 starts, grabs the mutex, inserts the object, releases the mutex. Before thread 2 grabs it, thread 1 again grabs it, clearing the index and releases the mutex again. Thread 2 then successfully grabs the mutex, inserts the object it had before releasing the mutex. In this scenario we get 2 inserts.
In the end, the issue lies with there being no reaction inside the if statement when a thread fails to insert an object and the thread, not doing what it is meant to. That way you get less insertions than expected.

Queue of variable length array or struct

How would one go about creating a queue that can hold an array, more over an array with variable amounts of rows.
char data[n][2][50];
//Could be any non 0 n e.g:
n=1; data = {{"status","ok}};
// or
n=3; {{"lat", "180.00"},{"long","90.123"},{"status","ok}};
// and so on
n to be added to the queue. Or is there even a better solution than what I'm asking? A queue is easy enough to write (or find re-usable examples of) for single data items but I'm not sure what method I would use for the above. Maybe a struct? That would solve for array and n...but would it solve for variable array?
More broadly the problem I'm trying to solved is this.
I need to communicate with a web server using POST. I have the code for this already written however I don't want to keep the main thread busy every time this task needs doing, especially since I need to make other checks such as is the connection up, if it isn't I need to back off and wait or try and bring it back online.
My idea was to have a single separate dedicated to this task. I figured creating a queue would be the best way for the main thread to let the child thread know what to do.
The data will be a variable number of string pairs. like:
Main
//Currently does
char data[MAX_MESSAGES_PER_POST][2][50];
...
assembles array
sendFunction(ptrToArray, n);
resumes execution with large and un predicatable delay
//Hopefully will do
...
queue(what needs doing)
carry on executing with (almost) no delay
Child
while(0)
{
if(allOtherConditionsMet()) //Device online and so forth
{
if(!empty(myQueue))
{
//Do work and deque
}
}
else
{
//Try and make condition ok. Like reconect dongle.
}
// sleep/Back off for a while
}

You could use an existing library, like Glib. GLib is cross platform. If you used GLib's asynchronous queues, you'd do something like:
The first thread to create the queue executes:
GAsyncQueue *q = g_async_queue_new ();
Other threads can reference (show intent to use the queue) with:
g_async_queue_ref (q);
After this, any thread can 'push' items to the queue with:
struct queue_item i;
g_async_queue_push (q, ( (gpointer) (&i)));
And any thread can 'pop' items from the queue with:
struct queue_item *d = g_async_queue_pop (q);
/* Blocks until item is available. */
Once a thread finishes using the queue and doesn't care any more about it, it calls:
g_async_queue_unref (q);
Even the thread which created the queue needs to do this.
There are a bunch of other useful functions, which you can all read about on the page documenting them. Synchronization (locking/consistency/atomicity of operations) is taken care of by the library itself.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight