"Connection closed" occurs when executing a agent - volttron

"Connection closed" occurs when executing a function for data pre-processing.
The data pre-processing is as follows.
Import data points of about 30 topics from the database.( Data for 9 days every 1 minute,
60 * 24 * 9 * 30 = 388,800 values)
Convert data to a pandas dataframe for pre-processing such as missing value or resampling (this process takes the longest time)
Data processing
In the above data pre-processing, the following error occurs.
volttron.platform.vip.rmq_connection ERROR: Connection closed unexpectedly, reopening in 30 seconds.
This error is probably what the VOLTTRON platform does to manage the agent.
Since it takes more than 30 seconds in step 2, an error occurs and the VOLTTRON platform automatically restarts the agent.
Because of this, the agent cannot perform data processing normally.
Does anyone know how to avoid this?

If this is happening during agent instantiation I would suggest moving the pre-processing out of the init or configuration steps to a function with the #core.receiver("onstart") decorator. This will stop the agent instantiation and configuration steps from timing out. The listener agent's on start method can be used as an example.

Related

Flink job is interrupted after 10 minutes

I have a flink job with a global window and custom process.
The Process is failed after ~10 minutes on the next error:
java.io.InterruptedIOException
This is my job:
SingleOutputStreamOperator<CustomEntry> result = stream
.keyBy(r -> r.getId())
.window(GlobalWindows.create())
.trigger(new CustomTriggeringFunction())
.process(new CustomProcessingFunction());
The CustomProcessingFunction is run for a long time (more then 10 minutes), but after 10 minutes, the process is stoped and failed on InterruptedIOException
Is it possible t increase the timeout of flink job?
From Flink's point of view, that's an unreasonably long period of time for a user function to run. What is this window process function doing that takes more than 10 minutes? Perhaps you can restructure this to use the async i/o operator instead, so you aren't completely blocking the pipeline.
That said, 10 minutes is the default checkpoint timeout interval, and you're preventing checkpoints from being able to complete while this function is running. So you could experiment with increasing execution.checkpointing.timeout.
If the job is failing because checkpoints are timing out, that will help. Or you could increase execution.checkpointing.tolerable-failed-checkpoints from its default (0).

Is the duration time for Power Apps Dataflow from Azure SQL to Dataverse really slow and error messages this terrible?

I have a table in a Azure SQL Database which contains approximately 10 cols and 1.7 million rows. There data in each cell is mostly null/varchar(30).
When running a dataflow to a new table in Dataverse, I have two issues:
It takes around 14 hours (around 100k rows or so per hour)
It fails after 14 hours with the great error message (**** is just some entity names I have removed):
Dataflow name,Entity name,Start time,End time,Status,Upsert count,Error count,Status details
****** - *******,,1.5.2021 9:47:20 p.m.,2.5.2021 9:51:27 a.m.,Failed,,,There was a problem refreshing >the dataflow. please try again later. (request id: 5edec6c7-3d3c-49df-b6de-e8032999d049).
****** - ,,1.5.2021 9:47:43 p.m.,2.5.2021 9:51:26 a.m.,Aborted,0,0,
Table name,Row id,Request url,Error details
*******,,,Job failed due to timeout : A task was canceled.
Is it really so that this should take 14 hours :O ?
Are there any verbose logging I can enable to get a more friendly error message?

How to run a cron command every hour taking script execution time in account?

I have a bunch of data to monitor. My data are statistics that can only be retrieved every hour but can change every second and I want to store into a database as much values as I can for each data set.
I've though about several approaches for this problem and I finally chose to refresh and read all statistics at once instead of reading them independently.
So that, I came out with command mycommand which reads all my statics with the cost of several minutes (let's say 30) of execution. Now I would like to run this script every hour, but taking the script execution into account.
I actually run
* */1 * * * mycommand.sh
and receive many annoying error emails (actually one every hour) and I effectly retrieve my statistics every 2 hours.
1h 30 minutes is the half of 3 hours. So you could have two entries in crontab(5) running the same /home/gogaz/mycommand.sh script, one to run it at 1, 4, 7, ... hours (every 3 hours from 1am) and another to run it at 2:30, 5:30, 8:30 hours, ... (every 3 hours from 2:30am) etc
Writing these entries is left as an exercise to the reader.
See also anacrontab(5) and at(1). For example, you might run your script once using batch, but terminate your script with an at command rescheduling that same script (the drawback is handling of unexpected errors).
If you redirect your stdout and stderr in your crontab entry, you won't get any emails.

GAE Task Queues how to make the delay?

In Task Queues code is executed to connect to the server side
through URL Fetch.
My file queue.yaml.
queue:
- Name: default
rate: 10 / m
bucket_size: 1
In such settings, Tusk performed all at once, simultaneously.
Specificity is that between the requests should be delayed at least 5
sec. Task must perform on stage with a difference> 5 sec. (but
does not parallel).
What are the values set in queue.yaml?
You can't specify minimum delays between tasks in queue.yaml, currently; you should do it (partly) in your own code. For example, if you specify a bucket size of 1 (so that more than one task should never be executing at once) and make sure the tasks runs for at least 5 seconds (get a start=time.time() at the start, time.sleep(time.time()-(5+start)) at the end) this should work. If it doesn't, have each task record in the store the timestamp it finished, and when it start check if the last task ended less than 5 seconds ago, and in that case terminate immediately.
The other way could be store the task data in table. In your task-queue add a id parameter. Fetch 1st task from table and pass its id to task queue processing servlet. In servlet at the end delay for 5 second and feth next task, pass its id and.... so on.

How do I measure response time in seconds given the following benchmarking data?

We recently got some data back on a benchmarking test from a software vendor, and I think I'm missing something obvious.
If there were 17 transactions (I assume they mean successfully completed requests) per second, and 1500 of these requests could be served in 5 minutes, then how do I get the response time for a single user? Is this sort of thing even possible with benchmarking? I have a lot of other data from them, including apache config settings, but I'm not sure how to do all the math.
Given the server setup they sent, I want to know how I can deduce the user response time. I have looked at other similar benchmarking tests, but I'm having trouble measuring requests to response time. What other data do I need to provide here to get that?
If only 1500 of these can be served per 5 minutes then:
1500 / 5 = 300 transactions per min can be served
300 / 60 = 5 transactions per second can be served
so how are they getting 17 completed transactions per second? Last time I checked 5 < 17 !
This doesn't seem to fit. Or am I looking at it wrongly?
I presume be user response time, you mean the time it takes to serve a single transaction:
If they can serve 5 per second than it takes 200ms (1/5) per transaction
If they can serve 17 per second than it takes 59ms (1/17) per transaction
That is all we can tell from the given data. Perhaps clarify how many transactions are being done per second.

Resources