schedule actuator agent MALFORMED_REQUEST - volttron

Can someone give me a tip what I am doing wrong with scheduling the actuator agent in my agent code to scrape some BACnet data?
_now = get_aware_utc_now()
str_start = format_timestamp(_now)
_end = _now + td(seconds=10)
str_end = format_timestamp(_end)
peroidic_schedule_request = ["slipstream_internal/slipstream_hq/1100", # AHU
"slipstream_internal/slipstream_hq/5231", # eGuage Main
"slipstream_internal/slipstream_hq/5232", # eGuage RTU
"slipstream_internal/slipstream_hq/5240"] # eGuage PV
# send the request to the actuator
result = self.vip.rpc.call('platform.actuator', 'request_new_schedule', self.core.identity, 'my_schedule', 'HIGH', peroidic_schedule_request).get(timeout=90)
_log.debug(f'[Conninuous Roller Agent INFO] - ACTUATOR SCHEDULE EVENT SUCESS {result}')
meter_data_to_get = [self.ahu_clg_pid_topic,self.kw_pv_topic,self.kw_rtu_topic,self.kw_main_topic]
rpc_result = self.vip.rpc.call('platform.actuator', 'get_multiple_points', meter_data_to_get).get(timeout=90)
_log.debug(f'[Conninuous Roller Agent INFO] - kW data is {rpc_result}!')
The RPC request is working, I can get the data but there is an actuator agent error.
{'result': 'FAILURE', 'data': {}, 'info': 'MALFORMED_REQUEST: ValueError: too many values to unpack (expected 3)'}
Full Traceback:
2021-09-13 20:47:23,334 (actuatoragent-1.0 1477601) __main__ ERROR: bad request: {'time': '2021-09-13T20:47:23.334762+00:00', 'requesterID': 'platform.continuousroller', 'taskID': 'my_schedule', 'type': 'NEW_SCHEDULE'}, [], too many values to unpack (expected 3)
2021-09-13 20:47:23,338 (continuousrolleragent-0.1 1559787) __main__ DEBUG: [Conninuous Roller Agent INFO] - ACTUATOR SCHEDULE EVENT SUCESS {'result': 'FAILURE', 'data': {}, 'info': 'MALFORMED_REQUEST: ValueError: too many values to unpack (expected 3)'}
2021-09-13 20:47:23,346 (forwarderagent-5.1 1548751) __main__ DEBUG: publish_to_historian number of items: 1
2021-09-13 20:47:23,346 (forwarderagent-5.1 1548751) __main__ DEBUG: Lasttime: 0 currenttime: 1631566043.0
2021-09-13 20:47:23,350 (forwarderagent-5.1 1548751) __main__ DEBUG: handled: 1 number of items
2021-09-13 20:47:23,710 (continuousrolleragent-0.1 1559787) __main__ DEBUG: [Conninuous Roller Agent INFO] - kW data is [{'slipstream_internal/slipstream_hq/1100/Cooling Capacity Status': 76.46278381347656, 'slipstream_internal/slipstream_hq/5231/REGCHG total_power': 59960.0, 'slipstream_internal/slipstream_hq/5232/REGCHG total_rtu_power': 44507.0, 'slipstream_internal/slipstream_hq/5240/REGCHG Generation': 1477.0}, {}]!

I suspect that the the Scheduler, which is an attribute of Actuator, encounters a problem when creating a Task from the 'requests' object, which originates from the 'requests' parameter for 'request_new_schedule'. In your case, 'requests' is the list of strings in the object named 'periodic_schedule_request'. Creating a Task is part of the Actuator's workflow by dint of the Acutuator's Scheduler calling 'request_slots'; see
https://github.com/VOLTTRON/volttron/blob/develop/services/core/ActuatorAgent/actuator/agent.py#L1378 and
https://github.com/VOLTTRON/volttron/blob/a7bbdfcd4c82544bd743532891389f49b771b196/services/core/ActuatorAgent/actuator/scheduler.py#L392
When a Task is created, the 'requests' object is processed in the Task constructor method; see https://github.com/VOLTTRON/volttron/blob/a7bbdfcd4c82544bd743532891389f49b771b196/services/core/ActuatorAgent/actuator/scheduler.py#L148
I suspect that each 'request' in 'requests' is not an object composed of exactly three objects, potentially being the cause of the Actuator error. But I could be wrong. To know exactly what 'requests' looks like, go back through your logs and look for a debug statement from the Actuator that begins with: "Got new schedule request: ..." (see https://github.com/VOLTTRON/volttron/blob/develop/services/core/ActuatorAgent/actuator/agent.py#L1375)
Without seeing more of the logs, as a first step, I'd recommend verifying that the 'requests' object was properly processed; I'd also recommend putting various debug statements, especially at https://github.com/VOLTTRON/volttron/blob/a7bbdfcd4c82544bd743532891389f49b771b196/services/core/ActuatorAgent/actuator/scheduler.py#L148 to catch where this ValueError is coming from.

Related

Observed non-terminating stream error 503 DNS resolution failed for pubsub.googleapis.com:443: UNAVAILABLE: OS Error

I have Feed Completion listener program for GCS using a Pub/Sub subscription in tandem with a post-processing job.'''
subscriber = pubsub_v1.SubscriberClient(credentials=cred) subscription_path = subscriber.subscription_path(PROJECT, SUBSCRIPTION_NAME)
flow_control = pubsub_v1.types.FlowControl(max_messages=500)
streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback, flow_control=flow_control)
with subscriber:
try:
streaming_pull_future.result(timeout=TIMEOUT)
except TimeoutError:
streaming_pull_future.cancel()
logging.info("The filewatcher process has completed.")
successes, failures = [], []
for feed in created_local_done_file:
if created_local_done_file[feed]:
successes.append(feed)
else:
failures.append(feed)
logging.info("These Juniper feeds completed and proceeded to the next step:")
counter = 1
for feed in successes:
logging.info("\t%s: %s", counter, feed)
counter += 1
logging.warning("These Juniper feeds did not publish a message to pub/sub (possibly due to failure):")
counter = 1
for feed in failures:
logging.warning("\t%s: %s", counter, feed)
counter += 1
if warn_of_dupes:
logging.warning("Reminder: There are overlapping feed names in the config files. This could cause errors/unexpected behaviour in the future!")
logging.info("Exiting...")
if len(failures) > 0:
sys.exit(1)
else:
sys.exit(0)
except Exception as error:
streaming_pull_future.cancel()
logging.critical("Listening for messages on %s threw an exception: %s.", SUBSCRIPTION_NAME, error)
sys.exit(1)
2022-06-17 09:12:25,932 PUB/SUB INFO -- List of feeds detected by the reader: {'XXX', 'YYY'}
2022-06-17 09:12:26,057 PUB/SUB INFO -- There are 2 feeds detected in total.
2022-06-17 09:12:26,057 PUB/SUB INFO -- Observed non-terminating stream error 503 DNS resolution failed for pubsub.googleapis.com:443: UNAVAILABLE: OS Error
2022-06-17 09:12:26,057 PUB/SUB INFO -- Listening for messages on projects/hsbc-11545401-datamesh-dev/subscriptions/datamesh-dev-control-api-service...
2022-06-17 09:12:26,057 PUB/SUB INFO -- Observed recoverable stream error 503 DNS resolution failed for pubsub.googleapis.com:443: UNAVAILABLE: OS Error

VOLTTRON Actuator agent failure message but it appears to be working just fine

I'm using the actuator agent get_multiple_points with VOLTTRON 8.1.3 to make about 30 BACnet read requests of sensors with:
zone_setpoints_data = self.vip.rpc.call('platform.actuator', 'get_multiple_points', actuator_get_this_data).get(timeout=300)
And I notice this debug message:
2022-06-09 19:55:21,927 (loadshedagent-0.1 2930461) __main__ DEBUG: [Simple DR Agent INFO] - ACTUATOR SCHEDULE EVENT SUCESS {'result': 'FAILURE', 'data': {}, 'info': 'REQUEST_CONFLICTS_WITH_SELF'}
But I have the data, like it appears to be working just fine in addition to the 1 minute interval scrape all BACnet devices inside the building. Anything to worry about or should I make some sort of adjustment?
EDIT
Code snip for scheduling the actuator below. Am I scheduling the actuator agent wrong with the _now,str_start,_end,str_end on 30 devices for get_multiple_points? Should I be adjusting this td(seconds=10) uniquely to space out the call for each device?
# create start and end timestamps for actuator agent scheduling
_now = get_aware_utc_now()
str_start = format_timestamp(_now)
_end = _now + td(seconds=10)
str_end = format_timestamp(_end)
actuator_schedule_request = []
for group in self.nested_group_map.values():
for device_address in group.values():
device = '/'.join([self.building_topic, str(device_address)])
actuator_schedule_request.append([device, str_start, str_end])
# use actuator agent to get all zone temperature setpoint data
result = self.vip.rpc.call('platform.actuator', 'request_new_schedule', self.core.identity, 'my_schedule', 'HIGH', actuator_schedule_request).get(timeout=90)
It seems to me that getting points won't cause that debug message. What the message you are seeing comes from https://volttron.readthedocs.io/en/releases-8.x/driver-framework/actuator/actuator-agent.html?highlight=REQUEST_CONFLICTS_WITH_SELF#task-schedule-failures
So it appears that somewhere your schedules are overlapping. But the call to get_multiple_points should not have anything to do with this error.

Gatling active users drops to negative for a longer run after timeout exception

I'm running a simulation using gatling.
5 users per second for 150 mins.
After a certain exception:
15:58:19.643 [WARN ] i.g.h.e.GatlingHttpListener - Request 'facebook_outbound_msg' failed for user 29989
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source)
15:58:19.643 [ERROR] i.g.h.e.r.DefaultStatsProcessor - Request 'facebook_outbound_msg' failed for user 29989: j.n.s.SSLException: handshake timed out
15:58:19.643 [ERROR] i.g.h.c.i.DefaultHttpClient - Failed to install SslHandler
15:58:19.678 [WARN ] i.g.h.e.GatlingHttpListener - Request 'facebook_inbound_msg' failed for user 29984
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source)
15:58:19.678 [ERROR] i.g.h.e.r.DefaultStatsProcessor - Request 'facebook_inbound_msg' failed for user 29984: j.n.s.SSLException: handshake timed out
the number of active users dropped to -1, then everytime this exception happens, the number of users keeps dropping.
Example:
---- FacebookOutboundSimulation ------------------------------------------------
[############################################ ] 59%
waiting: 18061 / active: -3 / done: 26942
---- FacebookInboundMessageSimulation ------------------------------------------
[############################################ ] 59%
waiting: 18061 / active: -3 / done: 26942
================================================================================
Why does this happen and how to fix this?
The issue was with gatling-charts-highcharts-bundle-3.0.0-RC1 when I switched to gatling-charts-highcharts-bundle-3.0.0-RC3 that was released on 2nd October 2018 it got resolved.
Sidenote: Use AsJson instead of AsJSON.

AWS: Why does my RDS instance keep starting after I turned it off?

I have an RDS database instance on AWS and have turned it off for now. However, every few days it starts up on its own. I don't have any other services running right now.
There is this event in my RDS log:
"DB instance is being started due to it exceeding the maximum allowed time being stopped."
Why is there a limit to how long my RDS instance can be stopped? I just want to put my project on hold for a few weeks, but AWS won't let me turn off my DB? It costs $12.50/mo to have it sit idle, so I don't want to pay for this, and I certainly don't want AWS starting an instance for me that does not get used.
Please help!
That's a limitation of this new feature.
You can stop an instance for up to 7 days at a time. After 7 days, it will be automatically started. For more details on stopping and starting a database instance, please refer to Stopping and Starting a DB Instance in the Amazon RDS User Guide.
You can setup a cron job to stop the instance again after 7 days. You can also change to a smaller instance size to save money.
Another option is the upcoming Aurora Serverless which stops and starts for you automatically. It might be more expensive than a dedicated instance when running 24/7.
Finally, there is always Heroku which gives you a free database instance that starts and stops itself with some limitations.
You can also try saving the following CloudFormation template as KeepDbStopped.yml and then deploy with this command:
aws cloudformation deploy --template-file KeepDbStopped.yml --stack-name stop-db --capabilities CAPABILITY_IAM --parameter-overrides DB=arn:aws:rds:us-east-1:XXX:db:XXX
Make sure to change arn:aws:rds:us-east-1:XXX:db:XXX to your RDS ARN.
Description: Automatically stop RDS instance every time it turns on due to exceeding the maximum allowed time being stopped
Parameters:
DB:
Description: ARN of database that needs to be stopped
Type: String
AllowedPattern: arn:aws:rds:[a-z0-9\-]+:[0-9]+:db:[^:]*
Resources:
DatabaseStopperFunction:
Type: AWS::Lambda::Function
Properties:
Role: !GetAtt DatabaseStopperRole.Arn
Runtime: python3.6
Handler: index.handler
Timeout: 20
Code:
ZipFile:
Fn::Sub: |
import boto3
import time
def handler(event, context):
print("got", event)
db = event["detail"]["SourceArn"]
id = event["detail"]["SourceIdentifier"]
message = event["detail"]["Message"]
region = event["region"]
rds = boto3.client("rds", region_name=region)
if message == "DB instance is being started due to it exceeding the maximum allowed time being stopped.":
print("database turned on automatically, setting last seen tag...")
last_seen = int(time.time())
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": str(last_seen)}])
elif message == "DB instance started":
print("database started (and sort of available?)")
last_seen = 0
for t in rds.list_tags_for_resource(ResourceName=db)["TagList"]:
if t["Key"] == "DbStopperLastSeen":
last_seen = int(t["Value"])
if time.time() < last_seen + (60 * 20):
print("database was automatically started in the last 20 minutes, turning off...")
time.sleep(10) # even waiting for the "started" event is not enough, so add some wait
rds.stop_db_instance(DBInstanceIdentifier=id)
print("success! removing auto-start tag...")
rds.add_tags_to_resource(ResourceName=db, Tags=[{"Key": "DbStopperLastSeen", "Value": "0"}])
else:
print("ignoring manual database start")
else:
print("error: unknown database event!")
DatabaseStopperRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- sts:AssumeRole
Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: Notify
PolicyDocument:
Version: '2012-10-17'
Statement:
- Action:
- rds:StopDBInstance
Effect: Allow
Resource: !Ref DB
- Action:
- rds:AddTagsToResource
- rds:ListTagsForResource
- rds:RemoveTagsFromResource
Effect: Allow
Resource: !Ref DB
Condition:
ForAllValues:StringEquals:
aws:TagKeys:
- DbStopperLastSeen
DatabaseStopperPermission:
Type: AWS::Lambda::Permission
Properties:
Action: lambda:InvokeFunction
FunctionName: !GetAtt DatabaseStopperFunction.Arn
Principal: events.amazonaws.com
SourceArn: !GetAtt DatabaseStopperRule.Arn
DatabaseStopperRule:
Type: AWS::Events::Rule
Properties:
EventPattern:
source:
- aws.rds
detail-type:
- "RDS DB Instance Event"
resources:
- !Ref DB
detail:
Message:
- "DB instance is being started due to it exceeding the maximum allowed time being stopped."
- "DB instance started"
Targets:
- Arn: !GetAtt DatabaseStopperFunction.Arn
Id: DatabaseStopperLambda
It has worked for at least one person. If you have issues please report here.

Camel with RabbitMQ exception only occurs on second message - mis-spelt exchange name

I'm using Camel within a Spring boot application and integrate with RabbitMQ but am encountering strange behaviour.
My app has Restful endpointswhich convert the http request to a RabbitMQ message and publish this to a predefined exchange. There is a separate consumer app which listens to a queue and processes the messages.
I have deliberately entered an incorrect rabbitmq exchange name (invalidxchangename)to check that the application will fail if the exchange does not exist however the camel context starts without error and when I send in a first request is does not report any error. This message gets lost as there is no matching RabbitMQ exchange. When I submit a second request I receive the following exception which I would have expected on route startup.
com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'invalidxchangename' in vhost
EDIT:
I've tried a more simple example to show the issue in Camel.
I've created a simple route as follows:
from("file:in?fileName=in.txt").log(LoggingLevel.DEBUG, "in here!").to("rabbitmq://localhost:5762/invalidexchange?declare=false");
where there is an existing RabbitMQ exchange called validexchange (so I have deliberately made a typo in the RabbitMQ uri). I would expect the camel route to fail at startup since the exchange doesn't exist, or even the first time it tries to process a new in.txt file.
What I am actually seeing in the logs is that on start up it reports no error and only on the 2nd invocation of the route does it report an error.
2015-03-11 16:17:04.356 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-2 >>> (route2) from(file://in?fileName=in.txt) --> log[in here!] <<< Pattern:InOnly, Headers:...
2015-03-11 16:17:04.360 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-2 >>> (route2) log[in here!] --> rabbitmq://localhost:5762/customerchannel.exchang?declare=false <<< Pattern:InOnly, Headers:...
2015-03-11 16:17:45.073 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-4 >>> (route2) from(file://in?fileName=in.txt) --> log[in here!] <<< Pattern:InOnly, Headers: ...
2015-03-11 16:17:45.079 INFO 9756 : ID-SBMELW7W-06220-59960-1426051020468-0-4 >>> (route2) log[in here!] --> rabbitmq://localhost:5762/customerchannel.exchang?declare=false <<< Pattern:InOnly, Headers:...
2015-03-11 16:17:45.092 ERROR 9756 : Failed delivery for (MessageId: ID-SBMELW7W-06220-59960-1426051020468-0-3 on ExchangeId: ID-SBMELW7W-06220-59960-1426051020468-0-4). Exhausted after delivery attempt: 1 caught: com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=404, reply-text=NOT_FOUND - no exchange 'customerchannel.exchang' in vhost '/', class-id=60, method-id=40)
It looks like the first request is causing an error which closes the connection and logs the reason, and when you try to use the channel the second time it's returning an AlreadyClosedException with the message that caused the channel to close in the first call.
You can test this by trying to publish the second message to a different exchange name in the same channel and checking which exchange is in the error. E.g. publish the second message to invalidxchangename2 and you should still see invalidxchangename as the exchange in the error.
To fix, you should handle the publish result when you publish and re-establish the connection if there's an error.
If you want to be sure that a message got delivered to a RabbitMQ queue, then you have to use publisher confirms: https://www.rabbitmq.com/confirms.html
That you are able to publish a message it doesn't mean that the message will reach a queue. You could go to a mailbox and leave a letter inside, but between the time you left the letter there and a postman picked up, many things could have happened, for example, the mailbox catching fire and so on.

Resources