VOLTTRON actuator agent RPC revert not working - volttron

I have a BACnet system for HVAC controls where I am using the VOLTTRON actuator agent to write # priority 10 in BACnet to a value of 2 which works good.
result = self.vip.rpc.call('platform.actuator', 'set_multiple_points', self.core.identity, set_multi_topic_values_master).get(timeout=20)
_log.debug(f'*** [Setter Agent INFO] *** - set_multiple_points ON ALL VAVs WRITE SUCCESS!')
Then the system sleeps for some time period for testing purposes:
_log.debug(f'*** [Setter Agent INFO] *** - SETTING UP GEVENT SLEEP!')
gevent.sleep(120)
_log.debug(f'*** [Setter Agent INFO] *** - GEVENT SLEEP DONE!')
Where after the gevent sleep I am running into some issues on the revert point not working. The code below executes just fine but using a BACnet scanning tool the priority 10 value of 2 are still present on the HVAC controls, like the revert point isn't doing anything.
for device in revert_topic_devices_jci:
response = self.vip.rpc.call('platform.actuator', 'revert_point', self.core.identity, topic_jci, self.jci_setpoint_topic).get(timeout=20)
_log.debug(f'*** [Setter Agent INFO] *** - REVERT POINTS ON {device} SUCCESS!')
_log.debug(f'*** [Setter Agent INFO] *** - REVERT POINTS JCI DONE DEAL SUCCESS!')
One thing I notice is the building automation writes occupancy/unoccupancy to the HVAC controls # BACnet priority 12. Its either ALWAYS a 1 for occupancy or a 2 for unoccupancy.
What I am trying to do with VOLTTRON is write in BACnet at priority 10 a value of 2, and then release to nothing on the revert. Could this by the revert isnt doing anything because there was nothing to revert too? I was hoping that VOLTTRON could write # BACnet priority 10 and then just release. On BACnet scan tool I can do the same thing write # priority 10 then release priority 10 with a priority 10 write null
Should I just be writing at priority 12 same as the building automation system so VOLTTRON can just revert back too whatever the building automation was doing?

I have a few observations:
In your revert loop, the third code-block above, you're not actually
changing the topic being passed to the RPC call. Each call will use
the device topic which is not in that code-block (but that we can
see is not being changed inside the block) and a device topic, which
similarly is not defined in the block but at least appears not to be
being changed. It is likely worth setting some breakpoints and/or
debug statements here to be sure that you're passing the correct
topics to revert on.
Your use of priority appears to be consistent
with BACnet protocol specification, and with the VOLTTRON BACnet
driver implementation. We would not recommend that you attempt to
write at the same priority as an existing building automation
system.
The BACnet driver code will send a NULL (None) value in a "writeProperty"
service request when the "revert_point" function is called by the
Platform Driver. This functionality I am frankly not terribly
familiar with, but given that your scan tool performs the expected
revert functionality when passed a NULL value, I suspect this is the expected
way of performing a "revert to previous value" type function in BACnet protocol.
I do not have reason to believe that the behavior you're experiencing is the
result of a bug in the driver code base.
Overall, I suggest debugging the topics being passed in the "revert_point" RPC call.

I am having a good luck to revert point using set_multiple_points to None
Something like this:
self.jci_device_map = {
'VMA-2-6': '27',
'VMA-2-4': '29',
'VMA-2-7': '30',
'VMA-1-8': '6',
}
revert_multi_topic_values_master = []
set_multi_topic_values_master = []
for device in self.jci_device_map.values():
topic_jci = '/'.join([self.building_topic, device])
final_topic_jci = '/'.join([topic_jci, self.jci_setpoint_topic])
# BACnet enum point for VAV occ
# 1 == occ, 2 == unnoc
# create a (topic, value) tuple and add it to our topic values
set_multi_topic_values_master.append((final_topic_jci, self.unnoccupied_value)) # TO SET UNNOCUPIED
revert_multi_topic_values_master.append((final_topic_jci, None)) # TO SET FOR REVERT
result = self.vip.rpc.call('platform.actuator', 'set_multiple_points', self.core.identity, revert_multi_topic_values_master).get(timeout=20)

Related

batch query is not allowed to request data from "".""

I'm getting started with Kapacitor and have been trying to run the first guide in the Kapacitor documentation, but with data I already have. I managed to define a task, but I can neither enable it nor can I run a backfill. I came across this question, which is similar to my problem, but the answer there didn't help. In contrast to the error message there I get empty strings for database, retention policy, and/or measurement.
In Kapacitor config I set an InfluxDB connection to the local host instance with the name localhost (which has a database mydb and the measurements weather.current.clouds and weather.current.visibility with default retention policy autogen) and created the following weathertest.tick script:
dbrp "mydb"."autogen"
var clouds = batch
|query('select mean(value) / 100.0 as val from "mydb"."autogen"."weather.current.clouds"')
.period(1h)
.every(1h)
.groupBy(time(1m), *)
.fill(0)
var vis = batch
|query('select mean(value) / 10000.0 as val from "mydb"."autogen"."weather.current.visibility"')
.period(1h)
.every(1h)
.groupBy(time(1m), *)
.fill(0)
clouds
|join(vis)
.as('c', 'v')
|eval(lambda: 100 * (1 - "c.val") * "v.val")
.as('pcent')
|influxDBOut()
.cluster('localhost')
.database('mydb')
.retentionPolicy('autogen')
.measurement('testmetric')
.tag('host', 'myhost.local')
.tag('key', 'weather.current.lightidx')
This is what I came up with after hours of trial and (especially) error. As given in the title, when I try to enable my task with kapacitor enable weathertest, I get the error message enabling task weathertest: batch query is not allowed to request data from ""."". Same thing when I try to record as in the "Backfill" example. Also, in that example there is a start and a stop date for limiting the time frame. The time format given there is wrong and is not understood by Kapacitor. Instead of e. g. 2015-10-01 I have to put in 2015-10-01T00:00Z to make it at least pass the error message regarding time format error.
In the Kapacitor logs there is not a single line regarding these errors, only when I try to remove a record, I get something like remove /var/lib/kapacitor/replay/1f5...750.brpl: no such file or directory and this can be found in the logs. There are lots of info lines in the logs showing successful POSTs to/from InfluxDB for the _internal database with HTTP response result 204.
Has anyone an Idea what I may be doing wrong?
OK, after the weekend I tried again. Without any change it accepted my script now in the failing steps, however, now I was able to find error messages in the log. The node mentioned there was the eval node and pointed towards a type mismatch. When I changed the line
|eval(lambda: 100 * (1 - "c.val") * "v.val")
to
|eval(lambda: 100.0 * (1.0 - "c.val") * "v.val")
the error messages were gone and the command kapacitor show weathertest showed a rather sane content now.
Furthermore, I redefined, recorded, replayed and deleted the tasks and recordings during my tests over and over again and I may have forgotten to redefine tasks after making changes to the tick script (not really sure). After changing the above, redefining the task and replaying it I finally found the expected data in the InfluxDB instance.

Role of maxPropagationDelay in link agent of UnetStack

In the link agent, I came across attributes like maxPropagationDelay and reservationGuardTime. What is the role of these attributes? Where I can find more information about these attributes.
These are parameters of specific LINK protocols.
maxPropagationDelay is used to determine timeouts based on expected round-trip-times in the network. It should be set to a value that depends on the geographical size of your network if the network is small enough for a single hop connection between any pair of nodes. Otherwise it should be set to a value based on the maximum communication range of your modem.
reservationGuardTime is a small extra time that a channel is reserved for, to allow for practical timing jitter of modems. Usually the default value for this provided by the agent will be good enough for most purposes.
The Underwater networks handbook to be released with the upcoming version of UnetStack3 will provide a lot more guidance on many of these parameters, and on how to set up various types of networks using Unetstack.
You can access more information about any parameters of any of the Agents in UnetStack using the help command. For the Link Agent, you'll see this in UnetStack 1.4.
> help link
link - access to link agent
Examples:
link // access parameters
link.maxRetries = 5 // set maximum retries for reliable delivery
link << new DatagramReq(to: 2, data: [1,2,3], reliability: true)
// send reliable datagram
Parameters:
MTU - maximum data transfer size
maxRetries - maximum retries for reliable delivery
reservationGuardTime - guard period (s)
maxPropagationDelay - maximum propagation delay (s)
dataChannel - channel to use for data frames (0 = control, 1 = data)
reservationGuardTime is the additional guard time that can be added to the frame duration when reserving a channel (using MAC) to ensure channel reservations have some delay in between for the nodes to be able react.
maxPropagationDelay is used to estimate the maximum time that an acknowledge to a request (or a series of requests if fragmentation is needed) might take and used to set timeouts for transmissions, or to make channel reservations (if using a MAC). Depending on your simulation/setup, you can change this number to be longest time (one-way) between two nodes which can communicate.

Cassandra failed to connect

I'm newbie in cassandra apache. In the tutorial video, it says type bin/nodetools status to check the status of node but when I tried to input it. Terminal returns
Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection
refused (Connection refused)'.
Check this image
I tried to change JVM_OPTS to "$JVM_OPTS -Djava.rmi.server.hostname=localhost" in cassandra-env.sh
but still can't connect.
What I gonna do to fix this error?
Debug.logs
DEBUG [main] 2017-01-21 13:57:48,095 ColumnFamilyStore.java:881 - Enqueuing flush of local: 38.338KiB (0%) on-heap, 0.000KiB (0%) off-heap
DEBUG [PerDiskMemtableFlushWriter_0:1] 2017-01-21 13:57:48,167 Memtable.java:435 - Writing Memtable-local#858986260(8.879KiB serialized bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(9223372036854775807)]
DEBUG [PerDiskMemtableFlushWriter_0:1] 2017-01-21 13:57:48,168 Memtable.java:464 - Completed flushing /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-56-big-Data.db (5.367KiB) for commitlog position CommitLogPosition(segmentId=1484978256521, position=32861)
DEBUG [MemtableFlushWriter:1] 2017-01-21 13:57:48,471 ColumnFamilyStore.java:1184 - Flushed to [BigTableReader(path='/usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-56-big-Data.db')] (1 sstables, 9.527KiB), biggest 9.527KiB, smallest 9.527KiB
DEBUG [CompactionExecutor:1] 2017-01-21 13:57:48,472 CompactionTask.java:150 - Compacting (896b3470-df9e-11e6-9508-7dc463a45cc9) [/usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-53-big-Data.db:level=0, /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-54-big-Data.db:level=0, /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-55-big-Data.db:level=0, /usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-56-big-Data.db:level=0, ]
DEBUG [main] 2017-01-21 13:57:48,539 StorageService.java:2084 - Node localhost/127.0.0.1 state NORMAL, token [-1035692197905104867, -1103547951527719073, -1136980347732340590, -1150272208899529050, -1184340318934652250, -1251847845785777189, -1355083122390358187,
INFO [main] 2017-01-21 13:57:48,539 StorageService.java:2087 - Node localhost/127.0.0.1 state jump to NORMAL
DEBUG [main] 2017-01-21 13:57:48,545 StorageService.java:1336 - NORMAL
DEBUG [PendingRangeCalculator:1] 2017-01-21 13:57:48,575 PendingRangeCalculatorService.java:66 - finished calculation for 3 keyspaces in 19ms
INFO [main] 2017-01-21 13:57:49,125 NativeTransportService.java:70 - Netty using native Epoll event loop
DEBUG [CompactionExecutor:1] 2017-01-21 13:57:49,286 CompactionTask.java:230 - Compacted (896b3470-df9e-11e6-9508-7dc463a45cc9) 4 sstables to [/usr/lib/cassandra/apache-cassandra-3.9/data/data/system/local-7ad54392bcdd35a684174e047860b377/mc-57-big,] to level=0. 9.869KiB to 4.938KiB (~50% of original) in 812ms. Read Throughput = 12.145KiB/s, Write Throughput = 6.077KiB/s, Row Throughput = ~2/s. 4 total partitions merged to 1. Partition merge counts were {4:1, }
INFO [main] 2017-01-21 13:57:49,368 Server.java:159 - Using Netty Version: [netty-buffer=netty-buffer-4.0.39.Final.38bdf86, netty-codec=netty-codec-4.0.39.Final.38bdf86, netty-codec-haproxy=netty-codec-haproxy-4.0.39.Final.38bdf86, netty-codec-http=netty-codec-http-4.0.39.Final.38bdf86, netty-codec-socks=netty-codec-socks-4.0.39.Final.38bdf86, netty-common=netty-common-4.0.39.Final.38bdf86, netty-handler=netty-handler-4.0.39.Final.38bdf86, netty-tcnative=netty-tcnative-1.1.33.Fork19.fe4816e, netty-transport=netty-transport-4.0.39.Final.38bdf86, netty-transport-native-epoll=netty-transport-native-epoll-4.0.39.Final.38bdf86, netty-transport-rxtx=netty-transport-rxtx-4.0.39.Final.38bdf86, netty-transport-sctp=netty-transport-sctp-4.0.39.Final.38bdf86, netty-transport-udt=netty-transport-udt-4.0.39.Final.38bdf86]
INFO [main] 2017-01-21 13:57:49,369 Server.java:160 - Starting listening for CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
INFO [main] 2017-01-21 13:57:49,429 CassandraDaemon.java:521 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
Get rid of JVM_OPTS to "$JVM_OPTS -Djava.rmi.server.hostname=localhost.
Set listen_address and broadcast_rpc_address to local ip (ifconfig > ip-address-of-system).
Restart Cassandra.
Find and uncomment the following line from
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname="
to
JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=127.0.0.1"
If you face any difficulty on start cassandra, then delete the commit logs of datastax,
C:\Program Files\DataStax-DDC\data\commitlog
Check your system Memory i had a same issue but after increasing RAM 4GB its working properly.
Connection refused can have multiple causes, the most common one being that the application to connect to is not there. Check this using
sudo service cassandra status # exit by pressing 'q'
If it says active (exited) in bold then Cassandra is not even running!
Check Cassandra's log for error messages:
grep -A2 ERROR /var/log/cassandra/system.log
Watch htop after you sudo service cassandra restart -- if it fills up all of your available memory, Cassandra will die without an error message. On my EC2 instance an empty Cassandra takes up about 1.3 GB of RAM, which would be too little for a t2.nano or t2.micro instance.

How to debug long-running mapreduce jobs with millions of writes?

I am using the simple control.start_map() function of the appengine-mapreduce library to start a mapreduce job. This job successfully completes and shows ~43M mapper-calls on the resulting /mapreduce/detail?mapreduce_id=<my_id> page. However, this page makes no mention of the reduce step or any of the underlying appengine-pipeline processes that I believe are still running. Is there some way to return the pipeline ID that this calls makes so I can look at the underlying pipelines to help debug this long-running job? I would like to retrieve enough information to pull up this page: /mapreduce/pipeline/status?root=<guid>
Here is an example of the code I am using to start up my mapreduce job originally:
from third_party.mapreduce import control
mapreduce_id = control.start_map(
name="Backfill",
handler_spec="mark_tos_accepted",
reader_spec=(
"third_party.mapreduce.input_readers.DatastoreInputReader"),
mapper_parameters={
"input_reader": {
"entity_kind": "ModelX"
},
},
shard_count=64,
queue_name="backfill-mapreduce-queue",
)
Here is the mapping function:
# This is where we keep our copy of appengine-mapreduce
from third_party.mapreduce import operation as op
def mark_tos_accepted(modelx):
# Skip users who have already been marked
if (not modelx
or modelx.tos_accepted == myglobals.LAST_MATERIAL_CHANGE_TO_TOS):
return
modelx.tos_accepted = user_models.LAST_MATERIAL_CHANGE_TO_TOS
yield op.db.Put(modelx)
Here are the relevant portions of the ModelX:
class BackupModel(db.Model):
backup_timestamp = db.DateTimeProperty(indexed=True, auto_now=True)
class ModelX(BackupModel):
tos_accepted = db.IntegerProperty(indexed=False, default=0)
For more context, I am trying to debug a problem I am seeing with writes showing up in our data warehouse.
On 3/23/2013, we launched a MapReduce job (let's call it A) over a db.Model (let's call it ModelX) with ~43M entities. 7 hours later, the job "finished" and the /mapreduce/detail page showed that we had successfully mapped over all of the entities, as shown below.
mapper-calls: 43613334 (1747.47/sec avg.)
On 3/31/2013, we launched another MapReduce job (let's call it B) over ModelX. 12 hours later, the job finished with status Success and the /mapreduce/detail page showed that we had successfully mapped over all of the entities, as shown below.
mapper-calls: 43803632 (964.24/sec avg.)
I know that MR job A wrote to all ModelX entities, since we introduced a new property that none of the entities contained before. The ModelX contains an auto_add property like so.
backup_timestamp = ndb.DateTimeProperty(indexed=True, auto_now=True)
Our data warehousing process runs a query over ModelX to find those entities that changed on a certain day and then downloads those entities and stores them in a separate (AWS) database so that we can run analysis over them. An example of this query is:
db.GqlQuery('select * from ModelX where backup_timestamp >= DATETIME(2013, 4, 10, 0, 0, 0) and backup_timestamp < DATETIME(2013, 4, 11, 0, 0, 0) order by backup_timestamp')
I would expect that our data warehouse would have ~43M entities on each of the days that the MR jobs completed, but it is actually more like ~3M, with each subsequent day showing an increase, as shown in this progression:
3/16/13 230751
3/17/13 193316
3/18/13 344114
3/19/13 437790
3/20/13 443850
3/21/13 640560
3/22/13 612143
3/23/13 547817
3/24/13 2317784 // Why isn't this ~43M ?
3/25/13 3701792 // Why didn't this go down to ~500K again?
3/26/13 4166678
3/27/13 3513732
3/28/13 3652571
This makes me think that although the op.db.Put() calls issued by the mapreduce job are still running in some pipeline or queue and causing this trickle effect.
Furthermore, if I query for entities with an old backup_timestamp, I can go back pretty far and still get plenty of entities, but I would expect all of these queries to return 0:
In [4]: ModelX.all().filter('backup_timestamp <', 'DATETIME(2013,2,23,1,1,1)').count()
Out[4]: 1000L
In [5]: ModelX.all().filter('backup_timestamp <', 'DATETIME(2013,1,23,1,1,1)').count()
Out[5]: 1000L
In [6]: ModelX.all().filter('backup_timestamp <', 'DATETIME(2012,1,23,1,1,1)').count()
Out[6]: 1000L
However, there is this strange behavior where the query returns entities that it should not:
In [8]: old = ModelX.all().filter('backup_timestamp <', 'DATETIME(2012,1,1,1,1,1)')
In [9]: paste
for o in old[1:100]:
print o.backup_timestamp
## -- End pasted text --
2013-03-22 22:56:03.877840
2013-03-22 22:56:18.149020
2013-03-22 22:56:19.288400
2013-03-22 22:56:31.412290
2013-03-22 22:58:37.710790
2013-03-22 22:59:14.144200
2013-03-22 22:59:41.396550
2013-03-22 22:59:46.482890
2013-03-22 22:59:46.703210
2013-03-22 22:59:57.525220
2013-03-22 23:00:03.864200
2013-03-22 23:00:18.040840
2013-03-22 23:00:39.636020
Which makes me think that the index is just taking a long time to be updated.
I have also graphed the number of entities that our data warehousing downloads and am noticing some cliff-like drops that makes me think that there is some behind-the-scenes throttling going on somewhere that I cannot see with any of the diagnostic tools exposed on the appengine dashboard. For example, this graph shows a fairly large spike on 3/23, when we started the mapreduce job, but then a dramatic fall shortly thereafter.
This graph shows the count of entities returned by the BackupTimestamp GqlQuery for each 10-minute interval for each day. Note that the purple line shows a huge spike as the MapReduce job spins up, and then a dramatic fall ~1hr later as the throttling kicks in. This graph also shows that there seems to be some time-based throttling going on.
I don't think you'll have any reducer functions there, because all you've done is start a mapper. To do a complete mapreduce, you have to explicitly instantiate a MapReducePipeline and call start on it. As a bonus, that answers your question, as it returns the pipeline ID which you can then use in the status URL.
Just trying to understand the specific problem. Is it that you are expecting a bigger number of entities in your AWS database? I would suspect that the problem lies with the process that downloads your old ModelX entities into an AWS database, that it's somehow not catching all the updated entities.
Is the AWS-downloading process modifying ModelX in any way? If not, then why would you be surprised at finding entities with an old modified time stamp? modified would only be updated on writes, not on read operations.
Kind of unrelated - with respect to throttling I've usually found a throttled task queue to be the problem, so maybe check how old your tasks in there are or if your app is being throttled due to a large amount of errors incurred somewhere else.
control.start_map doesn't use pipeline and has no shuffle/reduce step. When the mapreduce status page shows its finished, all mapreduce related taskqueue tasks should have finished. You can examine your queue or even pause it.
I suspect there are problems related to old indexes for the old Model or to eventual consistency. To debug MR, it is useful to filter your warnings/errors log and search by the mr id. To help with your particular case, it might be useful to see your Map handler.

Outcome field in LoadTest2010's LoadTestTestResults table

I'm running load tests in Visual Studio 2010 Ultimate, and I'm trying to build some custom reporting tools. In the LoadTestTestResults table, there's a column labeled Outcome. I've seen it have the values 0, 1, 3, and (mostly) 10. But I can't find anything that explains what the different values mean.
I think that 10 is a success outcome, according to a comment in Prc_GetUserTestDetail. No clue on the others -- they don't seem to match up with any numbers in the VS summary.
What do these outcome codes mean?
I contacted a Microsoft developer from the MSDN blog on VS load testing and asked about this. Here's the information I got back, in case anybody else needs it:
The Outcome field is an enum that stores the status of an individual test case within a load test run. It can have values from 0 - 13.
0 - Error: There was a system error while we were trying to execute a test.
1 - Failed: Test was executed, but there were issues. Issues may involve exceptions or failed assertions.
2 - Timeout: The test timed out.
3 - Aborted: Test was aborted. This was not caused by a user gesture, but rather by a framework decision.
4 - Inconclusive: Test has completed, but we can't say if it passed or failed. May be used for aborted tests...
5 - PassedButRunAborted: Test was executed w/o any issues, but run was aborted.
6 - NotRunnable: Test had its chance for been executed but was not, as ITestElement.IsRunnable == false.
7 - NotExecuted: Test was not executed. This was caused by a user gesture - e.g. user hit stop button.
8 - Disconnected: Test run was disconnected before it finished running.
9 - Warning: To be used by Run level results. This is not a failure.
10 - Passed: Test was executed w/o any issues.
11 - Completed: Test has completed, but there is no qualitative measure of completeness.
12 - InProgress: Test is currently executing.
13 - Pending: Test is in the execution queue, was not started yet.

Resources