ESP32 AWS IoT transportStatus=-1 - c

I'm following the espressif docs for connecting a ESP32 to AWS IoT shadow. I'm using the example github.com/espressif/esp-aws-iot for shadow mqtt synchronisation. I set everything in the config but when I run it on the ESP32, I get the following error:
--- until here everything runs fine ---
--- mqtt connects to aws and it's success ---
--- and then ---
I coreMQTT: SUBSCRIBE topic $aws/things/MY_DEVICE_NAME/shadow/name/MY_SHADOW_NAME/delete/accepted to broker.
E coreMQTT: A single byte was not read from the transport: transportStatus=-1.
E coreMQTT: Receiving incoming packet length failed. Status=MQTTRecvFailed
E coreMQTT: Exiting process loop due to failure: ErrorStatus=MQTTRecvFailed
E coreMQTT: MQTT_ProcessLoop returned with status = 4.
I tried increasing the network buffer for MQTT packets via the config to 4096 but that didn't help. Anyone know what the problem might be?

I found the error. Something was messed up with the policy I created for that device. I'm still not 100% sure what exactly was the problem, but when I removed the code that tried to delete the shadow first before the actual shadow operations begin, the example worked fine. I can now pub/sub to my shadow.
For anyone having the same problem, comment the code from line 691-746 (part for deletion).

Related

XBee3 Coordinator cannot find End_Device during Network Discovery

Currently, I am running XBee3 International Mesh Kit and tried following the given example in the documentation.
https://www.digi.com/resources/documentation/Digidocs/90001942-13/?utm_source=packaging&utm_medium=insert&utm_campaign=xbee3&utm_content=XBeeZigbeeMeshKit#tasks/t_configure_zigbee_modules.htm%3FTocPath%3DGetting%2520started%2520with%2520XBee%2520Zigbee%7CExample%253A%2520basic%2520communication%7C_____4
I tried implementing the Network Discovery for the three devices using Micropython REPL.
The following is the configuration I have for my 3 XBee devices.
Parameters for the 3 XBee Devices
I am running the following code on the Coordinator and End-Device for network discovery
https://www.digi.com/resources/documentation/digidocs/90001539/#reference/r_node_discovery_micropython.htm%3FTocPath%3DGet%2520started%2520with%2520MicroPython%7CMicroPython%
Now when I try to run the python code for Network discovery on the Coordinator (XBee_A), it does not find any End-Device in the node list. The only Router comes up in the search.
But when I run it on the End-Device, it does find the Coordinator and Router.
My idea was to get the details of the End-Device using network discovery, gets it network ID (NI) and then send a command to the device.
Do you know what I am doing wrong?
Have you tried going into command mode and executing an ATND? What are your sleep parameters? If the end device is sleeping longer than the node discovery time limit, it won't be awake to respond to the discovery broadcast.
Try reducing the sleep duration on the end device, or increasing the node discovery timeout (ATNT) on the Coordinator.
I ran into the same issue where the coordinator just couldn't discover any node in the same network. Spending hours digging through other documentations and just to sadly realize the updated XCTU has both DH and DL defaulted to 0 so the default dash for DL in Digi's Mesh Kit User Guide should manually be set to FFFF instead for the coordinator to work. Not the best documentation and there are quite a few errors and information missing out then leave you guessing or pay them for support.

Setting Cinterion BGS2-W modem certificates in code

I'm trying to use AT commands to setup a BGS2-W modem on a custom board to connect to a site over TLS, but the modem is not reacting to my commands and no certificates are being set.
I'm using the command
AT^SBNW
to send the commands as documented in Transport Layer Security for Client
TCP/IP Services doc (https://ptelectronics.ru/wp-content/uploads/organizatsiya_bezopasnogo_ssl-soedineniya.pdf#page=8).
Unfortunately, the document provides no examples, and I haven't been able to find any samples showing the usage of this command online.
The document linked has a java commandline tool attached that will send a cert from a PC, however I am unable to use this tool (I don't have the connection to the modem).
If anyone has any idea's on how to use this command I could very much use the help.
Note: I'm trying to set the certificates from within code running on a PIC18 - this isn't a final incarnation, I just need the certificates loaded so I can connect to our secure server.
Well, guess this one isn't going to find an answer anytime soon :)
So, its fairly easy to capture the output from the javatool - I'm using Com0Com to emulate 2 connected ports, then using termite to manually input on one port while telling the java app to connect to the other.
The first query from the java app expects an "OK" response, I find it easiest to send the response before starting the java app (I guess it gets cached in the recv buffer of the emulated port).
The javatool then sends "AT^SBNW=is_cert,1\r", and you can type in the reply in termite "SECURE CMD READY: SEND COMMAND ..."
After this a large binary dump comes through. You can decode the dump using the structure described in Application Note 62 (https://ptelectronics.ru/wp-content/uploads/organizatsiya_bezopasnogo_ssl-soedineniya.pdf). That should get all the data required to generate the same binary within code.

Failed Publish when subscribed to same topic as publisher?

I am currently working on a embedded c project using mqtt 3.1.1 and mosquitto broker 1.4.3. the issue I have is when the client board is publishing and subscribed to the same topic, after a random number of messages the client is blocked and the connection gets timed-out.
I am trying to send a string message, 25 bytes, over 3G network. Using QOS2 on both pub & sub, I have tried different settings on the client for keepalive (15s <-> 120s) and have a delay between each message (2000ms <-> 300000ms), on the broker I have tried different settings also, but nothing seem to work, is it possible to send messages using QOS2 over a 3G network or am I expecting too much?
We want to guarantee the transfer of some data that is critical so if this is not possible on mqtt is there a better alternative?
A keepalive of 120ms sounds bogus.
Keepalive is there for the broker to detect that a client may have gone missing, without having to wait for the TCP connection to time out. You would typically use a keepalive in the range of seconds, if not minutes.
With a keepalive of 120ms, you have to send a PING packet at least every 100ms or so (or do any other MQTT exchange in that time frame), so it might explain why you are introducing so much latency in your scenario – and probably killing your 3G data plan too ;-)
I suggest you start using a keep-alive of 30s to see if that improves things.

MQSUB ended with reason code 2429 in pub sub

I am using IBM WebSphere MQ to set up a durable subscription for Pub/Sub. I am using their C APIs. I have set up a subscription name and have MQSO_RESUME in my options.
When I set a wait interval for my subscriber and I properly close my subscriber, it works fine and restarts fine.
But if I force crash my subscriber (Ctrl-C) and I try to re open it, I get a MQSUB ended with reason code 2429 which is MQRC_SUBSCRIPTION_IN_USE.
I use MQWI_UNLIMITED as my WaitInterval in my MQGET and use MQGMO_WAIT | MQGMO_NO_SYNCPOINT | MQGMO_CONVERT as my MQGET options
This error pops up only when the topic has no pending messages for that subscription. If it has pending messages that the subscription can resume, then it resumes and it ignores the first published message in that topic
I tried changing the heartbeat interval to 2 seconds and that didn't fix it.
How do I prevent this?
This happens because the queue manager has not yet detected that your application has lost its connection to the queue manager. You can see this by issuing the following MQSC command:-
DISPLAY CONN(*) TYPE(ALL) ALL WHERE(APPLTYPE EQ USER)
and you will see your application still listed as connected. As soon as the queue manager notices that your process has gone you will be able to resume the subscription again. You don't say whether your connection is a locally bound connection or a client connection, but there are some tricks to help speed up the detection of connections depending on the type of connection.
You say that in the times when you are able to resume you don't get the first message, this is because you are retrieving this messages with MQGMO_NO_SYNCPOINT, and so that message you are not getting was removed from the queue and was on its way down the socket to the client application at the time you forcibly crashed it, and so that message is gone. If you use MQGMO_SYNCPOINT, (and MQCMIT) you will not have that issue.
You say that you don't see the problem when there are still messages on the queue to be processed, that you only see it when the queue is empty. I suspect the difference here is whether your application is in an MQGET-wait or processing a message when you forcibly crash it. Clearly, when there are no messages left on the queue, you are guarenteed with the use of MQWL_UNLIMITED, to be in the MQGET-wait, but when processing messages, you probably spend more time out of the MQGET than in it.
You mention tuning down the heartbeat interval, to try to reduce the time frame, this was a good idea. You said it didn't work. Please remember that you have to change it at both ends of the channel, or you will still be using the default 5 minutes.

Azure StorageClient Transient Connection testing - hanging

I am testing my WPF application connecting to Azure Blob Storage to download a bunch of images using TPL (tasks).
It is expected that in Live environment, there will be highly transient connection to the internet at deployed locations.
I have set Retry Policy and time-out in BlobRequestOptions as below:
//Note the values here are for test purposes only
//CloudRetryPolicy is a custom method returning adequate Retry Policy
// i.e. retry 3 times, wait 2 seconds between retries
blobClient.RetryPolicy = CloudRetryPolicy(3, new TimeSpan(0, 0, 2));
BlobRequestOptions bro = new BlobRequestOptions() { Timeout = TimeSpan.FromSeconds(20) };
blob.DownloadToFile(LocalPath, bro);
The above statements are in a background task that work as expected and I have appropriate exception handling in background task and the continuation task.
In order to test exception handling and my recovery code, I am simulating internet disconnection by pulling out the network cable. I have hooked up a method to System.Net.NetworkChange.NetworkAvailabilityChanged event on UI thread and I can detect connection/disconnection as expected and update UI accordingly.
My problem is: If I pull the network cable while a file is being downloaded (via blob.DownloadToFile), the background thread just hangs. It does not timeout, does not crash, does not throw exception, nothing!!! As I write, I have been waiting ~30 mins and no response/processing has happened in relation to background task.
If I pull the network cable, before download starts, execution is as expected. i.e. I can see retries happening, exceptions raised and passed ahead and so on.
Has anyone experienced similar behaviour? Any tips/suggestions to overcome this behaviour/problem?
By the way, I am aware that I can cancel the download task on detection of network connectivity loss, but I do not want to do this as network connectivity can get restored within the time-out duration and the download process can continue from where it was interrupted. I have tested this auto resumption and works nicely.
Below is a rough indication of my code structure (not syntactically correct, just a flow indication)
btnClick()
{
declare background_task
attach continuewith_task to background task
start background task
}
background_task()
{
try
{
... connection setup ...
blob.DownloadToFile(LocalPath, bro);
}
catch(exception ex)
{
... exception handling ....
// in case of connectivity loss while download is in progress
// this block is not getting executed
// debugger just sits idle without a current statement
}
}
continuewith_task()
{
check if antecedent task is faulted
{
... do recovery work ...
// this is working as expected if connectivity is lost
// before download starts
// this task does not get called if connectivity is lost
// while file transfer is taking place
}
else
{
.. further processing ...
}
}
Avkash is correct I believe. Also, to be clear, you will basically never see that network removed error so not a lot of point in testing for it. You will see a ton of connection rejected, conflicts, missing resources, read-only accounts, throttles, access denied, even DNS resolution failures depending on how you are handling storage accounts. You should test for those.
That being said, I would suggest you do not use the RetryPolicy at all with blob or table storage. For most of the errors you will actually encounter, they are not retryable to begin with (e.g. 404, 409, 403, etc.). When you have a retry policy in place, it will by default actually try it 4 more times over the next 2 minutes. There is no point in retrying bad credentials for instance.
You are far better off to simply handle the error and retry selectively yourself (timeouts and throttle are about the only thing that make sense here).
Your problem is mainly caused because Azure storage client libraries uses file streaming classes underneath and that why the API hang is not directly related with Windows Azure Blob client library. Calling file streaming API directly over network you can see the exact same behavior when network cable is suddenly removed, however removing network gracefully will return different behavior.
If you search on internet you will find streaming classes does not detect the network loss and that's why in your code you can check the network disconnect event and then stop the background streaming thread.

Resources