How to make AWS IoT Rule to split agregated data before sending then to Timestream - aws-iot

integrating exiting IoT Rule based solution with new AWS Timestream feature I came across issue, that IoT Topic messages send are aggregated. This is not integrating well with Timestream DB, as it accepts data send as single measurements.
There is possible to put some SQL code into IoT Rule Action, but I think it is limited to handle only single measurement messages. Is my assumption true?
I don't think changing code on edge devices to go back to send single measurements only is an option. What would be recommended to tackle the issue on AWS side?

If your Dimensions stay constant, than a single rule is all you need. EVERY attribute you SELECT in the rule will be a measurement. So multi-measurement writes ARE supported. The documentation also has exactly this example.

Related

Need advice on migrating from Flink DataStream Job to Flink Stateful Functions 3.1

I have a working Flink job built on Flink Data Stream. I want to REWRITE the entire job based on the Flink stateful functions 3.1.
The functions of my current Flink Job are:
Read message from Kafka
Each message is in format a slice of data packets, e.g.(s for slice):
s-0, s-1 are for packet 0
s-4, s-5, s-6 are for packet 1
The job merges slices into several data packets and then sink packets to HBase
Window functions are applied to deal with disorder of slice arrival
My Objectives
Currently I already have Flink Stateful Functions demo running on my k8s. I want to do rewrite my entire job upon on stateful functions.
Save data into MinIO instead of HBase
My current plan
I have read the doc and got some ideas. My plans are:
There's no need to deal with Kafka anymore, Kafka Ingress(https://nightlies.apache.org/flink/flink-statefun-docs-release-3.0/docs/io-module/apache-kafka/) handles it
Rewrite my job based on java SDK. Merging are straightforward. But How about window functions?
Maybe I should use persistent state with TTL to mimic window function behaviors
Egress for MinIO is not in the list of default Flink I/O Connectors, therefore I need to write my custom Flink I/O Connector for MinIO myself, according to https://nightlies.apache.org/flink/flink-statefun-docs-release-3.0/docs/io-module/flink-connectors/
I want to avoid Embedded module because it prevents scaling. Auto scaling is the key reason why I want to migrate to Flink stateful functions
My Questions
I don't feel confident with my plan. Is there anything wrong with my understandings/plan?
Are there any best practice I should refer to?
Update:
windows were used to assemble results
get a slice, inspect its metadata and know it is the last one of the packet
also knows the packet should contains 10 slices
if there are already 10 slices, merge them
if there are not enough slices yet, wait for sometime (e.g. 10 minutes) and then either merge or record packet errors.
I want to get rid of windows during the rewrite, but I don't know how
Background: Use KeyedProcessFunctions Rather than Windows to Assemble Related Events
With the DataStream API, windows are not a good building block for assembling together related events. The problem is that windows begin and end at times that are aligned to the clock, rather than being aligned to the events. So even if two related events are only a few milliseconds apart they might be assigned to different windows.
In general, it's more straightforward to implement this sort of use case with keyed process functions, and use timers as needed to deal with missing or late events.
Doing this with the Statefun API
You can use the same pattern mentioned above. The function id will play the same role as the key, and you can use a delayed message instead of a timer:
as each slice arrives, add it to the packet that's being assembled
if it is the first slice, send a delayed message that will act as a timeout
when all the slices have arrived, merge them and send the packet
if the delayed message arrives before the packet is complete, do whatever is appropriate (e.g., go ahead and send the partial packet)

Logger/ data store recommendation

I am looking for a recommendation for the following scenario: we have a service that consists of, on a high level, a front-end web app serving API and web UI requests (the latter are less important) -- decomposing, putting them as tasks in queue for processing, and a number of worker services consuming the tasks from the queue and processing them. The API clients would poll for results asynchronously.
We need to be able to log pieces of information along the way (starting from the originating request, through intermediate outputs, to final results) so that they can be accessed later if needed (mostly to troubleshoot what went wrong for a given request).
Ultimately, what we need is:
To be used as a secure storage for information related to logging and short term auditing,
Low overhead insertion:
(Low) constant time insertion, either truly non-blocking or effectively non-blocking (guaranteed quick),
Very frequent insertion – think multiple inserts per one CF API call,
Retrieval used significantly less frequently, can be slow-ish,
Items need to be retrievable at least by ID, but...
Payloads are effectively text or binary
Full text search capability would be a plus,
Understanding the structure of the text, e.g. being able to query JSON
elements is a mild nice-to-have,
Data retention policies either built in or easy to implement.
"Secure" means we're processing personal information in several countries, usual regulations/ standards apply.
This can be software (open source, usable in commercial environment) that we'd host ourselves or an Amazon AWS service.
checkout, as a base for your app, sherlock on Sourceforge.net , it's an opensource a Log4J implementation, you could modify as you like, ie- containerize the headless tomcat server , it's a "Chain of Custody" "C2" compliant Rsyslog replacement server collector of syslog and syslogrelay data, which first stores the logs as flat files per source, then post processes and dumps the log data into a mysql db, thereafter there is an older web client with some regex support to search/filter data so you can get at the log data for forensics..
The guys that put this together with me came from Platespin (later sold to Novell) , actually the team that built this code successfully sold a dervitative work for decent cash right at the time they built it, and then went on to work for Tibco(later Mulesoft) and RIM(Blackberry, and now BMO)... so its solid code
here is the link...
https://sourceforge.net/projects/sherlock/
r2

Fast save/search update which database to use redis vs solr

Hello I have a project of implementing a smpp client which would connect to various telcos like a client aggregator to handle like 16 binds for each of the 5 SMSCs we currently. This will be either based on java or python (have not decided yet).
I am contemplating using a save and forward mode where messages sent by applications behind the smpp client would be stored (for billing and auditing purpose) and then looked up for update when smsc returns updates.
I have read about redis a while ago and it seems adequate for my scenario in terms of fast write. If my message Ids are my keys, would search and update of messages status be done nicely as well in redis?
I have also read about search capability of solr which would be nice for auditing/operation and some basic csv reporting.
It would be nice if anybody who has worked with both make a suggestion on which fits the most per my intended usage described here. I am also open to any other aside mongoDB(sensible default).
Thanks

Scaling WebSockets on Google Compute Engine

I would like to implement a chat system as part of a game I am developing on App Engine. To implement this, I would like to use WebSockets, and have clients connect to each other though a hub, in this case an instance of GCE. Assuming this game needed to scale to multiple instances on GCE, how would this work? If I had a client 1, and the load balancer directed that request of client 1 to instance A, and another client (2) came in and was directed to instance B, but those clients wanted to chat with each other, they would each be connected to different hubs, and would be unable to reach each other. How would this be set up to work with scale? Would I implement it using queues, where each instance listens on that queue, and if so, how would I do that?
Google Play Game Services offers exactly the functionality that you want but in regard to Android and ios clients. So this option may not be compatible with your game tech design.
In general you're reasoning correctly. Messages from client who want to talk to each other will most of the time hit different server instances. What you want to do is to make instances handle the communication between users. Pub/sub (publish-subscribe pattern) is very suitable pattern in this scenario. Roughly:
whenever there's a message directed to client X a message is published on the channel X,
whenever client X creates a session, instance handling it subscribes to channel X.
You can use one of many existing solutions for starters. It's very easy to set this up using redis. If you need something more low-level and more flexible check out zeromq.
You can expect single instance of either solution to be able to handle thousands of QPS.
Unfortunately I don't have any experience with scaling neither of these solutions so can't offer you any practical advice as to the limits of their scalability.
PS. There are also other topics you may want to explore such as: message persistence and failure recovery I didn't address here at all.
I didn't try to implement this yet but I'll probably have to soon, I think it should be fairly simple to handle it yourself.
You have: server 1 with list of clients and you have server 2 with another list of clients,
so if client wants to send data to another client which might be on server 2, you have to:
Lookup if the receiver is on current server - if it is, you just send it (standard)
Otherwise you send the same data to all other servers you have, so they would check their lists for particular client (or clients) and send data to them.

.NET CF mobile device application - best methodology to handle potential offline-ness?

I'm building a mobile application in VB.NET (compact framework), and I'm wondering what the best way to approach the potential offline interactions on the device. Basically, the devices have cellular and 802.11, but may still be offline (where there's poor reception, etc). A driver will scan boxes as they leave his truck, and I want to update the new location - immediately if there's network signal, or queued if it's offline and handled later. It made me think, though, about how to handle offline-ness in general.
Do I cache as much data to the device as I can so that I use it if it's offline - Essentially, each device would have a copy of the (relevant) production data on it? Or is it better to disable certain functionality when it's offline, so as to avoid the headache of synchronization later? I know this is a pretty specific question that depends on my app, but I'm curious to see if others have taken this route.
Do I build the application itself to act as though it's always offline, submitting everything to a local queue of sorts that's owned by a local class (essentially abstracting away the online/offline thing), and then have the class submit things to the server as it can? What about data lookups - how can those be handled in a "Semi-live" fashion?
Or should I have the application attempt to submit requests to the server directly, in real-time, and handle it if it itself request fails? I can see a potential problem of making the user wait for the timeout, but is this the most reliable way to do it?
I'm not looking for a specific solution, but really just stories of how developers accomplish this with the smoothest user experience possible, with a link to a how-to or heres-what-to-consider or something like that. Thanks for your pointers on this!
We can't give you a definitive answer because there is no "right" answer that fits all usage scenarios. For example if you're using SQL Server on the back end and SQL CE locally, you could always set up merge replication and have the data engine handle all of this for you. That's pretty clean. Using the offline application block might solve it. Using store and forward might be an option.
You could store locally and then roll your own synchronization with a direct connection, web service of WCF service used when a network is detected. You could use MSMQ for delivery.
What you have to think about is not what the "right" way is, but how your implementation will affect application usability. If you disable features due to lack of connectivity, is the app still usable? If you have stale data, is that a problem? Maybe some critical data needs to be transferred when you have GSM/GPRS (which typically isn't free) and more would be done when you have 802.11. Maybe you can run all day with lookup tables pulled down in the morning and upload only transactions, with the device tracking what changes it's made.
Basically it really depends on how it's used, the nature of the data, the importance of data transactions between fielded devices, the effect of data latency, and probably other factors I can't think of offhand.
So the first step is to determine how the app needs to be used, then determine the infrastructure and architecture to provide the connectivity and data access required.
I haven't used it myself, but have you looked into the "store and forward" capabilities of the CF? It may suit your needs. I believe it uses an Exchange mailbox as a message queue to send SOAP packets to and from the device.
The best way to approach this is to always work offline, then use message queues to handle sending changes to and from the device. When the driver marks something as delivered, for example, update the item as delivered in your local store and also place a message in an outgoing queue to tell the server it's been delivered. When the connection is up, send any queued items back to the server and get any messages that have been queued up from the server.

Resources