Can I create a rule that triggers on a query with a time delta? - aws-iot

For example my device reports on two on/off states: a and b. The device reports state 3 times per second.
Is it possible to create a rule in iot core that will trigger if a and b are switched on within 1 second of each other?
Or is the proper approach to save state history and trigger a lambda that will query recent state from DynamoDB for example?

AWS IoT rules are stateless so either the payload needs to have the necessary information or (as you suggest) the rule action (e.g. a lambda) would need to have some persistent storage to determine this.

Related

How to make AWS IoT Rule to split agregated data before sending then to Timestream

integrating exiting IoT Rule based solution with new AWS Timestream feature I came across issue, that IoT Topic messages send are aggregated. This is not integrating well with Timestream DB, as it accepts data send as single measurements.
There is possible to put some SQL code into IoT Rule Action, but I think it is limited to handle only single measurement messages. Is my assumption true?
I don't think changing code on edge devices to go back to send single measurements only is an option. What would be recommended to tackle the issue on AWS side?
If your Dimensions stay constant, than a single rule is all you need. EVERY attribute you SELECT in the rule will be a measurement. So multi-measurement writes ARE supported. The documentation also has exactly this example.

What is the best way to have a cache of an external database in Flink?

The external database consists of a set of rules for each key, these rules should be applied on each stream element in the Flink job. Because it is very expensive to make a DB call for each element and retrieve the rules, I want to fetch the rules from the database at initialization and store it in a local cache.
When rules are updated in the external database, a status change event is published to the Flink job which should be used to fetch the rules and refresh this cache.
What is the best way to achieve what I've described? I looked into keyed state but initializing all keys and refreshing the keys on update doesn't seem possible.
I think you can make use of BroadcastProcessFunction or KeyedBroadcastProcessFunction to achieve your use case. A detailed blog available here
In short: You can define the source such as Kafka or any other and then publish the rules to Kafka that you want the actual stream to consume. Connect the actual data stream and rules stream. Then the processBroadcastElement will stream the rules where you can update the state. Finally the updated state (rules) can be retrieved in the actual event streaming method processElement.
Points to consider: Broadcast state will be kept on the heap always, not in state store (RocksDB). So, it has to be small enough to fit in memory. Each slot will copy all of the broadcast state into its checkpoints, so all checkpoints and savepoints will have n (parallelism) copies of the broadcast state.
A few different mechanisms in Flink may be relevant to this use case, depending on your detailed requirements.
Broadcast State
Jaya Ananthram has already covered the idea of using broadcast state in his answer. This makes sense if the rules should be applied globally, for every key, and if you can find a way to collect and broadcast the updates.
Note that the Context in the processBroadcastElement() of a KeyedBroadcastProcessFunction method contains the method applyToKeyedState(StateDescriptor<S, VS> stateDescriptor, KeyedStateFunction<KS, S> function). This means you can register a KeyedStateFunction that will be applied to all states of all keys associated with the provided stateDescriptor.
State Processor API
If you want to bootstrap state in a Flink savepoint from a database dump, you can do that with this library. You'll find a simple example of using the State Processor API to bootstrap state in this gist.
Change Data Capture
The Table/SQL API supports Debezium, Canal, and Maxwell CDC streams, and Kafka upsert streams. This may be a solution. There's also flink-cdc-connectors.
Lookup Joins
Flink SQL can do temporal lookup joins against a JDBC database, with a configurable cache. Not sure this is relevant.
In essence David's answer summarizes it well. If you are looking for more detail: not long ago, I gave a webinar [1] on this topic including running code examples. [2]
[1] https://www.youtube.com/watch?v=cJS18iKLUIY
[2] https://github.com/knaufk/enrichments-with-flink

How do I get notified of platform events in SFCC?

What is the typical way of handling events such as new customer registered, cart updated, order posted from a cartridge in SFCC B2C?
Here I can see that some resources 'support server-side customization' by proving hooks, but the number of resources support customization is small and there are no hooks for 'customer registered' or 'cart updated' events there.
Vitaly, I assume you are speaking specifically about the OCAPI interface and not the traditional Demandware Script storefront interface. However, I will answer for both contexts. There is not a single interface that would grant you the ability to know when an event occurs. Furthermore, there are multiple interfaces that can trigger such events:
Open Commerce API (OCAPI)
If you wish to listen to and/or notify an external service of an event that is triggered using this interface, you must use the appropriate hook for the resource for which you want to track the creation or modification. This hook is written in Demandware Script (ECMAScript 5 + custom extensions)
Storefront Interface
Within the storefront interface lies an MVC architecture which is the most prevalent use case for Commerce Cloud B2C. There are a few versions of this MVC architecture but all of them sport several Controllers that handle various user interactions on the server-side. To track all the various mutations and creations of data objects you would need to add code to each of those Controllers. Perhaps more appropriately to the Models that Controllers use to create and mutate those data objects.
Imports
There are two ways to import data into the platform:
XML File Import
OCAPI Data API
Both of these import data with no way to trigger a custom behavior based on the result of their actions. You will be effectively blind to when the data was created or modified in many cases.
An approach to remediating this could be a job that looks for objects missing a custom attribute--that this job or customizations to both of the other interfaces set--and adds the custom attribute and/or updates another attribute with a timestamp. In addition to that activity, this job may need to loop over all objects to determine if an import activity changed anything since it last set the aforementioned custom attributes. This could be achieved with yet another custom attribute containing some sort of hash or checksum. This job will need to be running constantly and probably split into two parts that run at different intervals. It is not a performant nor scalable solution.
Instead, and ideally, all systems sending data through these import mechanisms would pre-set the custom attributes so that those fields are updated upon import.
Getting Data Out
In Salesforce Commerce Cloud you can export data either via synchronous external API calls within the storefront request & response context or via asynchronous batch jobs that run in the background. These jobs can write files, transfer them via SFTP, HTTPS, or make external API calls. There is also the OCAPI Data API which could allow you to know when something is added/modified based on polling the API for new data.
In many cases, you are limited by quotas that are in place to help maintain the overall performance of the system.
Approaches
There's a couple of different approaches that you can use to capture and transmit the data necessary to represent these sorts of events. They are summarized below.
An Export Queue
Probably the most performant option is an export queue. Rather than immediately notifying an external system of an event occurring, you can queue up a list of events that have happened and then transmit them to the third party system in a job that runs in the background. The queue is typically constructed using the system's Custom Object concept. As an event occurs you create a new Custom Object which would contain all the necessary information about the event and how to handle that event in the queue. You craft a job component that is added to a job flow that runs periodically. Every 15 minutes for example. This job component would iterate over the queue and perform whatever actions are necessary to transmit that event to the third party system. Once transmitted, the item is removed from the queue.
Just in Time Transmission
You must be careful with this approach as it has the greatest potential to degrade the performance of a merchant's storefront and/or OCAPI interface. As the event occurs, you perform a web service call to the third-party system that collects the event notifications. You must set a pretty aggressive timeout on this request to avoid impacting storefront or API performance too much if the third-party system should become unavailable. I would even recommend combining this approach with the Queue approach described above so that you can add failed API calls to the queue for resending later.
OCAPI Polling
In order to know when something is actually modified or created, you need to implement a custom attribute to track such timestamps. Unfortunately, while there is creationDate and lastModified DateTime stamps on almost every object, they're often not accessible from neither OCAPI nor DW Script APIs. Your custom attributes would require modification to both the OCAPI Hooks and the Storefront Controllers/Models to set those attributes appropriately. Once set, you can query for objects based on those custom attributes using the OCAPI Data API. A third-party system would connect periodically to query for new data objects since it last checked. Note that not all data objects are accessible via the OCAPI Data API and you may be limited on how you can query certain objects so this is by no means a silver bullet approach.
I wish you the best of luck, and should you need any support in making an appropriate solution, there are a number of System Integrator Partners available in the market. You can find them listed on AppExchange. Filter the Consultants by Salesforce B2C Commerce for a tiered list of partners.
Full disclosure: I work for one such partner: Astound Commerce

Querying Data from Apache Flink

I am looking to migrate from a homegrown streaming server to Apache Flink. One thing that we have is a Apache Storm like DRPC interface to run queries against the state held in the processing topology.
So for example: I have a bunch of sensors that I am running an moving average on. I want to run a query on the topology and return all the sensors where that average is above a fixed value.
Is there an equivalent in Flink, or if not, what is the best way to achieve equivalent functionality?
Out-of-box Flink does not come with a solution for querying the internal state of operations right now. You're lucky however, because there are two solutions: We did an example of a stateful word count example that allows querying the state. This is available here: https://github.com/dataArtisans/query-window-example
For one of the upcoming versions of Flink we are also working on a generic solution to the queryable state use case. This will allow querying the state of any internal operation.
Also, could it also suffice, in your case, to just periodically output the values to something like Elasticsearch using a Window Operation. The results could then simply be queried from Elasticsearch.
They are coming with Out-of-box solution called Queryable State in next release.
Here is an example
https://github.com/apache/flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/query/QueryableStateITCase.java
But I suggest you should read about it more first then see the example.

Translating change log into actual state in Apache Camel

I have two systems, call them A and B.
When some significant object changes in A, A sends it to B through Apache Camel.
However, I encountered one case, when A actually has change log of an object, while B must reflect only actual state of the object.
Moreover, change log in A can contain "future" records.
It means, that object's state change is scheduled on some moment in the future.
A user of the system A can edit this change log, remove change records, add new change records with any timestamp (in the past and in the future) and even update existing changes.
Of course, A sends these change records to B, but B needs only actual state of the object.
Note, that I can query objects from A, but A is a performance-critical system, and I therefore I am not going to query something, as it can cause additional load.
Also, API for querying data from A is overcomplicated and I would like to avoid it whenever possible.
I can see two problems here.
First is realizing whether the particular change in change log record may cause changing of the actual state.
I am going to store change log in an intermediate database.
As a change log record comes, I am going to add/remove/update it in the intermediate database and then calculate actual state of the object and send this state to B.
Second is tracking change schedule.
I couldn't invent anything except for running a periodical job in a constant interval (say, 15 minutes).
This job would scan all records that fall in the time interval from the last invocation until the current invocation.
What I like Apache Camel for is its component-based approach, when you only need to connect endpoints and get everyting work, with only a little amount of coding.
Is there any pre-existing primitives for this problem both in Apache Camel and in EIP?
I am actually working on a very similar use-case, where system A sends Snapshot and updates which require translation before sending to system B.
First, you need to trigger the mechanism for giving you the initial state (the "snapshot") from system A, the timer: component can start the one-time startup logic.
Now, you will receive the snapshot data (you didn't specify how, perhaps it is an ftp file or a jms endpoint). Validate the data, split it into items, and store each item of data in a local in-memory cache:, as Sergey suggests in his comment, keyed uniquely. Use an expire policy that is logical (e.g. 48 hours).
From there, continually process the "update" data from the ftp: endpoint. For each update, you will need to match the update with the data in the cache: and determine what (and when) needs to be sent to system B.
Data that needs to be sent to system B later will need be to be persisted in memory or in a database.
Finally, you need a scheduling mechanism to determine every 15 minutes if new data should be sent, you can easily use timer: or quartz: for this.
In summary, you can build this integration from following components: timer, cache, ftp, quartz plus some custom beans / processors to perform the custom logic.
The main challenges are handling data that is cached and then updated and working out the control mechanisms for what should happen on the initial connect, a disconnect, or if your camel application is restarted.
Good luck ;)

Resources