Best practice for Apache Flink for user defined alerts - apache-flink

Let's say my Flink job receives a stream of Stock Prices (as an example) and issues alert if lets say a Stock drops below a certain price. Users can add or remove these alert criteria. For example user abc#somemail.com creates a rule to be alerted if price of GME drops below $100. How can my Flink job dynamically keep track of all these alert criteria in a scalable manner?
I could create an API which my Flink job could call to get all of the updated alert criteria but that would mean calling the API numerous times to keep every thing up to date.
Or I could create a permanent table with Flink Table API, which another Flink job updates as soon as users creates a new alert criteria.
What would be the best practice for this use case?
Notes:
Alert should be issued with minimal latency
Alert criteria should be updated as soon as user creates it.

Here's a design sketch for a purely streaming approach:
alertUpdates = alerts
.keyBy(user)
.process(managePreviousAlerts) // uses MapState<Stock, Price>
.keyBy(stock, price)
priceUpdates = prices
.keyBy(stock)
.process(managePriceHistory)
.keyBy(stock, price)
alertUpdates
.connect(priceUpdates)
.process(manageAlertsAndPrices) // uses MapState<User, Boolean>
managePreviousAlerts maintains a per-user MapState from stocks to alert prices. When a new alert arrives, find the existing alert for this stock (for this user), if any. Then emit two AlertUpdates: a RemoveAlert event for this (user, stock, oldAlertPrice) and an AddAlert event for this (user, stock, newAlertPrice).
managePriceHistory keeps some per-stock pricing history in state, and uses some business logic to decide if the incoming price is a change that merits triggering alerts. (E.g., maybe you only alert if the price went down.)
manageAlertsAndPrices maintains a per-stock, per-price MapState, keyed by user.
The keys of this MapState are all of the users w/ alerts for this stock at this price. Upon receiving a PriceUpdate, alert all of these users by iterating over the keys of the MapState.
Upon receiving a RemoveAlert, remove the user from the MapState.
Upon receiving an AddAlert, add the user to MapState.
This should scale well. The latency will be governed by the two network shuffles caused by the keyBys.

I think this depends on how do You approach generating alerts in general. My first idea would be to use Kafka to store the new alerts, so that Flink can receive them as a Stream. Then, depending on the requirements You could simply broadcast stream of alerts and connect it with stream of Stock Prices. This should allow You to scale pretty well.
But if You are using Table API, then using the external Table to store the data may also be a good idea, then You could take a look at something like that.

Related

How to show last message when querying chat list in chat app using DynamodB

I wanted to build a chat app using DynamoDB, but having a hard time designing an architecture.
So, I do not need a complicated chat app like telegram, its rather simple. These are the queries that I need:
List chats for user (each chat also has lastMessageTimestamp, unreadCound and lastMessage)
List chat messages for chat
List users of chat (this is optional)
So far, I have come up with this design
And queries to get data
The problem is that, to have data about lastMessage and unreadCount, I need to update 2 rows when creating the message. And transaction should be used for that, but I do not think that DynamoDB good for high transaction apps. Is there better way to do this (maybe using different technology)?
P.S. I know I should use RDB until I hit the bottleneck, but I have done this using RDB and now wanna try it using NoSql. I also had a look at MongoDB, but it does not support transaction if I have different schemas for chat and messages and want to update them in sync. I also may use streaming in DynamoDb to update values, but that's not going to be real time(
Update
I also could embed messages in chat in mongodb, but is this scalable? I can push message as stack so would be easy to query latest messages, but what about pagination or infinite scroll, is there a way to make these queries fast? Also, what if embedded messages exceed the document size limit, how to scale then?
DynamoDB is definitely a suitable database for your needs here, but I think the design you've proposed isn't the right approach.
Your requirements are (I've split some up from your original post):
List chats for user
List chat messages for chat
List users of chat (this is optional)
Get last message for a chat
Get unread count for a chat+user
If you have a DDB table with:
PK: chat_id
SK: timestamp:message_id
And a GSI with:
PK: user_id
SK: timestamp:message_id
You can do a query for chat_id to complete requirement #2, getting all messages in a chat, sorted by time posted.
You can have a second table that handles permissions like:
PK: user_id
SK: chat_id
With a GSI:
PK: chat_id
SK: user_id
You can do a query on the permissions table to get all chat_ids for a user_id, and a query on the permissions gsi table to get all user_ids in a chat, satisfying requirements 2 & 3.
For requirement 4, this is pretty easy as you can just do a query on the chats table, with a max count of 1, which will get you the last message and the time it was last posted.
Requirement 5 is a little more tricky, but if you can keep track of the last time a specific user has viewed a chat, you can do a query with a range expression on the sort key such as timePosted >= timeLastSeen, and the number of messages you get is the unread count. It makes sense to me to store the time last viewed on client side, but if you want this to be stored server side you could make a third table.
All the operations above are highly scalable and you won't run into any concurrency issues, even with 100 users in the same chat.

Best practice for handling user names in notifications in DB

I would like to create a notification document on Firebase (Cloud Firestore) which includes a "sender" display name (eg. Anonymous128 sent you a message). This name is prone to changing.
What is the best practice to dynamically update the name if it does change? Should I just store userId, and pull the name up every time I'm querying notifications from the database? Or would it be better to update all notifications belonging to a user if they change their display name?
Thanks!
If reading notifications is much more frequent than a user updating their name, then I'd recommend storing sender's name in notification documents as that'll save you plenty of read operations that'll you'll spend on fetching user's name every time.
This does mean that you'll have to update plenty of documents when a user updates their name. Usually there's some rate limit to change user name so this operation should not be much frequent. Also the term notification seems like you'll be deleting the document after the receiver has read the message. If yes, then the update costs should reduce too.
Alternatively, you just store userId in notifications documents. When you fetch all the notifications of current user, parse an array of unique userIds from them and then query senders' documents. This ensures you fetch document of each user only once and not for every notification they have sent. Additionally, you can cache these usernames like { uid: "name" } locally and periodically clear that.

Using Hash+Range key for storing CloudWatch alarm in DynamoDB

I configured a CloudWatch alarm. I need this event to be captured and logged in a database for reporting purpose. Dynamo is chosen because of the low volume of incoming read/write.
What I need to capture:
AWS account ID (12345678912345)
Datetime of the event
Event ID
I have multiple AWS accounts, and I want to store them all in a single table (I could store them in separate table as well, but given the low volume not sure if that's really that useful).
So should I use Hash+Range?
Hash: <account_id>
Range: <datetime>
This way my understanding is DynamoDB will group/order based on the range.
My queries would only be:
get all of the events for account_id / for all accounts
get all of the events since x time for account_id / for all accounts
Is this a good design? Do I need a separate index?
As per your query patterns, you approach looks correct.
If you want data for only one account_id, do a Query. You can also supply a KeyConditionExpression on range-key to only get events that happened after given timestamp.
If you want data for list of account_ids, run a Scan. (You can't do a BatchGetItem, because it needs both hash-key and range-key).

How facebook notify wall updates to fans?

I working on social media app and I want users to have option to be notified when page create new post (as in facebook). First, I create a Notification table that contain :
Id(PK)
UserId
PageId
PostId
ReadBit
Date
But this doesn't make sense. If page have 1000 or even 500 interested fans, it doesn't not logically to create 1000 or 500 record for every interested fan. Is there another method to do that ?
There are two types of information you are talking about: Pages and posts.
When a user opts in for a page, this should be persistent, so you'll need a database entry representing this. The standard way to go is to have one record per user and per page she subscribed to:
Subscription(id, user_id, page_id)
Depending on the exact requirements, there might exist simpler solutions. E.g., if the pages are about topics, and the user doesn't subscribe a page but a topic (like there are 50 pages about cars, and 70 about computers), it would be sufficient to store a subscription per user and topic. But your text doesn't indicate this.
The second question is how to track the notification process when a post has been made to a page. Strictly speaking, you don't need a database record for each such notification. When a page has changed, look up all subscribers using the Subscription table, and in a loop on them generate the notifications.
Only in case that you need some additional persistent information about each such notification you will need a record per notification, i.e.
Notification(id, subscription_id, ...)
This might be the case if you need to store the timestamp when the notification was sent, or if you have some status information for a notification process, e.g. whether the user has reacted on the notification.

Creating a Notifications type feed in GAE Objectify

I'm working on a notification feed for my mobile app and am looking for some help on an issue.
The app is a Twitter/Facebook like app where users can post statuses and other users can like, comment, or subscribe to them.
One thing I want to have in my app is to have a notifications feed where users can see who liked/comment on their post or subscribed to them.
The first part of this system I have figured out, when a user likes/comments/subscribes, a Notification entity will be written to the datastore with details about the event. To show a users Notification's all I have to do is query for all Notification's for that user, sort by date created desc and we have a nice little feed of actions other users took on a specific users account.
The issue I have is what to do when someone unlikes a post, unsubscribes or deletes a comment. Currently, if I were to query for that specific notification, it is possible that nothing would return from the datastore because of eventual consistency. We could imagine someone liking, then immediate unliking a post (b/c who hasn't done that? =P). The query to find that Notification might return null and nothing would get deleted when calling ofy().delete().entity(notification).now(); And now the user has a notification in their feed saying Sally liked his post when in reality she liked then quickly unliked it!
A wrench in this whole system is that I cannot delete by Key<Notification>, because I don't really have a way to know id of the Notification when trying to delete it.
A potential solution I am experimenting with is to not delete any Notifications. Instead I would always write Notification's and simply indicate if the notification was positive or negative. Then in my query to display notifications to a specific user, I could somehow only display the sum-positive Notification's. This would save some money on datastore too because deleting entities is expensive.
There are three main ways I've solved this problem before:
deterministic key
for example
{user-Id}-{post-id}-{liked-by} for likes
{user-id}-{post-id}-{comment-by}-{comment-index} for comments
This will work for most basic use cases for the problem you defined, but you'll have some hairy edge cases to figure out (like managing indexes of comments as they get edited and deleted). This will allow get and delete by key
parallel data structures
The idea here is to create more than one entity at a time in a transaction, but to make sure they have related keys. For example, when someone comments on a feed item, create a Comment entity, then create a CommentedOn entity which has the same ID, but make it have a parent key of the commenter user.
Then, you can make a strongly consistent query for the CommentedOn, and use the same id to do a get by key on the Comment. You can also just store a key, rather than having matching IDs if that's too hard. Having matching IDs in practice was easier each time I did this.
The main limitation of this approach is that you're effectively creating an index yourself out of entities, and while this can give you strongly consistent queries where you need them the throughput limitations of transactional writes can become harder to understand. You also need to manage state changes (like deletes) carefully.
State flags on entities
Assuming the Notification object just shows the user that something happened but links to another entity for the actual data, you could store a state flag (deleted, hidden, private etc) on that entity. Then listing your notifications would be a matter of loading the entities server side and filtering in code (or possibly subsequent filtered queries).
At the end of the day, the complexity of the solution should mirror the complexity of the problem. I would start with approach 3 then migrate to approach 2 when the fuller set of requirements is understood. It is a more robust and flexible approach, but complexity of XG transaction limitations will rear its head - but ultimately a distributed feed like this is a hard problem.
What I ended up doing and what worked for my specific model was that before creating a Notification Entity I would first allocate and ID for it:
// Allocate an ID for a Notification
final Key<Notification> notificationKey = factory().allocateId(Notification.class);
final Long notificationId = notificationKey.getId();
Then when creating my Like or Follow Entity, I would set the property Like.notificationId = notificationId; or Follow.notificationId = notificationId;
Then I would save both Entities.
Later, when I want to delete the Like or Follow I can do so and at the same time get the Id of the Notification, load the Notification by key (which is strongly consistent to do so), and delete it too.
Just another approach that may help someone =D

Resources