I am working on an application that requires checking the user's inbox for new messages every 5mins.
The current approach we've taken is to utilise the list.histories endpoint based on push notifications. We're running into the edge case where we get stale history IDs for accounts that go cold for a few weeks. I understand the remedy in this situation is to do a full sync using the list messages.
I was wondering if it was possible to just use list.messages to poll the list of messages every 5-10 mins using the list.messages endpoint with q filters to restrict the timeframe. The implementation would be something involving querying with overlapping timeframes of 1 min; The idea is that having that overlap would allow us figure out where we've left off and then stitch the sequence correctly. We will not be using pub/sub or list.histories any more.
The concerns i have are:
This approach isn't listed in guide.
Is it possible for a message with a history_id that is greater than the message that precedes it to have an internal date that is older?
Does anyone else have experience with this?
Related
Please help me to understand the functionality of Google cloud Pubsub subscription/num_undelivered_messages metric with pull subscription.
From docs: subscription/num_undelivered_messages is
Number of unacknowledged messages (a.k.a. backlog messages) in a
subscription. Sampled every 60 seconds. After sampling, data is not
visible for up to 120 seconds.
And for Pull delivery from docs
In pull delivery, your subscriber application initiates requests to
the Cloud Pub/Sub server to retrieve messages. The subscribing
application explicitly calls the pull method, which requests messages
for delivery.
Now I setup a pull subscription against a Google public topic named projects/pubsub-public-data/topics/taxirides-realtime which is suppose to continuously provide stream of taxi rides data.
Now my requirement is to calculate number of taxi rides in past 1 hour. The usual approach came in my mind is to pull all messages from topic and perform aggregation over it.
However while searching I found these 2 links link1 and link2 which I feel like can solve the problem but below question 1 is lingering as doubt for this solution and confuses me!
So overall my question is
1. How does a pub subscription finds value of num_undelivered_messages from a topic, even when subscription didn't made any pull call? Actually I can see this metric in stackdriver monitoring by filtering on subscription id.
What is the right way to calculate aggregate of number of messages present in a topic in a certain duration?
The number of undelivered messages is established based on when the subscription is created. Any messages published after that are messages that should be delivered to the subscription. Therefore, any of these messages not pulled and acked by the subscription will count toward num_undelivered_messages.
For solving your particular problem, it would be better to read the feed and aggregate the data. The stats like num_undelivered_messages are useful for examining the health of subscribers, e.g., if the count is building up, it could indicate that something is wrong with the subscribers or that the data published has changed in some way. You could look at the difference in the number between the end of your desired time interval and the beginning to get an estimate of the number of messages published in that time frame, assuming you aren't also consuming and acking any messages.
However, it is important to keep in mind that the time at which messages are published in this feed may not exactly correspond to the time at which a taxi ride occurred. Imagine there was an issue with the publisher and it was unable to publish the messages for a period of time and then once fixed, published all of the messages that had built up during that time. In this scenario, the timestamp in the messages themselves indicating when the taxi ride occurred would not match the time at which the message was received by Cloud Pub/Sub.
I am using Camel and I have a business problem. We consume order messages from an activemq queue. The first thing we do is check in our DB to see if the customer exists. If the customer doesn't exist then a support team needs to populate the customer in a different system. Sometimes this can take a 10 hours or even the following day.
My question is how to handle this. It seems to me at a high level I can dequeue these messages, store them in our DB and re-run them at intervals (a custom coded solution) or I could note the error in our DB and then return them back to the activemq queue with a long redelivery policy and expiration, say redeliver every 2 hours for 48 hours.
This would save a lot of code but my question is if approach 2 is a sound approach or could lead to resource issues or problems with not knowing where messages are?
This is a pretty common scenario. If you want insight into how the jobs are progressing, then it's best to use a database for this.
Your queue consumption should be really simple: consume the message, check if the customer exists; if so process, otherwise write a record in a TODO table.
Set up a separate route to run on a timer - every X minutes. It should pull out the TODO records, and for each record check if the customer exists; if so process, otherwise update the record with the current timestamp (the last time the record was retried).
This allows you to have a clear view of the state of the system, that you can then integrate into a console to see what the state of the outstanding jobs is.
There are a couple of downsides with your Option 2:
you're relying on the ActiveMQ scheduler, which uses a KahaDB variant sitting alongside your regular store, and may not be compatible with your H/A setup (you need a shared file system)
you can't see the messages themselves without scanning through the queue, which is an antipattern - using a queue as a database - you may as well use a database, especially if you can anticipate needing to ever selectively remove a particular message.
I've developed a python app that registers information from incoming emails and saves this information to the GAE Datastore. Registering the emails works just fine. As part of the registration, emails with the same subject and recipients get a conversation ID. However, sometimes emails enter the system so fast after each other, that emails from the same conversation don't get the same ID. This happens because two emails from the same conversation are being processed at the same time and GAE doesn't see the other entry yet when running a query for this conversation.
I've been thinking of a way to prevent this, and think it would be best if the system processes only one email per user at a time (each sender has his own account). This could be done by having a push task queue that first checks if there is currently an email being processed for this user, and if so, put the new task in a pull queue from which it can be retrieved as soon as the previous task has been finished.
The big disadvantage of this, is that (I think) I can't run the push queue asynchronous, which obviously is a big performance disadvantage. Any ideas on what would be a better way to setup such a process?
Apparently this was a typical race-condition. I've made use of the Transactions functionality to prevent multiple processes writing at the same time. Documentation can be found here: https://cloud.google.com/appengine/docs/python/datastore/transactions
I am trying to implement a 2-player turn-based game with a GAE backend. The first thing this game requires is a very simple match making system that operates like this:
User A asks the backend for a match. The back ends tells him to come back later
User B asks the backend for a match. He will be matched with A.
User C asks the backend for a match. The back ends tells him to come back later
User D asks the backend for a match. He will be matched with C.
and so on...
(edit: my assumption is that if I can figure this one out, most other operation i a turn based game can use the same implementation)
This can be done quite easily in Apple Gamecenter and Xbox Live, however I would rather implement this on an open and platform independent backend like GAE. After some research, I have found the following options for a GAE implementation:
use memcache. However, there is no guarantee that the memcache is synchronized across different instances. I did some tests and could actually see match request disappearing due to memcache mis-synchronization.
Harden memcache with Sharding Counters. This does not always solve the multiple instance problem and mayabe results in high memcache quota usage.
Use memcache with Compare and Set. Does not solve the multiple instance problem when used as a mutex.
task queues. I have no idea how to use these but someone mentioned as a possible solution. However, I am afraid that queues will eat me GAE quota very quickly.
push queues. Same as above.
transaction. Same as above. Also probably very expensive.
channels. Same as above. Also probably very expensive.
Given that the match making is a very basic operation in online games, I cannot be the first one encountering this. Hence my questions:
Do you know of any safe mechanism for match making?
If multiple solutions exist, which is the cheapest (in terms of GAE quota usage) solution?
You could accomplish this using a cron tasks in a scheme like this:
define MatchRequest:
requestor = db.StringProperty()
opponent = db.StringProperty(default = '')
User A asks for a match, a MatchRequest entity is created with A as the requestor and the opponent blank.
User A polls to see when the opponent field has been filled.
User B asks for a match, a MatchRequest entity is created with B as as the requestor.
User B pools to see when the opponent field has been filled.
A cron job that runs every 20 seconds? or so runs:
Grab all MatchRequest where opponent == ''
Make all appropriate matches
Put all the MatchRequests as a transaction
Now when A and B poll next they will see that they they have an opponent.
According to the GAE docs on crons free apps can have up to 20 free cron tasks. The computation required for these crons for a small amount of users should be small.
This would be a safe way but I'm not sure if it is the cheapest way. It's also pretty easy to implement.
I have a strange problem on a PeopleSoft application. It appears that integration broker messages are being processed out of order. There is another possibility, and that is that the commit is being fired asynchronously, allowing the transactions to complete out of order.
There are many inserts of detail records, followed by a trailer record which performs an update on the rows just inserted. Some of the rows are not receiving the update. This problem is sporadic, about once every 6 months, but it causes statistically significant financial reporting errors.
I am hoping that someone has had enough dealings with the internals of PeopleTools to know what it is up to, so that perhaps I can find a work around to the problem.
You don't mentioned whether you've set this or not, but you have a choice with Integration Broker. All messages flow through message channels, and a channel can either be ordered or unordered. If a channel is ordered then - if a message errors - all subsequent messages queue up behind it and will not be processed until it succeeds.
Whether a channel is ordered or not depends upon the checkbox on the message channel properties in Application Designer. From memory channels are ordered by default, but you can uncheck the box to increase throughput.
Hope this helps.
PS. As of Tools 8.49 the setup changed slightly, Channels became Queues, Messages Service Operations etc.
I heard from GSC. We had two domains on the sending end as well as two domains on the receiving end. All were active. According to them, it is possible when you have multiple domains for each of the servers to pick up some of the messages in the group, and therefore, process them asynchronously, rather than truly serially.
We are going to reduce the active servers to one, and see it it happens again, but it is so sporadic that we may never know for sure.
There are few changes happened in PSFT 9 IB so please let me know the version of your apps. Async services can work with Sync now. Message channel properties are need to set properly. Similar kind of problem, I found on www.itwisesolutions.com/PsftTraining.html website but that was more related to implementing itself.
thanks