Javamail: how to fetch messages modified since a specific date - jakarta-mail

I'm writing an application that synchronize emails (inbox only) from an IMAP server. For that I'm using javamail and I have performance issue. When I want to refresh my emails, I fetch all messages in inbox, and it takes several minutes. :(
So I would like to fetch only messages that were modified since the last time I refreshed emails. I found how to fetch messages received or sent since a date, but what I want to do is slightly different. It is possible to change the state of a very old message (unread to read). In this case the modification date is recent but received or send date is old.
Any idea ?
Regards,
Quentin

You can't change the content of an old message, but you can change the flags. You can fetch all the flags for all messages and compare them with your cached copy of the flags. (There are IMAP extensions that help with this, but many IMAP servers don't support them and JavaMail doesn't support them.)
Use the Folder.fetch method to fetch all the flags in one operation, then iterate over the Message objects and compare the flags.

Related

How to transfer messages, each at a specific time?

I have about 10K messages in a CSV file. Each message has an associated timestamp for it. When the time is reached, I want the message delivered to an MQ. The timestamps are not uniformly spread. Is this possible with Apache Camel?
As far as I know Apache Camel by default has no consumer endpoint components that you could configure to trigger with specific messages at specified times.
There is however a timer component that you can setup to trigger for example once a second. Then in the route you could use a processor to check if a list contains any messages that should be send at the given time and send them to the MQ.
You could also trigger route from java code using a ProducerTemplate that you can create using the CamelContext.
The list could be populated using your csv file and ordered by timestamp so you could use it like a stack and only check the first few entries instead of going through all 10K every second.
The real problem here would be persistence i.e to figure out which of the messages listed on the csv have already been sent if the application closes before all 10K messages have been sent.

Q: Gmail Api Returning Emails With InternalDates In the Future

I am attempting to use the Gmail api to synchronize all the email's from a user's Gmail inbox. I am using the Partial Synchronization technique described in Gmail's "Synchronizing Clients" [1] documentation. One of the listed limitations of this is that in rare cases the historyId of certain emails are unavailable. Under these circumstances, it is advised that the client fall back on using "Full Synchronization", which states that the client should "retrieve and store as many of the most recent messages or threads as are necessary for your purpose".
This all makes sense. When I have issues with Partial Synchronization, I attempt to look through an inboxes messages by time range. To do this, I effectively store a record of the ( emailAddress, historyId, internalDate ) of each email I sync and then when falling back on Full Synchronization I attempt to sync all email since the most recent internalDate that I have already synced.
My issue is that the cases that seem to cause partial synchronization to fail also seem to cause Full Synchronization to fail, and many of these cases are caused by emails with internalDates in the future (I can't share these examples for privacy reasons). The failure case seems to be something like the following
I sync email E with historyId H and an internalDate I some time in the future
Some time passes
I receive a push notification from google indicating that their are new emails to sync
I lookup the most recent message that I have syncecd for this inboxId, finding email E
I attempt a partial sync using the listHistory [2] endpoint with historyId H
The listHistory request fails with a 404
I attempt a full sync using the listMessages [3] endpoint using the query newer_than:{hours_since-internalDate-I}, but this request doesn't make any sense since the internalDate of this message is in the future.
I can imagine a few different solutions to this problem. Perhaps I should simply ignore these emails as spam, or perhaps I should store a timestamp of when I synced each email and then perform a Full Synchronization on the timestamp I have stored.
Either way, this seems like a bug in the Gmail API, as the internalDate should really be when Gmail received the email. I initially suspected that this might be caused by Gmail's new schedule feature and that the internalDate might be when the email was scheduled in the future, but I confirmed that some of the examples I have are definitely for emails that the user's inbox received, not sent. Really not sure what to make of this edge case within the internalDate api.
So my question is, what is the advised way to handle bogus future internalDates? And is it a bug?
https://developers.google.com/gmail/api/guides/sync
https://developers.google.com/gmail/api/v1/reference/users/history/list
https://developers.google.com/gmail/api/v1/reference/users/messages/list
If you're sure this is a bug, you can head to Google's Issue Tracker (template here) and report it so their engineering team can take a look and see what is causing this error. Alternatively if this persists with other mails or users, you can open a support ticket directly with them by going to your admin dashboard and selecting 'Contact Support' in the ? menu in the top right. This way Google can take a look into the erroneous internalDates without the need for you to post any potentially sensitive data in a public forum.
In the mean time you can workaround this dynamically by making sure that you don't fetch mails with a time in the future (psuedo-code):
var now = new Date().getTime()
var q = "newer_than:1h before:" + now
GmailServiceConnect.Users.messages.list(userId = "user#domain.ext", q = q).execute()
But remember that Gmail uses milliseconds for Unix time not seconds so this will have to be adjusted accordingly.

Does PubSub auto-generate timestamps?

According to here: https://cloud.google.com/pubsub/docs/reference/rest/v1/PubsubMessage
The timestamp field should not be populated when a publisher sends a message to the queue. So this leads me to think PubSub attaches a timestamp automatically whenever it gets a message from a publisher.
Is this correct?
Yes, you got that correctly. This is what is implied with the following sentence (from the docs you linked):
publishTime: The time at which the message was published, populated by
the server when it receives the topics.publish call.
You can test that yourself: if you go to the Explorer's API and publish to a topic using the pubsub.projects.topics.publish method, without giving a publishTime and then you pull using a subscription from that same topic (pubsub.projects.subscriptions.pull), the pulled message will have a publishTime.
Now, there is also a sentence in the docs regarding publishTime which seems a bit unclear to me:
It must not be populated by the publisher in a topics.publish call.
If you actually try to add a (correctly formatted) publishTime in your publish call, you will not get an error. Still, the actual publishTime attached to the message that you later pull is the one provided by the pub/sub service (i.e. the publish time you gave will simply be ignored).
They do generate using the publishTime. However on cases where there are multiple publishers publishing to Pub-Sub, you could attach a timestamp to every event in the publisher.
https://cloud.google.com/pubsub/docs/ordering

How to request read receipt with Google App Engine

I have a GAE application which sends out email to my domain users in a Google Apps for Business environment. I am using JavaMail as described in this article. Unfortunately I can't seem to find a way to ask for a read receipt. I looked at Message methods but nothing seems to suggest that it is possible. Thanks a lot.
If you're interested in knowing if a mail bounced, then use bounce notification https://developers.google.com/appengine/docs/java/mail/bounce
For read receipts:
As far as I'm aware, you need to roll your own read receipt functionality. For example: Include an image(with a unique url) in the mail you send out. When the recipient opens the mail, the image is retrieved and you can determine whether the mail has been read. This has it's downsides; if they don't have images enabled, then you won't receive the notification.
You need to set the appropriate headers on your message, as described in Message Disposition Notification - RFC 3798. Not all mailers will honor MDNs, so you might find the tracking pixel useful as well. But then some mailers won't display remote images, so in the end there's no guaranteed way of getting notified when a message is read.

How to get unseen/unread emails in the mailbox other than mail search technique

I am working on a mail client application which syncs emails for a GMail account using the IMAP c-client library.
How can I get the most recent unseen/unread e-mails in the mailbox without blindly searching for all unread e-mails?
A mail search needs to pull all unread e-mails on every sync to the client but it is quite an expensive operation to perform on every sync.
Is there a better aproach to communicate to the client any unread e-mails which were not sync'd on the previous interaction with the server?
Thunderbird, for example, is able to sync unseen e-mails with some mechanism (possibly by doing a blind search for all unseen e-mails) as the IDLE command won't notify
the client about them.
Is there some mechanism that can tell the client about unread e-mails that have appeared since the last sync?
There is an IMAP extension for Quick Flag Changes Resynchronization (RFC-4551). With this extension it is possible to search for all messages that have been changed since the last synchronization (based on some kind of timestamp). However, this extension is not widely supported - in particular not by gmail's IMAP server:
* CAPABILITY IMAP4rev1 UNSELECT IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 UIDPLUS COMPRESS=DEFLATE
There is an informational RFC that describes how IMAP clients should do synchronization (RFC-4549, section 4.3). The text recommends the issuing the following two commands:
tag1 UID FETCH <lastseenuid+1>:* <descriptors>
tag2 UID FETCH 1:<lastseenuid> FLAGS
The first command is used to fetch the required information for all unknown mails (without knowing how many mails there are). The second command is used to synchronize the flags for the already seen mails.
This method is widely used. Therefore, many IMAP servers contain an optimization in order to provide this information quickly. Typically, the network bandwidth is the limiting factor.
If you are only interested in the UNSEEN flag, a UID SEARCH is probably the best you can do.

Resources