Gmail API all messages - request

I need to get all messages in Inbox with gmail api. But I see only one way to do it.
Get list of messages(id, threadID)
GET https://www.googleapis.com/gmail/v1/users/somebody%40gmail.com/messages?labelIds=INBOX&key={YOUR_API_KEY}
With id`s get all messages in loop
While
GET https://www.googleapis.com/gmail/v1/users/somebody%40gmail.com/messages/147199d21bbaf5a5?key={YOUR_API_KEY}
End of While
But for this way needed 100500 request.
Have anybody idea how to get with one request all messages(or just payload field)?

Use batch and request 100 messages at a time. You will need to make 1000 requests but the good news is that's quite fine and it'll be easier for everyone (no downloading 1GB response in a single request!).
Documented at:
https://developers.google.com/gmail/api/guides/batch
There's a few other people that have asked about batching Gmail Api here on Stack Overflow, so just do a quick search to find answers and examples.

The approach you are doing is correct, as there is no 'GetAll' API to download them
Reasons include:
Unbounded Result Sets
Pulling out an unlimited amount of emails (aka unbounded result set) is a resource hog on Google servers. Did you want the attachments AND images? These could be gigabytes of data.
Network Problems
Google has to read gigabytes form disk, store it in memory and send them over the internet. Google's server would handle it, but the bandwidth of internet connectivity would not work. Worst of all, if you issue this request again and again, you could perform a DDoS attack on Google.
Security Risk
If someone gains an API key of another user, they could download their entire mailbox. Hence Google provide paging to ensure that they can provide a securer service and reduce resource contention.
Therefore, it is there to protect you and other users, and themselves.

Related

Is it safe to cache Gmail API message results?

I am iterating over a Gmail account with many thousands of messages. To save resources I am caching the results of users.messages.get. Is it safe to cache this data indefinitely? Will the data it returns ever change? I assume that it will not but so far I am unable to find anything definitive in the docs or otherwise to confirm this.
It can change - though it might not necessarily change in a way that you care about.
The API itself lets you change the labels on a Message and delete messages.

Method to find API call usage

I am operating in a production environment with a number of different applications using the Amazon API. Of these, some are our own home-grown apps, and others are 3rd party shipping applications.
I have a situation where I am hitting an hourly throttle for the Reports API 'GetReport' request, and I am trying to determine what is causing us to be throttled. By my count, we shouldn't be exceeding ~60 calls per hour at the absolute maximum. (Just a note, while API info says this function call throttles at 60 requests per hour, the exception I received back indicated a cap of 120 requests per hour. Maybe the exception is wrong, and I'm hitting a 60 request cap?)
Is there either an API call to determine current call usage, or a method of accessing this information via Amazon Seller Central / Developers Program? I've done some searching around but everything I can find is describing how the throttling works which isn't my problem.
I am currently using C# Amazon MWS libraries for all function calls, although that information is a bit superfluous. Any insight into the proper API call to use, or how to gain access to this information would be greatly appreciated.
In the response to most calls you get back something like the following in the response.
"x-mws-quota-max"=>"60.0",
"x-mws-quota-remaining"=>"51.0",
"x-mws-quota-resetsOn"=>"2016-03-25T16:00:00.000Z"
You should be able to use this to figure out what is causing you to hit the limit quicker than expected. Perhaps logging out the call and the response with the data above??
Contact MWS Support here and ask for clarification on your issue. They surely know of your usage in order to be able to cap it. I met with the MWS team a few months ago in Detroit and they said any time you have a technical question to ask them. They've been really helpful to me.

Best practices to limit the number of calls to Mirror API

I, like everyone else I imagine, have a courtesy limit of 1000 Mirror API calls per day.
I see there's a batching facility that looks promising, but it appears to be able to batch only requests for a single credential. So even one customer, pushing to the API every 60 seconds will be 1440 requests/day. Ideally, 30 seconds is where I'd like to be. 2880 requests/day would be multiplied by the number of customers. It will get really big really fast.
I might be missing something, but I don't see a way around that.
If it were available I could glom all updates across all clients in the 30 second period into one giant message...
Is there a better design pattern to keep cards up-to-date with telemetry that's changing in real-time?
You can send requests to multiple users with a single batch request: instead of setting the Authorization header in the batch request, simply set the Authorization header in each sub-request.
Our Python and Java Quick Start projects have an example of using batch request to send an update to up to 10 users. This is also mentioned in the Building Glass Services with the Google Mirror API I/O session.
Otherwise, you can check the protocol documentation in our reference guide.
As Scarygami mentioned, each sub-request will consume quota so the only optimization is to save on bandwidth and HTTP requests, especially if using gzip encoding.

Is there a way to make more than 10K requests on Google search from the same IP?

I am currently working to an app that requires to scrape data from Google's search results. For example google.com/search?q=domain.com and so on. But Google blocks my IP address after making some requests. I know there are Google APIs, but there are many sites around that just scrape the data directly.
Scraping Google search results is a breech of the terms-of-service. Google actively discourages such and blocks those who do. They share their information with you free of charge but they don't appreciate you trying to get a copy of all of it.
Better to do your own crawling of the domain.
Too bad I did not see your question earlier, if it's not too late:
Scraping Google does indeed violate their terms of service, on the other hand you may choose not to accept them. You would accept their TOS when you create a Google account for example but as far as I know you can also reject the acceptance again (at least when they change them).
For a smaller amount of data you can use their API or also their commercial API but if you need the results and ranks exactly as a user will see them (SEO purposes) I know no official way to get their permission.
I am not a lawyer, so you might want to consult one if you want to make sure about legal consequences.
However, scraping Google usually does not lead to any legal problems. I remember that even Bing (Microsofts engine) got caught scraping Google for unknown keywords. That happened a few years ago. My personal guess is that the majority of their original results were copied from Google in secret.
There is an open source project http://google-rank-checker.squabbel.com which does work to scrape large amounts of Google results. As far as I remember, without modification it is limited to about 50-70k resultpages per day.
I suggest to take a look at the code, it's PHP with libcURL.
You will need proper IP addresses (not shared, not previously abused) as well. Scraping with a single IP will result in getting blocked by Google within an hour.
Usually the first thing that happens is a captcha, by solving the captcha you generate a cookie which allows you to keep making requests.
If you continue you will get a complete ban.
And if you "hammer" Google with a huge amount of requests you will alert their staff and they can put a manual ban on the whole ISP or network block.
A proper amount is around 10 requests per hour with an IP, that's what I have been sticking to on my related projects.
So if someone scrapes Google, make sure you have functions which validate the results and watch for unexpected returns. In such a case your code should immediately stop accessing Google to prevent further accessing a page which is just showing a captcha.

working with new channel creation limits

Google app engine seems to have recently made a huge decrease in free quotas for channel creation from 8640 to 100 per day. I would appreciate some suggestions for optimizing channel creation, for a hobby project where I am unwilling to use the paid plans.
It is specifically mentioned in the docs that there can be only one client per channel ID. It would help if there were a way around this, even if it were only for multiple clients on one computer (such as multiple tabs)
It occurred to me I might be able to simulate channel functionality by repeatedly sending XHR requests to the server to check for new messages, therefore bypassing limits. However, I fear this method might be too slow. Are there any existing libraries that work on this principle?
One Client per Channel
There's not an easy way around the one client per channel ID limitation, unfortunately. We actually allow two, but this is to handle the case where a user refreshes his page, not for actual fan-out.
That said, you could certainly implement your own workaround for this. One trick I've seen is to use cookies to communicate between browser tabs. Then you can elect one tab the "owner" of the channel and fan out data via cookies. See this question for info on how to implement the inter-tab communication: Javascript communication between browser tabs/windows
Polling vs. Channel
You could poll instead of using the Channel API if you're willing to accept some performance trade-offs. Channel API deliver speed is on the order of 100-200ms; if you could accept 500ms average then you could poll every second. Depending on the type of data you're sending, and how much you can fit in memcache, this might be a workable solution. My guess is your biggest problem is going to be instance-hours.
For example, if you have, say, 100 clients you'll be looking at 100qps. You should experiment and see if you can serve 100 requests in a second for the data you need to serve without spinning up a second instance. If not, keep increasing your latency (ie., decreasing your polling frequency) until you get to 1 instance able to serve your requests.
Hope that helps.

Resources