I have around 500k e-mails stored in a azure blob storage accounts (one e-mail = one document in the blob storage). Now i would like to anaylze the content of every of these e-mail with azure cognitivie text api (https://www.microsoft.com/cognitive-services/en-us/text-analytics-api). That's working pretty good - but since i need to bulk process thousands of e-mails, i am wondering what would the be the best way to do so? Is there another azure analytics product that could help me with this? Or do i just create an azure function that's taking a document and doing the stuff?
The Text Analytics API allows you to send up to 1000 records at a time. You can submit up to 100 requests per minute.
You can see an example on how to send a small batch here. It would not be that hard to modify to send 1000 records instead of 3.
I would suggest adding some kind of delay between each batch call so that you effectively send less than 100 requests per minute.
Also, I should point out that the maximum payload of a batch request cannot be > 1MB -- so depending on the size of the emails you want to analyze, you may want to make a batch smaller than 1000 documents.
Luis Cabrera | Text Analytics PM | Microsoft Corporation
Related
I have an express API application running on GAE. It is my understanding that every time someone makes a GET request to it, it creates a log. In the Logs Explorer (Operations -> Logging -> Logs Explorer), I can filter to view only GET requests from a certain source by querying:
protoPayload.method="GET"
protoPayload.referrer="https://fakewebsite.com/"
In the top-right of the Logs Explorer, I can also select a time range of 1 day to view logs from the past 24 hours.
I want to be able to see how many GET requests the app receives from a given referrer every day. Is this possible? And is there functionality to display the daily logs in a bar chart (say, to easily visualize how many logs I get every day over the period of a week)?
You can achieve this but not directly in the Cloud Logging service. You have started well: you have created a custom filter.
Then, create a sink and save the logs to BigQuery.
So now, you can perform a query to count the GET every day, and you can build a datastudio dashboard to visualise your logs.
If only the count is needed on daily basis you can create a sink to stream data directly into the BigQuery. As the data needs to be segregated on daily basis while creating a sink , a better option would be to use a partition table which can help you in two ways:
You would have a new table everyday
Based on your usage although BigQuery provide a free tier , this data is not
needed in near future storing it this way will reduce your cost and querying cost
BigQuery comes with data studio , as soon as you query on table you'll have the option to explore the result in Data studio and generate reports as needed.
I am executing a search operation for people search using Microsoft Graph Endpoint - https://graph.microsoft.com/V1.0/users.
The question I have is - I am able to get all the textual data I need, but is there a way to get photo for each returned user in a single call?. If there are 10 users returned in the previous search, executing 10 different operations to get the photos based on each user's id would be a challenge.
It isn't possible to fetch both user's data and photo in a single call since they are different data types (application/json vs image/jpeg).
Marc is spot on here. However you should also check out the new batching feature (note this is still in /beta) which would allow you to get up to 5 photos in one request round-trip. See https://developer.microsoft.com/en-us/graph/docs/concepts/json_batching. We'd love to get your feedback on this.
I am planning to create a website for questioning and answering purpose. The website will contain the more functionality in the future. So the request will be increasing from the future point of view towards the website.
for example, the request may be like 2000 or more per second. Also the processing of request will requires some validation like user-name/password. After the request is validated, it is required save the data to the database.
I also want to know how to check the data write speed from multiple user's request from the server at a time.
Please let me know TIA...
We would like to keep Salesforce synced with data from our organization's back-end. The organizational data gets updated by nightly batch processes, so "real-time" syncing to Salesforce isn't in view. We intend to refresh Salesforce nightly, after our batch processes complete.
We will have somewhere around 1 million records in Salesforce (some are Accounts, some are Contacts, and some belong to custom objects).
We want the refresh to be efficient, so it would be nice to send only updated records to Salesforce. One thought is to use Salesforce's Bulk API to first get all records, then compare to our data, and only send updated records to Salesforce. But this might be an expensive GET.
Another thought is to just send all 1 million records through the Bulk API as upserts to Salesforce - as a "full refresh".
What we'd like to avoid is the burden/complexity of keeping track of what's in Salesforce ourselves (i.e. tables that attempt to reflect what's in Salesforce, so that we can determine the changes to send to Salesforce).
I read somewhere that the Salesforce API has a 10 request limit. If we write code to integrate with Salesforce:
1. What is the risk of this limit
2. How can we write code to negate this risk?
My real concern is that I don't want to build our customer this great standalone website that integrates with Salesforce only to have user 11 and 12 kicked out to wait until requests 1-10 are complete?
Edit:
Some more details on the specifics of the limitation can be found at http://www.salesforce.com/us/developer/docs/api/Content/implementation_considerations.htm. Look at the section titled limits.
"Limits
There is a limit on the number of queries that a user can execute concurrently. A user can have up to 10 query cursors open at a time. If 10 QueryLocator cursors are open when a client application, logged in as the same user, attempts to open a new one, then the oldest of the 10 cursors is released. This results in an error in the client application.
Multiple client applications can log in using the same username argument. However, this increases your risk of getting errors due to query limits.
If multiple client applications are logged in using the same user, they all share the same session. If one of the client applications calls logout(), it invalidates the session for all the client applications. Using a different user for each client application makes it easier to avoid these limits.*"
Not sure which limit you're referring to, but the governor limits are all listed in the Apex documentation. These limits apply to code running in a given Apex transaction (i.e. in response to a trigger/web service call etc), so adding more users won't hurt you - each transaction gets its own allocation of resources.
There are also limits on the number of long-running concurrent API requests and total API calls in a day. Most of these are per-license, so, again, as the number of users rises, so do the limits.
Few comments on:
I don't want to build our customer this great standalone website that integrates with Salesforce only to have user 11 and 12 kicked out to wait until requests 1-10 are complete?
There are two major things you need to consider when planning real-time Sfdc integration beside the api call limits mentioned in the metadaddy's answer (and if you make a lot of queries it's easy to hit these limits):
Sfdc has routine maintainance outage periods.
Querying Sfdc will always be significantly slower than a querying local datasource.
You may want to consider a local mirror of you Sfdc data where you replicate your Sfdc data.
Cheers,
Tymek
All API usage limits are calculated over 24 hours period
Limits are applicable to whole organization. So if you have several users connecting through API all of them count against the same limit.
You get 1,000 API requests per each Salesforce user. Even Unlimited Editions is actually limited to 5,000.
If you want to check your current API usage status go to Your Name |
Setup | Company Profile | Company Information
You can purchase additional API calls
You can read more at Salesforce API Limits documentation