fetch data from datastore more than 2000 - google-app-engine

I tried to fetch data from a datastore. But I can fetch the limited data. What is the maximum data can be fetched from a datastore using "Query"? Also can I add 2000+ data in a datastore using a loop or any other way?. How to code to fetch 2000+ data from a datastore.

Related

Get Firestore documents from cache by default and only from Firestore if they differ from cache [duplicate]

This question already has answers here:
Firestore -- Get only the changed documents in a large synced collection
(2 answers)
Firestore, fetch only those documents from a collection which are not present in client cache
(1 answer)
Closed last month.
In my app I need to get a lot of documents everytime a user starts the app.
This documents can only be edited by the user himself and not from other users. So the only way that the documents from Firestore differ from cache is if the user did logged in on another device with his account where the newest data is not cached yet.
Now I search for a solution to get these documents from cache by default to save reads and only get the documents from Firestore again if they differ from cache.
Is this possible with Firestore?
Note: I am using Flutter for app development
This is nothing that Firestore supports by default. You could achieve this by defining one extra "status" document for each user. This document should log the latest user activity on all devices. You can then check and compare the datasets on firestore with your local cache (maybe use hashing) to determine if you need to fetch new data.

Azure Logic Apps Get Request cache result

I created a Logic App HTTP GET request that retrieve data from a weather API.
What I would like to achieve is to reduce the calls to the weather API using a cached result only for identical requests.
Example: in my company there are 300 devices that are calling the Logic Apps endpoint with the same latitude and longitude in the query. At this point, I'm assuming, that for every call the Logic App makes a call at the weather API. Instead I'd like that it calls the weather API just the first time and then, for all the identical calls, it returns the cached result.
I'm afraid that, if I use cache-control settings in the header of the request, the Logic App would return the same cached result also if the query is different (for example a different location).
Thanks.
As #Thomas said in the comments above, API Management Service is expensive than other services of App Services, like Logic Apps.
However, per my experience, I think you can pay less to implement a cache logic using a few code. For example, it's available and cheap to use Azure Table Storage to store these cache weather data. And you can fetch them via the table partition key & row key as the query parameters of datetime, latitude and longitude from Azure Table.
Here is a simple pseudocode for cache logic.
string partitionKey = "<datetime>"
string rowKey = "<latitude>-<longitude>"
data = fetchWeatherDataFromTable(partitionKey, rowKey)
if data == null {
data = getWeatherDataFromRemoteAPI()
storeWeatherDataIntoTable(partitionKey, rowKey, data)
}
return data
Also, you can use other storages like Azure SQL Database or Redis instead of Azure Table Storage. It's up to you.
Hope it helps.

Cloudant CDTDatastore to pull only part of the database

We're using Cloudant as the remote database for our app. The database contains documents for each user of the app. When the app launches, we need to query the database for all the documents belonging to a user. What we found is the CDTDatastore API only allows pulling the entire database and storing it inside the app then performing the query in the local copy. The initial pulling to the local datastore takes about 10 seconds and I imagine will take longer when adding more users.
Is there a way I can save only part of the remote database to the local datastore? Or, are we using the wrong service for our app?
You can use a server side replication filter function; you'll need to add information about your filter to the pull replicator. However replication will have a performance hit when using the function.
That being said a common pattern is to use one database per user, however this has other trade offs and it is something you should read up on. There is some information on the one database per user pattern here.

Is there a way to be notified when an entity in memcache is about to be dropped?

Is there a way to know when an entity stored in memcache is about to be dropped? For the purposes of persisting it to a more permanent datastore.
I'd be very surprised if there is, memcache is designed to be a temporary data store that caches data from another source. Normally the way memcache would be used is when you need to access the data, memcache is checked first and then if not found the persistent store is checked. On an update you would either store the updated value in memcache in addition to the persistent store, or delete the memcache value and only store it in the permanent data store, then you would have it get pulled from the permanent store on the next data retrieval call. With this paradigm, you should always expect that memcache may or may not contain your data.
As jdavidbakr pointed out, no way of being notified.
the best practice for memcache is using ndb if you're in python (as ndb already checks memcache before pushing to the datastore, so it'll hit memcache if memcache has the requested data).
if you're in a language that doesn't have ndb, I would suggest using taskqueues to insert data to the datastore (ie: "push entity to memcache and create a taskqueue to push to datastore"). For retrieve, you first look in memcache and then turn to the datastore if the entity isn't in memcache. In the event you had to visit datastore to get the data, push it back to memcache.

Recommended architecture for bulk-refreshing Salesforce?

We would like to keep Salesforce synced with data from our organization's back-end. The organizational data gets updated by nightly batch processes, so "real-time" syncing to Salesforce isn't in view. We intend to refresh Salesforce nightly, after our batch processes complete.
We will have somewhere around 1 million records in Salesforce (some are Accounts, some are Contacts, and some belong to custom objects).
We want the refresh to be efficient, so it would be nice to send only updated records to Salesforce. One thought is to use Salesforce's Bulk API to first get all records, then compare to our data, and only send updated records to Salesforce. But this might be an expensive GET.
Another thought is to just send all 1 million records through the Bulk API as upserts to Salesforce - as a "full refresh".
What we'd like to avoid is the burden/complexity of keeping track of what's in Salesforce ourselves (i.e. tables that attempt to reflect what's in Salesforce, so that we can determine the changes to send to Salesforce).

Resources