Cloudant CDTDatastore to pull only part of the database - cloudant

We're using Cloudant as the remote database for our app. The database contains documents for each user of the app. When the app launches, we need to query the database for all the documents belonging to a user. What we found is the CDTDatastore API only allows pulling the entire database and storing it inside the app then performing the query in the local copy. The initial pulling to the local datastore takes about 10 seconds and I imagine will take longer when adding more users.
Is there a way I can save only part of the remote database to the local datastore? Or, are we using the wrong service for our app?

You can use a server side replication filter function; you'll need to add information about your filter to the pull replicator. However replication will have a performance hit when using the function.
That being said a common pattern is to use one database per user, however this has other trade offs and it is something you should read up on. There is some information on the one database per user pattern here.

Related

The best GCP architecture for exporting Bigquery data to an external application with API

I use these following GCP products together for a CRM system:
Cloud SQL
App Engine
Bigquery
Once a week an external application exports data from Bigquery in this way:
The external application makes a request to Appengine with a token.
AppEngine retrieves permissions for this token from Cloud SQL, makes some additional computation to obtain a list of allowed IDs.
Appengine runs a Bigquery's query filtered with these ids. Something like that: SELECT * FROM table WHERE id IN(ids)
Appengine responds to the external application with a unmodified result of query in JSON.
The problem is that the export is not very often, but amount of data can be large and I dont want to load AppEngine with this data. What other GCP products are useful in this case? Remember I need to retrieve permissions from Appengine and CloudSQL.
Unclear whether the JSON is just directly from BigQuery query results, or you do additional processing in the application to render/format it. I'm assuming direct results.
An option that comes to mind is to leverage cloud storage. You can use the signed url feature to provide a time-limited link to your (potential large) results without exposing public access.
This, coupled with BigQuery's ability to export results to GCS (either via an export job, or using the newer EXPORT DATA SQL statement allows you to run a query and deliver results directly to GCS.
With this, you could simply redirect the user to the signed URL at the end of your current flow. There's additional features that are complementary here, such as using GCS data lifecycle features to age out and remove files automatically so you don't need to concern yourself with slow accumulation of results.

What is the best practices for building REST API with different subscribers (companies)?

What is the best design approach in term of security, performance and maintenance for REST API that has many subscribers (companies)?
What is the best approach to use?:
Build a general API and sub APIs for each subscriber (company), when request come we check the request and forward it to the sub API using (API Key) then retrieve data to general API then to client.
Should we make single API and many databases for storing each subscribe(company) data (because each company has huge records that why we suggested to separated databases to improve performance)? when request come we verify it and change database Connection String based on client request.
Should we make one API and one big database that handle all subscribes data?
Do you suggest any new approach to solve this problem? We used Web API and MS SQL Server and Azure Cloud.
In the past I've had one API, the API is secured using OAuth/JWT in the token we have a company id. When a request comes in we read the company id from the JWT and perform a lookup in a master database, this database holds global information such a connection strings for each company. We then create a unit of work that has the company's conneciton string associated with it and any database lookups use that.
This mean that you can start with one master and one node database, when the node database starts getting overloaded you can bring up another one and either add new companies to that or move existing companies to take pressure off. Essentially you're just scaling out when the need arises.
We had no performance issues with this setup.
Depends on the transaction volume and nature of data, you can go for a single database or separate database for each company.
Option 2 would be the best , if you have complex data model
I don't see any advantage of going for option 1, because , anyway general API will call for each request.
You can use the ClientID verification while issuing access tokes.
What I understood from your question is, you want an rest API for multiple consumers(companies). Logically the employees from that company will consume your API, employees may be admin, HR etc. So what I suggested for such scenario you must go with single Rest API for providing the services to your consumers and for security you have to use OpenId on the top of OAuth 2. This resolves the authentication and authorization for you.

How to know when a put on Cloud Datastore in App Engine reaches Milestone B?

I have an application that uses Cloud Datastore via App Engine to save data.
I need to refresh the clients when an object is put on the database. To do it, after the object is put on the database, the server sends a sync message to the clients. The clients read the sync message and does a query to the server. The server does a Query to return the new result.
The problem is that when the Query is done, the put object doesn't appears on the query results. Reading the documentation, I suppose that the reason is that the put isn't on the Milestone B, see https://cloud.google.com/appengine/articles/transaction_isolation, because another later call object appears.
How can I know when a put reaches a "Milestone B"? If it isn't possible to know it, how can I do this logic (refresh clients after put)?
You can ensure up-to-date query results by using an ancestor query, or, if you know the key of the specific entity you need to retrieve, you can fetch it by key rather than using a query.
This page discusses the trade-offs of using ancestor queries.
The data do not appear in the result of your query because the indexes have not been updated yet.
There is some latency before the indexes will be updated and unfortunately there is no way to know when this will happen.
The only way to handle this case is to use the entity's key, that is the only index that guarantees to be updated as soon the entity it's stored.
https://cloud.google.com/appengine/docs/java/datastore/entities

What is the difference between a session store and database

I've been trying to implement authentication and session management in a node.js application using socket.io.
And from almost all the resources I found, I came across the term "session store".
There are open source tools that handles sessions for us, but we have to provide them with a session store.
Some tools has built in storage for sessions in memory, for example the module express-session comes with a default in memory session store, but also this warning:
Warning The default server-side session storage, MemoryStore, is purposely not designed for a production environment. It will leak memory under most conditions, does not scale past a single process, and is meant for debugging and developing.
So I searched for the available stable session stores and it turns out that most of the names are databases that I've heard of.
For example, here's a list of session stores and another one at GitHub that I've came across.
The names include MongoDB, MySQL, SQLite, cassandra, firebase etc, hence the confusion.
So the question is, are session stores and database the same..? (I can think of it like - when we're using the database for storing session details we call it session store but it's in fact a database)
If not, how do they differ..?
Session store is a place where session data is being stored on server.
On web its usually being identified by a cookie stored in clients browser.
So it allows your app to identify user and keep him logged in for example.
Session can either be memory, some database, simple files, or any other place you can come up with to store session data.
If you project uses some database, you can configure your session store to use the same database, to avoid having another database on server just for the purpose of session store.
Differences between different session stores:
Memory session store is going to be reset on every app re-lauch. Also its fastest.
Database session store, is going to be safe with app
re-lauch. And at some point you will have alot of session objects
which you might want to clean up. And same session stored in database can be even accessed from different apps.
Session store is a method of storing information about user as a session with unique identifier. It could be stored in memory or in database. Socket.io can utilize the same session (id) being used in express app by socket-express-session package, if I am not mistaken.
You can then use session information to grant/restrict access, for example.

GAE datastore -- proper ways to implement search/data retrieval in response to a user request?

I am writing a web app and I am trying to improve the performance of search/displaying results. I am relatively new to programming this sort of thing, so I apologize in advance if these are simple questions/concepts.
Right now I have a database of ~20,000 sites, each with properties, and I have a search form that (for now) just asks the database to pull all sites within a set distance (for this example, say 50km). I have put the data into an index and use the Search API to find sites.
I am noticing that the database search takes ~2-3 seconds to:
1) Search the index
2) Get a list of key names (this is stored in the search index)
3) Using key names, pull from datastore (in a loop) and extract data properties to be displayed to the user
4) Transmit data to the user via jinja template variables
This is also only getting 20 results (the default maximum for a Search API query.. I haven't implemented cursors here yet, although I will have to).
For whatever reason, it feels quite slow.. I am wondering what websites do to make the process seem faster. Do they implement some kind of "asynchronous" search, where a page loads while in the background the search/data pulls are processed, and then subsequently shown to the user...?
Are there "standard" ways of performing searches here where the processing/loading feels seamless to the user?
Thanks.
edit
Would doing something like just passing a "query ID" via the page work, and then using AJAX to get data from the datastore via JSON work? Like... can app engine redirect the user to the final page, pass in only a "query ID", and then search in the meantime, and then once the data is ready, pass the information the user via JSON?
Make sure you are getting entities from the datastore in parallel. Since you already have the key names, you just have to pass your list of keys to the appropriate method.
For db:
MyModel.get_by_key_name(key_names)
For ndb:
ndb.get_multi([ndb.Key.from_path('MyModel', key_name) for key_name in key_names])
If you needed to do datastore queries, you could enable parallel fetches with the query.run (db) and query.fetch_async (ndb) methods.

Resources