Are there limits to the `keys` parameter when querying a Cloudant view? - database

Are there any size limits to the keys parameter that can be passed when querying a Cloudant view? Or perhaps this is limited by request size?
Also, in cases where a view query may return a large set of data, are there limits to how much can be returned in a single response?

I don't believe there is a hard Cloudant limit as to how many keys you can pass but there is a limit to how much data you can supply as a GET parameter to an HTTP endpoint e.g. ?keys=["a","b","c"] - the limit is 2KB per request. There is more detail on that in this SO question 
You can make a POST equivalent request, passing keys in the body of the request. In that case you'd be limited by the maximum request body of 10MB. However, I would try to keep such requests below 2000 key items.

Related

Get the "Place in line" of records in the realtime db?

Basically I'm creating a system to manage requests from users in a Firebase Realtime Database, via an Express app loaded into Firebase Functions. The request table will be in a queue format, so FIFO. Records would be ordered chronologically by their keys, a la timecodes that are created from Post requests. I'd like to be able to tell a user their place in line, and I'd like to be able to list this place in line in my app. I expect this queue to have requests numbering in the thousands, so iterating up to the entire length of the queue every time a client requests its place in line seems unattractive.
I've thought of doing a query for the user's UID (which would be saved in each request, naturally), but I can't figure out how to structure that query while maintaining the chronological order of the requests. Something like requestsReference.orderByKey().endAt({UID}, key: "requestorUid") doesn't seem like it'd work from what I'm seeing in the docs; but if I pulled off a query like that then I'd be able to get the place in line just from the length of the returned object. It's worth saying now that I have no idea how efficient this would be compared to just iterating the entire queue in its original chronological order.
I've also thought of taking an arithmetic approach, basically adding the "place in line when requested" and "total fulfillments when requested" as data in the request records. So then I'd be able to retrieve a record by its UID, and determine the place in line via placeInLineAtRequestTime - (totalCurrentFulfillments - totalFulfillmentsAtRequestTime). It'd be a rough approach, and I'd need to fetch the entire fulfillments table in order to get the current count. So again, I'm not sure how this compares.
Anyway, any thoughts? Am I missing some real easy way I could do this? Or would iterating it be cheaper than I think it'd be?

Fetch large JSON from Datastore

I've created an API on Google Cloud Endpoints that getting all datas from a single entity in Datastore. The NoSQL request (A really simple one : Select * from Entity) is performed with Objectify.
This datastore entity is populated with 200 rows (entities) and each row (entity) has a list of children entities of same kind :
MEAL:
String title
int preparationTime
List< Ingredient > listOfIngredients (child entities...)
...
So when I fetch API, a JSON is returned. It's size is about 641Ko and it has 17K lines.
When I look at the API explorer, it tells me that request takes 4 seconds to execute :
I would like to decrease that time, because it's a really high one... I've already :
Increase GAE instance to F2
Enable Memcache
It helps a little but I don't think this is the best efficient way...
Should I use Big Query to generate the JSON file faster ? Or maybe there is another solution ?
Do you need all the entity in a single request ?
if Not, then you can batch fetch entities using Cursor Queries and display as per your need, say for eg: fetch 20 or 30 entities at a time depending on your need.
If Yes,
Does your meal entity changes often
If No, you can generate a json file and store it in GCS, and whenever your entity changes you can update the json file, so that on the client end fetching will be lot faster and using etag header, new content can be pulled easily
If Yes,
then i think batch fetching is only effective way to pull those many entities

What cloudant requests are considered single API Calls?

What is considered to be a single API call to Cloudant?
To my understanding these are all single api calls respectively?:
Get a single document
Insert/update a document
Use the getAllDocuments function to retrieve all documents
Get all documents by using a view.
Insert documents by sending all at the same time (Bulk update)
Perform a search query with a search index
Download an attachment from a cloudant document.
Could you say that which ever function / rest request you are making to Cloudant it is considered as a single API call no matter how much data / how many documents that are transferred as the response?
You are correct. Each of the above actions can be performed with single API calls. Let's deal with each in turn:
Get a single document - GET /db/:id
Insert/update a document - PUT /db/:id
Retrieve all documents - GET /db/_all_docs
Using a view - GET /db/_design/mydesigndoc/_view/myview - Although views can be used to return a selection of documents (with startkey/endkey parameters) or to aggregate the data (by using a 'reduce' operation and optionally grouping by keys)
Bulk insert/update/delete - POST /db/_bulk_docs
Cloudant Query - POST /db/_find
Get attachment - GET /db/:id/:attachmentname
As a rule of thumb, limit your calls to _bulk_docs to batches of around 500. You can retrieve lots of data from views or _all_docs: Cloudant will happily spool you all the data it has. More commonly, views (or the primary index that powers _all_docs) can be used to retrieve sub-sets of the data by passing startkey/endkey parameters, or supplying skip/limit parameters.

How to fetch thousands of data from database without getting slow down?

I want auto search option in textbox and data is fetching from database. I have thousands of data in my database table (almost 8-10000 rows). I know how to achieve this but as I am fetching thousands of data, it will take a lot of time to fetch. How to achieve this without getting slow down? Should I follow any other methodology to achieve this apart from simple fetching methods? I am using Oracle SQL Developer for database.
Besides the obvious solutions involving indexes and caching, if this is web technology and depending on your tool you can sometimes set a minimum length before the server call is made. Here is a jquery UI example: https://api.jqueryui.com/autocomplete/#option-minLength
"The minimum number of characters a user must type before a search is performed. Zero is useful for local data with just a few items, but a higher value should be used when a single character search could match a few thousand items."
It depends on your web interface, but you can use two tecniques:
Paginate your data: if your requirements are to accept empty values and to show all the results load them in block of a predefined size. goggle for example paginates search results. On Oracle pagination is made using the rownum special variable (see this response). Beware: you must first issue a query with a order by and then enclose it in a new one that use rownum. Other databases that use the limit keyword behave in a different way. If you apply the pagination techique to a drop down you end up with an infinite scroll (see this response for example)
Limit you data imposing some filter that limits the number of rows returned; your search display some results only after the user typed at least n chars in the field
You can combine 1 & 2, but unless you find an existing web component (a jquery one for example) it may be a difficult task if you don't have a Javascript knowledge.

mapping encoded keys to shorter identifiers in appengine

I want to send unique references to the client so that they client can refer back to specific objects. The encoded keys appengine provides are sometimes 50 bytes long, and I probably only need two or three bytes (I could hope to need four or five, but that won't be for a while!).
Sending the larger keys is actually prohibitively expensive, since I might be sending 400 references at a time.
So, I want to map these long keys to much shorter keys. An obvious solution is to store a mapping in the datastore, but then when I'm sending 400 objects I'm doing 400 additional queries, right? Maybe I mitigate the expense by keeping copies of the mappings in memcache as well. Is there a better way?
Can I just yank the number out of the unencoded keys that appengine creates and use that? I only need whatever id I use to be unique per entity kind, not across the whole app.
Thanks,
Riley
Datastore keys include extra information you don't need - like the app ID. So you definitely do not need to send the entire keys.
If these references are to a particular Kind in your datastore, then you can do even better and just send the key_name or numeric ID (whichever your keys use). If the latter is the case, then you could transmit each key with just a few bytes (you could opt for either a variable-length or fixed-length integer encoding depending on which would be more compact for your specific case [probably the former until most of the IDs you're sending get quite large]).
When you receive these partial keys back from the user, it should be easy to reconstruct the full key which you need to retrieve the entities from the datastore. If you are using the Python runtime, you could use db.Key.from_path(kind_name, numeric_id_or_key_name).
A scheme like this should be both simpler and (a lot) faster than trying to use the datastore/memcache to store a custom mapping.
You don't need a custom mapping mechanism. Just use entity key names to store your short identifier :
entity = MyKind(key_name=your_short_id)
entity.put()
Then you can fetch these short identitiers in one query :
keys = MyKind.all(keys_only=True).filter(...).fetch(400)
short_ids = [key.name() for key in keys]
Finally, use MyKind.get_by_key_name(short_id) in order to retrieve entities from identifiers sent back by your users.

Resources