CouchBase view get for multiple ranges - database

I'm evaluating CouchBase for an application, and trying to figure out something about range queries on views. I know I can do a view get for a single key, multiple keys, or a range. Can I do a get for multiple ranges? i.e. I want to retrieve items with view key 0-10, 50-100, 5238-81902. I might simultaneously need 100 different ranges, so having to make 100 requests to the database seems like a lot of overhead.

As far as I know in couchbase there is no way to implement getting values from multiple ranges with one view. May be there are (or will be implemented in future) some features in Couchbase N1QL, but I didn't work with it.
Answering your question 100 requests will not be a big overhead. Couchbase is quiet fast and it's designed to handle a lot of operations per second. Also, if your view is correctly designed, it will not be "recalculated" on each query.
Also there is another way:
1. Determine minimum and maximum value of your range (it will be 0..81902 according to your example)
2. Query view that will return only document ids and a value that range was based on, without including all docs in result.
3. On client side filter array of results from previous step according to your ranges (0-10, 50-100, 5238-81902)
and then use getMulti with document ids that left in array.
I don't know your data structure, so you can try both ways, test them and choose the best one that will fit your demands.

Related

Perform operations directly on database (esp. Firestore)

Just a question regarding NoSQL DB. As far as I know, operations are done by the app/website outside the DB. For instance, if I need to add an value to a list, I need to
download the intial list
add the new value in the list on my device
upload the whole updated list.
At the end, a lot of data is travelling (twice the initial list) with no added value.
Is there any way to request directly the DB for simple operations like this?
db.collection("collection_key").document("document_key").add("mylist", value)
Or simply increment a field?
Same for knowing the number of documents in a collection: is it needed to download the whole set of document to get the number ?
Couple different answers:
In Firestore, many intrinsic operations can be done "FieldValues", such as increment/decrement (by supplied value, so really Add/subtract). Also array unions, field deletes, etc. Just search the documentation for FieldValue. Whether this is true for NoSQL in general, I can't say.
Knowing the number of documents, on the other hand. is not trivially done in Firestore - but frankly, I can't think of any situations other than artificially contrived examples where you would need to know. Easy enough to setup ways to "count" documents as you create/delete them, and keep that separately, if for some reason you find yourself needing it.
Or were you just trying to generically put down NoSQL as a concept?

GAE NDB Sorting a multiquery with cursors

In my GAE app I'm doing a query which has to be ordered by date. The query has to containt an IN filter, but this is resulting in the following error:
BadArgumentError: _MultiQuery with cursors requires __key__ order
Now I've read through other SO question (like this one), which suggest to change to sorting by key (as the error also points out). The problem is however that the query then becomes useless for its purpose. It needs to be sorted by date. What would be suggested ways to achieve this?
The Cloud Datastore server doesn't support IN. The NDB client library effectively fakes this functionality by splitting a query with IN into multiple single queries with equality operators. It then merges the results on the client side.
Since the same entity could be returned in 1 or more of these single queries, merging these values becomes computationally silly*, unless you are ordering by the Key**.
Related, you should read into underlying caveats/limitations on cursors to get a better understanding:
Because the NOT_EQUAL and IN operators are implemented with multiple queries, queries that use them do not support cursors, nor do composite queries constructed with the CompositeFilterOperator.or method.
Cursors don't always work as expected with a query that uses an inequality filter or a sort order on a property with multiple values. The de-duplication logic for such multiple-valued properties does not persist between retrievals, possibly causing the same result to be returned more than once.
If the list of values used in IN is a static list rather than determined at runtime, a work around is to compute this as an indexed Boolean field when you write the Entity. This allows you to use a single equality filter. For example, if you have a bug tracker and you want to see a list of open issues, you might use a IN('new', 'open', 'assigned') restriction on your query. Alternatively, you could set a property called is_open to True instead, so you no longer need the IN condition.
* Computationally silly: Requires doing a linear scan over an unbounded number of preceding values to determine if the current retrieved Entity is a duplicate or not. Also known as conceptually not compatible with Cursors.
** Key works because we can alternate between different single queries retrieving the next set of values and not have to worry about doing a linear scan over the entire proceeding result set. This gives us a bounded data set to work with.

How to fetch thousands of data from database without getting slow down?

I want auto search option in textbox and data is fetching from database. I have thousands of data in my database table (almost 8-10000 rows). I know how to achieve this but as I am fetching thousands of data, it will take a lot of time to fetch. How to achieve this without getting slow down? Should I follow any other methodology to achieve this apart from simple fetching methods? I am using Oracle SQL Developer for database.
Besides the obvious solutions involving indexes and caching, if this is web technology and depending on your tool you can sometimes set a minimum length before the server call is made. Here is a jquery UI example: https://api.jqueryui.com/autocomplete/#option-minLength
"The minimum number of characters a user must type before a search is performed. Zero is useful for local data with just a few items, but a higher value should be used when a single character search could match a few thousand items."
It depends on your web interface, but you can use two tecniques:
Paginate your data: if your requirements are to accept empty values and to show all the results load them in block of a predefined size. goggle for example paginates search results. On Oracle pagination is made using the rownum special variable (see this response). Beware: you must first issue a query with a order by and then enclose it in a new one that use rownum. Other databases that use the limit keyword behave in a different way. If you apply the pagination techique to a drop down you end up with an infinite scroll (see this response for example)
Limit you data imposing some filter that limits the number of rows returned; your search display some results only after the user typed at least n chars in the field
You can combine 1 & 2, but unless you find an existing web component (a jquery one for example) it may be a difficult task if you don't have a Javascript knowledge.

Too many columns in a single preference db table?

I have an application that is essentially built out of many smaller applications. Each application has their own individual preferences, but all of them share the same 5 preferences, for example, whether the application is displayed in the nav, whether it is public, whether reports should be generated, etc.
All of these common preferences need to be known by any page in the web app because the navigation is constructed from it. So originally I put all these preferences in a single table. However as the number of applications grow (10 now, eventually around 30), the number of columns will end up being around 150-200 total. Most of these columns are just booleans, but it still worries me having that many columns in one table. On the other hand, if I were to split them apart into separate tables (preferences per app), I'd have to join them all together anyway every time I need to see the preferences, so why not just leave them all together?
In the application I can break the preferences into smaller objects so they are easier to work with, but from a db perspective they are a single entity. Is it better to leave them in one giant table, or break them apart into smaller ones but force many joins every time they are requested?
Which database engine are you using ? normally you will find some recommendations about recommended number of columns per table in your DB engine. Mostly Row size limitations, which should keep you safe.
Other options and suggestions include:
Assign a bit per config key in an integer, and use the logical "AND" operation to show only the key you are interested in at a given point in time. Single value read from DB, one quick Logical operation for each read of a config key.
Caching the preferences in memory, less round trips to DB servers, Based on frequency of changes , you may also having to clear the cache of each preference when it is updated.
Why not turn the columns into rows and use something like this:
This is a typical approach for maintaining lists of settings values.
The APP_SETTING table contains the value of the setting. The SETTING table gives you the context of what the setting is.
There are ways of extending this to add information such as which settings apply to which applications and whether or not the possible values for a particular setting are constrained to a specific list.
Well CommonPreferences and ApplicationPreferences would certainly make sense, and perhaps even segregating them in code (two queries instead of a join).
After that a table per application will make more sense.
Another way is going down the route suggested By Joel Brown.
A third would be instead of having individual colums or row per setting, you stuff all the non-common ones in to an xml snippet or serialise from a preferences class.
Which decision you make revolves around how your application does (or could use the data).
If you go down the settings table approach getting application settings as a row will be 'erm painful. Go down the xml snippet route and querying for a setting across applications will be even more painful than several joins.
No way to say what you should compromise on from here. I think I'd go for CommonPreferences first and see where I was at after that.

Indexes for google app engine data models

We have many years of weather data that we need to build a reporting app on. Weather data has many fields of different types e.g. city, state, country, zipcode, latitude, longitude, temperature (hi/lo), temperature (avg), preciptation, wind speed, date etc. etc.
Our reports require that we choose combinations of these fields then sort, search and filter on them e.g.
WeatherData.all().filter('avg_temp =',20).filter('city','palo alto').filter('hi_temp',30).order('date').fetch(100)
or
WeatherData.all().filter('lo_temp =',20).filter('city','palo alto').filter('hi_temp',30).order('date').fetch(100)
May be easy to see that these queries require different indexes. May also be obvious that the 200 index limit can be crossed very very easily with any such data model where a combination of fields will be used to filter, sort and search entities. Finally, the number of entities in such a data model can obviously run into millions considering that there are many cities and we could do hourly data instead of daily.
Can anyone recommend a way to model this data which allows for all the queries to still be run, at the same time staying well under the 200 index limit? The write-cost in this model is not as big a deal but we need super fast reads.
Your best option is to rely on the built-in support for merge join queries, which can satisfy these queries without an index per combination. All you need to do is define one index per field you want to filter on and sort order (if that's always date, then you're down to one index per field). See this part of the docs for details.
I know it seems counter-intuitive but you can use a full-text search system that supports categories (properties/whatever) to do something like this as long as you are primarily using equality filters. There are ways to get inequality filters to work but they are often limited. The faceting features can be useful too.
The upcoming Google Search API
IndexTank is the service I currently use
EDIT:
Yup, this is totally a hackish solution. The documents I am using it for are already in my search index and I am almost always also filtering on search terms.

Resources