How to get distinct results from a Mongoid criteria? - mongoid

I am very frustrated as I try to query the results of a Mongoid criteria and keep only the documents where a field is distinct. And doing this:
Books.all.distinct(:name)
..only returns the name fields, not the documents.
Also using the uniq loop as stated in another question here does not work for me.
Books.all.uniq{|x| x.name} # Returns non-unique results
What am I missing here?

OP, your problem is that you want every book with a unique name.
This issue with this is that lets say you have 98 unique books, and 2 books with the same name
If you ask your database: "Give me every uniquely named book" It will find the first 98 books, then it will run into the last two.
Which of the two books with the same name should it return? Since there's no right answer to this question given the level of detail, something like a hypothetical .uniq does not make sense.

I'm not sure I fully understand what you are attempting to achieve? Does the 'name' field in your database have a unique constraint on it?
If so, you are simply retrieving all of the books names, to retrieve the books themselves you would call the base object.
If not, there will be multiple books for each name, which doesn't make sense to grab with distinct. Perhaps what you're looking for is a groupby function? To group all of the books with the same name you can call Books.all.group_by{|book| book.name}, but as this is run against the web server rather than at a database level it will be very slow for any reasonable amount of records.
Your best bet is probably to do one of the following:
Use map/reduce as mentioned in the answer to this question: Mongoid Group By or MongoDb group by in rails
Look into the new aggregation framework released in 2.2 - see http://docs.mongodb.org/manual/applications/aggregation/ NOTE: At the time of writing this is not supported by Mongoid, and will take some work to get going

Related

How to use CREATE STATISTIC object in PostgreSQL?

I am trying to make different STATISTIC objects using different attributes in a database.
Problem - 1
My aim is to find the error in selectivity by choosing different attribute combinations. I wanted to compare this results with some other experiments. Here is what I have done,
I have made each of the attribute combinations (nC1, nC2, ..., nC_len_of_attributes). One attribute combination, two attribute combination etc. For example (name), (name, age), (name, age, zip), (age, zip), etc.
Made STATISTIC objects for each of the combinations using the command CREATE STATISTICS <name> on <one_attrib_combination> from <table_name>
I ran ANALYSE on the table, <table_name>.
Now I want to run a set of queries on each of this STATISTIC objects and get selectivity for each of the STATISTIC objects.
How can I go about this problem? I am using PostgreSQL 10. Any ideas?
Problem - 2
The second problem is, I wanted to know the size of each of these STATISTIC objects? How can I find the size of each of the unique STATISTIC objects that I have created before?
Thanks in advance for answering my queries.
Purpose for STATISTICS is different. You can create extended stats, so that planner can be aware about relations between columns, functions and so on. That way DBA can provide better, dynamic hint for the planner. Docs for CREATE STATISTIC has nice explanation for that.
To see information about that object there is dedicated catalog pg_statistic_ext.
To get something you can use explain analyze, but I would say - this is dead-end, and choose other path... Sorry for bad news.

Creating Index in Cloudant

Scenario.
I have a document in the database which has thousands of item in
'productList' as below.
here
All the object in array 'productList' has the same shape and same fields with different values.
Now I want to search in the following way.
when a user writes 'c' against 'Ingrediants' field, the list will show all 'Ingrediants' start with alphabet 'c'.
when a user write 'A' against 'brandName' field, the list will show
all 'brandName' start with alphabet 'A'.
please give an example using this to search for it, either it is by
creating an index(json,text).
creating a Search index (design document) or
using views etc
Note: I don't want to create an index at run-time(I mean index could be defined by Cloudant dashboard) I just want to query it, by this library in the application.
I have read the documentation's, I got the concepts.
Now, I want to implement it with the best approach.
I will use this approach to handle all such scenarios in future.
Sorry if the question is stupid :)
thanks.
CouchDB isn't designed to do exactly what you're asking. You'd need one index for Ingredient, and another for Brand Name - and it isn't particularly performant to do both at once. The best approach I think would be to check out the Mango query feature http://docs.couchdb.org/en/2.0.0/api/database/find.html, try the queries you're interested in and then add indexes as required (it has the explain plan to help make this more efficient).

How does the SETCURRENTKEY() C/AL function in Navision work?

I have the following questions:
What does SETCURRENTKEY actually do?
What is the benefit of SETCURRENTKEY?
Why would I use SETCURRENTKEY?
When would I use SETCURRENTKEY?
What is the advantage of using an index and how do I tie this analogously to the example of an old sorting system of a library?
What type of database querying efficiency problems does this function solve?
I have been searching all over the internet and the 'IT Pro Developer Help' internal Navision documentation for this poorly documented function and I cannot find a right answer to my questions.
The only thing I know is that SETCURRENTKEY sets the current key for a record variable and it sorts the recordset based on it. When SETCURRENTKEY is used with only a few keys, it can improve query performance. I have no idea what is actually happening when a database uses an index versus not using an index.
Someone told me this is how SETCURRENTKEY works:
It is like the old sorting card system in a library: without SETCURRENTKEY you would have to go through each shelf and manually filter out for the book you want. You would find a mix of random books and you would have to say: "No, not this one. Yes, this one". With SETCURRENTKEY you can have an index analogous to the old system where you would just go to a book or music CD based on its 'Author' or 'Artist' etc.
That's all fine, but I still can't properly answer my questions.
With SETCURRENTKEY you declare the key (table index, which can consist of many fields) to be used when querying database with FINDSET/FINDFIRST/FINDLAST statements, and the order of records you will receive while iterating the recordset with NEXT statement.
Performance. The database server uses the selected key (table index) to retrieve the record set. You are always better off explicitly stating SETCURRENTKEY in your code, as it makes you think along about you database structure and indices required.
Performance, and so that you know ahead the order of records you will receive when iterating through a recordset.
When to use:
The typical use is this:
RecordVar.SETCURRENTKEY(...)
RecordVar.SETRANGE(Field, ...)
RecordVar.SETFILTER(Field, ...)
RecordVar.SETRANGE(Field, ...)
...
IF RecordVar.FINDSET THEN REPEAT
// do something with records
UNTIL RecordVar.NEXT = 0;
SETCURRENTKEY is declarative, and comes into effect only when FINDSET is executed. At the moment FINDSET is executed, the database will be queried on the table represented by RecordVar, using the filters declared by SETRANGE/SETFILTER, and the key/index declared by SETCURRENTKEY.
For 5. and 6. and generally, I would truly reccomend you to familiarize yourself with basic database index theory. This is what it is, pretty well explained by yourself using the library/book analogy.
If modifying key fields (or filtered fields, even if not in the key) in a loop, the standard way to do this in NAV is to declare a second record variable, do a GET on it using the primary key fields from the record variable you are looping through, then change and MODIFY the second record variable.

How about: Using a field containing all information to use it for search

On the Employees Database Table, I'm using field called SearchTags in that field i'm going to add the employees information like FullName + PassportNo + Nationality + JobTitel est.
And to search for a particular employee i'll search within that field (TagSearch)
What Do you think about this method?
isn't that considered as information duplicate?
from my opinion this method is very easy to code it and straight forward.
So, I'd like to know your opinion before I start using this method :)
I am assuming that you are using SQL to perform the search.
What Do you think about this method?
I don't mean to sound harsh, but I completely disagree with your approach.
isn't that considered as information duplicate?
Of course, which is not at all recommended by database design fundamentals.
Problems you will have to face
What if you want to update one of those individual fields? For example, when the job title changes, how will you handle? You will have to update at two places.
A new requirement down the road will demand you to search only 3 of those fields, not four. What would you do? Create another field with duplicates of the latest 3 target fields?
SQL is simple enough to formulate a query to target multiple fields to search.

Implementing a database -- How to get started

I've been trying to learn programming for a while. I've studied Java and Python, and I'm comfortable with their syntax. Recently, I wanted to use what I've learnt with coding a tangible software from ground up.
I want to implement a database engine, sort of a NoSQL database. I've put together a small document, sort of a specification to follow throughout my adventure of coding it. But all I know is a bunch of keywords. I don't know where to start.
Can someone help me find out how to gather the knowledge I need for this kind of work and in what order to learn things? I have searched for documents, but I feel like I'll end up finding unrelated/erroneous content or start from a wrong point, because implementing a complete database engine is (seeming to be) a truly complicated task.
I wan't to express that I'd prefer theses and whitepapers and (e)books to codes of other projects, because I've asked a question of kind in which people usually get answered in the form of "read project - x' source code". I'm not at the level of comfortably reading and understanding source code.
First, you may have a look that the answers for How to write a simple database engine. While it focus on a SQL engine, there is still a lot of good material in the answers.
Otherwise, a good project tutorial is Implementation of a B-Tree Database Class. The example code is in C++, but the description of what is done and why is probably what you'll want to look at anyway.
Also, there is Designing and Implementing Structured Storage (Database Engine) over at MSDN. Plenty of information there to help you in your learning project.
Because the accepted answer only offers (good) links to other resources, I'd thought I share my experience writing webdb, a small experimental database for browsers. I also invite you to read the source code. It's pretty small. You should be able to read through it and get a basic understanding of what it's doing in a couple of hours. Warning: I am a n00b at this and since writing it I learned a lot more about it and see I have been doing some things wrong. It can help you get started though.
The basics: BTree
I started out with adapting an AVL tree to suit my needs. An AVL tree is a kind of self-balancing binary search tree. You store the key K and related data (if any) in a node, then all items with key < K in a node in the left subtree and all items with key > K in a right subtree. You can use an array to store the data items if you want to support non unique keys.
This tree will give you the basics: Create, Update, Delete and a way to quickly get an item by key, or all items with key < x, or with key between x and y etc. It can serve as the index for our table.
A schema
As a next step I wrote code that lets the client code define a schema. Methods like createTable() etc. Schemas are typically associated with SQL, but even no-SQL sort-of has a schema; they usually require you to mark the ID field and any other fields you want to search on. You can make your schema as fancy as you want, but you typically want to model at least which column(s) serve as primary key and which fields will be searched on frequently and need an index.
Creating a data structure to store a table
I decided to use the tree I created in the first step to store my items. These were simple JS objects. Having defined which field contains the PK, I could simply insert the item into the tree using that field's value as the key. This gives me quick lookup by ID (range).
Next I added another tree for every column that needs an index. In these trees I did not store the full record, but only the key. So to fetch a customer by last name, I would first use the index on last name to get the ID, then the primary key index to get the actual record. The reason I did not just store the (reference to the) actual object is because it makes set operations a little bit simpler (see next step)
Querying
Now that we have a table with indexes for PK and search fields, we can implement querying. I did not take this very far as it becomes complicated quickly, but you can get some nice functionality with just some basics. WebDB does not implement joins; all queries operate only on a single table. But once you understand this you see a pretty clear (though long and winding) path to doing joins and other complicated stuff as well.
In WebDB, to get all customers with firstName = 'John' and city = 'New York' (assuming those are two search fields), you would write something like:
var webDb = ...
var johnsFromNY = webDb.customers.get({
firstName: 'John',
city: 'New York'
})
To solve it, we first do two lookups: we get the set X of all IDs of customers named 'John' and we get the set Y of all IDs of customers from New York. We then perform an intersection on these two sets to get all IDs of customers that are both named 'John' AND from New York. We then run through our set of resulting IDs, getting the actual record for each one and adding it to the result array.
Using the set operators like union and intersection we can perform AND and OR searches. I only implemented AND.
Doing joins would (I think) involve creating temporary tables in memory, then populating them as the query runs with the joined results, then applying the query criteria to the temp table. I never got there. I attempted some syncing logic next but that was too ambitious and it went downhill from there :)

Resources