Creating Index in Cloudant - cloudant

Scenario.
I have a document in the database which has thousands of item in
'productList' as below.
here
All the object in array 'productList' has the same shape and same fields with different values.
Now I want to search in the following way.
when a user writes 'c' against 'Ingrediants' field, the list will show all 'Ingrediants' start with alphabet 'c'.
when a user write 'A' against 'brandName' field, the list will show
all 'brandName' start with alphabet 'A'.
please give an example using this to search for it, either it is by
creating an index(json,text).
creating a Search index (design document) or
using views etc
Note: I don't want to create an index at run-time(I mean index could be defined by Cloudant dashboard) I just want to query it, by this library in the application.
I have read the documentation's, I got the concepts.
Now, I want to implement it with the best approach.
I will use this approach to handle all such scenarios in future.
Sorry if the question is stupid :)
thanks.

CouchDB isn't designed to do exactly what you're asking. You'd need one index for Ingredient, and another for Brand Name - and it isn't particularly performant to do both at once. The best approach I think would be to check out the Mango query feature http://docs.couchdb.org/en/2.0.0/api/database/find.html, try the queries you're interested in and then add indexes as required (it has the explain plan to help make this more efficient).

Related

Azure Search - Hierarchical facets guidance

I'm developing a project where I want to have hierarchical facets.
I have an index with a complex structure, like:
Index
-field1
-List
And othercomplexfield contains another list with anothercomplexfield inside.
I'd like to be able to give to users the possibility to:
Have the facets of field1.
When one is selected, I'd like to give the user the possibility to select one of the values of a certain field of "othercomplexfield" while filtering by the selected field1.
I can do that.
I'd then like to give the user the possibility to select one of the possible values of "anothercomplexfield" while filtering by field1 AND by the selected othercomplexfield.
The difficulty here is that I don't want every possible facet value, but only the ones CONTAINED by the othercomplexfield that I'm filtering for.
So far I had to do this inside of c# and i did not find a way to write a query that gives me back from azure search the distinct values that I want.
Someone has a similar problem?
Did I explain the problem well enough?
I saw no clear guidance online, everything is easy if you only have level 1 facets but when you get into nested objects it's not that clear anymore.
I'm not sure I fully understand the context of your question. What I can tell you is that filters only apply at the document level and not at the complex collection level. What I mean by that is that if a filter matches an item in a complex collection, the entire document will be returned, not just the item in the complex collection that matched. The same is true for facets--facets will count all documents in the result set that match the filter and can't be scoped down just to parts of documents. With that, it seems like having this logic in your application like you mentioned might be the best approach for your current index schema.
We do have this old blog post that talks about one way to implement hierarchical facets with Azure Cognitive Search which may give you some other ideas on how you could implement the functionality you're looking for: https://learn.microsoft.com/en-us/archive/blogs/onsearch/multi-level-taxonomy-facets-in-azure-search

Database Table structure for search engine for my website

I am trying to make a search engine for my website. How should I design the table which keeps the list of indexed words.
Earlier I thought something like this:
Table: tbl_indexedwords has 2 columns iw_wordid and iw_word.
Table: tbl_wordoccurrence has 4 columns wo_occurrenceid, wo_wordid, wo_pageid, wo_numberofoccurrences.
Now, this design will not work well if the user enters more than two words in the search box. Suppose foo bar. Even if foo and bar both are present in the table tbl_indexedwords and corresponding details are in the tbl_wordoccurrence, my search engine script would rank the results where it sees maximum wo_numberofoccurrences for either foo or bar. It will not see whether foo and bar are present next to each other as there is no column for order of occurrence of the words. I hope I am clear with what I am saying here.
Another idea could be to make the table tbl_wordoccurrence of 3 columns. Forget about wo_numberofoccurences and store each word in the page with unique wo_occurrenceid. Now, this would solve my problem as I know the order of occurrence of the words. if wo_occurrenceid of some word is wo_occurrenceid+1 or wo_occurrenceid-1 of some other word then, these two occur side by side.
The problem with this design is that it would take up lots of space. I have lots of content for my website. I think this approach would make it slow(not sure, though). Is there any other design that would help me? Or will I have to go with the second one? I am sure the first one is not gonna work, so discarding it.
If the contents of your website is on the database (I assume) creating a separate table would not even necessary if you are using FULLTEXT index. If you are using mySQL then it has such capability see the examples here and here. And if you are using MSSQL it has also its own FULLTEXT indexing capability like the example here and here
And if you insist if having a separate table for searching then you could most likely have only one table needed like:
Table : tbl_wordsoccurrence
Fields : words_id, words
(and if you like you can include also number_of_occurences and page_id fields)
In the table above you could either store one word like programming or phrases like php programming.
On the other hand if your website is static meaning the content is not saved on a database and therefore changes had to be made manually rather than by regular user input then that's another story.

How about: Using a field containing all information to use it for search

On the Employees Database Table, I'm using field called SearchTags in that field i'm going to add the employees information like FullName + PassportNo + Nationality + JobTitel est.
And to search for a particular employee i'll search within that field (TagSearch)
What Do you think about this method?
isn't that considered as information duplicate?
from my opinion this method is very easy to code it and straight forward.
So, I'd like to know your opinion before I start using this method :)
I am assuming that you are using SQL to perform the search.
What Do you think about this method?
I don't mean to sound harsh, but I completely disagree with your approach.
isn't that considered as information duplicate?
Of course, which is not at all recommended by database design fundamentals.
Problems you will have to face
What if you want to update one of those individual fields? For example, when the job title changes, how will you handle? You will have to update at two places.
A new requirement down the road will demand you to search only 3 of those fields, not four. What would you do? Create another field with duplicates of the latest 3 target fields?
SQL is simple enough to formulate a query to target multiple fields to search.

Implementing a database -- How to get started

I've been trying to learn programming for a while. I've studied Java and Python, and I'm comfortable with their syntax. Recently, I wanted to use what I've learnt with coding a tangible software from ground up.
I want to implement a database engine, sort of a NoSQL database. I've put together a small document, sort of a specification to follow throughout my adventure of coding it. But all I know is a bunch of keywords. I don't know where to start.
Can someone help me find out how to gather the knowledge I need for this kind of work and in what order to learn things? I have searched for documents, but I feel like I'll end up finding unrelated/erroneous content or start from a wrong point, because implementing a complete database engine is (seeming to be) a truly complicated task.
I wan't to express that I'd prefer theses and whitepapers and (e)books to codes of other projects, because I've asked a question of kind in which people usually get answered in the form of "read project - x' source code". I'm not at the level of comfortably reading and understanding source code.
First, you may have a look that the answers for How to write a simple database engine. While it focus on a SQL engine, there is still a lot of good material in the answers.
Otherwise, a good project tutorial is Implementation of a B-Tree Database Class. The example code is in C++, but the description of what is done and why is probably what you'll want to look at anyway.
Also, there is Designing and Implementing Structured Storage (Database Engine) over at MSDN. Plenty of information there to help you in your learning project.
Because the accepted answer only offers (good) links to other resources, I'd thought I share my experience writing webdb, a small experimental database for browsers. I also invite you to read the source code. It's pretty small. You should be able to read through it and get a basic understanding of what it's doing in a couple of hours. Warning: I am a n00b at this and since writing it I learned a lot more about it and see I have been doing some things wrong. It can help you get started though.
The basics: BTree
I started out with adapting an AVL tree to suit my needs. An AVL tree is a kind of self-balancing binary search tree. You store the key K and related data (if any) in a node, then all items with key < K in a node in the left subtree and all items with key > K in a right subtree. You can use an array to store the data items if you want to support non unique keys.
This tree will give you the basics: Create, Update, Delete and a way to quickly get an item by key, or all items with key < x, or with key between x and y etc. It can serve as the index for our table.
A schema
As a next step I wrote code that lets the client code define a schema. Methods like createTable() etc. Schemas are typically associated with SQL, but even no-SQL sort-of has a schema; they usually require you to mark the ID field and any other fields you want to search on. You can make your schema as fancy as you want, but you typically want to model at least which column(s) serve as primary key and which fields will be searched on frequently and need an index.
Creating a data structure to store a table
I decided to use the tree I created in the first step to store my items. These were simple JS objects. Having defined which field contains the PK, I could simply insert the item into the tree using that field's value as the key. This gives me quick lookup by ID (range).
Next I added another tree for every column that needs an index. In these trees I did not store the full record, but only the key. So to fetch a customer by last name, I would first use the index on last name to get the ID, then the primary key index to get the actual record. The reason I did not just store the (reference to the) actual object is because it makes set operations a little bit simpler (see next step)
Querying
Now that we have a table with indexes for PK and search fields, we can implement querying. I did not take this very far as it becomes complicated quickly, but you can get some nice functionality with just some basics. WebDB does not implement joins; all queries operate only on a single table. But once you understand this you see a pretty clear (though long and winding) path to doing joins and other complicated stuff as well.
In WebDB, to get all customers with firstName = 'John' and city = 'New York' (assuming those are two search fields), you would write something like:
var webDb = ...
var johnsFromNY = webDb.customers.get({
firstName: 'John',
city: 'New York'
})
To solve it, we first do two lookups: we get the set X of all IDs of customers named 'John' and we get the set Y of all IDs of customers from New York. We then perform an intersection on these two sets to get all IDs of customers that are both named 'John' AND from New York. We then run through our set of resulting IDs, getting the actual record for each one and adding it to the result array.
Using the set operators like union and intersection we can perform AND and OR searches. I only implemented AND.
Doing joins would (I think) involve creating temporary tables in memory, then populating them as the query runs with the joined results, then applying the query criteria to the temp table. I never got there. I attempted some syncing logic next but that was too ambitious and it went downhill from there :)

how to get a list of objects, in which each object contains a field with substring "ho"

I have an object Person with fields firstName,lastName etc.
After finding the list of persons available. I need to find persons whose firstName contains substring "ho" . How can I do this?
I would have used LIKE with wild cards but my application is hosted on google app engine, so I cant use LIKE in the SQL Query. Tried it before did not work. Any suggestions how I can do this without traversing each object in the list?
You really need to think of the datastore in a different manner than a relational database. What that essentially means is that you have to be smart about how you store your data to get at it. Without having full text search, you can use a strategy to mimic full text search by creating a key list of searchable words and storing them in a child entity in an entity group. Then you can construct your query to return the keys of the parent object that match your "query string". This allows you to have indexing without the overhead of full text search.
Here's a great example of it using Objectify but you can use anything to accomplish the same thing (JPA, JDO, low level API).
http://novyden.blogspot.com/2011/02/efficient-keyword-search-with-relation.html
You can't, at least, not if you're using the BigTable-based datastore.

Resources