How to develop search box auto completion from database? - database

I have seen so many e-commerce websites that provides search box to search products. In that search features most of the search fields are auto-complete. If we enter a letter on field, then it will show the data which is including that letter as suggestions from database. As I know basics on developing that functionality.
But what if database contains huge amount of data?
For example, e-commerce websites like flipkart and amazon had a lot of products in their database. so, if user enter a letter in search field, it have to search for data including that letter among all the data in database and match data including that letter and display data as suggestions. The websites are processing it within nano seconds of time. I wonder how they achieved that functionality? I can't understand what are the technologies they are using.
As a learner I wanna know the functional flow and if possible demo for that feature.

I think your question can be divided into two parts. 1) how to design the database for the search technology. 2) how to implement an effect search... They belong to the field of search engine technology.
About the Q1, you can create a table to save the keywords for search, and in the table, you'd better to design a column or similar method to describe the "search-weight". As well known, a view is a practical solution to accelerate the access of the data.
About the Q2, the search engine technique is No longer mysterious, some open source projects can simulate the feature of search engine, such as Apache Lucene, visit please Apache Lucene.
more discuss:
And specially, in your front system for example, the ASP/JSP or even simple HTML page, you should use some scripts e.g. Ajax, to popup, drawdown, of caurse, simple DOM Javascript+DIV can reach it too, but with jQuery or other libarary can make it easily. Here is an example.
Here is the backend system example
To reduce the burden on the host and reduce the requirement of network's bandwidth, the front javascript should active the autocomplete feature with more than three characters.
Please pay attention in your actual application, that your server has calculation's limitation, and the client page has usually many elements, all will reduce user friendliness. Please do not make the request and response too complex.
An alternative simulation can be: make a FIFO logic, save some usual search keyword in the "cache" or temp-table|view, and the amount of data will be reduced.
There are too many solutions, I can only think of these tricks at this moment.
regards

Related

Creating an autocomplete search box with large data set using Ruby on Rails and React

I am building a web application using React on the front end and Ruby on Rails on the backend. I am a newbie developer but am working on a real world project to help local pharmacies better respond to patient medication requests. I was hoping to get some insight on a search box feature I am working to implement.
I want to create this feature so that a patient can search for a medication name as they type into a search box that generates autocomplete suggestions. We are using a dataset of 10,000 medications, so the challenge here is how to deal with this large dataset while making the user experience pleasant. Last thing I want to do is send a request every time a user types in a word to fetch relevant medication names and have it be laggy due to round trip response.
One approach I have thought of was to seed the medications to a Postgres SQL database, then when the patient loads the page with the search box, fetch all the records and maybe use some sort of cache to store on the browser. Rails documentation seems to have some good information on this.
I was also looking into elasticsearch, since I hear its good for large dataset retrievals.
Does anyone have any suggestions on any other concepts or technologies I could explore for the particular feature I want to build without sacrificing user experience?

how to prepare data for domain specific chat-bot

I am trying to make a chatbot. all the chatbots are made of structure data. I looked Rasa, IBM watson and other famous bots. Is there any ways that we can convert the un-structured data into some sort of structure, which can be used for bot training? Let's consider bellow paragraph-
Packaging unit
A packaging unit is used to combine a certain quantity of identical items to form a group. The quantity specified here is then used when printing the item labels so that you do not have to label items individually when the items are not managed by serial number or by batch. You can also specify the dimensions of the packaging unit here and enable and disable them separately for each item.
It is possible to store several EAN numbers per packaging unit since these numbers may differ for each packaging unit even when the packaging units are identical. These settings can be found on the Miscellaneous tab:
There are also two more settings in the system settings that are relevant to mobile data entry:
When creating a new item, the item label should be printed automatically. For this reason, we have added the option ‘Print item label when creating new storage locations’ to the settings. When using mobile data entry devices, every item should be assigned to a storage location, where an item label is subsequently printed that should be applied to the shelf in the warehouse to help identify the item faster.
how to make the bot from such a data any lead would be highly appreciated. Thanks!
is this idea in picture will work?just_a_thought
The data you are showing seems to be a good candidate for a passage search. Basically, you would like to answer user question by the most relevant paragraph found in your training data. This uses-case is handled by Watson Discovery service that can analyze unstructured data as you are providing and then you can query the service with input text and the service answers with the closest passage found in the data.
From my experience you also get a good results by implementing your own custom TF/IDF algorithm tailored for your use-case (TF/IDF is a nice similarity search tackling e.g. the stopwords for you).
Now if your goal would be to bootstrap a rule based chatbot using these kind of data then these data are not that ideal. For rule-based chatbot the best data would be some actual conversations between users asking questions about the target domain and the answers by some subject matter expert. Using these data you might be able to at least do some analysis helping you to pinpoint the relevant topics and domains the chatbot should handle however - I think - you will have hard time using these data to bootstrap a set of intents (questions the users will ask) for the rule based chatbot.
TLDR
If I would like to use Watson service, I would start with Watson Discovery. Alternatively, I would implement my own search algorithm starting with TF/IDF (which maps rather nicely to your proposed solution).

Spatial Search Objectify, appengine

I want to use, objectify for spatial search. I have entities that have longitude and latitude associated with them. Latitude and longitude information is dynamic e.g. service providers (like electrician, carpenter) in a city. I want to implement a query that gives me service providers providing some specific service in 1 Km radius. Searching on google reveals following options
Use Objectify with geohashes - Not sure, how accurate and scalable this solution is
Use Google Search - It will need entities(or part of it) duplicated in the form of documents and Will it be able to support dynamically updated locations.
Use other database like mongodb
Assuming few millions entities and latitude/longitude dynamically updated, please suggest me an appropriate option.
thanks
Ittium
I've used geohashes. It works, although you end up selecting more data than the exact bounds you are looking for and then filtering out the extra. This might or might not be a good solution depending on your specific application. It requires writing more code but has fewer moving parts (all in the datastore).
Google search and "other database" are basically the same architectural pattern - use the task queue to replicate updates to an external index. If you want a quick solution, the search service is probably is the easiest to wrap your head around.
Just pick one solution and run with it for a while. You can always reindex the data into a different solution.
It really depends on your query rate but I usually prefer to use google search. Building and maintaining docs is pretty simple and you get a different quota to handle this queries.

Storing 100k map markers in App Engine

I'm designing yet another "Find Objects near my location" web site and mobile app.
My requirements are:
Store up to 100k objects;
Query for objects that are close to the point (my location, city, etc). And other search criteria (like object type);
Display results on the Google Maps with smooth performance.
Let user filter objects by object time.
I'm thinking about using Google App Engine for this project.
Could You recommend what would be the best data storage option for this?
And couple of words about dynamic data loading strategy.
I kinda feel overwhelmed with options at the moment and looking for hints where should I continue my research.
Thanks a lot!
I'm going to to assume that you are using the datastore. I'm not familiar with Google Cloud SQL (which I believe aims to offer MySQL-like features in the cloud), so I can't speak if it can do geospatial queries.
I've been looking into the whole "get locations in proximity of a location" problem for a while now. I have some good and bad news for you, unfortunately.
The best way to do the proximity search in the Google Environment is via the Search Service (https://developers.google.com/appengine/docs/python/search/ or find the JAVA link ). Reason being is that it supports a "Geopoint Field", and allows you to query in such a way.
Ok, cool, so there is support, right? However, "A query is complex if its query string includes the name of a geopoint field or at least one OR or NOT boolean operator". The free quota for Complex Search Queries are 100/day. Per 10,000 queries, it costs 60 cents. Depending on your application, this may be an issue.
I'm not too familar with the Google Maps API you might be able to pull off something like this :(https://developers.google.com/maps/articles/phpsqlsearch_v3)
My current project/problem involves moving locations, and not "static" ones (stores, landmarks,etc). I've decided to go with Amazon's Dynamodb and they have a library which supports geospatial indexing : http://aws.amazon.com/about-aws/whats-new/2013/09/05/announcing-amazon-dynamodb-geospatial-indexing/

Web data extraction and data mining; Scraping vs Injection and how to get data.. like yesterday

I feel like i should almost give a friggin synopsis to this/these lengthy question(s)..
I apologize if all of these questions have been answered specifically in a previous question/answer post, but I have been unable to locate any that specifically addresses all of the following queries.
This question involves data extraction from the web (ie web scraping, data mining etc). I have spent almost a year doing research into these fields and how it can be applied to a certain industry. I have also familiarized myself with php and mysql/myphpmyadmin.
In a nutshell I am looking for a way to extract information from a site (probably several gigs worth) as fast and efficiently as possible. I have tried web scraping programs like scrapy and webharvey. I have also experimented with programs like HTTrack. All have their strengths and weaknesses. I have found that webharvey works pretty good yet it has its limitations when scraping images that are stored in gallery widgets. Also I find that many of the sites I am extracting from use other methods to make mining data a pain. It would take months to extract the data using webharvey. Which I can't complain given that I'd be extracting millions of rows worth of data exported in csv format into excel. But again, images and certain ajax widgets throw the program off when trying to extract image files.
So my questions are as follows:
Are there any quicker ways to extract said data?
Is there any way to get around the webharvey image limitations (ie only being able to extract one image within a gallery widget / not being able to follow sub-page links on sites that embed their crap funny and try to get cute with coding)?
Are their any ways to bypass site search form parameters that limit the number of search results (ie obtaining all business listings within an entire state instead of being limited to a county per search form restrictions)**
Also, this is public information so therefore it cannot be copyrighted; anybody can take it :) (case in point: Feist Publications v. Rural Telephone Service). Extracting information is extracting information. Its legal to extract as long as we are talking facts/public information.
So with that said, wouldn't the most efficient method (grey area here) of extracting this "public" information (assuming vulnerabilities existed), be through the use of sql injection?... If one was so inclined? :)
As a side question just how effective is Tor at obscuring ones IP address? Lol
Any help, feedback, suggestions or criticism would be greatly appreciated. I am by no means an expert in any of the above mentioned fields. I am just a motivated individual with a growing interest in programming and automation who has a lot of crazy ideas. Thank you.
You may be better off writing your own Linux command-line scraping program using either a headless browser library like PhantomJS (JavaScript), or a test framework like Selenium WebDriver (Java).
Once you have your scrape program completed, you can then scale it up by installing it on a cloud server (e.g. Amazon EC2, Linode, Google Compute Engine or Microsoft Azure) and duplicating the server image to as many are required.

Resources