Solr autosuggest tutorial For Edge Ngram - solr

Can someone provide a link/blog/anything where I can find a step by step tutorial of Solr and autosuggest.
I want to understand how to complete configuring schema , configurations including field types analyzers and tokens

There is an article specifically on autosuggesters. And another one on doing multi-field autosuggester.
There are also several ways to implement autocomplete. You can use ngrams approach, which is what I use for Solr/Lucene search-based documentation. You can find the source code for that on Solr-Javadoc repository.
There is another one doing ngrams and edge-ngrams from a couple of years ago.
You could also use facets for some scenarios.

Related

Any good guides for writing custom Riak SOLR search analyzers?

In short, I need to search against my Riak buckets via SOLR. The only problem is, is that by default SOLR searches are case-sensitive. After some digging, I see that I need to write a custom SOLR text analyzer schema. Anyone have any good references for writing search analyzer schemas?
And finally, when installing a new schema for an index, is re-indexing all objects in a bucket necessary to show prior results in a search (using new schema)?
RTFM fail.... I swear though, getting to this page was not easy
http://docs.basho.com/riak/latest/dev/advanced/search-schema/#Defining-a-Schema

Part-of-speech tagger mapping in Solr

I am currently using Apache Solr to build a search engine. The queries in Solr are of the field:value format. Now I want to use a part-of-speech tagger to separate the subject, verb and predicate and search the values in each fields. For example, if I input "Who likes Starbucks" then I need some code to give me "q=subject:*&verb=likes&object=starbucks". Is there any library that can handle this job? Thank you!
I think several people have used UIMA for this, see solr wiki
There are a number of POS taggers. Here is another StackOverflow posting about this: What is a good Java library for Parts-Of-Speech tagging?

Index my own data in Solr

I am new to Solr and have a couple of questions to ask help from more experienced people:
I am able to get example running, however what is exactly the start.jar?
I know by running "java -jar start.jar", i can start solr. But do i run this command after i index my own data, not the given sample data? if not, what should i do to run my own solr instance with my own indexed data?
I do need to index my own sample data, not related to the given example solr thing at all. How exactly should i do it? Should i copy the example directory then modify the fields in sechema.xml? should i then run the post.sh accordingly to index the data like what i did to set up the example solr?
Thanks a lot for your help!
Steps:
Decide what will be the document structure u store in SOLR. (Somewhat like creating the schema of a relational DB for one table).
remove the example core and create your own core with that schema
once the schema works with no errors (you check the server logs that hosts the SOLR app) You can start feed the data you have into SOLR. You POST it via HTTP in a specific structure which is documented in the SOLR Wiki. Various frameworks have some classes to handle that.
Marked as Wiki as this is too broad an answer for someone who did not bother to RTFM...
Dear custom indexing is not a difficult task as I have worked on it just a few days ago. First you need to write your documnet is xml,csv or json( format supported in solr) containing fields according to your schema.xml, then run following command in example/exampledocs
For a document mydoc.xml
./post.sh mydoc.xml
if in output, status value is 0 then indexing is successful and you can search your document in solr
Reference:http://www.solrtutorial.com/solr-in-5-minutes.html
Though the question is old, but I am writing for new visitors with same issue. The question can't be answered in few words. You must understand what Solr is, whats Solr Admin UI, why we need Solr instead a relational database. Then you can understand how to import sample data. I have recently published two articles i.e. Solr Introduction and Importing Sample Data, these might be helpful for you.
http://www.devtrainings.com/2017/03/apache-solr-introduction-and-server.html
http://www.devtrainings.com/2017/03/apache-solr-index-data-and-run-search.html

Does Solr have an equivalent to CompassQueryBuilder?

I am rewriting our company's search functionality to use Solr instead of Compass. Our old code is using CompassQueryBuilder.CompassQueryStringBuilder to build a query out of a list of keywords. The keywords may have spaces in them: for example: "john smith", "tom jones".
Is there an existing facility I can use in Solr to replicate this functionality?
The closest thing I know for SolrJ is the solrj-criteria project. It seems to be currently unmaintained though.
Solr offers a wide variety of querying and indexing options. So fields that contain keywords with spaces in it, can be made possible by defining a custom type in the configuration file (see here). Queries with spaced keywords in it can be made possible by specifying a custom QueryParser. (see here)
Solr itself doesn't offer a QueryStringBuilder in an API. Actually, Solr itself doesn't offer any API classes at all, since all interaction is done by posting messages over Http. There are client libraries for Java, .NET and PHP etc. In the SolrNet api there exists a SolrMultipleCriteriaQuery, which is quite similar to the CompassQueryStringBuilder.

Sunspot / Solr / Lucene : Find similar article

Let's say we have a list of articles that are indexed by sunspot/solr/lucene (or any other search engine).
How can be used to find similar articles with a given article?
Should this be done with a resuming tool, like:
http://www.wordsfinder.com/api_Keyword_Extractor.php, or termextract from http://developer.yahoo.com/yql/console, or http://www.alchemyapi.com/api/demo.html ?
It seems you're looking for the MoreLikeThis feature.
What you are trying to do is very similar to the task I outlined in this answer.
In brief, you need to generate a summary for each document that you can use as the query to compare it with every other. A document summary could be as simple as the top N terms in that document (excluding stop words). You can generate top N terms from a Lucene document pretty easily without using any 3rd party tools, there are plenty examples on SO and the web to do this.

Resources