Index my own data in Solr - solr

I am new to Solr and have a couple of questions to ask help from more experienced people:
I am able to get example running, however what is exactly the start.jar?
I know by running "java -jar start.jar", i can start solr. But do i run this command after i index my own data, not the given sample data? if not, what should i do to run my own solr instance with my own indexed data?
I do need to index my own sample data, not related to the given example solr thing at all. How exactly should i do it? Should i copy the example directory then modify the fields in sechema.xml? should i then run the post.sh accordingly to index the data like what i did to set up the example solr?
Thanks a lot for your help!

Steps:
Decide what will be the document structure u store in SOLR. (Somewhat like creating the schema of a relational DB for one table).
remove the example core and create your own core with that schema
once the schema works with no errors (you check the server logs that hosts the SOLR app) You can start feed the data you have into SOLR. You POST it via HTTP in a specific structure which is documented in the SOLR Wiki. Various frameworks have some classes to handle that.
Marked as Wiki as this is too broad an answer for someone who did not bother to RTFM...

Dear custom indexing is not a difficult task as I have worked on it just a few days ago. First you need to write your documnet is xml,csv or json( format supported in solr) containing fields according to your schema.xml, then run following command in example/exampledocs
For a document mydoc.xml
./post.sh mydoc.xml
if in output, status value is 0 then indexing is successful and you can search your document in solr
Reference:http://www.solrtutorial.com/solr-in-5-minutes.html

Though the question is old, but I am writing for new visitors with same issue. The question can't be answered in few words. You must understand what Solr is, whats Solr Admin UI, why we need Solr instead a relational database. Then you can understand how to import sample data. I have recently published two articles i.e. Solr Introduction and Importing Sample Data, these might be helpful for you.
http://www.devtrainings.com/2017/03/apache-solr-introduction-and-server.html
http://www.devtrainings.com/2017/03/apache-solr-index-data-and-run-search.html

Related

Does SOLR support percolation

ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak

Any good guides for writing custom Riak SOLR search analyzers?

In short, I need to search against my Riak buckets via SOLR. The only problem is, is that by default SOLR searches are case-sensitive. After some digging, I see that I need to write a custom SOLR text analyzer schema. Anyone have any good references for writing search analyzer schemas?
And finally, when installing a new schema for an index, is re-indexing all objects in a bucket necessary to show prior results in a search (using new schema)?
RTFM fail.... I swear though, getting to this page was not easy
http://docs.basho.com/riak/latest/dev/advanced/search-schema/#Defining-a-Schema

how to make a search engine with nutch and cassandra?

I am tring to implement a website search engine with java as an applet,I have used nutch as web crawler and cassandra as my database,I have to use a nosql database(because my teacher wants me to do),now my question is what should I do next to complete my search engine?
I have googled a lot,but all of the sites are mostly about nutch and solr,and they build search engines with integration of these two,cause solr itself is somehow a database,I don't know what should I do,do I have to use solr too to complete my search engine?is it wise to use two databases(solr and cassandra)?or I should do some thing else?
please remember I have to use cassandra.
and please first explain me if I have understood things in a wrong way and then give me a minus mark,:D
I will be really really thankfull for your help,I have got somehow confused.
by the way does solr counted as a nosql database?excuse me,I am new to them all.
Check out Solr's Data Import Handler and see if you feel it would work. It allows you to query your database and store the results with Solr to which then Solr can manipulate the reuslts. Nutch also has very good integration with Solr should you choose to use it.

Solr's schema and how it works

Hey so I started researching about Solr and have a couple of questions on how Solr works. I know the schema defines what is stored and indexed in the Solr application. But I'm confuse as to how Solr knows that the "content" is the content of the site or that the url is the url?
My main goal is I'm trying to extract phone numbers from websites and I want Solr to nicely spit out 1234567890.
You need to define it in Solr schema.xml by declaring all the fields and its field type. You can then query Solr for any field to search.
Refer this: http://wiki.apache.org/solr/SchemaXml
Solr will not automatically index content from a website. You need to tell it how to index your content. Solr only knows the content you tell it to know. Extracting phone numbers sounds pretty simple so writing an update script or finding one online should not be an issue. Good luck!

Key Points/Challenges while working with Apache Tika and Solr

Recently I got involved in a task, and part of it require to use Apache Solr ( for Document Search) ,and Apache Tika ( to Extract the meta-text or plain text from documents)
I have n't integrated Solr and tika yet ,But I have worked with both of them individually I might have set of questions related to Apache Solr and Apache Tika , It might be at beginners level or average.
Following types of practical I did with Solr e.g. created a dummy database, wrote a program, configured - schema.xml things, ran Solr sever, and program which fetches documents from database and store in Solr Document Index , Made a Simple client to fetch data from Solr via JSON Interface, Made a Program which keeps MySQL Database to sync with Apache’s Solr document Index.
Following types of practical I did with tika e.g. compiled and Installed Tika, understood its document parsing capablities.
..
My Sample Task statement:
Part of my project require to store around 100,000 of documents (Data of these 100,000 (Doc,PDF,Txt) docs are fetched by Apache tika and pushed to MySql’s Database and later that pushed to apache Solr’s Document Database)for Full Text Search and search them those via a client interface (Browser)
In simple programmatical level this task will get done,
I would like to understand the challenges related to managing the index or something else in Solr e.g.
** In advanced level does it require optimizing the Solr’s Open Source Code?
** While Solr works in proper way, does it provide any specific challenges?
** What Key things need to consider initially so that, Solr should work in a proper way.
** Do you think any extra tool to developed to monitor Solr’s working ?
Hope you got the idea related to questions I have ?
** Also I would like to know If you have any experience of using apache Tika with apache Solr, and any challenges or key things to consider ?
Would you like to recommend and specific sources Or If you have any document or anything which you feel to be helpful.

Resources