Solr no content field = no highlighting

Solr no content field = no highlighting - solr

I want to add highlighting to my search result from Solr. My problem is the query don't contain any content field.
Search seems to work, but I guess that when I create the index I need to tell Solr to stored the texts or something.
I am running Solr on Windows.
java -Dc=aceapps -Dauto=yes -Ddata=files -Drecursive=yes -Dfiletypes=pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html -jar example/exampledocs/post.jar "\user\PowerBI"

Hard to answer when we don't know your schema.xml or solrconfig.xml
Have a look at the Wiki for the standard highlighter:
https://cwiki.apache.org/confluence/display/solr/Standard+Highlighter
Solr also comes with an example core called techproducts and in there is a good example of highlighting. Look at the request handler and highlighter in the solrconfig.xml and the fields it's being applied to in that example.

Related

Solr 5 custom field and filter

I'm new to solr. After i tried using Solr 5 client. I want to try Solr 5 source code.
So my questions are,
can i create a custom field for my own core on solr 5 by editing
schema.xml? if it's possible, please tell me the location (it wasn't in my conf folder, should i create a new one?).
Is there any other method for adding a custom field other than using schema
api?
Everytime i try to create a new core and then index the files, there are only currency.xml, elevate.xml, managed-schema(generated schema), params.json, protwords.txt, solrconfig.xml, stopwords.txt synonyms.txt on my conf folder and there's no schema.xml. Did i miss something?
Is there any simple tutorial to explain the custom filter on solr 5?
I really appriciate your answer. Thank's

When you create a core in Solr 5 it comes by default with schemaless mode active. This mode make solr schema not visible and all changes need to be done with schema API. If you want to manage schema by yourself you could rename managed-schema to schema.xml and modify solrconfig.xml to not use schemaless mode. In solrconfig.xml replace
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
by
<schemaFactory class="ClassicIndexSchemaFactory"/>
Now solr will use schema.xml managed by yourself.
The only mandatory configuration files to use are solrconfig.xml and in your case schema.xml. The other files are used just if you configure some filters using them. If you are using the example schema.xml probably you need to have all these files. But clean the configuration files to have just the fields and field types you really expect to use.
To learn more about filters, tokenizers and analyzers you can take a look at https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.

Yes
Manually editing the schema and using the API, are only two ways, as far as i know.
How exactly are you creating this core? Are you using the install_solr_service.sh ? Assuming its a linux system, check /var/solr/configs folder. Thats where the config files are if you ran that script.
Yes of course :) . https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide There is a "Getting Started" section, which should answer all your questions, including where configs are stored, how to use them etc.
Happy Searching!

How does Solr's schema-less feature work? How to revert it to classic schema?

Just found that Solr 5 doesn't require a schema file to be predefined and it generates the schema, based on the indexing being performed. I would like to know how does this work in the background?
And whether it's a good practice or not? Is there any way to disable it?

The schemaless feature has been in Solr since version 4.3. But it might be more stable only now as a concurrency issue with it was fixed in 4.10.
It is also called managed schema. When you configure Solr to use managed schema, Solr uses a special UpdateRequestProcessor to intercept document indexing requests and it guesses field types.
Solr starts with your schema.xml file and creates a new file called, by default, managed-schema to store all the inferred schema information. This file is automatically overwritten by Solr as it detects changes to the schema.
You should then use the Schema API if you want to make changes to the Schema. See also the Schemaless Mode documentation.
How to change Solr managed schema to classic schema
Stop Solr: $ bin/solr stop
Go to server/solr/mycore/conf, where "mycore" is the name of your core/collection.
Edit solrconfig.xml:
search for <schemaFactory class="ManagedIndexSchemaFactory"> and comment the whole element
search for <schemaFactory class="ClassicIndexSchemaFactory"/> and uncomment it
search for the <initParams> element that refers to add-unknown-fields-to-the-schema and comment out the whole <initParams>...</initParams>
Rename managed-schema to schema.xml and you are done.
You can now start Solr again: $ bin/solr start, go to http://localhost:8983/solr/#/mycore/documents and check that Solr now refuses to index a document with a new field not yet specified in schema.xml.
Is it a good practice? When to use it?
It depends on what you want. If you want to enforce a specific document structure (e.g. to make sure that all docs are "well-formed" according to your definition), then you want to use the classical schema management.
If on the other hand you don't know upfront what the doc structure is then you might want to use the schema-less feature.
Limits
While it is called schema-less, there are limits to the kinds of structures that you can index. This is true both for Solr and Elasticsearch, by the way. For example, if you first index this doc:
{"name":"John Doe"}
then you will get an error if you try to index a doc like that next:
{"name": {
"first": "Daniel",
"second": "Dennett"
}
}
That is because in the first case the field name was of type string while in the second case it is an object.
If you would like to use indexing which goes beyond these limitations then you could use SIREn - it is an open source semi-structured information retrieval engine which is implemented as a plugin for both Solr and Elasticsearch. (Disclaimer: I worked for the company that develops SIREn)

This is so called schemaless mode in Solr. I don't know about internal details, how it's implemented, etc.
bin/solr start -e schemaless
This snippet above will start Solr in schemaless mode, if you don't do that, it will work as usual.
For more information on schemaless, take a look here - https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode

How to show the contents of files when searching in alfresco

When I am searching for a particular content, it is showing the file which has the content, how can I show the line in which the particular content is there?
I know alfresco uses lucene, can I use lucene highlighter. If yes how to use lucene highlighter in alfresco?
What about solr can I use that?

4.2.e without modifications means that you're using SOLR.
Afaik there is no addon that adds hit-highlighting to Alfresco's Solr search subsystem.
It's on the roadmap.
There are quite some posts regarding hit-lighting in Alfresco based on lucene.

Alfresco 5.2 seems to have this feature. Searched for string is highlighted with context in the search results.

Solr's schema and how it works

Hey so I started researching about Solr and have a couple of questions on how Solr works. I know the schema defines what is stored and indexed in the Solr application. But I'm confuse as to how Solr knows that the "content" is the content of the site or that the url is the url?
My main goal is I'm trying to extract phone numbers from websites and I want Solr to nicely spit out 1234567890.

You need to define it in Solr schema.xml by declaring all the fields and its field type. You can then query Solr for any field to search.
Refer this: http://wiki.apache.org/solr/SchemaXml

Solr will not automatically index content from a website. You need to tell it how to index your content. Solr only knows the content you tell it to know. Extracting phone numbers sounds pretty simple so writing an update script or finding one online should not be an issue. Good luck!

Solr field collapsing

I read
http://wiki.apache.org/solr/FieldCollapsing
and I tried the query
http://192.168.0.1:8080/solr/append/select?q=mobile&group=true&group.field=brand
and I don't see the field collapsing. I mean I see the results, but not the grouping. My understanding is it should work, nothing to change in the solrconfig.xml ? In my schema, all my field are stored/index. My index is Lucene 2.9 and my Solr is 1.4.1. I don't see what I doing wrong...

Field collapsing is not available in Solr 1.4.1. You need Solr 3.3 or 4.0 (currently unreleased).
The wiki page about field collapsing also explains "If you haven't already, get a recent nightly build of Solr4.0 or Solr3.3..."
Look for "warning tags" in the Solr wiki that show when a particular feature is available only since a particular version of Solr:

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight