ParseException Unknown function termfreq in FunctionQuery - solr

What is the right syntax if my query is formulated incorrectly in the link above for Apache 3.5 SOLR and do I have to enable anything specific in solrconfig.xml and schema.xml
Using Apache SOLR 3.5 and receiving a ParseException Uknown function termfreq in FunctionQuery(tf(text,amplifiers)'
http://localhost:8983/solr/select/?fl=score,documentPageId&defType=func&q=tf%28text,amplifiers%29
I am following the syntax on other websites because I don't know how to do it for the documentation on the wiki --> http://wiki.apache.org/solr/FunctionQuery

It won't work, the function query tf(field, term) that you are attempting to use is not available in 3.5, browse through ValueSourceParser if you want to double-check. You need to get Solr 4.x nightly build - Solr Nightly Build from trunk & use it, but beware Solr 4.x is not stable & released yet, there will be a significant level of API changes compared to 3.5.
If you are interested in poking into the code, you could for instance, if you are using Maven modify pom.xml to get the atrifacts from Trunk and browse the source code starting from ValueSourceParser that should let you know if those relevance functions exist & how their implementation is.
For Ex: You will see parsers related to the term vector function queries,
// From Solr 4 `ValueSourceParser` Trunk Source Code
addParser("tf", new ValueSourceParser() {
#Override
public ValueSource parse(FunctionQParser fp) throws ParseException {
TInfo tinfo = parseTerm(fp);
return new TFValueSource(tinfo.field, tinfo.val, tinfo.indexedField, tinfo.indexedBytes);
}
});

Related

Solr throws error on partial update after upgrade to 8.8

I'm doing a simple partial update scenario which worked with version 6.x and 7.x of Solr. After upgdrading both Solr and Solrj to 8.8, I'm getting the following exception:
2021-02-23 14:57:58.201 ERROR (qtp-459670553-28) [ x:core1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: TransactionLog doesn't know how to serialize class org.apache.lucene.document.LazyDocument$LazyField; try implementing ObjectResolver?
at org.apache.solr.update.TransactionLog$1.resolve(TransactionLog.java:100)
at org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:266)
at org.apache.solr.common.util.JavaBinCodec$BinEntryWriter.put(JavaBinCodec.java:441)
at org.apache.solr.common.ConditionalKeyMapWriter$EntryWriterWrapper.put(ConditionalKeyMapWriter.java:44)
at org.apache.solr.common.MapWriter$EntryWriter.putNoEx(MapWriter.java:101)
at org.apache.solr.common.MapWriter$EntryWriter.lambda$getBiConsumer$0(MapWriter.java:161)
at org.apache.solr.common.MapWriter$EntryWriter$$Lambda$548/0000000000000000.accept(Unknown Source)
at org.apache.solr.common.SolrInputDocument.lambda$writeMap$0(SolrInputDocument.java:59)
at org.apache.solr.common.SolrInputDocument$$Lambda$549/0000000000000000.accept(Unknown Source)
.....
solrj code is just similar to the sample provided here and was working before upgrade. The operation is 'add' with a simple integer field for a document whose id is provided.
Note that this is different from a previous question on stackoverflow, since I'm passing simple integer field and on solr/lucene side it's replaced with org.apache.lucene.document.LazyDocument$LazyField.
Seems to be a bug in Solr https://issues.apache.org/jira/browse/SOLR-13034 to be fixed in the next version of solr 8 (8.9).
Until it's released the workaround is to set <enableLazyFieldLoading>false</enableLazyFieldLoading> in solrconfig.xml

Sitecore 8.2 update 5 Ninject Solr support package issue

I am facing an issue with Ninject IOC container.
I am using Sitecore 8.2 update 5 and switching from Lucene to Solr search engine using the steps mentioned in https://sitecorerockz.wordpress.com/2018/08/01/lucene-to-solr/
I am using Solr 6.6.3. Earlier this project was on Sitecore 6.X version and from time to time some upgrades happened, and now it is in Sitecore 8.2 update 5.
The same Solr setup is working fine for the fresh Sitecore 8.2 update 5 setup.
I created Solr diagnostic page and kept it in /Sitecore/admin folder to check the error details, I am getting the below error for all the indexes:
Solr Indexes Microsoft.Practices.ServiceLocation.ActivationException: Activation error occured while trying to get instance of type ISolrOperations`1, key "sitecore_analytics_index" ---> Ninject.ActivationException: Error activating ISolrOperations{Dictionary{string, Object}} No matching bindings are available, and the type is not self-bindable. Activation path: 1) Request for ISolrOperations{Dictionary{string, Object}} Suggestions: 1) Ensure that you have defined a binding for ISolrOperations{Dictionary{string, Object}}. 2) If the binding was defined in a module, ensure that the module has been loaded into the kernel. 3) Ensure you have not accidentally created more than one kernel. 4) If you are using constructor arguments, ensure that the parameter name matches the constructors parameter name. 5) If you are using automatic module loading, ensure the search path and filters are correct. at Ninject.KernelBase.Resolve(IRequest request) in c:\Projects\Ninject\ninject\src\Ninject\KernelBase.cs:line 376 at Ninject.ResolutionExtensions.Get(IResolutionRoot root, Type service, String name, IParameter[] parameters) in c:\Projects\Ninject\ninject\src\Ninject\Syntax\ResolutionExtensions.cs:line 164 at MyLibrary.test.Infrastructure.NinjectServiceLocator.DoGetInstance(Type serviceType, String key) in C:\test_Git\Sitecore\src\test\Infrastructure\NinjectServiceLocator.cs:line 15 at Microsoft.Practices.ServiceLocation.ServiceLocatorImplBase.GetInstance(Type serviceType, String key) in c:\Home\Chris\Projects\CommonServiceLocator\main\Microsoft.Practices.ServiceLocation\ServiceLocatorImplBase.cs:line 49 --- End of inner exception stack trace --- at Microsoft.Practices.ServiceLocation.ServiceLocatorImplBase.GetInstance(Type serviceType, String key) in c:\Home\Chris\Projects\CommonServiceLocator\main\Microsoft.Practices.ServiceLocation\ServiceLocatorImplBase.cs:line 53 at Microsoft.Practices.ServiceLocation.ServiceLocatorImplBase.GetInstance[TService](String key) in c:\Home\Chris\Projects\CommonServiceLocator\main\Microsoft.Practices.ServiceLocation\ServiceLocatorImplBase.cs:line 103 at Sitecore.ContentSearch.SolrProvider.SolrSearchIndex.Initialize() at ASP._Page_sitecore_admin_solr_diagnostic_cshtml.Execute() in c:\test_Git\Sitecore\build\25Sep2019\Website\sitecore\admin\solr-diagnostic.cshtml:line 29
What am I missing, could you please advise me?
The SetLocatorProvider was getting initialized two times
Modified the Custom code related to Ninject IOC
Our current solution was using Ninject.dll 3.0.0.0 now I used new version of Ninject.dll which is 3.2.2.0 under folder bin>>Social
Replaced all the Solr dll files from fresh Sitecore 8.2 update 5 files

solr query exception ! how is that going on?

i use solr 4.3 for my website ,
whie i query one data with morelike this function ,
sometime this exception goes out :
org.apache.solr.common.SolrException: parsing error
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:43)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385)
at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
parsing error??
how was it ?
why it occures ?
thanks for your reply .
this code :
the code :
public MoreLikeThisQueryResponse(QueryResponse queryResponse) {
this.queryResponse = queryResponse;
NamedList<Object> res = this.queryResponse.getResponse();
for (int i = 0; i < res.size(); i++) {
String name = res.getName(i);
if ("match".equals(name)) {
this.matchResult = (SolrDocumentList) res.getVal(i);
}
}
}
This type of "parsing error" in the BinaryResponseParser.processResponse method typically indicates that the server is returning a raw HTML text response rather than a Solr javabin response. Typically that means that the server container (Tomcat or Jetty) is detecting and reporting an error before Solr gets a chance to handle the error and obey the wt parameter which sets the response format to javabin.
Check the server log for the actual error.
I had a similar problem.
It seems that there might be a compatibility error is your are not using the EXACT compatible versions of Solr and Solrj.
To resolve this, you have to specify
server.setParser(new XMLResponseParser());
Quoting from Solrj
SolrJ generally maintains backwards compatibility, so you can use a newer SolrJ with an older Solr, or an older SolrJ with a newer Solr. There are some minor exceptions to this general rule:
If you're mixing 1.x and a later major version, you must set the response parser to XML, because the two versions use incompatible versions of javabin.

Solr 4: disable compression on stored fields: how to actually configure custom codec?

The short question is :
I want to disable stored field compression on Solr 4.3.0 index. After reading :
http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1
http://wiki.apache.org/solr/SimpleTextCodecExample
http://www.opensourceconnections.com/2013/06/05/build-your-own-lucene-codec/
I've decided to follow the path described there, and make my own codec. I'm pretty sure I've followed all the steps, however, when I actually try to use my codec (affectionatelly named "UncompressedStorageCodec"), I get the following error in Solr log:
java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'UncompressedStorageCodec' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.
The current classpath supports the following names: [Pulsing41, SimpleText, Memory, BloomFilter, Direct, Lucene40, Lucene41]
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:109)
From the output I get that Solr is not picking up the jar with my custom codec, and I don't get why?
Here's all the horriffic details:
I've created a class like this:
public class UncompressedStorageCodec extends FilterCodec {
private final StoredFieldsFormat fieldsFormat = new Lucene40StoredFieldsFormat();
protected UncompressedStorageCodec() {
super("UncompressedStorageCodec", new Lucene42Codec());
}
#Override
public StoredFieldsFormat storedFieldsFormat() {
return fieldsFormat;
}
}
in package: "fr.company.project.solr.transformers.utils"
The FQDN of "FilterCodec" is: "org.apache.lucene.codecs.FilterCodec"
I've created a basic jar file out of this (exported it as jar from Eclipse).
The Solr installation I'm using to test this is the basic Solr 4.3.0 unzipped, and started via it's embedded Jetty server and using the example core.
I've placed my jar with the codec in [solrDir]\dist
In:
[solrDir]\example\solr\myCore\conf\solrconfig.xml
I've added the line:
<lib dir="../../../dist/" regex="myJarWithCodec-1.10.1.jar" />
Then in the schema.xml file, I've declared some fieldTypes that should use this codec like so:
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" postingsFormat="UncompressedStorageCodec"/>
<fieldType name="string_lowercase" class="solr.TextField" positionIncrementGap="100" omitNorms="true" postingsFormat="UncompressedStorageCodec">
<!--...-->
</fieldType>
Now, if I use the DataImportHandler component to import some data into Solr, at commit time it tells me:
java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'UncompressedStorageCodec' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.
The current classpath supports the following names: [Pulsing41, SimpleText, Memory, BloomFilter, Direct, Lucene40, Lucene41]
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:109)
What I find strange is that the above mentioned codec jar also contains some Transformers for the DataImportHandler component. And those are picked up fine. Also, other jars placed in the dist folder (and declared in the same way in solrconfig.xml), like the jdbc driver are picked up fine. I'm guessing that for the codec there's this SPI thingy which loads things differentlly, and there's somethign he's missing...
I've also tried placing the codec jar in:
[solrDir]\example\solr-webapp\webapp\WEB-INF\lib\
as well as inside the WEB-INF\lib folder of the solr.war file, which is found in:
[solrDir]\example\webapps\
but I'm still getting the same error.
So basically, my question is, what's missing so that my codec jar is picked up by Solr?
Thanks
I'm going to answer this question myself since it sort of become moot due to some benchmarks I've made: long story short, I had arrived at the (wrong) conclusion that for really large stored fields, Solr 3.x and 4.0 (without field compression) is faster than Solr 4.1 and above (with field compression). However that was mostly due to some errors in my benchmarks. After repeating them I've obtained results where when you go from non-compressed to compressed fields even for very large stored fields, the index time is between 0% and 15% slower, which is really not bad at all, considering that afterwards queries on the compressed fields indexes are 10-20% times faster (the document fetching part).
Also, here's some remarks on how to speed up indexing:
Use the DataImportHandler plugin. It bypasses the Solr Rest (HTTP based) API and writes directly to the Lucene index.
Check out said plugins sources to see how it accomplishes this, and do your own plugin if the DataImportHandler doesn't meet your needs
If for whatever reason you want to stick to the Solr Rest API, use ConcurrentUpdateSolrServer and play around with the queue size and number of threads parameters. It will normally be a lot faster (up to 200% in my case) than the basic HttpSolrServer.
Don't forget to enable the javabin data serialization like this:
ConcurrentUpdateSolrServer solrServer = new ConcurrentUpdateSolrServer("http://some.solr.host:8983/solr", 100, 4);
solrServer.setRequestWriter(new BinaryRequestWriter());
I'm explicitly showing the code because I believe there migth be a small bug here:
If you look at the ConcurrentUpdateSolrServer constructor, you'll see that by default it already sets the request writer to binary:
//the ConcurrentUpdateSolrServer initializes HttpSolrServer objects using this constructor:
public HttpSolrServer(String baseURL, HttpClient client) {
this(baseURL, client, new BinaryResponseParser());
}
However after debugging I've noticed that if you don't explicitly call the setWriter method with the Binary writer argument, it will still use the XmlSerializer.
Going from XML to Binary serialization reduces the size of my documents about 3 times as they are being sent to the server. This makes my index times for this case about 150-200% faster.
I have recently tried and succeeded to get something very similar to work. The only difference is that I want to enable the best compression instead of no compression, and Solr defaults to the fastest compression. I also got the "SPI class [...] does not exist" error at some point, and here is what I have found out from various articles, including the ones you have linked to.
Lucene uses SPI to find the codec classes to load. Lucene requires the list of codec classes be declared in the file "org.apache.lucene.codecs.Codec", and the file must be on the class path. To get Solr to load the file: When you create your JAR file "myJarWithCodec-1.10.1.jar", make sure that it contains a file at "META-INF/services/org.apache.lucene.codecs.Codec". The file should have one full class name per line, like this:
org.apache.lucene.codecs.lucene3x.Lucene3xCodec
org.apache.lucene.codecs.lucene40.Lucene40Codec
org.apache.lucene.codecs.lucene41.Lucene41Codec
org.apache.lucene.codecs.lucene42.Lucene42Codec
fr.company.project.solr.transformers.utils.UncompressedStorageCodec
And in solrconfig.xml, replace:
<codecFactory class="solr.SchemaCodecFactory" />
with:
<codecFactory class="fr.company.project.solr.transformers.utils.UncompressedStorageCodec" />
You might also need to remove postingsFormat="UncompressedStorageCodec" from schema.xml if Solr complains. I think this particular parameter is for specifying the postings format, not the codec. Hope it helps.

Indexing PDF with Solr

Can anyone point me to a tutorial.
My main experience with Solr is indexing CSV files. But I cannot find any simple instructions/tutorial to tell me what I need to do to index pdfs.
I have seen this: http://wiki.apache.org/solr/ExtractingRequestHandler
But it makes very little sense to me. Do I need to install Tika?
Im lost - please help
With solr-4.9 (the latest version as of now), extracting data from rich documents like pdfs, spreadsheets(xls, xlxs family), presentations(ppt, ppts), documentation(doc, txt etc) has become fairly simple.
The sample code examples provided in the downloaded archive from
here contains a basic solr template project to get you started quickly.
The necessary configuration changes are as follows:
Change the solrConfig.xml to include following lines :
<lib dir="<path_to_extraction_libs>" regex=".*\.jar" />
<lib dir="<path_to_solr_cell_jar>" regex="solr-cell-\d.*\.jar" />
create a request handler as follows:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults" />
</requestHandler>
2.Add the necessary jars from the solrExample to your project.
3.Define the schema as per your needs and fire a query like :
curl "http://localhost:8983/solr/collection1/update/extract?literal.id=1&literal.filename=testDocToExtractFrom.txt&literal.created_at=2014-07-22+09:50:12.234&commit=true" -F "myfile=#testDocToExtractFrom.txt"
go to the GUI portal and query to see the indexed contents.
Let me know if you face any problems.
You could use the dataImportHandler. The DataImortHandle will be defined at the solrconfig.xml, the configuration of the DataImportHandler should be realized in an different XML config file (data-config.xml)
For indexing pdf's you could
1.) crawl the directory to find all the pdf's using the FileListEntityProcessor
2.) reading the pdf's from an "content/index"-XML File, using the XPathEntityProcessor
If you have the list of related pdf's, use the TikaEntityProcessor
look at this http://solr.pl/en/2011/04/04/indexing-files-like-doc-pdf-solr-and-tika-integration/ (example with ppt) and this Solr : data import handler and solr cell
The hardest part of this is getting the metadata from the PDFs, using a tool like Aperture simplifies this. There must be tonnes of these tools
Aperture is a Java framework for extracting and querying full-text content and metadata from PDF files
Apeture grabbed the metadata from the PDFs and stored it in xml files.
I parsed the xml files using lxml and posted them to solr
Use the Solr, ExtractingRequestHandler. This uses Apache-Tika to parse the pdf file. I believe that it can pull out the metadata etc. You can also pass through your own metadata.
Extracting Request Handler
public class SolrCellRequestDemo {
public static void main (String[] args) throws IOException, SolrServerException {
SolrClient client = new
HttpSolrClient.Builder("http://localhost:8983/solr/my_collection").build();
ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest("/update/extract");
req.addFile(new File("my-file.pdf"));
req.setParam(ExtractingParams.EXTRACT_ONLY, "true");
NamedList<Object> result = client.request(req);
System.out.println("Result: " +enter code here result);
}
This may help.
Apache Solr can now index all sort of binary files like PDF, Words, etc ... check out this doc:
https://lucene.apache.org/solr/guide/8_5/uploading-data-with-solr-cell-using-apache-tika.html

Resources