Where are avgRequestsPerSecond and avgTimePerRequest metrics in solr 7,8 - solr

I am coding golang solr exporter which format the same with java solr-exporter of Apache Solr (it ate much RAM) . I want to add more metric like "avgTimePerRequest", "avgRequestsPerSecond".
According to Solr document, it said that can query "avgTimePerRequest" and "avgRequestsPerSecond" via
"http://localhost:8983/solr/admin/metrics?group=core&prefix=UPDATE./update.requestTimes"
"http://localhost:8983/solr/admin/metrics?group=core&prefix=QUERY./select.requestTimes"
But when i couldn't see avgTimePerRequest or avgRequestsPerSecond, It only includes these
"count":0,
"meanRate":0.0,
"1minRate":0.0,
"5minRate":0.0,
"15minRate":0.0,
"min_ms":0.0,
"max_ms":0.0,
"mean_ms":0.0,
"median_ms":0.0,
"stddev_ms":0.0,
"p75_ms":0.0,
"p95_ms":0.0,
"p99_ms":0.0,
"p999_ms":0.0
With Solr 6, I can found "avgTimePerRequest" and "avgRequestsPerSecond" in mbean. But solr7,8 I couldn't found it? Does they need to enable?

From SOLR v7.3 Change.txt
SOLR-8785: Metrics related classes in org.apache.solr.util.stats have been removed in favor of
the dropwizard metrics library. Any custom plugins using these classes should be changed to use
the equivalent classes from the metrics library.
As part of this, the following changes were made to the output of Overseer Status API:
* The "totalTime" metric has been removed because it is no longer supported
* The metrics "75thPctlRequestTime", "95thPctlRequestTime", "99thPctlRequestTime"and "999thPctlRequestTime" in Overseer Status API have been renamed to "75thPcRequestTime", "95thPcRequestTime"
and so on for consistency with stats output in other parts of Solr.
The metrics "avgRequestsPerMinute", "5minRateRequestsPerMinute" and "15minRateRequestsPerMinute" have been replaced by corresponding per-second rates viz. "avgRequestsPerSecond", "5minRateRequestsPerSecond" and "15minRateRequestsPerSecond" for consistency with stats output in other parts of Solr.

Related

solr 8.11 Field Types docs contradiction. Any guidance?

I'm setting up my first Solr server via docker using solr:8.11.1-slim. I am gonna use the schema API to set up the schema for my core whose name is 'products'.
While reading the docs there seems to be false info on the docs for field types:
https://solr.apache.org/guide/8_11/field-types-included-with-solr.html
vs.
https://solr.apache.org/guide/8_11/schema-api.html
I followed the first guide to get info on what field types I can specify and am trying to send requests based on the second doc such as this:
{ 'add-field': { "name":"latlong", "type":"LatLongPointSpatialField", "multiValued":False, "stored":True, 'indexed': True } },
but Solr gives me back errors such as:
org.apache.solr.api.ApiBag$ExceptionWithErrObject: error processing commands, errors: [{add-field={name=latlong, type=LatLongPointSpatialField, multiValued=false, stored=true, indexed=true}, errorMessages=[Field 'latlong': Field type 'LatLongPointSpatialField' not found
So what gives? Am I misreading the docs or are they wrong or is something wrong with the solr 8.11.1 image in docker? Why does it not accept the field types I'm providing?
Thanks for your help ahead of time.

Flink, where can I find the ExecutionEnvironment#readSequenceFile method?

I have hdfs data files which were originally created by mapreduce job with output settings like below,
job.setOutputKeyClass(BytesWritable.class);
job.setOutputValueClass(BytesWritable.class);
job.setOutputFormatClass(SequenceFileAsBinaryOutputFormat.class);
SequenceFileAsBinaryOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);
Now I'm trying to read these files with Flink DataSet API (version 1.5.6), I look into the flink doc, but couldn't figure out how to do that.
In the doc, there's an API 'readSequenceFile', I just cannot find it in the class ExecutionEnvironment, I can find 'readCsvFile', 'readTextFile', but not this one.
There's a general one 'readFile(inputFormat, path)', but I have no clue what's the inputFormat, it seems this API doesn't accept hadoop input format such as 'SequenceFileAsBinaryInputFormat'.
Could anyone please shed some light here? Many thanks.
I guess what you missed is an additional dependency: "org.apache.flink" %% "flink-hadoop-compatibility" % 1.7.2
Once you added this you can run:
val env = ExecutionEnvironment.getExecutionEnvironment
env.createInput(HadoopInputs.readSequenceFile[Long, String](classOf[Long], classOf[String], "/data/wherever"))
Find a more detail documentation about the what and how here https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/hadoop_compatibility.html
Hope that helps

How to prevent crawling external links with apache nutch?

I want to crawl only specific domains on nutch. For this I set the db.ignore.external.links to true as it was said in this FAQ link
The problem is nutch start to crawl only links in the seed list. For example if I put "nutch.apache.org" to seed.txt, It only find the same url (nutch.apache.org).
I get the result by running crawl script with 200 depth. And it's finished with one cycle and generate the out put below.
How can I solve this problem ?
I'm using apache nutch 1.11
Generator: starting at 2016-04-05 22:36:16
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: false
Generator: normalizing: true
Generator: topN: 50000
Generator: 0 records selected for fetching, exiting ...
Generate returned 1 (no new segments created)
Escaping loop: no more URLs to fetch now
Best Regards
You want to fetch only pages from a specific domain.
You already tried db.ignore.external.links but this restrict anything but the seek.txt urls.
You should try conf/regex-urlfilter.txt like in the example of the nutch1 tutorial:
+^http://([a-z0-9]*\.)*your.specific.domain.org/
Are you using "Crawl" script? If yes make sure you giving level which is greater than 1. If you run something like this "bin/crawl seedfoldername crawlDb http://solrIP:solrPort/solr 1". It will crawl only urls which are listed in the seed.txt
And to crawl specific domain you can use regex-urlfiltee.txt file.
Add following property in nutch-site.xml
<property>
<name>db.ignore.external.links</name>
<value>true</value>
<description>If true, outlinks leading from a page to external hosts will be ignored. This is an effective way to limit the crawl to include only initially injected hosts, without creating complex URLFilters. </description>
</property>

Solr 4: disable compression on stored fields: how to actually configure custom codec?

The short question is :
I want to disable stored field compression on Solr 4.3.0 index. After reading :
http://blog.jpountz.net/post/35667727458/stored-fields-compression-in-lucene-4-1
http://wiki.apache.org/solr/SimpleTextCodecExample
http://www.opensourceconnections.com/2013/06/05/build-your-own-lucene-codec/
I've decided to follow the path described there, and make my own codec. I'm pretty sure I've followed all the steps, however, when I actually try to use my codec (affectionatelly named "UncompressedStorageCodec"), I get the following error in Solr log:
java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'UncompressedStorageCodec' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.
The current classpath supports the following names: [Pulsing41, SimpleText, Memory, BloomFilter, Direct, Lucene40, Lucene41]
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:109)
From the output I get that Solr is not picking up the jar with my custom codec, and I don't get why?
Here's all the horriffic details:
I've created a class like this:
public class UncompressedStorageCodec extends FilterCodec {
private final StoredFieldsFormat fieldsFormat = new Lucene40StoredFieldsFormat();
protected UncompressedStorageCodec() {
super("UncompressedStorageCodec", new Lucene42Codec());
}
#Override
public StoredFieldsFormat storedFieldsFormat() {
return fieldsFormat;
}
}
in package: "fr.company.project.solr.transformers.utils"
The FQDN of "FilterCodec" is: "org.apache.lucene.codecs.FilterCodec"
I've created a basic jar file out of this (exported it as jar from Eclipse).
The Solr installation I'm using to test this is the basic Solr 4.3.0 unzipped, and started via it's embedded Jetty server and using the example core.
I've placed my jar with the codec in [solrDir]\dist
In:
[solrDir]\example\solr\myCore\conf\solrconfig.xml
I've added the line:
<lib dir="../../../dist/" regex="myJarWithCodec-1.10.1.jar" />
Then in the schema.xml file, I've declared some fieldTypes that should use this codec like so:
<fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" postingsFormat="UncompressedStorageCodec"/>
<fieldType name="string_lowercase" class="solr.TextField" positionIncrementGap="100" omitNorms="true" postingsFormat="UncompressedStorageCodec">
<!--...-->
</fieldType>
Now, if I use the DataImportHandler component to import some data into Solr, at commit time it tells me:
java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'UncompressedStorageCodec' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.
The current classpath supports the following names: [Pulsing41, SimpleText, Memory, BloomFilter, Direct, Lucene40, Lucene41]
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:109)
What I find strange is that the above mentioned codec jar also contains some Transformers for the DataImportHandler component. And those are picked up fine. Also, other jars placed in the dist folder (and declared in the same way in solrconfig.xml), like the jdbc driver are picked up fine. I'm guessing that for the codec there's this SPI thingy which loads things differentlly, and there's somethign he's missing...
I've also tried placing the codec jar in:
[solrDir]\example\solr-webapp\webapp\WEB-INF\lib\
as well as inside the WEB-INF\lib folder of the solr.war file, which is found in:
[solrDir]\example\webapps\
but I'm still getting the same error.
So basically, my question is, what's missing so that my codec jar is picked up by Solr?
Thanks
I'm going to answer this question myself since it sort of become moot due to some benchmarks I've made: long story short, I had arrived at the (wrong) conclusion that for really large stored fields, Solr 3.x and 4.0 (without field compression) is faster than Solr 4.1 and above (with field compression). However that was mostly due to some errors in my benchmarks. After repeating them I've obtained results where when you go from non-compressed to compressed fields even for very large stored fields, the index time is between 0% and 15% slower, which is really not bad at all, considering that afterwards queries on the compressed fields indexes are 10-20% times faster (the document fetching part).
Also, here's some remarks on how to speed up indexing:
Use the DataImportHandler plugin. It bypasses the Solr Rest (HTTP based) API and writes directly to the Lucene index.
Check out said plugins sources to see how it accomplishes this, and do your own plugin if the DataImportHandler doesn't meet your needs
If for whatever reason you want to stick to the Solr Rest API, use ConcurrentUpdateSolrServer and play around with the queue size and number of threads parameters. It will normally be a lot faster (up to 200% in my case) than the basic HttpSolrServer.
Don't forget to enable the javabin data serialization like this:
ConcurrentUpdateSolrServer solrServer = new ConcurrentUpdateSolrServer("http://some.solr.host:8983/solr", 100, 4);
solrServer.setRequestWriter(new BinaryRequestWriter());
I'm explicitly showing the code because I believe there migth be a small bug here:
If you look at the ConcurrentUpdateSolrServer constructor, you'll see that by default it already sets the request writer to binary:
//the ConcurrentUpdateSolrServer initializes HttpSolrServer objects using this constructor:
public HttpSolrServer(String baseURL, HttpClient client) {
this(baseURL, client, new BinaryResponseParser());
}
However after debugging I've noticed that if you don't explicitly call the setWriter method with the Binary writer argument, it will still use the XmlSerializer.
Going from XML to Binary serialization reduces the size of my documents about 3 times as they are being sent to the server. This makes my index times for this case about 150-200% faster.
I have recently tried and succeeded to get something very similar to work. The only difference is that I want to enable the best compression instead of no compression, and Solr defaults to the fastest compression. I also got the "SPI class [...] does not exist" error at some point, and here is what I have found out from various articles, including the ones you have linked to.
Lucene uses SPI to find the codec classes to load. Lucene requires the list of codec classes be declared in the file "org.apache.lucene.codecs.Codec", and the file must be on the class path. To get Solr to load the file: When you create your JAR file "myJarWithCodec-1.10.1.jar", make sure that it contains a file at "META-INF/services/org.apache.lucene.codecs.Codec". The file should have one full class name per line, like this:
org.apache.lucene.codecs.lucene3x.Lucene3xCodec
org.apache.lucene.codecs.lucene40.Lucene40Codec
org.apache.lucene.codecs.lucene41.Lucene41Codec
org.apache.lucene.codecs.lucene42.Lucene42Codec
fr.company.project.solr.transformers.utils.UncompressedStorageCodec
And in solrconfig.xml, replace:
<codecFactory class="solr.SchemaCodecFactory" />
with:
<codecFactory class="fr.company.project.solr.transformers.utils.UncompressedStorageCodec" />
You might also need to remove postingsFormat="UncompressedStorageCodec" from schema.xml if Solr complains. I think this particular parameter is for specifying the postings format, not the codec. Hope it helps.

ParseException Unknown function termfreq in FunctionQuery

What is the right syntax if my query is formulated incorrectly in the link above for Apache 3.5 SOLR and do I have to enable anything specific in solrconfig.xml and schema.xml
Using Apache SOLR 3.5 and receiving a ParseException Uknown function termfreq in FunctionQuery(tf(text,amplifiers)'
http://localhost:8983/solr/select/?fl=score,documentPageId&defType=func&q=tf%28text,amplifiers%29
I am following the syntax on other websites because I don't know how to do it for the documentation on the wiki --> http://wiki.apache.org/solr/FunctionQuery
It won't work, the function query tf(field, term) that you are attempting to use is not available in 3.5, browse through ValueSourceParser if you want to double-check. You need to get Solr 4.x nightly build - Solr Nightly Build from trunk & use it, but beware Solr 4.x is not stable & released yet, there will be a significant level of API changes compared to 3.5.
If you are interested in poking into the code, you could for instance, if you are using Maven modify pom.xml to get the atrifacts from Trunk and browse the source code starting from ValueSourceParser that should let you know if those relevance functions exist & how their implementation is.
For Ex: You will see parsers related to the term vector function queries,
// From Solr 4 `ValueSourceParser` Trunk Source Code
addParser("tf", new ValueSourceParser() {
#Override
public ValueSource parse(FunctionQParser fp) throws ParseException {
TInfo tinfo = parseTerm(fp);
return new TFValueSource(tinfo.field, tinfo.val, tinfo.indexedField, tinfo.indexedBytes);
}
});

Resources