Escape Special Character In Solr - solr

I'm getting following error while escaping special character '&' after running http://localhost:8983/solr/amazon_products/select?q=*:*&fq=Category:"Toys \& Games "
this query in Solr
{
"responseHeader": {
"zkConnected": true,
"status": 400,
"QTime": 0,
"params": {
"q": "*:*",
"Games \"": "",
"fq": "Category:\"Toys \\",
"rows": "70"
}
},
"error": {
"metadata": [
"error-class", "org.apache.solr.common.SolrException",
"root-error-class", "org.apache.solr.parser.TokenMgrError"
],
"msg": "org.apache.solr.search.SyntaxError: Cannot parse 'Category:\"Toys \\': Lexical error at line 1, column 17. Encountered: <EOF> after : \"\\\"Toys \\\\\"",
"code": 400
}}
Category field contains values like below
"Category":["Toys & Games "," Learning & Education "," Science Kits & Toys"]
"Category":["Home & Kitchen "," Home Décor "," Window Treatments "," Window Stickers & Films ", " Window Films"],
And category field is of type string with multivalued=true
{
"name":"Category",
"type":"string",
"multiValued":true,
"stored":true},
How to search properly for Category:"Toys & Games "
NOTE: I tried http://localhost:8983/solr/amazon_products/select?q=*:*&fq=Category:Toys* AND *"Games "&rows=70 this query and it worked fine, but If I excatly want to serach for string 'Toys & Games ' how to do that by properly escaping special character '&'

You'll need to encode some of the characters. For example the following command:
$ curl 'http://localhost:8983/solr/puldata/select?fq=title_t%3A%22Woody%20Herman%20%26%20His%20Orchestra%22&q=*&start=0'
will query fq=title_t:"Woody Herman & His Orchestra". Notice how the :, ", spaces, and the & characters are encoded.

Related

Attribute Syntax for JSON query in check_json.pl

So, I'm trying to set up check_json.pl in NagiosXI to monitor some statistics. https://github.com/c-kr/check_json
I'm using the code with the modification I submitted in pull request #32, so line numbers reflect that code.
The json query returns something like this:
[
{
"total_bytes": 123456,
"customer_name": "customer1",
"customer_id": "1",
"indices": [
{
"total_bytes": 12345,
"index": "filename1"
},
{
"total_bytes": 45678,
"index": "filename2"
},
],
"total": "765.43gb"
},
{
"total_bytes": 123456,
"customer_name": "customer2",
"customer_id": "2",
"indices": [
{
"total_bytes": 12345,
"index": "filename1"
},
{
"total_bytes": 45678,
"index": "filename2"
},
],
"total": "765.43gb"
}
]
I'm trying to monitor the sized of specific files. so a check should look something like:
/path/to/check_json.pl -u https://path/to/my/json -a "SOMETHING" -p "SOMETHING"
...where I'm trying to figure out the SOMETHINGs so that I can monitor the total_bytes of filename1 in customer2 where I know the customer_id and index but not their position in the respective arrays.
I can monitor customer1's total bytes by using the string "[0]->{'total_bytes'}" but I need to be able to specify which customer and dig deeper into file name (known) and file size (stat to monitor) AND the working query only gives me the status (OK,WARNING, or CRITICAL). Adding -p all I get are errors....
The error with -p no matter how I've been able to phrase it is always:
Not a HASH reference at ./check_json.pl line 235.
Even when I can get a valid OK from the example "[0]->{'total_bytes'}", using that in -p still gives the same error.
Links pointing to documentation on the format to use would be very helpful. Examples in the README for the script or in the -h output are failing me here. Any ideas?
I really have no idea what your question is. I'm sure I'm not alone, hence the downvotes.
Once you have the decoded json, if you have a customer_id to search for, you can do:
my ($customer_info) = grep {$_->{customer_id} eq $customer_id} #$json_response;
Regarding the error on line 235, this looks odd:
foreach my $key ($np->opts->perfvars eq '*' ? map { "{$_}"} sort keys %$json_response : split(',', $np->opts->perfvars)) {
# ....................................... ^^^^^^^^^^^^^
$perf_value = $json_response->{$key};
if perfvars eq "*", you appear to be looking for $json_reponse->{"{total}"} for example. You might want to validate the user's input:
die "no such key in json data: '$key'\n" unless exists $json_response->{$key};
This entire business of stringifying the hash ref lookups just smells bad.
A better question would look like:
I have this JSON data. How do I get the sum of total_bytes for the customer with id 1?
See https://stackoverflow.com/help/mcve

lucene solr - how to know numCount of each word in query

i have a query string with 5 words. for exmple "cat dog fish bird animals".
i need to know how many matches each word has.
at this point i create 5 queries:
/q=name:cat&rows=0&facet=true
/q=name:dog&rows=0&facet=true
/q=name:fish&rows=0&facet=true
/q=name:bird&rows=0&facet=true
/q=name:animals&rows=0&facet=true
and get matches count of each word from each query.
but this method takes too many time.
so is there a way to check get numCount of each word with one query?
any help appriciated!
In this case, functionQueries are your friends. In particular:
termfreq(field,term) returns the number of times the term appears in the field for that document. Example Syntax:
termfreq(text,'memory')
totaltermfreq(field,term) returns the number of times the term appears in the field in the entire index. ttf is an alias of
totaltermfreq. Example Syntax: ttf(text,'memory')
The following query for instance:
q=*%3A*&fl=cntOnSummary%3Atermfreq(summary%2C%27hello%27)+cntOnTitle%3Atermfreq(title%2C%27entry%27)+cntOnSource%3Atermfreq(source%2C%27activities%27)&wt=json&indent=true
returns the following results:
"docs": [
{
"id": [
"id-1"
],
"source": [
"activities",
"activities"
],
"title": "Ajones3 Activity Entry 1",
"summary": "hello hello",
"cntOnSummary": 2,
"cntOnTitle": 1,
"cntOnSource": 1,
"score": 1
},
{
"id": [
"id-2"
],
"source": [
"activities",
"activities"
],
"title": "Common activity",
"cntOnSummary": 0,
"cntOnTitle": 0,
"cntOnSource": 1,
"score": 1
}
}
]
Please notice that while it's working well on single value field, it seems that for multivalued fields, the functions consider just the first entry, for instance in the example above, termfreq(source%2C%27activities%27) returns 1 instead of 2.

solr query for not equal to text value and number greater than 0

I have solr documents with two fields, one is a string and one is an integer. Both fields are allowed to be null. I am attempting to write a query that will eliminate documents with the following properties:
textField = "badValue" AND (numberField is null OR numberField = 0)
I added the following fq:
((NOT textField=badValue) OR numberField=[1 TO *])
This does not seem to have worked properly, because I am getting a document with textField = badValue and numberField = 0. What did I do wrong with my fq?
The full query response header, containing the parsed query is:
"responseHeader": {
"status": 0,
"QTime": 245,
"params": {
"q": "(numi) AND (solr_specs:[* TO ] OR full_description:[ TO ])",
"defType": "edismax",
"bf": "log(sum(popularity,1))",
"indent": "true",
"qf": "categories^3.0 manufacturer^1.0 sku^0.2 split_sku^0.2 upc^1.0 invoice_description^2.6 full_description solr_specs^0.8 solr_spec_values^1.7 legacyid legacy_altcode id",
"fl": "distributor_status,QOH_estimate,id,score",
"start": "0",
"fq": "((:* NOT distributor_status=VENDORDISC) OR QOH_estimate=[1 TO *])",
"sort": "score desc,id desc",
"rows": "20",
"wt": "json",
"_": "1441220051438"
}
}
QOH_estimate is numberField and distributor_status is textField.
Please try the following in your fq parameter: ((*:* NOT textField:badValue) OR numberField:[1 TO *]).
((*:* NOT distributor_status:VENDORDISC) OR QOH_estimate:[1 TO *])
Here you first selecting the documents which are not containing textField:badValue and ORing with documents coming from numberField:[1 TO *] condition.

org.apache.solr.search.SyntaxError: Cannot parse sku_str

See below error:
"error": {
"msg": "org.apache.solr.search.SyntaxError: Cannot parse 'sku_str:VFY:A5440M35A5ME': Encountered \" \":\" \": \"\" at line 1,
column 11.
code": 400 }
Escape solr query string , See below function
SolrUtils::escapeQueryChars — Escapes a lucene query string
http://php.net/manual/en/solrutils.escapequerychars.php

SOLR/Tika - I need to concat values from 2 columns from 2 entities

Please look at my schema: http://pastebin.com/uPxwq8Zs and my data config: http://pastebin.com/ebeDfPM9
So, I need to index page content and some other fields and also all file attachments text, linked for each page. Because there can be multiple attachments, "text" and the rest fileds regarding to file need to be declared as multivalued. So, example output could be:
{
"header": " ",
"page_id": 25352,
"title": "Informacje, których nie ma w BIP",
"content": [
"<p> TEST test test"
],
"file_name": [
"Wniosek",
"Wniosek"
],
"file_desc": [
"Wniosek o udostępnienie informacji publicznej - format PDF",
"Wniosek o udostępnienie informacji publicznej - format RTF"
],
"file_path": [
"/mnt/storage/content/www/html/smartsite/src/data/resource/1/2/422/zalacznik_nr_1_wniosek.pdf",
"/mnt/storage/content/www/html/smartsite/src/data/resource/1/2/422/zalacznik_nr_1_wniosek.rtf"
],
"text": [
"Dzień dobry - dokument testowy.\nTEST.\n"
],
"_version_": 1479310282003054600
},
As you can see, it generallly works, but it's useless for me. In my example, there are 2 file attachments declared in database for page_id 25352. First attachment doesn't exist on disk on server, so tika was unable to index it, second attachment was indexed succesfully and this is text extracted:
"text": [
"Dzień dobry - dokument testowy.\nTEST.\n"
],
But I need to know, from which attachment was it extracted. So my idea is to concat my "text" field value with "file_patch" value and some separator, so I will get result like this:
"text": [
/mnt/storage/content/www/html/smartsite/src/data/resource/1/2/422/zalacznik_nr_1_wniosek.rtf*"Dzień dobry - dokument testowy.\nTEST.\n"
],
How to get result like this using solr/tike in my case?

Resources