Solr fq not eliminating results - solr

I am issuing the following query:
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"q": "(test)",
"defType": "edismax",
"indent": "true",
"fl": "distributor_status,QOH_estimate,id,score",
"start": "0",
"sort": "score desc,id desc",
"fq": "(QOH_estimate:[1 TO *])+OR+(distributor_status:stock)+OR+(*:* -distributor_status:VENDORDISC)",
"rows": "10",
"wt": "json",
"_": "1446833368873"
}
}
I am getting back documents like the following:
{ "id": "5445a000e4b0fb20ffca4aba",
"QOH_estimate": 0,
"distributor_status": "VENDORDISC",
"score": 4.48295
}
How does this document get past the fq?
Its QOH_estimate is 0, so it fails the QOH_estimate:[1 TO *]. Its distributor_status is VENDORDISC, so it fails distribotor_status:stock. Its distributor_status is VENDORDISC, so I would also expect it to fail the (*:* -distributor_status:VENDORDISC) as well. Since it fails all 3 parts of the disjunctive query, I would expect it to be eliminated, yet it is not being eliminated. Why?

I think your spaces between the clauses are double-escaping. Why otherwise, you have +OR+ in that output when the other spaces are fine.
If that does not help, try adding debug flag and see how that all gets parsed into the Lucene level. That should give a hint to the final expansion.

Related

Neo4j: Do queries on specific database?

I´m new to Neo4j, and want to implement a service that makes use of it.
I´ve read the docs and searched for it, however I still didn´t get an answer to this simple question:
How do I specify which database to query in a Neo4j query?
E.g. I connected to bolt://localhost:7687, and have three databases in there: system, neo4j, and mydb. The neo4j database is the standard.
When I open the Neo4j browser and do a query such as MATCH (n) RETURN n, it automatically assumes that I want to query the standard DB which is called neo4j. However, I want to query another one, mydb.
My output when I query aforementioned query says
{
"query": {
"text": "match (n) return n",
"parameters": {}
},
"queryType": "r",
"counters": {
"_stats": {
"nodesCreated": 0,
"nodesDeleted": 0,
"relationshipsCreated": 0,
"relationshipsDeleted": 0,
"propertiesSet": 0,
"labelsAdded": 0,
"labelsRemoved": 0,
"indexesAdded": 0,
"indexesRemoved": 0,
"constraintsAdded": 0,
"constraintsRemoved": 0
},
"_systemUpdates": 0
},
"updateStatistics": {
"_stats": {
"nodesCreated": 0,
"nodesDeleted": 0,
"relationshipsCreated": 0,
"relationshipsDeleted": 0,
"propertiesSet": 0,
"labelsAdded": 0,
"labelsRemoved": 0,
"indexesAdded": 0,
"indexesRemoved": 0,
"constraintsAdded": 0,
"constraintsRemoved": 0
},
"_systemUpdates": 0
},
"plan": false,
"profile": false,
"notifications": [],
"server": {
"address": "localhost:7687",
"version": "Neo4j/4.4.5",
"agent": "Neo4j/4.4.5",
"protocolVersion": 4.4
},
"resultConsumedAfter": {
"low": 2,
"high": 0
},
"resultAvailableAfter": {
"low": 8,
"high": 0
},
"database": {
"name": "neo4j"
}
}
In the last JSON value is the proof that the query was executed on database neo4j.
What do I have to add to my queries to instead query another database in the same DBMS?
You can change/specify the database using the following options.
From the Neo4j Browser, you can select the database in the sidebar.
In Cypher syntax, the use command lets you choose different databases.
:use mydb.
If you connect to Neo4j through an Application driver, you can specify the database while creating the session object.
For example, if you are using the Python driver:
from neo4j import GraphDatabase
driver = GraphDatabase.driver(uri, auth=(user, password))
session = driver.session(database="mydb")
Specify the default database in a system-wide manner by modifying the config_dbms.default_database value in the the neo4j.conf file.

Formatting into CSV JSON file using jq

I've some data in a file called myfile.json. I need to format using jq - in JSON it looks like this ;
{
"result": [
{
"service": "ebsvolume",
"name": "gtest",
"resourceIdentifier": "vol-999999999999",
"accountName": "g-test-acct",
"vendorAccountId": "12345678912",
"availabilityZone": "ap-southeast-2c",
"region": "ap-southeast-2",
"effectiveHourly": 998.56,
"totalSpend": 167.7,
"idle": 0,
"lastSeen": "2018-08-16T22:00:00Z",
"volumeType": "io1",
"state": "in-use",
"volumeSize": 180,
"iops": 2000,
"throughput": 500,
"lastAttachedTime": "2018-08-08T22:00:00Z",
"lastAttachedId": "i-086f957ee",
"recommendations": [
{
"action": "Rightsize",
"preferenceOrder": 2,
"risk": 0,
"savingsPct": 91,
"savings": 189.05,
"volumeType": "gp2",
"volumeSize": 120,
},
{
"action": "Rightsize",
"preferenceOrder": 4,
"risk": 0,
"savingsPct": 97,
"savings": 166.23,
"volumeType": "gp2",
"volumeSize": 167,
},
{
"action": "Rightsize",
"preferenceOrder": 6,
"risk": 0,
"savingsPct": 91,
"savings": 111.77,
"volumeType": "gp2",
"volumeSize": 169,
}
]
}
}
I have it formatted better with the following
jq '.result[] | [.service,.name,.resourceIdentifier,.accountName,.vendorAccountId,.availabilityZone,.region,.effectiveHourly,.totalSpend,.idle,.lastSeen,.volumeType,.state,.volumeSize,.iops,.throughput,.lastAttachedTime,.lastAttachedId] |#csv' ./myfile.json
This nets the following output ;
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\""
I figured out this but its not exactly what I am trying to achieve. I want to have each recommendation listed underneath on a seperate line, and not at the end of the same line.
jq '.result[] | [.service,.name,.resourceIdentifier,.accountName,.vendorAccountId,.availabilityZone,.region,.effectiveHourly,.totalSpend,.idle,.lastSeen,.volumeType,.state,.volumeSize,.iops,.throughput,.lastAttachedTime,.lastAttachedId,.recommendations[].action] |#csv' ./myfile.json
This nets :
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\",\"Rightsize\",\"Rightsize\",\"Rightsize\""
What I want is
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\",
\"Rightsize\",
\"Rightsize\",
\"Rightsize\""
So not entirely sure how to deal with the array inside the "recommendations" section in jq, I think it might be called unflattening?
You can try this:
jq '.result[] | [ flatten[] | try(.action) // . ] | #csv' file
"\"ebsvolume\",\"gtest\",\"vol-999999999999\",\"g-test-acct\",\"12345678912\",\"ap-southeast-2c\",\"ap-southeast-2\",998.56,167.7,0,\"2018-08-16T22:00:00Z\",\"io1\",\"in-use\",180,2000,500,\"2018-08-08T22:00:00Z\",\"i-086f957ee\",\"Rightsize\",\"Rightsize\",\"Rightsize\""
flatten does what it says.
try tests if .action is neither null nor false. If so, it emits its value, otherwise jq emits the other value (operator //).
The filtered values are put into an array in order to get them converted with the #csv operator.
That didn't overly work for me actually it omitted all the data in the previous array - but thanks!
I ended up with the following, granted it doesn't put the Rightsize details on a seperate line but it will have to do:
jq -r '.result[] | [.service,.name,.resourceIdentifier,.accountName,.vendorAccountId,.availabilityZone,.region,.effectiveHourly,.totalSpend,.idle,.lastSeen,.volumeType,.state,.volumeSize,.iops,.throughput,.lastAttachedTime,.lastAttachedId,.recommendations[][]] |#csv' ./myfile.json

lucene solr - how to know numCount of each word in query

i have a query string with 5 words. for exmple "cat dog fish bird animals".
i need to know how many matches each word has.
at this point i create 5 queries:
/q=name:cat&rows=0&facet=true
/q=name:dog&rows=0&facet=true
/q=name:fish&rows=0&facet=true
/q=name:bird&rows=0&facet=true
/q=name:animals&rows=0&facet=true
and get matches count of each word from each query.
but this method takes too many time.
so is there a way to check get numCount of each word with one query?
any help appriciated!
In this case, functionQueries are your friends. In particular:
termfreq(field,term) returns the number of times the term appears in the field for that document. Example Syntax:
termfreq(text,'memory')
totaltermfreq(field,term) returns the number of times the term appears in the field in the entire index. ttf is an alias of
totaltermfreq. Example Syntax: ttf(text,'memory')
The following query for instance:
q=*%3A*&fl=cntOnSummary%3Atermfreq(summary%2C%27hello%27)+cntOnTitle%3Atermfreq(title%2C%27entry%27)+cntOnSource%3Atermfreq(source%2C%27activities%27)&wt=json&indent=true
returns the following results:
"docs": [
{
"id": [
"id-1"
],
"source": [
"activities",
"activities"
],
"title": "Ajones3 Activity Entry 1",
"summary": "hello hello",
"cntOnSummary": 2,
"cntOnTitle": 1,
"cntOnSource": 1,
"score": 1
},
{
"id": [
"id-2"
],
"source": [
"activities",
"activities"
],
"title": "Common activity",
"cntOnSummary": 0,
"cntOnTitle": 0,
"cntOnSource": 1,
"score": 1
}
}
]
Please notice that while it's working well on single value field, it seems that for multivalued fields, the functions consider just the first entry, for instance in the example above, termfreq(source%2C%27activities%27) returns 1 instead of 2.

count based on a specific field in SOLR

How to return the count of a field with each object in Solr
When I do fq=verify_ix:1 I have a response below, I want to get count where verify_ix = 1 in the response too. How can I do that?
"response": {
"numFound": 9484,
"start": 0,
"maxScore": 1,
"docs": [
{
"id": "10000000000965509",
"description_s": "No Description",
"recommendation_ix": 0,
"sId_lx": 30005938,
"sType_sx": "P",
"condition_ix": 1000,
"verify_ix": 1
},
.
.
.
{
"id": "10000000000965734",
"description_s": "No Description",
"recommendation_ix": 1,
"sId_lx": 30005947,
"sType_sx": "P",
"condition_ix": 2000,
"verify_ix": 1
}
]}
If you want counts of the different values for a given field, you can send a request to Solr with facet=true and facet.field=verify_ix. For counts over all records, set q=*:*. If you don't want to see any rows returned, you can set rows=0.
See here for more details on faceting:
https://cwiki.apache.org/confluence/display/solr/Faceting
(I tested this with Solr 5, but faceting should work with Solr 4 as well.)

solr query for not equal to text value and number greater than 0

I have solr documents with two fields, one is a string and one is an integer. Both fields are allowed to be null. I am attempting to write a query that will eliminate documents with the following properties:
textField = "badValue" AND (numberField is null OR numberField = 0)
I added the following fq:
((NOT textField=badValue) OR numberField=[1 TO *])
This does not seem to have worked properly, because I am getting a document with textField = badValue and numberField = 0. What did I do wrong with my fq?
The full query response header, containing the parsed query is:
"responseHeader": {
"status": 0,
"QTime": 245,
"params": {
"q": "(numi) AND (solr_specs:[* TO ] OR full_description:[ TO ])",
"defType": "edismax",
"bf": "log(sum(popularity,1))",
"indent": "true",
"qf": "categories^3.0 manufacturer^1.0 sku^0.2 split_sku^0.2 upc^1.0 invoice_description^2.6 full_description solr_specs^0.8 solr_spec_values^1.7 legacyid legacy_altcode id",
"fl": "distributor_status,QOH_estimate,id,score",
"start": "0",
"fq": "((:* NOT distributor_status=VENDORDISC) OR QOH_estimate=[1 TO *])",
"sort": "score desc,id desc",
"rows": "20",
"wt": "json",
"_": "1441220051438"
}
}
QOH_estimate is numberField and distributor_status is textField.
Please try the following in your fq parameter: ((*:* NOT textField:badValue) OR numberField:[1 TO *]).
((*:* NOT distributor_status:VENDORDISC) OR QOH_estimate:[1 TO *])
Here you first selecting the documents which are not containing textField:badValue and ORing with documents coming from numberField:[1 TO *] condition.

Resources