Group by and Count(*) in Datastax Search/Solr - solr

Hi we have a solr index with diff fields in it like business,businessType, regionName, StateName, .....
Now I need a solr query to get the number of business of type businessType ='event' group by regionName.
if I want to write a sql query for this it would be select region_name , Count(business) from solr where businessType='event' group by region_name
Any pointer would be helpful

I finally figured out how to do this. Note, if you need to query on a field with a space or a special character, you need to put the search term in quotes, e.g. businessType:"(fun) event".
curl http://localhost:8983/solr/yourCollection/query -d
{ "query"="*:*",
"fq"="businessType:event",
"rows"=0,
"json.facet"= { "category" : {
"type": "terms",
"field" : "region_name",
"limit" : -1 }}
}
One more Note: if you want to count over 2 fields, you have to do a nested facet.
curl http://localhost:8983/solr/yourCollection/query -d
{ "query"="*:*",
"fq"="businessType:event",
"rows"=0,
"json.facet"= { "category1" : {
"type": "terms",
"field" : "regionName",
"limit" : -1,
"facet" : { "category2" : {
"type": "terms",
"field" : "stateName",
"limit" : -1
}}}}
}
Add another facet chunk after the "limit":-1 item if you need to group by a third dimension. I tried this on my company's Solr and it hung, never returning anything but a timeout error. In general, working with Solr isn't very easy... and the documentation, IMO, is pretty terrible. And absolutely nothing about the syntax or names of the commands seem intuitive at all...

Use facets. Your solr query will look like, q=:&fq=businessType:event&facet=true&facet.field=region_name&rows=0
if want to group by on multiple fields then we need to do facet.pivot=state,region_name

Related

Solr facet column with a condition

I have following facet query which is working fine. However as you can see I need from_usa as one more parameter, I have a country column and it needs to be conditional. Would that be possible ? I cant not use fq filters in this case.
{
"facet":{
"total_game_plays" : "count",
"unique_game_players" : "unique(uuid)",
"total_play_time":"sum(play_time)",
//"from_usa": "unique(where country=US)"
}
}
I assume you're using the JSON facet API from the syntax you've provioded, so a JSON facet query should do what you want:
"from_usa": {
"type": "query",
"q": "country:US"
}

How to filter an array in Azure Search

I have following Data in my Index,
{
"name" : "The 100",
"lists" : [
"2c8540ee-85df-4f1a-b35f-00124e1d3c4a;Bellamy",
"2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike",
"2c8540ee-85df-4f1a-b35f-00155c02e581;Clark"
]
}
I have to get all the documents where the lists has Pike in it.
Though a full search query works with Any I could't get the contains work.
$filter=lists/any(t: t eq '2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike')
However i am not sure how to search only with Pike.
$filter=lists/any(t: t eq 'Pike')
I guess the eq looks for a full text search, is there any way with the given data structure I should make this query work.
Currently the field lists has no searchable property only the filterable property.
The eq operator looks for exact, case-sensitive matches. That's why it doesn't match 'Pike'. You need to structure your index such that terms like 'Pike' can be easily found. You can accomplish this in one of two ways:
Separate the GUIDs from the names when you index documents. So instead of indexing "2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike" as a single string, you could index them as separate strings in the same array, or perhaps in two different collection fields (one for GUIDs and one for names) if you need to correlate them by position.
If the field is searchable, you can use the new search.ismatch function in your filter. Assuming the field is using the standard analyzer, full-text search will word-break on the semicolons, so you should be able to search just for "Pike" and get a match. The syntax would look like this: $filter=search.ismatch('Pike', 'lists') (If looking for "Pike" is all your filter does, you can just use the search and searchFields parameters to the Search API instead of $filter.) If the "lists" field is not already searchable, you will need to either add a new field and re-index the "lists" values, or re-create your index from scratch with the new field definition.
Update
There is a new approach to solve this type of problem that's available in API versions 2019-05-06 and above. You can now use complex types to represent structured data, including in collections. For the original example, you could structure the data like this:
{
"name" : "The 100",
"lists" : [
{ "id": "2c8540ee-85df-4f1a-b35f-00124e1d3c4a", "name": "Bellamy" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c40f11c", "name": "Pike" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c02e581", "name": "Clark" }
]
}
And then directly query for the name sub-field like this:
$filter=lists/any(l: l/name eq 'Pike')
The documentation for complex types is here.

Cloiudant using $nin There is no index available for this selector

I created a JSON index in cloudant on _id like so:
{
"index": {
"fields": [ "_id"]
},
"ddoc": "mydesigndoc",
"type": "json",
"name": "myindex"
}
First off, unless I specified the index name, somehow cloudant could not differentiate between the index I created and the default text based index for _id (if that is truly the case, then this is a bug I believe)
I ran the following query against the _find endpoint of my db:
{
"selector": {
"_id": {
"$nin":["v1","v2"]
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
The result was this error:
{"error":"no_usable_index","reason":"There is no index available for this selector."}
if I change "$nin":["v1","v2"] to "$eq":"v1" then it works fine, but that is not the query I am after.
So in order to get what I want, I had to this to my selector "_id": {"$gt":null}, which now looks like:
{
"selector": {
"_id": {
"$nin":["v1","v2"],
"$gt":null
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
Why is this behavior? This seems to be only happening if I use the _id field in the selector.
What are the ramifications of adding "_id": {"$gt":null} to my selector? Is this going to scan the entire table rather than use the index?
I would appreciate any help, thank you
Cloudant Query can use Cloudant's pre-existing primary index for selection and range querying without you having to create your own index in the _id field.
Unfortunately, the index doesn't really help when using the $nin operator - Cloudant would have to scan the entire database to check for documents which are not in your list - the index doesn't really get it any further forward.
By changing the operator to $eq you are playing to the strengths of the index which can be used to locate the record you need quickly and efficiently.
In short, the query you are attempting is inefficient. If your query was more complex e.g. the equivalent of WHERE colour='red' AND _id NOT IN ['a','b'] then a Cloudant index on colour could be used to reduce the data set to a reasonable level before doing the $nin operation on the remaining data.

Retrieving distinct documents from Solr

I've had hard time explaining and finding what I need so please put your self in my shoes for a moment.
My requirement comes from a relational database background. I may be using Solr to do something it wasn't designed to do, or may be it can do what I need, I still need to confirm that. Hopefully you can assist me.
After indexing numerous documents into Solr. I need to retrieve distinct documents based on a filter. Just think about it as retrieving distinct rows while also applying a WHERE condition.
For example, in a relational database, I may have the following columns
(Country) (City) (Whatever)
Egypt Cairo Hospitals
Egypt Alex Schools
Egypt Mansoura Hospitals
Egypt Cairo Schools
If I perform this query: SELECT DISTINCT Country, City FROM mytable
I should get the following rows
(Country) (City)
Egypt Alex
Egypt Mansoura
Egypt Cairo
Now after indexing the original table (SELECT * FROM mytable), how can I achieve the SAME output from Solr ? How can I retrieve documents by saying that I need these documents to be distinct based on some fields ? I will also need to apply a not null filter for a specific field.
I don't need statistics of any kind, I only need to get the documents.
I hope I was clear enough. Thank you for your time.
this would be achievable with field collapsing by grouping by multiple fields, but unfortunately only one field is supported right now. There is an open issue, check it out.
Did you try with facet?
You should do somethings like this:
http://localhost:8983/solr/select/?q=*:*&facet=on&facet.field=city&facet.field=country
he will return you all the city (with a distinct) and the his count.
Here there is the wiki if you want to learn more about it.
I hope this help you.
Another good solution available from Solr 4 is based on Pivot (Decision Tree) Faceting.
Try with:
/solr/collection1/select?q=*:*&facet=true&facet.pivot=Country,City
This should return:
"facet_counts" : {
"facet_queries" : {},
"facet_fields" : {},
"facet_dates" : {},
"facet_ranges" : {},
"facet_pivot" : {
"Country,City" : [ {
"field" : "Country",
"value" : "Egypt",
"count" : 4,
"pivot" : [ {
"field" : "City",
"value" : "Cairo",
"count" : 2
}, {
"field" : "City",
"value" : "Alex",
"count" : 1
}, {
"field" : "City",
"value" : "Mansoura",
"count" : 1
} ]
} ]
}
}

Update a new field to existing document

is there possibility to update a new field to an existing document?
For example:
There is an document with several fields, e.g.
ID=99999
Field1:text
Field2:text
This document is already in the index, now I want to insert a new field to this document WITHOUT the old data:
ID=99999
Field3:text
For now, the old document will be deleted and a new document with the ID will be created. So if I now search for the ID 99999 the result will be:
ID=99999
Field3:text
I read this at the Solr Wiki
How can I update a specific field of an existing document?
I want update a specific field in a document, is that possible? I only need to index one field for >a specific document. Do I have to index all the document for this?
No, just the one document. Let's say you have a CMS and you edit one document. You will need to re-index this document only by using the the add solr statement for the whole document (not one field only).
In Lucene to update a document the operation is really a delete followed by an add. You will need >to add the complete document as there is no such "update only a field" semantics in Lucene.
So is there any solution for this? Will this function be implemented in a further version (I currently use 3.6.0). As a workaround, I thought about writing a script or an application, which will collect the existing fields, add the new field and update the whole document. But I think this will suffer performance. Do you have any other ideas?
Best regards
I have 2 answers for you (both more or less bad):
To update filed with in document in Solr you have to reindex whole document (to update Field3 within document ID:99999 you have to reindex that document with values for all fields)
In Solr 4 they implemented feature like that, but they have a condition: all fields have to be stored, not just indexed. What is happening that is they are using stored values and reindexing document in the background. If you are interested, there is nice article about it: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ This solution have obvious flaw and that is size of index when you are storing all fields.
I hope that this will help you with your problem. If you have some more questions, please ask
It is possible to do this in Solr 4. E.g. Consider the following document
{
"id": "book123",
"name" : "Solr Rocks"
}
In order to add an author field to the document the field value would be a json object with "set" attribute and the field value
$ curl http://localhost:8983/solr/update -H 'Content-type:application/json' -d '
[
{"id" : "book123",
"author" : {"set":"The Community"}
}
]'
Your new document
$ curl http://localhost:8983/solr/get?id=book123
will be
{
"doc" : {
"id" : "book123",
"name" : "Solr Rocks"
"author": "The Community"
}
}
Set will add or replace the author field. Along with set you also have the option to increment(inc) and adding(add)
From Solr 4 onwards you can update a field in solr ....no need to reindex the entire indexes .... various modifiers are supported like ....
set – set or replace a particular value, or remove the value if null is specified as the new value
add – adds an additional value to a list
remove – removes a value (or a list of values) from a list
removeregex – removes from a list that match the given Java regular expression
inc – increments a numeric value by a specific amount (use a negative value to decrement)
example :
document
{
"id": "1",
"name" : "Solr"
"views" : "2"
}
now update with
$ curl http://localhost:8983/solr/demo/update -d '
[
{"id" : "1",
"author" : {"set":"Neal Stephenson"},
"views" : {"inc":3},
}
]'
will result into
{
"id": "1",
"name" : "Solr"
"views" : "5"
"author" : "Neal Stephenson"
}

Resources