Ordering the solr search results based on the indexed fields - solr

I have to order the search results from solr based on some fields which are already indexed.
My current api request is like this without sorting.
http://127.0.0.1:8000/api/v1/search/facets/?page=1&gender=Male&age__gte=19
And it gives the search results based on the indexed order. But I have to reorder this results based on the filed 'last_login' which is already indexed DateTimeField.
Here is my viewset
class ProfileSearchView(FacetMixin, HaystackViewSet):
index_models = [Profile]
serializer_class = ProfileSearchSerializer
pagination_class = PageNumberPagination
facet_serializer_class = ProfileFacetSerializer
filter_backends = [HaystackFilter]
facet_filter_backends = [HaystackFilter, HaystackFacetFilter]
def get_queryset(self, index_models=None):
if not index_models:
index_models = []
queryset = super(ProfileSearchView, self).get_queryset(index_models)
queryset = queryset.order_by('-created_at')
return queryset`
Here I have changed the default search order by 'created_at' value. But for the next request I have order based on the 'last_login' value. I have added a new parameter in my request like this
http://127.0.0.1:8000/api/v1/search/facets/?page=1&gender=Male&age__gte=19&sort='last_login'
but it gives me an error
SolrError: Solr responded with an error (HTTP 400): [Reason: undefined field sort]
How can I achieve this ordering possible? Please help me with a solution.

The URL you provided http://127.0.0.1:8000/api/v1/search/facets/ is not direct SOLR URL. It must be your middle-ware. Since you have tried the query directly against Solr and it works, the problem must be somewhere in middle-ware layer.
Try to print or monitor or check logs to see what URL the midde-ware actually generates and compare it to the valid URL you know works.

Related

Lucene facet request on list - missing elements

Let's say that we have entity which have a list of other subentity.
If there is more than one element in list, facet request doesnt count all of them for each group but only for random(probably there is some mechanism) property.
FacetingRequest categoryFacetingRequest = qBuilder.facet()
.name("districtFaceting").onField("address.districtId")
.discrete().orderedBy(FacetSortOrder.COUNT_DESC)
.includeZeroCounts(true).createFacetingRequest();
class Base {
List<Address> adresses = ...
}
class Address {
#Field(analyze = Analyze.NO, store = Store.YES, index = Index.YES)
public String getDistrictId() {
return this.districtId;
}
}
In case of Base will have more than one address, facet request will return only one count for random district id. Other are no incremented.
Is there any solution to have correct results ?
To use faceting requests, the fields on which you want to use faceting should be annotated accordingly with #Facet.
Your code snippet is missing this annotation, which could explain the issue. Could you try adding a #Facet annotation on getDiscritctId?
If you already have one, could you please expand your code sample to include all the relevant annotations that are present in your code? (#Facet, #Indexed, #IndexedEmbedded, ...)
See https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#example-faceting-entity

How to fetch data from couchDB using couch api?

Instead keys and IDs alone, I want to get all the docs via couch api. I have tried with GET "http://localhost:5984/db-name/_all_docs" but it returned
{
"total_rows":4,
"offset":0,
"rows":[
{"id":"11","key":"11","value":{"rev":"1-a0206631250822b37640085c490a1b9f"}},
{"id":"18","key":"18","value":{"rev":"30-f0798ed72ceb3db86501c69ed4efa39b"}},
{"id":"3","key":"3","value":{"rev":"15-0dcb22bab2b640b4dc0b19e07c945f39"}},
{"id":"6","key":"6","value":{"rev":"4-d76008cc44109bd31dd32d26ba03125d"}}
]
}
From the documentation
for the below request it will send the data as we expected but it requires set of keys in request.
POST /db/_all_docs HTTP/1.1
{
"keys" : [
"11",
"18"
]
}
Thanks in advance.
The _all_docs endpoint is actually just a system-level view that uses the _id field as the index. Thus, any parameters that you can use for views also apply here.
If you read the documentation further, you'll find that adding the parameter include_docs=true to your view will include the original documents in the results. The documents will be added as the doc field alongside id, value and rev.

How to get users from Active Directory using Unboundid LDAP SDK?

I need to get users from Active Directory.
According to many places include MSDN
https://msdn.microsoft.com/en-us/library/ms677643%28v=vs.85%29.aspx
the correct query is this (&(objectClass=user)(objectCategory=person)).
Unfortunately, I was not able to create the query using Unboundid filters.
I have created the following filters:
Filter categoryFilter = Filter.createEqualityFilter("objectCategory","Person");
Filter objectFilter = Filter.createEqualityFilter("objectClass","user");
Filter searchFilter = Filter.createANDFilter(objectFilter, categoryFilter);
It does not return results.
When I looked into objectCategory of LDAP object I have found that it looks like the following:
CN=Person,CN=Schema,CN=Configuration,DC=…,DC=com
Therefore I have changed categoryFilter to the following:
Filter categoryFilter = Filter.createSubstringFilter("objectCategory", null, new String[]{"Person"}, null);
Unfortunately, I still do not get results.
Then I used the categoryFilter with the full objectCategory name:
Filter categoryFilter = Filter.createEqualityFilter("objectCategory","CN=Person,CN=Schema,CN=Configuration,DC=…,DC=com");
Only in the last case I get results.
How to make the filter more generic?
How to obtain the full objectCategory name from Active Directory?
I need to obtain CN=Person,CN=Schema,CN=Configuration,DC=…,DC=com for any Active Directory while I know that the objectCategory is Person.
Do you know other way to create filters for the query (&(objectClass=user)(objectCategory=person))?
Solution
(not mine therefore do not want to put in the answer)
I have created filter using the following string (sAMAccountType=805306368) and it works perfect:
Filter searchFilter = Filter.create("(sAMAccountType=805306368)");
Source: http://ldapwiki.com/wiki/Active%20Directory%20User%20Related%20Searches#section-Active+Directory+User+Related+Searches-AllUsers

How to upload pdf and update field within one request in solr

All:
I am new to solr and solrj. What I want to do right now is uploading pdf file to solr and set customized field such as last_modified field at same time.
But I keep encounter the error such as " multiple values encountered for non multiValued field last_modified", I use solrj to upload pdf and set the last_modified field like
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
up.setParam("literal.last_modified", "2011-05-19T09:00:00Z");
I guess the error is due to when solr extract the pdf, it uses some meta data as last_modified field value as well so that my custmized last_modified value leads to a multivalue error, but I wonder how to replace the meta data with my custmized data?
Thanks
/update/extract is defined in solrconfig.xml for your core. You can see the configuration there and modify it to match it to your particular scenario. The Reference Guide lists the options.
In your particular scenario, something look strange. The parameter that seems to be relevant is literalsOverride but it is true by default. Perhaps, you are setting it to false somewhere.
You can also try explicitly map Tika's last update field to some different name.
I would enable catch-all (dynamicField *) as store=true and see what is being captured. Then you can play with the parameters until you are happy. You don't have to restart Solr, just reload the core from the Admin UI.
I have faced similar issue, where I need to fetch one dynamic field value and do some operation then update it. I use below code to achieve this.
First check for that field is it exist or not. Try using below code may be it will help you.
Map<String, String> partialUpdate = new HashMap<String, String>();
if(alreadyPresent)
{
partialUpdate.put("set", value);
}else
{
partialUpdate.put("add", value);
}
doc.addField("projectId", projectId); // unique id for solrdoc
doc.addField(keys[0], partialUpdate);
docs.add(doc);
solrServer.add(docs);
solrServer.commit();

How can I mimic 'select_related' using google-appengine and django-nonrel?

django nonrel's documentation states: "you have to manually write code for merging the results of multiple queries (JOINs, select_related(), etc.)".
Can someone point me to any snippets that manually add the related data? #nickjohnson has an excellent post showing how to do this with the straight AppEngine models, but I'm using django-nonrel.
For my particular use I'm trying to get the UserProfiles with their related User models. This should be just two simple queries, then match the data.
However, using django-nonrel, a new query gets fired off for each result in the queryset. How can I get access to the related items in a 'select_related' sort of way?
I've tried this, but it doesn't seem to work as I'd expect. Looking at the rpc stats, it still seems to be firing a query for each item displayed.
all_profiles = UserProfile.objects.all()
user_pks = set()
for profile in all_profiles:
user_pks.add(profile.user_id) # a way to access the pk without triggering the query
users = User.objects.filter(pk__in=user_pks)
for profile in all_profiles:
profile.user = get_matching_model(profile.user_id, users)
def get_matching_model(key, queryset):
"""Generator expression to get the next match for a given key"""
try:
return (model for model in queryset if model.pk == key).next()
except StopIteration:
return None
UPDATE:
Ick... I figured out what my issue was.
I was trying to improve the efficiency of the changelist_view in the django admin. It seemed that the select_related logic above was still producing additional queries for each row in the results set when a foreign key was in my 'display_list'. However, I traced it down to something different. The above logic does not produce multiple queries (but if you more closely mimic Nick Johnson's way it will look a lot prettier).
The issue is that in django.contrib.admin.views.main on line 117 inside the ChangeList method there is the following code: result_list = self.query_set._clone(). So, even though I was properly overriding the queryset in the admin and selecting the related stuff, this method was triggering a clone of the queryset which does NOT keep the attributes on the model that I had added for my 'select related', resulting in an even more inefficient page load than when I started.
Not sure what to do about it yet, but the code that selects related stuff is just fine.
I don't like answering my own question, but the answer might help others.
Here is my solution that will get related items on a queryset based entirely on Nick Johnson's solution linked above.
from collections import defaultdict
def get_with_related(queryset, *attrs):
"""
Adds related attributes to a queryset in a more efficient way
than simply triggering the new query on access at runtime.
attrs must be valid either foreign keys or one to one fields on the queryset model
"""
# Makes a list of the entity and related attribute to grab for all possibilities
fields = [(model, attr) for model in queryset for attr in attrs]
# we'll need to make one query for each related attribute because
# I don't know how to get everything at once. So, we make a list
# of the attribute to fetch and pks to fetch.
ref_keys = defaultdict(list)
for model, attr in fields:
ref_keys[attr].append(get_value_for_datastore(model, attr))
# now make the actual queries for each attribute and store the results
# in a dict of {pk: model} for easy matching later
ref_models = {}
for attr, pk_vals in ref_keys.items():
related_queryset = queryset.model._meta.get_field(attr).rel.to.objects.filter(pk__in=set(pk_vals))
ref_models[attr] = dict((x.pk, x) for x in related_queryset)
# Finally put related items on their models
for model, attr in fields:
setattr(model, attr, ref_models[attr].get(get_value_for_datastore(model, attr)))
return queryset
def get_value_for_datastore(model, attr):
"""
Django's foreign key fields all have attributes 'field_id' where
you can access the pk of the related field without grabbing the
actual value.
"""
return getattr(model, attr + '_id')
To be able to modify the queryset on the admin to make use of the select related we have to jump through a couple hoops. Here is what I've done. The only thing changed on the 'get_results' method of the 'AppEngineRelatedChangeList' is that I removed the self.query_set._clone() and just used self.query_set instead.
class UserProfileAdmin(admin.ModelAdmin):
list_display = ('username', 'user', 'paid')
select_related_fields = ['user']
def get_changelist(self, request, **kwargs):
return AppEngineRelatedChangeList
class AppEngineRelatedChangeList(ChangeList):
def get_query_set(self):
qs = super(AppEngineRelatedChangeList, self).get_query_set()
related_fields = getattr(self.model_admin, 'select_related_fields', [])
return get_with_related(qs, *related_fields)
def get_results(self, request):
paginator = self.model_admin.get_paginator(request, self.query_set, self.list_per_page)
# Get the number of objects, with admin filters applied.
result_count = paginator.count
# Get the total number of objects, with no admin filters applied.
# Perform a slight optimization: Check to see whether any filters were
# given. If not, use paginator.hits to calculate the number of objects,
# because we've already done paginator.hits and the value is cached.
if not self.query_set.query.where:
full_result_count = result_count
else:
full_result_count = self.root_query_set.count()
can_show_all = result_count self.list_per_page
# Get the list of objects to display on this page.
if (self.show_all and can_show_all) or not multi_page:
result_list = self.query_set
else:
try:
result_list = paginator.page(self.page_num+1).object_list
except InvalidPage:
raise IncorrectLookupParameters
self.result_count = result_count
self.full_result_count = full_result_count
self.result_list = result_list
self.can_show_all = can_show_all
self.multi_page = multi_page
self.paginator = paginator

Resources