How to upload pdf and update field within one request in solr - solr

All:
I am new to solr and solrj. What I want to do right now is uploading pdf file to solr and set customized field such as last_modified field at same time.
But I keep encounter the error such as " multiple values encountered for non multiValued field last_modified", I use solrj to upload pdf and set the last_modified field like
ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");
up.setParam("literal.last_modified", "2011-05-19T09:00:00Z");
I guess the error is due to when solr extract the pdf, it uses some meta data as last_modified field value as well so that my custmized last_modified value leads to a multivalue error, but I wonder how to replace the meta data with my custmized data?
Thanks

/update/extract is defined in solrconfig.xml for your core. You can see the configuration there and modify it to match it to your particular scenario. The Reference Guide lists the options.
In your particular scenario, something look strange. The parameter that seems to be relevant is literalsOverride but it is true by default. Perhaps, you are setting it to false somewhere.
You can also try explicitly map Tika's last update field to some different name.
I would enable catch-all (dynamicField *) as store=true and see what is being captured. Then you can play with the parameters until you are happy. You don't have to restart Solr, just reload the core from the Admin UI.

I have faced similar issue, where I need to fetch one dynamic field value and do some operation then update it. I use below code to achieve this.
First check for that field is it exist or not. Try using below code may be it will help you.
Map<String, String> partialUpdate = new HashMap<String, String>();
if(alreadyPresent)
{
partialUpdate.put("set", value);
}else
{
partialUpdate.put("add", value);
}
doc.addField("projectId", projectId); // unique id for solrdoc
doc.addField(keys[0], partialUpdate);
docs.add(doc);
solrServer.add(docs);
solrServer.commit();

Related

How to show facet values on the front end in Hybris?

I have added a brand attribute of String type in the ProductModel through item.xml.
I need to create a facet for brand. I have two brands - Sony and Canon.
After creating the facet, I'm able to see "Shop by Brand" on the UI, but I'm not able to find Sony or Canon under it.
The impexes I used are:
INSERT_UPDATE SolrIndexedProperty;solrIndexedType(identifier) [unique=true];name[unique=true];type(code);sortableType(code);currency[default=false];localized[default=false];multiValue[default=false];facet[default=true];facetType(code);facetSort(code);priority;visible;useForSpellchecking[default=false];useForAutocomplete[default=false];fieldValueProvider;valueProviderParameter;facetDisplayNameProvider;customFacetSortProvider;topValuesProvider;rangeSets(name)
;$solrIndexedType;brand;string;;;;;;Refine;Alpha;;true;true;true;springELValueProvider;
INSERT_UPDATE SolrSearchQueryProperty; indexedProperty(name, solrIndexedType(identifier))[unique = true]; searchQueryTemplate(name, indexedType(identifier))[unique = true][default = DEFAULT:$solrIndexedType]; facet[default = true]; facetType(code); includeInResponse[default = true]; facetDisplayNameProvider;facetSortProvider;facetTopValuesProvider
; brand:$solrIndexedType ; ; ; ; ;
Can someone please point out what am I missing?
To make brand attribute as to available as facet need to follow below steps:
Index brand attribute to SOLR via value provider by creating
SolrIndexedProperty and associating it with value provider
After indexing ensure that data being sent to SOLR for corresposding products
Next steps is to do facet settings in Bakcoffice (or this can also be done via solr.impex while creating SolrIndexedProperty header impex for custom brand attribute)
Edit that attribute in Backoffice facet configuration and go to "Facet Settings" tab which will enable us to define facet for custom (brand) attribute, which can be used to filter results based on facet value in PLP or SLP. Make "Facet" to 'TRUE'
Or this can be handle via impex by setting facet[default=true]
Select facet type to be 'Single Select' or 'Multi Select' via facetType(code)
Re-Index Solr and changes should get reflect upon Search or PLP

Ordering the solr search results based on the indexed fields

I have to order the search results from solr based on some fields which are already indexed.
My current api request is like this without sorting.
http://127.0.0.1:8000/api/v1/search/facets/?page=1&gender=Male&age__gte=19
And it gives the search results based on the indexed order. But I have to reorder this results based on the filed 'last_login' which is already indexed DateTimeField.
Here is my viewset
class ProfileSearchView(FacetMixin, HaystackViewSet):
index_models = [Profile]
serializer_class = ProfileSearchSerializer
pagination_class = PageNumberPagination
facet_serializer_class = ProfileFacetSerializer
filter_backends = [HaystackFilter]
facet_filter_backends = [HaystackFilter, HaystackFacetFilter]
def get_queryset(self, index_models=None):
if not index_models:
index_models = []
queryset = super(ProfileSearchView, self).get_queryset(index_models)
queryset = queryset.order_by('-created_at')
return queryset`
Here I have changed the default search order by 'created_at' value. But for the next request I have order based on the 'last_login' value. I have added a new parameter in my request like this
http://127.0.0.1:8000/api/v1/search/facets/?page=1&gender=Male&age__gte=19&sort='last_login'
but it gives me an error
SolrError: Solr responded with an error (HTTP 400): [Reason: undefined field sort]
How can I achieve this ordering possible? Please help me with a solution.
The URL you provided http://127.0.0.1:8000/api/v1/search/facets/ is not direct SOLR URL. It must be your middle-ware. Since you have tried the query directly against Solr and it works, the problem must be somewhere in middle-ware layer.
Try to print or monitor or check logs to see what URL the midde-ware actually generates and compare it to the valid URL you know works.

Solr - multiple facet.field in query

I would like to add multiple facet.field (and also facet.count and other properties) to my Map to be used in SolrParams in a query but it only let you add one value per key (Java Map).
Can this be done?
Something like:
Map params = new Map();
params.put("facet.field", "title");
params.put("facet.field", "tags");
...
query(new SolrParams(params));
try this way
solrQuery.addFacetField(“subcat”)
solrQuery.addFacetField(“tags”)
solrQuery.addFacetField(“languages”)
Check the link for your reference Solr-using-Solr4J-in-Java.

solrj api for partial document update

Solr 4 beta is out, the GA version will follow soon. Partial document updates has been around for a while as explained here: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
However, I haven't figured out how to do it with solrj api.
Does anyone knows if it is possible with solrj? Or is solrj just not up-to-speed with this feature?
update: as I describe in the mailing list (see reply here), I found that in the solrj api, the value of a SolrInputField can be a map - it doesn't have to be a simple scalar value.
If it is a map, solrj adds an additional update attribute to the field's xml element.
For example,
This code:
SolrInputDocument doc = new SolrInputDocument();
Map<String, String> partialUpdate = new HashMap<String, String>();
partialUpdate.put("set", "foo");
doc.addField("id", "test_123");
doc.addField("description", partialUpdate);
yields this document:
<doc boost="1.0">
<field name="id">test_123</field>
<field name="description" update="set">foo</field>
</doc>
In this example I used the word "set" for this additional attribute, but it doesn't work.
Solr doesn't update the field as I expected.
According to this link:
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
valid values are "set" and "add".
Any idea?
As it turns out, the code snippet shown above in the question actually works. I don't know what was wrong the first time I tried it, perhaps I simply forgot to commit or my schema was misconfigured.
In any case, this question is very localized. However, since the api with the hash map is so poorly documented, I thought maybe it is worth to keep this question and answer.
The key of the hash map can be one of three values:
set - to set a field.
add - to add to a multi-valued field.
inc - to increment a field.
There is an example of this code in the solrj unit tests, in a method called testUpdateField.
You can update parts of documents using the SOLR API's update endpoint
curl 'https://solr-url/update?commitWithin=1000&overwrite=true&wt=json' \
-X POST \
-H 'accept: application/json, text/plain, */*' \
--data-raw '[{ "the-unique-filed": "value", "field-to-change":{"set": "new-value"} }]' \
--compressed
Or from UI

SalesForce Bulk API: Relationship between custom object and Account

I have a custom object in SalesForce called Deal, which is a child of the built-in Account object. I am trying to use the Bulk XML API to upload a batch of records, but I can't seem to figure out how to specify this relationship correctly. From the documentation it says that you should reference a custom object's relationships like so:
<Relationship__r>
<sObject>
<some_indexed_field>#####</some_indexed_field>
</sObject>
</Relationship__r>
If you have any idea how to specify a relationship to the Account object from a custom object I'd really appreciate it.
Added
The Deal object has the following 2 fields:
DealID
API Name - DealID__c
Data Type - Text(255)(External ID)(Unique Case Sensitive)
Account
API Name - Account__c
Data Type - Master-Detail(Account)
Request XML:
<Account__r>
<sObject>
<ID>0013000000kcWpfAAE</ID>
</sObject>
</Account__r>
Result XML:
<result>
<errors>
<message>Field name provided, Id is not an External ID or indexed field for Account</message>
<statusCode>INVALID_FIELD</statusCode>
</errors>
<success>false</success>
<created>false</created>
</result>
There appears to be a bug and you have to strip out all whitespace and newlines when dealing with reference objects.
Check out:
http://success.salesforce.com/ideaview?id=08730000000ITQ7AAO
From the docs
<RelationshipName>
<sObject>
<IndexedFieldName>rwilliams#salesforcesample.com</IndexedFieldName>
</sObject>
Everything looks good, but instead of using "ID" for the Indexed Field Name, you need to use "Account__c". That should take care of your issue.

Resources