How to remove escape character from solr indexed field? - solr

I am indexing Json data into solr field, for eg
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
But Json is getting indexed with escaped characters, so now I am getting the json as
"{\"employees\":[\n {\"firstName\":\"John\", \"lastName\":\"Doe\"},\n {\"firstName\":\"Anna\", \"lastName\":\"Smith\"},\n {\"firstName\":\"Peter\", \"lastName\":\"Jones\"}\n]}"
Is there any way to index without escaping the json or de escaping result while displaying from the solr end solely ?

This is perfectly fine storage of json data in a solr textfield.
If you see it through admin, you will see the json in escaped format in the UI, but if you were to query this and then decode the json, it will return correct object in the language you are using.
Python example.
my_json_field = json_string // read from solr using api calls or module like pysolr
my_obj = json.loads(my_json_field)

Finally solution was very simple by using Transforming Result Documents
eg,
fl=my_field_with_escaped_json:[json]
Thanks everyone

Related

How to search Alfresco for empty property?

I have Alfresco 5.2 and my task is "to get all documents with empty (one of) property", I am creating a query
searchParameters.setQuery("search +TYPE:\"ecmcndintregst:nd_int_reg_standards\" +#ecmcnddoc\\:doc_name_ru:\"\" -ASPECT:\"ecmcdict:inactive\" AND ( #ecmcnddoc\\:doc_kind_cp_ecmcdict_value:\"mek\")");
And I got all the documents thus: with either - empty and non-empty ecmcnddoc:doc_name_ru
how can I get ONLY empty ecmcnddoc:doc_name_ru ?
Thank you
please tell me what am I doing wrong? How to search solr for empty properties? When I submit +#ecmcnddoc:doc_name_ru:"" (without slash) I got all documents with ANY ecmcnddoc:doc_name_ru value :(
Thank you

local param not working in solr 8 but working in solr 5

I am migrating from solr 5.5 to solr 8.
Query for solr 5.5 looks like -
qt=/dismax
product_fields_Ref1=product_concept^279841
sku_and_product_fields_Ref1=silhouette_concept^234256 $product_fields_Ref1
product_phrase_Ref2=pant
concept_with_synonyms_ref1=({!edismax2 qf=$sku_and_product_fields_Ref1 v=$product_phrase_Ref2})
top_concept_query_ref= (+({!maxscore v=$concept_with_synonyms_ref1}) )
productQueryRef3=+(+({!query v=$cq})) +( ({!maxscore v=$top_concept_query_ref}) )
sq=+{!lucene v=$productQueryRef3}
q={!parent tag=top which=$pq score=max v=$sq}
But is giving error on solr 8.0 with error -
Error from server at http://localhost:8080/products: org.apache.solr.search.SyntaxError: Query Field '$product_fields_Ref1' is not a valid field name
If I modify query like this (remove the variable product_fields_Ref1 and append the value directly in sku_and_product_fields_Ref1) -
qt=/dismax
sku_and_product_fields_Ref1=silhouette_concept^234256 product_concept^279841
product_phrase_Ref2=pant
concept_with_synonyms_ref1=({!edismax2 qf=$sku_and_product_fields_Ref1 v=$product_phrase_Ref2})
top_concept_query_ref= (+({!maxscore v=$concept_with_synonyms_ref1}) )
productQueryRef3=+(+({!query v=$cq})) +( ({!maxscore v=$top_concept_query_ref}) )
sq=+{!lucene v=$productQueryRef3}
q={!parent tag=top which=$pq score=max v=$sq}
Problem is I can not modify this query since the value of param "product_fields_Ref1" are being compiled from a large number of places.
I am using defType=dismax only.
Can any one guide what needs to be fixed?
I went through the source code of "org.apache.solr.search.ExtendedDismaxQParser"
and found out the is a new validation check added which DOES NOT allow local parameter in qf field edismax parser (this check has been introduced starting solr 8.0.0).
Check works like this -
any parameter coming in qf MUST match a field in schema (I am not using schema-less mode) of the core. method is
validateQueryFields(up);
This executes in
public Query parse() throws SyntaxError { ... }
of
org.apache.solr.search.ExtendedDismaxQParser
I got this working by creating my own custom parser and removed this validator after overriding the parse() method.
Support for Local Parameters has changed significantly in more recent versions of Solr (see https://lucene.apache.org/solr/guide/7_5/solr-upgrade-notes.html#solr-7-2)
The only way that I have been able to get some of the behavior back is by setting lucene as the default parser in solrconfig.xml and then passing the local parameters in the query, for example: q={!dismax qf=$param1}coffee
I understand that you can get back the old behavior by switching to LuceneMatchVersion 7.1.0 but that change did not work for me.

Object id is missing in Django framework when posted from AngularJS MongoDB

I am posting the following object
{
skillName : "Professional Skills"
_id : {$oid: "5adf23946ab671bf6cb36aff"}
}
to the DjangoService given below:
#csrf_exempt
#api_view(['GET','POST'])
def saveSubjectView(request): #this service will add & update Subject
if request.method == 'POST':
try:
stream = StringIO(request.body)
subject = JSONParser().parse(stream)
print("The subejct is ")
pp.pprint(subject)
serializedsubject = json.loads(json_util.dumps(subject))
print("serializedsubject")
pp.pprint(serializedsubject)
The output that I am getting is
'skillType': { u'_id': { }, u'skillName': u'Professional Skills'}
The ObjectId posted from the front end (AngularJS) is not printed in the service. I know that I can fix it by removing the $oid while posting from the AngularJS application. But I would like to know why this is not happening. I have searched the documents and I couldn't get a proper reply. May be the keywords I used are wrong. Keywords used are : "JSON serialisation of ObjectId", "$oid json serialization using Django".
The complete object I am posting to the Django service is given below:
Exactly. $oid or anything prefixed with $ is an internal format and reserved, so you cannot post field names. The convention is from MongoDB Extended JSON where such prefixes are used to identify the BSON Type for proper conversion, and used as a serializable transport since these "types" are not supported in basic JSON.
So the solution is to actually use the bson.json_util to "deserialize" the JSON string right from the start:
from bson import json_util
# serializedsubject = json.loads(json_util.dumps(subject))
serializedsubject = json_util.loads(request.body) # correct usage
Or more succinctly self contained:
input = '{ "skillName" : "Professional Skills" ,"_id" : { "$oid": "5adf23946ab671bf6cb36aff"} }'
json_util.loads(input)
Returns
{u'skillName': u'Professional Skills', u'_id': ObjectId('5adf23946ab671bf6cb36aff')}
This correctly casts objects from any keys notated with the Extended JSON Syntax to their correct BSON Type, as also supported in the driver functions. And naturally the driver will then convert back to BSON when sending to MongoDB.
If for some reason your request.body contains anything other than a "string" which is valid for input to the function, then it is up to your code to convert it to that point. But there should be no need to "parse to JSON" and then "stringify" again just to input to the function.
NOTE: If you have not already done so within your JavaScript client side of the application, there is also the bson package available. This would allow where such Extended JSON is "received" from the server the translation into the BSON Types as JavaScript Objects, and of course then the serialization of such objects back into the Extended JSON Format.
This would in fact be recommended where "type" information needs to be maintained with the data transmitted and kept between client and server.

Sumologic - split JSON array into multiple records

I am passing a JSON array object in the HTTP POST as
[{"level":"INFO","data": "Test 1"},{"level":"INFO","data": "Test 2"}]
This message is seen as 1 object/log message in SumoLogic. How can I tell SumoLogic to consider each JSON object as an independent object and show 2 log messages instead of one?
I believe this can't be done with the json operator. But, have a look at the docs for the "parse regex" operator. There's an option called "multi" which creates a new message for each match of the regex. In your case, something like this might do the trick:
parse regex "\{?<fieldname>.*?\}" multi
I didn't try this in the product itself, but here is Regex101 link to play with the regex.
I believe the actual answer to this is to not send your logs as an array. Instead include each json object in your body with a '\n' at the end for Sumo to consider these as individual log messages.
{"level":"INFO","data": "Test 1"}\n
{"level":"INFO","data": "Test 2"}\n

Parsing Solr Results - javabin format

I am trying to integrate solr with java using solrj. The result retrieved are of the format
{
numFound=3,
start=0,
docs=[
SolrDocument{
id=IW-02,
name=iPod&iPodMiniUSB2.0Cable,
manu=Belkin,
manu_id_s=belkin,
cat=[
electronics,
connector
],
features=[
carpoweradapterforiPod,
white
],
weight=2.0,
price=11.5,
price_c=11.50,
USD,
popularity=1,
inStock=false,
store=37.7752,
-122.4232,
manufacturedate_dt=TueFeb1418: 55: 59EST2006,
_version_=1452625905160552448
}
Now this is the javabin format. How do I extract results from this? Have heard that solrj does convert the results to objects by itself. But cant figure out how.
Thanks for the help in advance.
Let solrReply be the response object. The you can access different parts of the result using appropriate params. Say you want docs, you can do:
docs = solrReply['docs']
if you want the first result you could do:
first = solrReply['docs'][0]
Within a result you can access each field in the same way.

Resources