Can't query automatically indexed Documents - solr

I have documents in the form of JSON that I want to search. When I upload them to Solr I don't get any responses if q isn't * :*
I uploaded my documents via the simplePostTool. I type the following in cmd
java -jar -Dc=testdata -Dauto example/exampledocs/post.jar testdata.json
After I successfully uploaded that I try to query my data. If I just use * :* it shows me the data exactly how I want it. Like this:
{
"number":[202012314],
"category":["Lorem ipsum"]
},{
Now, if I try to search for anything that is definitely in one of the documents it doesn't find anything.
What do I need to do to make my Collection searchable?

Related

How to remove escape character from solr indexed field?

I am indexing Json data into solr field, for eg
{"employees":[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]}
But Json is getting indexed with escaped characters, so now I am getting the json as
"{\"employees\":[\n {\"firstName\":\"John\", \"lastName\":\"Doe\"},\n {\"firstName\":\"Anna\", \"lastName\":\"Smith\"},\n {\"firstName\":\"Peter\", \"lastName\":\"Jones\"}\n]}"
Is there any way to index without escaping the json or de escaping result while displaying from the solr end solely ?
This is perfectly fine storage of json data in a solr textfield.
If you see it through admin, you will see the json in escaped format in the UI, but if you were to query this and then decode the json, it will return correct object in the language you are using.
Python example.
my_json_field = json_string // read from solr using api calls or module like pysolr
my_obj = json.loads(my_json_field)
Finally solution was very simple by using Transforming Result Documents
eg,
fl=my_field_with_escaped_json:[json]
Thanks everyone

Merge results of ExecuteSQL processor with Json content in nifi 6.0

I am dealing with json objects containing geo coordinate points. I would like to run these points against a postgis server I have locally to assess point in polygon matching.
I'm hoping to do this with preexisting processors - I am successfully extracting the lat/lon coordinates into attributes with an "EvaluateJsonPath" processor, and successfully issuing queries to my local postgis datastore with "ExecuteSQL". This leaves me with avro responses, which I can then convert to JSON with the "ConvertAvroToJSON" processor.
I'm having conceptual trouble with how to merge the results of the query back together with the original JSON object. As it is, I've got two flow files with the same fragment ID, which I could theoretically merge together with "mergecontent", but that gets me:
{"my":"original json", "coordinates":[47.38, 179.22]}{"polygon_match":"a123"}
Are there any suggested strategies for merging the results of the SQL query into the original json structure, so my result would be something like this instead:
{"my":"original json", "coordinates":[47.38, 179.22], "polygon_match":"a123"}
I am running nifi 6.0, postgres 9.5.2, and postgis 2.2.1.
I saw some reference to using replaceText processor in https://community.hortonworks.com/questions/22090/issue-merging-content-in-nifi.html - but this seems to be merging content from an attribute into the body of the content. I'm missing the point of merging the content of the original and either the content of the SQL response, or attributes extracted from the SQL response without the content.
Edit:
Groovy script following appears to do what is needed. I am not a groovy coder, so any improvements are welcome.
import org.apache.commons.io.IOUtils
import java.nio.charset.*
import groovy.json.JsonSlurper
def flowFile = session.get();
if (flowFile == null) {
return;
}
def slurper = new JsonSlurper()
flowFile = session.write(flowFile,
{ inputStream, outputStream ->
def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
def obj = slurper.parseText(text)
def originaljsontext = flowFile.getAttribute('original.json')
def originaljson = slurper.parseText(originaljsontext)
originaljson.put("point_polygon_info", obj)
outputStream.write(groovy.json.JsonOutput.toJson(originaljson).getBytes(StandardCharsets.UTF_8))
} as StreamCallback)
session.transfer(flowFile, ExecuteScript.REL_SUCCESS)
If your original JSON is relatively small, a possible approach might be the following...
Use ExtractText before getting to ExecuteSQL to copy the original JSON into an attribute.
After ExecuteSQL, and after ConvertAvroToJSON, use an ExecuteScript processor to create a new JSON document that combines the original from the attribute with the results in the content.
I'm not exactly sure what needs to be done in the script, but I know others have had success using Groovy and JsonSlurper through the ExecuteScript processor.
http://groovy-lang.org/json.html
http://docs.groovy-lang.org/latest/html/gapi/groovy/json/JsonSlurper.html

java code for solr geolocation indexing

I am using solr for fixing my indexing and searching feature and a beginner to solr.
I actually want to index the geolocation into solr index and also want to make queries on it so went through some articles,
http://wiki.apache.org/solr/SpatialSearch
And exactly some schema type are present in my schema.xml.
Now my question is I want to write a java code to index the geolocation while indexing it for dynamic geolocation fields. So how to write it and is there any sample java code for indexing it. I looked for it but didn't found any so please if anybody can help me with it.
I also understand that when indexing we would need to write some thing like :
document.addField(myDynLocFld+"_p", val));
If using this approach what should be val an instance of location object with both lat and lng value embedded in it. So how to counter this or is there any diferent approach in solr java for this?
Thanking in advance.
Check this sample of code,
// Store the index in memory:
//Directory directory = new RAMDirectory();
// To store an index on disk
Directory directory = FSDirectory.open("/tmp/testindex");
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
iwriter.addDocument(doc);
iwriter.close();
For more details check Lucene APIs.

Parsing Solr Results - javabin format

I am trying to integrate solr with java using solrj. The result retrieved are of the format
{
numFound=3,
start=0,
docs=[
SolrDocument{
id=IW-02,
name=iPod&iPodMiniUSB2.0Cable,
manu=Belkin,
manu_id_s=belkin,
cat=[
electronics,
connector
],
features=[
carpoweradapterforiPod,
white
],
weight=2.0,
price=11.5,
price_c=11.50,
USD,
popularity=1,
inStock=false,
store=37.7752,
-122.4232,
manufacturedate_dt=TueFeb1418: 55: 59EST2006,
_version_=1452625905160552448
}
Now this is the javabin format. How do I extract results from this? Have heard that solrj does convert the results to objects by itself. But cant figure out how.
Thanks for the help in advance.
Let solrReply be the response object. The you can access different parts of the result using appropriate params. Say you want docs, you can do:
docs = solrReply['docs']
if you want the first result you could do:
first = solrReply['docs'][0]
Within a result you can access each field in the same way.

Indexing PDF documents with addtional search fields using SolrNet?

I found this article useful when indexing documents, however, how can I attach additional fields so I can pass in, say, the ID of the document in our database for use in displaying the search results? I thought by using the Fields (Of the ExtractParameters class) property I could index additional data with the document, but that doesn't seem to work or that is not its function.
Example code:
var solr = ObjectLocator.Instance.Resolve<ISolrOperations<IndexDocument>>();
var guid = Guid.NewGuid().ToString();
using (var fileStream = System.IO.File.OpenRead(Server.MapPath("~/files/") + "greenroof.pdf"))
{
var response =
solr.Extract(
new ExtractParameters(fileStream, "greenRoof1234")
{
ExtractFormat = ExtractFormat.Text,
ExtractOnly = false,
Fields = new[] { new ExtractField("field1", "value1"), new ExtractField("field2", "value2") }
});
}
#aitchnyu is correct, passing the values via the literal.field=value method is the correct way to do this.
However, according to this post on ExtractingRequestHandler support in the SolrNet Google Group, there was a bug with the ExtractParameters.Fields not working properly. This was fixed in the 0.4.0.X versions of SolrNet. Please make sure you are using one of the latest versions of SolrNet. You can obtain that by one of the following means:
Project Site Downloads
NuGet PreRelease Package
Also that discussion has some good examples of using the ExtractingRequestHandler in SolrNet as well as a workaround for adding the additional field values if you cannot upgrade to a newer version of SolrNet.
This is sufficient: http://wiki.apache.org/solr/ExtractingRequestHandler#Literals .
In general use a literal.field=value while uploading.
It turned out not to be an issue with SOLRNet, but my knowledge of SOLR, in general. I needed to specify the fields in my schema. After i added the fields to my schema they were visible in my SOLR query.

Resources