Create SOLR index by parsing "content" via GET (URL) - solr

is it possible to create a SOLR index by parsing the "content" to index via GET (URL)?
The examples I found allowed this only by reading the content directly from a file.
Thanks in advance!

Have a look at the Solr Data Import Handler - focusing on the XML/HTTP Datasource. If I understand correctly what you are asking this should be able to meets your needs.

you might be able to use curl to do that. The request should most probably be something like (untested):-
curl 'http://localhost:8983/solr/update?key1=val1&key2=val2'
and so on where key1 and key2 are the fields and val1 and val2 are their corresponding values. Do note though that you will need to send a commit command also to the above url after sending the above request.
You can also look at the following url :-
http://wiki.apache.org/solr/UpdateXmlMessages
Search for the phrases 'Updating via GET' and 'Updating a Data Record via curl'

Related

Apache Camel - Mybatis select with parameters and useIterator

I'm trying to use Apache Camel (version 2.20.0) with mybatis component.
More specifically, I have to export a large set or record from database to file.
I'd like to prevent memory issues so I want to use the option consumer.useIterator. My route is:
from("mybatis:selectItemsByDate?statementType=SelectList&consumer.useIterator=true")
.split()
.body()
.process(doSomething)
to(file:my-path-file);
but my query has in input a parameter (the starting date to get data). How should I set this parameter?
In many example on internet I saw the parameter in the body or in the header of the Exchange message but I think is possibile only if the mybatis endpoint is in a "to" method. But the option "consumer.useIterator" is working only when the enpdoint is in a "from" method.
Please help me to understand how I can set the input for my query or if this is not supported yet (in this case if you can give some hint how to implement would be great)
thank you.
Then you need to start your route from something else, like a timer or direct endpoint, and then call the mybatis endpoint in a to, where you have set that information in the message body/header you use in the mybatis query so its dynamic.
Also you should set the splitter to be in streaming mode so it walks the iterator it gets from mybatis on-demand.

solr highlighting in old and new versions

I am migrating a web site from an old version of solr (1.4.1) to the current release version (5.2.1) on a different machine and noticing some differences.
In the old version, I could get highlighting with a url like this:
http://localhost:8983/solr/select?indent=on&q=text:software/&start=0&rows=10&fl=id,score,title&wt=json&hl=on&hl.fragsize=200
In the new version, one thing that's different is I need to specify a collection. Another difference is that the new version gives an error if I put text: in front of the value of q.
So, taking into account those differences, I end up with a URL like this:
http://localhost:8983/solr/default/select?indent=on&q=software/&start=0&rows=10&fl=id,score,title&wt=json&hl=on&hl.snippets=1&hl.fl=%2a&hl.fragsize=200
That second URL does not give me highlighting fragments/snippets. That is to say, where the old URL would give something like this:
"highlighting":{
"document0_id":{"text":["The <em>software</em> is awesome"]}}
The new URL gives something like this:
"highlighting":{
"document0_id":{}}
What do I need to do to get highlighting fragments returned in solr 5.2.1?
[edited]
In addition, I tried selecting a single document by its id on both machines. On the old machine, a url like
http://localhost:8983/solr/select?wt=json&indent=true&q=id:thedocumentid
returns some JSON that includes a text field containing the full searchable text of the original HTML document. On the new machine a similar url (but one that includes the collection):
http://localhost:8983/solr/default/select?wt=json&indent=true&q=id:thedocumentid
...returns similar JSON that does not include the text field.
I note that searching returns the correct results; the problem is that on the new machine, the results do not include the highlighting fragments.
So it seems like maybe the issue is that I need to specify that these documents have a text field when I index them; how do I do that?
A colleague (not tempted by the bounty) noticed that my text field had stored="false" in my schema.xml and suggsted changing it to true. That did the trick.
In the first query you are specifically searching in the text field and in the second its not.
And in the second you have mentioned hl.fl which means "Specifies a list of fields to highlight. Accepts a comma- or space-delimited list of fields for which Solr should generate highlighted snippets. If left blank, highlights the defaultSearchField"
Try again by making the changes...
http://localhost:8983/solr/default/select?q=text:software&start=0&rows=10&fl=id,score,title&wt=json&hl=on&hl.fragsize=200

CakeDC Search Plugin generated duplicate IDs in URL

I am using the DC search Plugin in several Cake projects and generally it works very well. But for one of my pages I have the problem that the searches blows up the URL.
The starting URL is somethin like:
/lessons/abrechnung/10
When the search is used the resulting URL is something like:
/lessons/abrechnung/10/10/10/datumab:01.02.2014/datumbis:25.02.2014
The search itselfs works well - I get the results filtered by the search criteria.
But: As you can see the ID value is duplicated each time you search. This continues and after 3 or 4 searches the URL contains 50 or 100 times the ID.
How can I avoid this?
I guess this would happen on all actions where I have unnamed params in the URL - but I am not sure about this. BTW: The search params are NOT getting duplicated.
EDIT:
I use cakePHP 2.4.0 and Version 2.3 of teh Search Plugin.
Using 'paramType' => 'querystring' didn't help. But I see now that there is something wrong with my Session-Handling. I will check that and give further feedback.
My guess: Your form setup is incorrect.
Do not interfere with the URL of the posted form.
So use
echo $this->Form->create();
without modifying action/url keys.
This way the form will automatically post to itself, and the Search plugin auto-adds the search params in the PRG redirect.
Then there will be no duplication of passed params or alike.
Independent from this it is still better to use the query strings here (and for pagination then as well, of course).

Generate highlighted snippets in Solr for PDFs

I'm new to solr. I've set up a solr server and have indexed a few thousand PDFs. I am trying to query solr via the rest API in a PHP page. I am trying to build something similar to the solritas interface included in the tutorial (solrserver/browse), but I don't know how to generate highlighted snippets. I found in the documentation "hl" is a query parameter and is by default set to false.
When I get http://solrserver/?q=search+term&hl=true I get back a response with a hightlighting section, but it only contains the document IDs, no generated snippets.
I am using the tutorial provided schema and config for solr 4.2.1. I believe that the configuration is fine because solritas is able to display highlighted snippets using the same indexed data. I've tried seeing how solritas is built but it's separated out in .vm template files and I haven't been able to find what I'm looking for yet.
I can see the full text of the PDF in the doc->content area, so it is stored. I think I just don't understand the proper way to generate snippets! Can someone please help!
Thanks :)
I would suggest, you should try using hl.fl parameter. So your query should be something like this:
?q=search+term&hl=true&hl.fl=field1,field2,field3
Where field1, field2 and field3 are three source fields you would like to generate highlights.
In your case, if the field name you want to use for highlighting is content, your query can be:
?q=search+term&hl=true&hl.fl=content
More details: http://docs.lucidworks.com/display/solr/Highlighting
With highlighting, you can even specify fragment size, HTML tags around highlighted text etc...

Solr configuration

I'm very new with Solr,
And I really want a step by step to have my Solr search result like the google one.
To give you an idea, when you search 'PHP' in http://wiki.apache.org/solr/FindPage , the word 'php' shows up in bold .. This is the same result I want to have.
Showing only a parser even if the pdf is a very huge one.
You can use highlighting to show a matching snippet in the results.
http://wiki.apache.org/solr/HighlightingParameters
By default, it will wrap matching words with <em> tags, but you can change this by setting the hl.simple.pre/hl.simple.post parameters.
You may be looking at the wrong part of the returned data. Try looking at the 'highlighting' component of the returned data structure (i.e. don't look at the response docs). This should give you the snippets you want.

Resources