Carrot2 dcs php example class modification - solr

I currently have solr and carrot2 configured and working on my server. I am using the dcs example class for php provided in the DCS download from project.carrot2.org . For reference the class can be found here https://github.com/amoghtolay/clustering/blob/master/carrot2-dcs-3.6.2/examples/php5/
I have tried a few things to modify the query to descending order and alter the number of records returned. The query used from a browser that give the results I need is q=*:*&sort=_docid_%20desc&rows=20. Although when I alter the query by altering the equivalent to line 35 in example.php found in link above to match the query I need I get the following error message "An error occurred during processing: HTTP error occurred, error code: 500" having the query only set to *:* works fine but it not the information needed. Also, source is set to solr.
Could anyone provide some assistance in getting this working, thanks.

If you'd like to pass some extra parameters to the Solr request URL, you'll need need to append them to the SolrDocumentSource.serviceUrlBase attribute. You can specify the number of results using the results attribute.

Related

Why dismax q.alt doesn't return any result

I'm new to solr.
After following the tutorial exercise 1(https://solr.apache.org/guide/8_9/solr-tutorial.html), I'm able to do some solr query on my loacl machine.
If I want to get result without condition, I will do the query like
http://127.0.0.1:8983/solr/#/techproducts/query?q=*:*&q.op=OR
This works pretty fine.
But when I switch to "dismax" and try to have similar result, I do need to use "q.alt".
The query is like
http://127.0.0.1:8983/solr/#/techproducts/query?q.op=OR&defType=dismax&q.alt=*:*
However, this query resulted in no result, which is pretty weird.
Even thought I specified the row, it still won't work.
http://127.0.0.1:8983/solr/#/techproducts/query?q.op=OR&defType=dismax&q.alt=*:*&row=0
Does anyone face the same problem before?
These parameters are not meant to be used with the user interface URLs; they're for sending directly to Solr. The user interface is a Javascript interface that talks to the Solr API behind the scenes. You can see that your urls have a local anchor in them (#), and this is just references that the javascript based user interface uses to load the correct page.
The rows parameter is also named rows, not row - and when used with 0, no documents will be returned (in the example it's given as an example for using facets with complete counts - you have to ask for facets for that to make sense).
The actual URL to query Solr for matching documents would be:
http://127.0.0.1:8983/solr/techproducts/select?defType=edismax&q.alt=*:*
This URL is shown in the user interface over the query results when using the query page.
There is also usually no reason to use dismax and not edismax these days, as edismax does everything that the old dismax handler did and with more functionality.

why in solr 1.4 passing a letter to a 'int' field results in exception?

I'm running lucene solr 1.4 on top of tomcat.
I have a field id defined with type int which is mapped to solr.TrieIntField.
When I do a solr query like ?q=id:a I get a NumberFormatException.
Is it possible to configure solr in such a way that it returns empty result set for above scenraio instead of throwing the exception?
Why do you have to have this as TrieIntField? Can you not use ? They are all sub classes of non tokekenized field (org.apache.solr.schema.FieldType).
Update: Based on your original question, as it is about id, i suggested to use string, as it makes no difference in that case. But if other fields use TrieIntField type the downside of using string for those fields is that your sorts and range queries may go string based and may be not desirable. In that case you need to prevent your orignal problem in client API or you need to handle them better by writing your own handler. Solr is doing correct thing by giving error as most applications would capure this error and respond to users with better user error message. If solr returns no results instead of error then it would be missleading.
Solr is written in Java so it is expected. You have to either filter out non-integer value from your client side (or API layer) or use String type as Arun suggested.

/select with 'q' parameter does not work

Whenever i query with q=: it shows all the documents but when i query with q=programmer 0 docs found.(contents is the default search field)
my schema has: id(unique),author,title,contents fields
Also query works fine for:
q=author:"Value" or q=title:"my book" etc, only for contents field no results.
Also when i query using spell checker(/spell?q=programmer) output shows spelling suggestions for this word,when 'programmer' is the right word and present in many documents.
I referred the example docs for configurations.
All of a sudden i am getting this,initially it worked fine.
I guess there some problem only in the contents field,but cannot figure it out.
Is it because indexes are not created properly for contents field?
(I am using solr 4.2 on Windows 7 with tomcat as webserver)
Please help.Thanks a lot in advance.
Are you sure you set the default search field? The reason you have this problem might be because you didn't set the <defaultSearchField> field in your schema.xml file. This is why "q=author:value" works while q=WHATEVER doesn't.
The Is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used.
But also consider this:
The is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used. It is preferable to not use or
rely on this setting; instead the request handler or query LocalParams
for a search should specify the default field(s) to search on. This
setting here can be omitted and it is being considered for
deprecation.
Do you have any data in your instance. try q=*:* and see what it returns. "for" is a stop word, may be it was filtered out. Look for something else as value to test.

Why does Entity sometimes need a "url" parameter and sometimes not?

I am trying to setup a DataImportHandler and upon trying to do a full import I get this error:
SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: SolrEntityProcessor: parameter 'url' is required Processing Document # 1
I see in the example data-config.xml that come with solr sometimes Entity has the url parameter and sometimes it doesn't. If it is required why do some of the examples not have it?
What URL is it looking for?
The documentation actually doesn't show "url" as a required parameter for SqlEntityProcessor
For SqlEntityProcessor the entity attributes are :
query (required) : The sql string using which to query the db
deltaQuery : Only used in delta-import
parentDeltaQuery : Only used in delta-import
deletedPkQuery : Only used in delta-import
deltaImportQuery : (Only used in delta-import) . If this is not present , DIH tries to construct the import query by(after identifying the delta) modifying the 'query' (this is error prone). There is a namespace ${dataimporter.delta.} which can be used in this query. e.g: select * from tbl where id=${dataimporter.delta.id} Solr1.4.
It depends on the specific EntityProcessor implementation you use. Every EntityProcessor has its own entity attributes. SQLEntityProcessor doesn't need an url parameter because it relies on the dataSource element to get the information needed to connect to the database, while for example the SolrEntityProcessor doesn't need the dataSource element but relies on the url attribute to get the url of the Solr instance from which import data.
There are different DataSource implementations as well, if you look at JdbcDataSource you'll see it requires the url parameter itself.

Boolean NOT in solr query

How do I pick/ delete all the documents from Solr using the boolean NOT notion?
i.e. How do I delete all the documents from Solr who's id does NOT start with A59?
Use - to indicate NOT.
For example, to query documents with id not starting with A59, the query would be: -id:A59*, that is: /solr/select/?q=-id:A59*
To delete by query, post the query in a delete message to the update handler, as specified here.
EDIT: NOT (all uppercase) can also be used as boolean operator
Exclamation works for NOT as well, so:
/solr/select/?q=!id:A59*
should work in the case above.
I don't believe that a negative delete by query works. See this Jira ticket: https://issues.apache.org/jira/browse/SOLR-381
They do say that there is a workaround to prefix in a :, but I do not have any luck with that.
This does not work (same with using NOT)
java -Ddata=args -jar /opt/solr/example/exampledocs/post.jar "-userid:*"
java -jar /opt/solr/example/exampledocs/post.jar *.xml
Adding in a : gives a syntax error (same with using NOT)
java -Ddata=args -jar /opt/solr/example/exampledocs/post.jar ": -userid:*"
java -jar /opt/solr/example/exampledocs/post.jar *.xml
SimplePostTool: version 1.4
SimplePostTool: POSTing args to http://localhost:8983/solr/update..
SimplePostTool: FATAL: Solr returned an error #400 Error parsing Lucene query
SimplePostTool: version 1.4
Using the - symbol in-front of the files to implies that exclude that particular value. It will give result like NOT Equal
The following is sample url query string where. I have kept "&fq=-HQ_City_Code:MEL",
It will skip all the result which is having HQ_City_Code value MEL.
http://localhost:8983/solr/HQ_SOLR_Hotels/select?q=*:*&fq=HQ_National_Code:TH&fq=HQ_TYPE:hotel_EN&fq=HQ_Country_Code:AU&**fq=-HQ_City_Code:MEL**&wt=json&indent=true
before deleting please ensure that the ids that you are referring to is string and in no way would be formed by two terms combined .
The way I would do it is read the data from solr from a script and do a singular delete or in batches , because it provides a better control and validations over each ids which reduces the risk of wrong deletion
Hence
1 read the data from solr from a script using
/solr/select/?q=id:A59*
2 verify and validate the ids
3 delete them one by one or in a group of 10 ids at once
Regards
Rajat
As Mauricio stated:
Use the - symbol to indicate what you want exclude in your query.
The following two queries will delete all documents except the ones that begin with A59.
GET http://<url>/solr/<core>/update?stream.body=<delete><query>-id:A59*</query></delete>
GET http://<url>/solr/<core>/update?stream.body=<commit/>
The first line does the delete operation.
The second line does the commit.

Resources