Why does Entity sometimes need a "url" parameter and sometimes not? - solr

I am trying to setup a DataImportHandler and upon trying to do a full import I get this error:
SEVERE: Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: SolrEntityProcessor: parameter 'url' is required Processing Document # 1
I see in the example data-config.xml that come with solr sometimes Entity has the url parameter and sometimes it doesn't. If it is required why do some of the examples not have it?
What URL is it looking for?
The documentation actually doesn't show "url" as a required parameter for SqlEntityProcessor
For SqlEntityProcessor the entity attributes are :
query (required) : The sql string using which to query the db
deltaQuery : Only used in delta-import
parentDeltaQuery : Only used in delta-import
deletedPkQuery : Only used in delta-import
deltaImportQuery : (Only used in delta-import) . If this is not present , DIH tries to construct the import query by(after identifying the delta) modifying the 'query' (this is error prone). There is a namespace ${dataimporter.delta.} which can be used in this query. e.g: select * from tbl where id=${dataimporter.delta.id} Solr1.4.

It depends on the specific EntityProcessor implementation you use. Every EntityProcessor has its own entity attributes. SQLEntityProcessor doesn't need an url parameter because it relies on the dataSource element to get the information needed to connect to the database, while for example the SolrEntityProcessor doesn't need the dataSource element but relies on the url attribute to get the url of the Solr instance from which import data.
There are different DataSource implementations as well, if you look at JdbcDataSource you'll see it requires the url parameter itself.

Related

Carrot2 dcs php example class modification

I currently have solr and carrot2 configured and working on my server. I am using the dcs example class for php provided in the DCS download from project.carrot2.org . For reference the class can be found here https://github.com/amoghtolay/clustering/blob/master/carrot2-dcs-3.6.2/examples/php5/
I have tried a few things to modify the query to descending order and alter the number of records returned. The query used from a browser that give the results I need is q=*:*&sort=_docid_%20desc&rows=20. Although when I alter the query by altering the equivalent to line 35 in example.php found in link above to match the query I need I get the following error message "An error occurred during processing: HTTP error occurred, error code: 500" having the query only set to *:* works fine but it not the information needed. Also, source is set to solr.
Could anyone provide some assistance in getting this working, thanks.
If you'd like to pass some extra parameters to the Solr request URL, you'll need need to append them to the SolrDocumentSource.serviceUrlBase attribute. You can specify the number of results using the results attribute.

Solr fields mapping?

I am indexing documents into solr from a source. At source, for each document, i have some associated properties which i am indexing & fetching into solr.
What i am doing is i am mapping some fields from source properties with solr schema fields. But i could see couple of extra fields in solr logs which i am not mapping. While querying in solr admin UI, i could see only mapped fields.
E.g. In below logs, i am using only content_name & content content_modifier but i could see Template fields also.
INFO - 2014-09-18 12:07:47.185; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr path=/update/extract params={literal.content_name=1_.000&literal.content_modifier=System&literal.Template={8ad4d8f0-93a7-4941-9657-cf3706f00409} {add=[1_.000 (1479581071766978560)]} 0 0
So whats happening here? Will solr index only mapped fields and skip rest of unmapped ones? Or will solr index all fields including mapped & non-mapped but on admin UI , it will show only mapped fields?
Please suggest.
Your question is defined by what your solrconfig and schema say because you can configure it any way you want. Here is how it works for the example schema for Solr 4.10:
1) In solrconfig.xml, the handler use "uprefix" parameter to map all fields NOT in schema to a dynamic field ignored_*
2) In schema.xml, that dynamic field has type ignored
3) Type ignored (in the same file) is defined as stored=false and indexed=false. Which means do not complain if you get one of fields with matching pattern, but do nothing with, literally ignore.
So, if you don't like that, you can modify any part of that pipeline. The easiest test would be to change the dynamic field to use type string and reindex. Then, you should see the rest of the fields.

DataStax Enterprise: No search results from Solr

I’m using DataStax 3.2.7 and have 2 rows in Cassandra that show up in cqlsh.
I cannot find these records in Solr, though, even after reloading the core and fully reindexing.
Any suggestions?
I also see this in the log: Type mapping versions older than 2 are unsupported for CQL3 table linkcurrent_search.content_items,​ forcing version 2.
When you are using Dynamic Fields to query Maps in Cassandra, you must begin the Key in your Map with the prefixed map literal. In your case the prefixed map literals are :
score_calculated_
score_value_
score_velocity_
shared_on_
The reason the error 'undefined field realtime' is coming is because realtime is not prefixed by the prefix specified for that field in schema.xml.
An example of what one of your records would look like would be:
{'score_value_realtime': 18.432}
Do the same for all the map values.
For more details see this url:
http://www.datastax.com/documentation/datastax_enterprise/3.2/datastax_enterprise/srch/srchDynFlds.html

why in solr 1.4 passing a letter to a 'int' field results in exception?

I'm running lucene solr 1.4 on top of tomcat.
I have a field id defined with type int which is mapped to solr.TrieIntField.
When I do a solr query like ?q=id:a I get a NumberFormatException.
Is it possible to configure solr in such a way that it returns empty result set for above scenraio instead of throwing the exception?
Why do you have to have this as TrieIntField? Can you not use ? They are all sub classes of non tokekenized field (org.apache.solr.schema.FieldType).
Update: Based on your original question, as it is about id, i suggested to use string, as it makes no difference in that case. But if other fields use TrieIntField type the downside of using string for those fields is that your sorts and range queries may go string based and may be not desirable. In that case you need to prevent your orignal problem in client API or you need to handle them better by writing your own handler. Solr is doing correct thing by giving error as most applications would capure this error and respond to users with better user error message. If solr returns no results instead of error then it would be missleading.
Solr is written in Java so it is expected. You have to either filter out non-integer value from your client side (or API layer) or use String type as Arun suggested.

Solr not searching (Very basic example)

Solr version :
4.2.1
Objective:
I am trying to get a very simplistic Solr example off the ground
So far:
Installed solr
Was able to run the example\tutorial successfully http://lucene.apache.org/solr/4_2_1/tutorial.html
Next:
Now I am trying to create my own schema
I have created a schema : http://pastebin.com/vj4ATa8d
And a Test Doc:http://pastebin.com/7fvZ5GTQ
I have added the doc to Solr using the command
java -jar post.jar testdoc.xml
What’s working:
In Solr Admin- I can see the schema
I can see one document uploaded
I can go to Admin console and query as follows:
Specify q as “:”. This works- shows the document
http://localhost:8983/solr/collection2/select?q=*%3A*&wt=xml&indent=true
What does not work:
If I give q as Nashua- I see no results
This is the default search field
Other attributes didn't work either
http://localhost:8983/solr/collection2/select?q=Nashua&wt=xml&indent=true
The debug response http://pastebin.com/fTneyEba
You need to either copy your fields into the default search field (in this case text) or qualify your query with the field you want to search against:
.../select?q=city:Nashua&wt=xml&indent=true
Things to read up on:
Default Search Field
Copy Fields
Both are documented here:
https://wiki.apache.org/solr/SchemaXml

Resources