I’m using DataStax 3.2.7 and have 2 rows in Cassandra that show up in cqlsh.
I cannot find these records in Solr, though, even after reloading the core and fully reindexing.
Any suggestions?
I also see this in the log: Type mapping versions older than 2 are unsupported for CQL3 table linkcurrent_search.content_items, forcing version 2.
When you are using Dynamic Fields to query Maps in Cassandra, you must begin the Key in your Map with the prefixed map literal. In your case the prefixed map literals are :
score_calculated_
score_value_
score_velocity_
shared_on_
The reason the error 'undefined field realtime' is coming is because realtime is not prefixed by the prefix specified for that field in schema.xml.
An example of what one of your records would look like would be:
{'score_value_realtime': 18.432}
Do the same for all the map values.
For more details see this url:
http://www.datastax.com/documentation/datastax_enterprise/3.2/datastax_enterprise/srch/srchDynFlds.html
Related
I'm moving a system from using Solr 1.4 to Solr 6.x (or possible 5.x) and the fields names all contain colons (e.g. "rdf:type" ). I've converted all the configuration files to Solr 6.x version using a schema.xml file. I can see "rdf:type" in Solr's schema view.
These fieldnames worked fine in 1.4 but now colons are automatically converted to underscores when indexing is attempted.
For instance using Solr's built in interface, if I try to submit a simple document like:
{'rdf:type': 'http://purl.org/ontology/bibo/Note'}
I get an error message saying:
ERROR: [doc=682e3f70-a4bc-4336-9f69-e7d620fe5fff] unknown field 'rdf_type'
Is it possible to "turn off" this feature? Will using colons cause problems with then newest versions of Solr?
(On a side note, making "rdf:type" a compulsory field and then not including it causes an error which reads: "missing required field: rdf:type", i.e. it displays the correct name)
This behaviour is not "native" to Solr itself, but is part of the default update processor chain that is added to the configuration for the Schemaless mode in the bundled examples (which is the default).
The reason is that lucene uses : to separate field names from the values to be queried in those fields, so it's usually easier to keep : out of the field name.
You can change this by removing the FieldNameMutatingUpdateProcessorFactory from the update chain, or use your own schema (without the update processor chain).
My query has a field CONTENTDISPLAYDATE and cfdump displays it as "2014-10-16 00:00:00.0". I add it to the SOLR collection using contentdisplaydate_dt="ContentDisplayDate" in my cfindex statement.
When I cfdump the resulting data from the cfsearch result, the field appears as "Thu Oct 16 00:00:00 EDT 2014" and sorting on it doesn't work. Using query of queries on the resultset and ordering by it also doesn't work. So looks like assigning it to a SOLR date field isn't working. Can anyone shed light on what I'm doing wrong? We're using the default version of SOLR that ships with CF 10.
The first thing to do would be to check to make sure that the field contentdisplaydate_dt is defined as a date field in Solr. You can do this by looking at the file schema.xml under this particular collection (often C:\ColdFusion9\collections\mycollection\conf\).
You can also confirm the content of contentdisplaydate_dt by querying the Solr index directly in your browser (using the Solr web service):
http://mysolrserver:8983/solr/myindex/select?q=searchterm&fl=contentdisplaydate_dt
(The above URL will return XML data by default; if you prefer JSON then add &wt=json to the URL.)
My guess is that what is happening is that ColdFusion is trying to convert the Solr dates (which are always of the format yyyy-mm-ddThh:mm:ssZ) and the results aren't pretty. You have to do some manipulation in order to convert Solr dates to a date format recognized by CF.
Last, I would encourage you to use Solr's web service both to index and to search your data rather than using <cfindex> and <cfsearch>. Searching is especially easy with the Solr web service; just use <cfhttp> to call the web service and deserializeJSON() to process the data returned (assuming you're returning JSON instead of XML).
I am indexing documents into solr from a source. At source, for each document, i have some associated properties which i am indexing & fetching into solr.
What i am doing is i am mapping some fields from source properties with solr schema fields. But i could see couple of extra fields in solr logs which i am not mapping. While querying in solr admin UI, i could see only mapped fields.
E.g. In below logs, i am using only content_name & content content_modifier but i could see Template fields also.
INFO - 2014-09-18 12:07:47.185; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr path=/update/extract params={literal.content_name=1_.000&literal.content_modifier=System&literal.Template={8ad4d8f0-93a7-4941-9657-cf3706f00409} {add=[1_.000 (1479581071766978560)]} 0 0
So whats happening here? Will solr index only mapped fields and skip rest of unmapped ones? Or will solr index all fields including mapped & non-mapped but on admin UI , it will show only mapped fields?
Please suggest.
Your question is defined by what your solrconfig and schema say because you can configure it any way you want. Here is how it works for the example schema for Solr 4.10:
1) In solrconfig.xml, the handler use "uprefix" parameter to map all fields NOT in schema to a dynamic field ignored_*
2) In schema.xml, that dynamic field has type ignored
3) Type ignored (in the same file) is defined as stored=false and indexed=false. Which means do not complain if you get one of fields with matching pattern, but do nothing with, literally ignore.
So, if you don't like that, you can modify any part of that pipeline. The easiest test would be to change the dynamic field to use type string and reindex. Then, you should see the rest of the fields.
I'm trying to quickly index a large collection of html files for a once off information retrieval experiment with Apache Lucene Solr. I'm using the example Solr instance distributed with the latest release (solr-4.9.0/example/solr) and in the spirit of a quick and dirty solution I'm just submitting the documents with curl:
curl http://localhost:8983/solr/update/extract?literal.id=001 -F myfile=#blah.html
When I look at the logs in the Solr panel during indexing I see a lot of errors of the form:
org.apache.solr.common.SolrException: ERROR: [doc=BLOG06-20060103-014-0011844415] multiple values encountered for non multiValued field keywords: [hair care, shampoo, hair styles, hair styles, ...]
It looks like the component doing the keyword extraction is pulling out multiple values when perhaps it should only be a list of words separated by whitespace. Do I need to do anything to force this, or does this look like some kind of bug?
Turns out the solution was as simple as ensuring that the keywords field in schema.xml has multiValued="true" specified. I then had to do this for a couple of other fields. I had foolishly assumed that the schema would be set up to match the default document parser in the demo instance.
I am struggling with a little problem where I have to display relevant information about the resultset returned from SolR but can't figure out how to calculate it without iterating the results (bad).
Basically I am storing my documents with a state field and while the search is supposed to return all documents, the UI has to show "Found 15 entities, 5 are in state A, 3 in state B and 8 in C".
At the moment I am using a rather brittle approach of running the query 3 times with additional scoping by type, but I'd rather get that information from the one query I am displaying. (There have been some edge cases where the numbers don't add up and since SolR can return facets I guess there has to be a way to use that functionality in this case)
I am using SolR 3.5 from Rails with the sunspot gem
As you mention yourself, you can use facets for this by setting
facet=true&facet.field=state
I'm not familiar with the sunspot gem, but by looking at the documentation you can use
facets like this(Assuming Entity is your searchable):
Entity.search do:
facet :state
end
This should return the states of all entities returned by your query with the number of entities in this state. The Sunspot documentation tells me you can read these facets in the following way:
search.facet(:state).rows.each do |facet|
puts "State #{facet.value} has #{facet.count} entities"
end
Essentially there are three main sets of functions you can use to garner stats from solr.
The first is faceting:
http://wiki.apache.org/solr/SimpleFacetParameters
There is also grouping (field collapsing):
https://wiki.apache.org/solr/FieldCollapsing
And the stats package:
https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
Although the stats, facet and group may be replaced by the analytic package known as olap which is aimed to be in solr V 5.0.0:
https://issues.apache.org/jira/browse/SOLR-5302
Good luck.