My document structure is like this
<document>
<entity name="entity1" query="query1">
<field column="column1" name="column1" />
<!-- more columns specific to this entity -->
</entity>
<entity name="entity2" query="query2">
<field column="column2" name="column2" />
<!-- more columns specific to this entity -->
</entity>
</document>
In my query involving entity1 columns only, if I add entity2 columns in sort clause, why should the result be affected at all? My query is only on entity1 columns which are unrelated to entity2. Is it the case that solr apply the sort clause first on entire "documents" and then apply the query condition(s)?
Documentation reads -
If sortMissingLast="false" and sortMissingFirst="false" (the default),
then default lucene sorting will be used which places docs without the
field first in an ascending sort and last in a descending sort.
Can someone please elaborate on the bolded text?
I think the last paragraph of my question had the answer in it.
If field is missing, default sorting is used which is why my results look "affected".
Related
I'm looking to configure SOLR to query a table based on certain data.
I unfortunately have to work with how the Database is setup, but here's what I'm after.
I have a table named Company that will contain a certain "prefix" value.
I want to use that prefix value to determine what tables I should query for the DIH.
As a quick sample:
<entity name="company" query="Select top 1 prefix from Company">
<field name="prefix" column="prefix"/>
<entity name="item" query="select * from ${company.prefix}item">
<field column="ItemID" name="id"/>
<field column="Description" name="description/>
</entity>
</entity>
However I only ever seem to get 1 document processed despite that table containing over 200,000 rows.
what am I doing wrong?
I think you could achieve this by:
using an stored procedure. You can call a sp from DIH as seen here
inside the stored procedure, you can do the table lookup as needed, and then return the results from the real query.
Depending on how good you are with MSSql-s SQL, you might be able to just put everything into a single SQL query and use that directly in DIH, but not sure about that.
I have a webshop with multiple different productcategories.
For each category I have a description, metadata, image and some more category specific data.
Right now, my data-config.xml looks as below.
However, I think this way I'm indexing all category specific data for each product individually, so taking up a lot more space than needed.
I'm now considering to move the indexing and storing of category specific data to a separate solr core/instance, this way I have basically separated the product specific data and the category data.
Is this reasoning correct? Is it better to move the category specific data outside this core/instance?
<document name="shopitems">
<entity name="shopitem" pk="id" query="select * from products" >
<field name="id" column="ID" />
<field name="articlenr" column="articlenr" />
<field name="title" column="title" />
<entity name="catdescription" query="select
pagetitle_de as cat_pagetitle_de,pagetitle_en as cat_pagetitle_en
,description as cat_description
,metadescription as cat_metadescription
FROM products_custom_cat_descriptions where articlegroup = '${shopitem.articlegroup}'">
</entity>
</entity>
</document>
Generally speaking, your implementation will be easier if you flatten (de-normalize) everything, as you did. If you spin off the categories in a different core, Solr becomes harder to use - you will need extra queries, extra client code, faceting won't work so easily, etc - all of which will result in a performance hit, on top of the extra implementation difficulties.
From the numbers you give (staying under 1GB index size? it's not that big), I would definitely not go the way of splitting out the category data, it will make your life harder, for not much practical gain.
I need to have fields nested inside of fields, does solr provide that ability ?
For example : I need to have a multivalued field called Products, and each Product needs to in-turn have a multivalued field Properties. I need there to be nesting, so that in case, I search for a property, it only returns the corresponding product info and not all products
Currently, I find that if I have 10 products which each have 10 properties in each doc, upon searching for a property, all the products in that doc(which holds that property) would be returned. And now again I'd have to manually sort out which product had that property, by comparing the array indices. So if property 53 is returned, it would be the 6th product. Thisgets worse when not all products have an equal number of properties.
Is there no easier way ?
Thanks in advance for your replies.
Yes, recent Solr supports nested document. Though, there are some tradeoffs. Mostly, that you had to index and delete the whole parent+children block together. But it should not be a problem for your case.
After that, you can search them in a couple of different ways using BlockJoins.
Not sure if it is useful in your situation but this is what I am doing in my data-config.xml
<document>
<entity name="paper" query="SELECT * FROM papers">
<field column="title" name="title"/>
<field column="title" name="title_unstem"/>
<field column="year" name="publish_date"/>
<entity name="person" query="SELECT * FROM papers_people PA, people A WHERE PA.person_id = A.id AND PA.paper_id='${paper.id}'">
<field column="id" name="author_id"/>
<field column="first_name" name="first_name"/>
<field column="last_name" name="last_name"/>
<field column="full_name" name="author"/>
</entity>
<entity name="volume" query="SELECT * FROM volumes WHERE id='${paper.volume_id}'">
<field column="id" name="volume_id"/>
<field column="title" name="volume_title"/>
<field column="anthology_id" name="volume_anthology"/>
</entity>
</entity>
</document>
Basically as you can see my Paper has many Authors and belongs to a Volume. I am doing this on Ruby on Rails with the Blacklight gem so if you have any questions just ask me.
If this is your key requirements and you haven't invested much in solr, then, I suggest you look at elasticsearch. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html
Otherwise, blockjoin is the only out of the box way to do it in solr and it looks more like a hack.
I have a master/detail table that I would like to import in Solr so I can query it.
Now it appears to me that only the first row of the detail table is imported.
How do I import all rows from the detail table?
I currently have something like this in my data import handler query:
<entity name="master" query="SELECT id, name, description,
FROM master WHERE isapproved = 1">
<!-- snip -->
<entity name="details" query="SELECT sku,description,price
FROM details WHERE masterid='${master.id}'">
<field column="sku name="sku" />
</entity>
To make it a bit more difficult, sometimes there are only master rows without corresponding detail rows. So I could not reverse the query (select detail first and then master) because that would leave me without the master data.
What is a good solution?
Unfortunatly I do not see your schema.xml, but it is likely that you forgot to mark your document attribute as multiValued="true" there. In that case Solr would only fetch the first value and skip the rest.
When using DataImportHandler with SqlEntityProcessor, I want to have several definitions going into the same schema with different queries.
How can I search both type of entities but also distinguish their source at the same time. Example:
<document>
<entity name="entity1" query="query1">
<field column="column1" name="column1" />
<field column="column2" name="column2" />
</entity>
<entity name="entity2" query="query2">
<field column="column1" name="column1" />
<field column="column2" name="column2" />
</entity>
</document>
How to get data from entity 1 and from entity 2?
As long as your schema fields (e.g. column1, column2) are compatible between different entities, you can just run DataImportHandler and it will populate Solr collection from both queries.
Then, when you query, you will see all entities combined.
If you want to mark which entity came from which source, I would recommend adding another field (e.g. type) and assigning to it different static values in each entity definition using TemplateTransformer.
Also beware of using clean command. By default it deletes everything from the index. As you are populating the index from several sources, you need to make sure it does not delete too much. Use preImportDeleteQuery to delete only entries with the same value in the type field that you set for that entity.