I have some questions regarding the solr schema design. Basically I'm setting up a search engine for product catalogue website and my table relationships are as follows.
Product Belongs to Merchant
Product Belongs to Brand
Product has and belongs to many Categories
Category has many Sub Categories
Sub Category has many Types
Type has many Sub Types
So far my Schema.xml is looks like this.
<field name="product_id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="string" indexed="true" stored="true"/>
<field name="merchant" type="string" indexed="true" stored="true"/>
<field name="merchant_id" type="string" indexed="true" stored="true"/>
<field name="brand" type="string" indexed="true" stored="true"/>
<field name="brand_id" type="string" indexed="true" stored="true"/>
<field name="categories" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="sub_categories" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="types" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="sub_types" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="price" type="float" indexed="true" stored="true"/>
<field name="description" type="text" indexed="true" stored="true"/>
<field name="image" type="text" indexed="true" stored="true"/>
<field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
<uniqueKey>product_id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
<copyField source="name" dest="text"/>
<copyField source="merchant" dest="text"/>
<copyField source="brand" dest="text"/>
<copyField source="categories" dest="text"/>
<copyField source="sub_categories" dest="text"/>
<copyField source="types" dest="text"/>
<copyField source="sub_types" dest="text"/>
So my Questions now:
1) Is the Schema correct?
2) Let's assume I need to find products for Category XYZ. My Senior programer doesn't like querying the solr by Category Name, instead he wan't to use CategoryID.
He is suggesting to store CategoryID_CategoryName (1001_Category XYZ) and from web front he is sending ID. (Assuming that Names with white spaces doesn't work properly).
So to find the products I should then do a partial match of categories and identify the category id from the string i.e (fetch 1001 from 1001_Category XYZ)
or
What if I keep the Names on categories field and setup another field for category_ids? that's seems a better option for me.
or
is there any Solr multi valued field type to store CategoryID and CategoryName together?
Let me know your thoughts, thanks.
Answers to your questions.
Maybe - it depends on how you plan on structuring your queries, what you intend to search and what you intend to retrieve in search results. In your schema, you're storing & indexing everything which can be quite inefficient. Index what you intend to query, store what you intend to retrieve/display. If you were looking for optimizations, I would review the datatypes used in the schema - try to stay as native to the source type as you can.
Querying by CategoryId - your programmer is correct, you want to query by category Id. Your approach of storing Ids and Names in separate fields is accurate as well. Presuming your Id-based fields are integers/longs, you don't want to structure them as strings but rather as integers/longs.
Related
I am having trouble with Solr 8.5.2 when providing a word in a query. It's fine when the query is :. But when I put in a word, it does not hit any document.
Here is my schema.xml config.
<field name="quoteid" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="quotenumber" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="formdata" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="creationtimeintickssinceepoch" type="plong" indexed="true" stored="true"/>
<field name="_version_" type="plong" indexed="false" stored="false"/>
<field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
Here is a sample document. (FormData field is actually a Json string, as you notice)
{
"quoteid":"466f4dea-XXXX-443c-b1e4-XXXXXXX",
"quotenumber":"NAAAAA",
"creationtimeintickssinceepoch":15927195449809739,
"formdata":"{\"formModel\": {\"SomeProperty0\":\"somevalue\",\"SomeProperty1\":\"somevalue\",\"SomeProperty2\":\"somevalue\"}"...blahblahblah here,
"_version_":1670089165635584000}
I tried entering NAAAAA, no results. I tried 'SomeProperty1', no results too.
If you're not giving any field names in your query or using dismax or edismax with the qf argument, the default search field is used (usually named _text_ - this could be configured in your schema, but is usually given as df with the default query handler).
You'll need to include your field name when you're querying other fields - quotenumber:NAAAAA to get hits in the quotenumber field.
I have a set of documents in a Solr index that have the fields, exact_title and alternative_title. I want to be able to search them by using the field title.
So in other words the query title:Hello World should return documents that have an exact_title or an alternative_title "Hello World"
Is it possible to define as alias for a field during indexing time?
I solved defining copy fields in the schema.xml file.
Example:
<field name="title_txt" type="text_general" indexed="true" stored="false" multiValued="true"/>
<field name="exact_title_txt" type="text_general" indexed="true" stored="true" multiValued="false"/>
<field name="alternative_title_txt" type="text_general" indexed="true" stored="true" multiValued="false"/>
<copyField source="exact_title_txt" dest="title_txt"/>
<copyField source="alternative_title_txt" dest="title_txt"/>
In our scenario we receive an unknown postal address in a string format with an unknown address format. Our need is to run the search with the given postal address over all the fields and find the best match for the query.
However, if we don't have an exact match for the 4 mandatory fields - meaning SOLR returns similar results (for at least 1 mandatory field), then NO results should be displayed.
The 4 mandatory fields are BuildingNumber, LocPressName, County and PostalDistrict defined with the other search fields in the schema.xml file as follows -
<field name="uid" stored="true" indexed="true" type="uuid" default="NEW"/>
<field name="UnitNumber" stored="true" indexed="true" type="text_general"/>
<field name="UnitName" stored="true" indexed="true" type="text_general"/>
<field name="BuildingNumber" stored="true" indexed="true" type="exactish"/>
<field name="BuildingName" stored="true" indexed="true" type="text_general"/>
<field name="LocPressName" stored="true" indexed="true" type="exactish"/>
<field name="PostalDistrict" stored="true" indexed="true" type="exactish"/>
<field name="County" stored="true" indexed="true" type="exactish"/>
<field name="AddressId" stored="true" indexed="true" type="text_general"/>
<field name="ExchangeCode" stored="true" indexed="true" type="text_general"/>
<field name="PreviousCustomerName" stored="true" indexed="true" type="text_general"/>
<field name="Eircode" stored="true" indexed="true" type="text_general"/>
I am fairly new to Solr and I am not sure how to generate this query that produces the best results only if it finds a match for ALL FOUR mandatory fields.
Without the exact type of your exactish field, its hard to say, but assuming that it's a StrField. The basic, explicit version:
q=(BuildingNumber:18 AND LocPressName:Foo AND
County:Forthershire AND PostalDistrict:Bar) AND searchField:Query
.. where searchField is a field where everything you want to search as a text_general field has been copied. You can replace this with all the other fields if needed.
Another option:
q=Query&defType=edismax&qf=UnitNumber UnitName .. etc&fq=BuildingNumber:18 AND
LocPressName:Foo AND County:Forthershire AND PostalDistrict:Bar
This works the same, but allows a free form querying by using the edismax query parser. The fq applies a filter to your resultset, where documents has to match the filter to be considered in the result set. It does however not affect how a document is scored.
I am using SOLR and i have a schema something Like this :
<fields>
<field name="Id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="Username" type="text_general" indexed="true" stored="true" omitNorms="true" multiValued="false"/>
<field name="ServerName" type="text_general" indexed="true" stored="true" multiValued="false" />
<fields/>
I want to use facet to get the result that give me the number of user per each server
how can i do that?
desired result :
server 1 : 200 (userNumber)
server 2: 300
and so on...
thank you
This is not a complete solution, as i do not have your data and schema. But what i think you need is pivot Faceting http://wiki.apache.org/solr/SimpleFacetParameters#Pivot_.28ie_Decision_Tree.29_Faceting .
So you need to do something like this (again , you need to adjust this to make it work for you)
http://ip:port/solr/collection1/select?q=*:*&rows=0&facet=true&facet.pivot=Username,ServerName
i indexed a collection of archived websites for querying using solr. As unique key i use the URL's of the sites. What i would like to do is to use the url field in filter queries to limit the search to a certain domain when needed. For example i want to query for "Barack Obama", but limit the results to the "whitehouse.gov" domain. Sounds like a pretty basic use case to me, however searches on the URL field do not return any results at all. Here is my config (schema.xml):
.
.
.
<field name="collection" type="string" indexed="true" stored="true"/>
<field name="content" type="text_de" indexed="true" stored="true" multiValued="true"/>
<field name="date" type="string" indexed="true" stored="true"/>
<field name="digest" type="string" indexed="true" stored="true"/>
<field name="length" type="string" indexed="true" stored="true"/>
<field name="segment" type="string" indexed="true" stored="true"/>
<field name="site" type="string" indexed="true" stored="true"/>
<field name="title" type="text_de" indexed="true" stored="true" multiValued="true"/>
<field name="type" type="string" indexed="true" stored="true"/>
<field name="url" type="text_en_splitting" indexed="true" stored="true"/>
.
.
.
<!-- Field to use to determine and enforce document uniqueness.
Unless this field is marked with required="false", it will be a required field
-->
<uniqueKey>url</uniqueKey>
And here is my query (simplified):
http://mysolrserver.com:8983/solr/select/?q=content:Barack+Obama&fq=url:whitehouse.gov
The query analyzer tells me, that my query should match:
Does anyone have an idea why this is not working? I highly appreciate any hints i can get! Thanks alot guys!!
The fq=url:whitehouse.gov filtering should work.
However I see the problem with the query q=content:Barack+Obama.
Whats your default search field ??
Does removing the query component and using q=*:* return results for you. ??
q=content:Barack+Obama query would actually result into a query like content:barack defaultsearchfield:obama
As the default search field would not have obama this would not result in any results.