How to make Range query properly in Solr - solr

Let’s say, there is Car Dealer website where dealer shows his car inventory for sale. Each car has different prize range base on car trim like LE, SE, XLE, XSE. For example.
LE SE XLE XSE
Toyota Camry: |15000 – 20000| |25000 – 30000| |35000 – 40000| |41000 - 50000|
Toyota REV4 : |18000 – 21000| |24500 – 27000| |28000 – 33000| |34000 - 36000|
Here each row is one Document in solr. I store this prize range like this in solr document
For Toyota Camry document:
prizeMin:[15000, 25000, 35000, 41000], prizeMax:[20000, 30000, 40000, 50000]
For Toyota REV document:
prizeMin:[18000, 24500, 28000, 34000], prizeMax:[21000, 27000, 33000,36000]
So I have Prize facet with Min and Max textbox where user enter his prize range. What I want to do is, if user enter 15000 to 17000 then I want to show only Toyota Camry (LE trim fall between this range) but not Toyota REV4. If user enter 26000 to 4000 then both cars will be display (because for Camry is matches SE, XLE and for REV4 it matches SE, XLE, XSE).
This I can do by having Prize range query to solr like prize:[UserEnterMin TO *] AND prize:[* TO UserEnterMax].
However, if user enters 22000 to 23000 then I do not want to display anything because this prize range does not fall under any prize range in table. With my solution prize:[UserEnterMin TO *] AND prize:[* TO UserEnterMax] I cannot prevent this scenario from display.
So My question is how to identify that user has enter prize that fall between gaps and how can I eliminate the selection of that document using solr range query.

This can be done with the well know way of using geo functionality to model other things: see here
That link is pretty old. Since then geo stuff has been augmented and refactored, and new field types are available. You can probably model this in several different ways with those types.

Related

BI - fact table design with incompatible grains

I'm quite new to BI designing DB, and here some point I do not understand well.
I'm trying to import french census data, where I got population for each city. For each city, I have population with different age classification, that can't really relate with each other.
For instance, let's say that one classification is 00 to 20 years old, 21 to 59, and 60+
And the other is way more precise : 00 to 02, 03 to 05, etc. but the bounds are never the same as the first one classification : I don't have 15 to 20, but 18 to 22, for example.
So those 2 classifications are incompatible. How can I use them in my fact table ? Should I use 2 fact tables and 2 cubes ? Should I use one fact table, and 2 dimensions for 1 cube ? But in this case, I will have double counted facts when I'll sum to have total population for a city, won't I ?
This is national census data, and national classifications, so changing that or estimating population to mix those classifications is not an option. And to be clear, one row doesn't relate to one person, but to one city. My facts are not individuals but cities' populations.
So this table is like :
Line 1 : One city - one amount of population - one code for dim age (ex. 00 to 19 yo) of this population - code (m/f) for the dim gender of that population - date of the census
Line 2 : Same city - one amount of population - one code for dim age (ex. 20 to 34) of this population - code (m/f) for the dim gender - date of the census
And so it goes for a lot of cities, both gender, and multiple years.
Same
I hope this question is clear enough, as english is not my native language and as I'm quite new in DB and BI !
Thanks for helping me with that.
One possible solution using a single fact table and two dimensions for the age ranges:
1 - Categorical range based on the broadest census, for example:
Young 0-20
Adult 21-59
Senior 60+
You could then link the other census to this dimension with approximate values, for example 18-22 could be Young.
2 -Original age range. This dimension could be used for precise age ranges when you report on a single city, it can also help you evaluate the impact of the overlapping bounds (e.g. how many rows are in the young / 18-22 range?)
you can crate one dimention as below
young 1-20
adult 21-59
senior 60+
Classification is
young city 1 : 1-20
young city 2 : 4-23
id field1 field2 field3 field4 .......
1 1 year young_city_1 other .......
2 2 year young_city_1 other .......
3 3 year young_city_1 other .......
4 4 year young_city_1 young_city_2 .......
Now you can report from any item and with any division
i hope it is help you

Build complex queries with multiple fields in cloudant

Date State | City | Zip | Water | Weight
-------------------------------------------------------------------
01/01/2016 Arizona Chandler 1011 10 ltr 40 kg
01/04/2016 Arizona Mesa 1012 20 ltr 50 kg
06/05/2015 Washington Spokane 1013 30 ltrs 44 kg
06/08/2015 Washington Spokane 1013 30 ltrs 44 kg
What I want are complex queries, like I want to know average water, weight by passing a city or state or ip for a date range or month, or any field or all fields.
I am not sure how to go about this. Read about map reduce, but cant guess how will I get above output
If you have link for examples which covers above scenarios that will also help.
Thanks in advance
So first we need to model your structured data in JSON. Something like this would work:
{
"date": "2016-01-01",
"location": "Arizona Chandler",
"pressure": 1101,
"water": 10,
"weight": 40
}
Here's your data in a Cloudant database:
https://reader.cloudant.com/so37613808/_all_docs?include_docs=true
Next we'd need to create a MapReduce view to aggregate the a specific field by date. A map function to create an index whose key is the date and whose value is the water would look like this:
function(doc) {
emit(doc.date, doc.water);
}
Every key/value pair emitted from the map function is added to an index which can be queried later in its entirety or by a range of keys (keys which in this case represent a date).
And if an average is required we would use the built-in _stats reducer. The Map and Reduce portions are expressed in a Design Document like this one: https://reader.cloudant.com/so37613808/_design/aggregate
The subsequent index allows us to get an aggregate across the whole data set with:
https://reader.cloudant.com/so37613808/_design/aggregate/_view/waterbydate
Dividing the sum by the count gives us an average.
We can use the same index to provide data grouped by keys too:
https://reader.cloudant.com/so37613808/_design/aggregate/_view/waterbydate?group=true
Or we can select a portion of the data by supplying startkey and endkey parameters:
https://reader.cloudant.com/so37613808/_design/aggregate/_view/waterbydate?startkey=%222016-01-01%22&endkey=%222016-06-03%22
See https://docs.cloudant.com/creating_views.html for more details.

Correct Database Architecture

I don't know how to design my mysql webdatabase for a shop.
The scenario is for a site selling guided tours.
Each tour can be either a Private, a Semi-Private or a Group Tour. The price per person changes per tour type. BUT ALSO for the Private tours, the price per person varies depending on the number of persons. However it varies by different amounts depending on tour. How would i create a 'Tour/Product' record?
e.g. Let's say:
Tour of Vatican (tour has various bits of data - name, description, meeting point, duration, etc). Semi-Private tour costs 50 euro per person. Group tour costs 45 euro per person. Private tour costs (140 euro for 1-2 people), or 180 euro for 3 people, or 200 euros for 4 people, or 225 euros for 5 people or 240 euro for 6 people or for 7 people or more it costs 43 euro per person.
HOWEVER for the Tour of Coliseum (tour has same bits of data - name, description, meeting point, duration, etc), Semi Private costs 40 per person. Group costs 25 per person. Private tour costs (100 euro for 1-2 people), or 135 euro for 3 people, or 160 euros for 4 people, or 175 euros for 5 people or 180 euro for 6 people or for 7 people or more it costs 25 euro per person.
How would i structure the data in the database - 2 tables? 3 tables?
Totally confused....
Thanks
Tom
From what I understand from your post, the price alters depending on three different things:
The tour: the price for the Tour of Vatican is not similar with the price for the Tour of Colloseum.
The type of the tour: Private, a Semi-Private or a Group Tour.
The number of persons on the tour.
Since there is no exact (constant) price per person on any of the given options, I would go for a three tables approach.
The digaram is detailed in the below picture and works under the following assumptions:
There are three tables: Tour (containing the description for each individual tour); TourPriceOptions (containing the individual price options records) and TourType (which at all times, will contain just three records: Private, Semi-Private and Group Tour);
There are just two assumptions that you have to do:
A tour can have multiple price options (1 to many relationship)
An price option can have just one single tour type (1 to 1 relationship)
How to code this up:
Whenever the administrator of the store creates another tour in the backend of the store, he should be able to add multiple price options. In order to do this you will need to:
Create a new tour: a function which inserts in to the database a new entry in the tour table.
Get the id of the recently created tour: if there is only one person adding information at any given time, then there is a good bet to write a function that returns the id of the latest added tour.
Add pricing options based on the id_tour: insert a new price option based on the id_tour variable. Remember to assign a tour_type from one of the already predefined categories.
Whenever you want to return these values, just write a query that allows you to retrieve information based on the tour the user is currently browsing.
Additional things to research: Dynamic Forms - They will help you when you don't know how many price options an admin might want to add for a specific tour

Having trouble sorting without grouping/field collapsing in Solr

Is it possible to do a compound sort in solr without Field Collapsing?
If I have two car models, Ford and Chevy, can I sort first on Ford where price is less than 2,000, then Ford > 2,000, then the Chevy models? I would like to do this without grouping, and without applying a price sort to the Chevy models.
For example, something like &sort=Model:"Ford" AND price:[0 TO 2000]
so that I get:
Ford 1, $1000
Ford 2, $500
Ford 2, $1500
_________
Ford 3, $3000
Ford 3, $5000
_______
Chevy 1
Chevy 2
Chevy 3
I've tinkered a bit with this, and I've come up with a solution based on the query() function, since you can use that together with sorting. I'm not sure about the performance, and depending on the number of documents in your index, that might not be important, so the only way is to try it and see if it performs. I've used name and price as my two fields in the schema, which I think would map to your Model and price fields.
The way sort works is that each clause is evaluated in order, so that the first sort description is performed first, then the next one if there's a draw, and so on.
I've removed url escaping and formatted everything a bit:
sort=query($sq1,0) asc,query($sq2,0) asc
&sq1=name:Ford* AND price:[0 TO 1500]
&sq2=name:Ford*
This implies that the first sort is performed on the query named in the sq1= URL parameter, but if there's a draw (which there will be, if there isn't a match), the query named under sq2= will be performed ($sq1 and $sq2 refers to these to queries, and a simple substitution will be made by Solr before evaluating the query() function).
I haven't provided a default sort order, but you could add name asc as a default sort. The 0 as the second argument to query() is a value that the sort will use if there isn't a match from the query (otherwise it'll use the score from the query). You could feed this value into product() and multiply with the price, to sort each of the "buckets" by price as well if needed.

Grouping results and keeping facet counts consistent

Using Solr 3.3
Key Store Item Name Description Category Price
=========================================================================
1 Store Name Xbox 360 Nice game machine Electronic Games 199.99
2 Store Name Xbox 360 Nice game machine Electronic Games 199.99
3 Store Name Xbox 360 Nice game machine Electronic Games 249.99
I have data similar to above table and loaded into Solr. Item Name,
description Category, Price are searchable.
Expected result
Facet Field
Category
Electronic(1)
Games(1)
**Store Name**
XBox 360 Nice game machine priced from 199.99 - 249.99
What will be the query parameters that I can send to Solr to receive results above, basically I wan to group it by Store, ItemName, Description and min max price
And I want to keep paging consistent with the main (StoreName). The paging should be based on the Store Name group. So if 20 stores were found. I should be able to correctly page.
Please suggest
If using Solr 4.0, the new "Grouping" (which replaces FieldCollapsing) fixes this issue when you add the parameter "group.facet=true".
So to group your fields you would have add the following parameters to your search request:
group=true // Enables grouping
group.facet=true // Facet counts to be number of groups instead of documents
group.field=Store // Groups results by the field "Store"
group.ngroups=true // Tells Solr to return the number of groups found
The number of groups found is what you would show to the user and use for paging, instead of the normal total count, which would be the total number of documents in the index.
Have you looked into field collapsing? It is new in Solr 3.3.
http://wiki.apache.org/solr/FieldCollapsing
What I did is I created another field that grouped the required fields in a single field and stored it, problem solved, so now I just group only on that field and I get the correct count.

Resources