User search pricing calculation - solr

I'm building a search engine which provide me a list of cap drivers. We have some requirements:
User is searching cheapest cap driver to bring him from place a to place b. He can go from any place to any place.
Default formula would be distance * price per mile
But there are also special prices like AMSTERDAM to THE HAGUE would be always 100 EUR
The price for each mile is season based winter/summers have different prices.
Faceting search based on attributes. Like is there Champagne/Luxory/Male/Female driver/Etc etc.
User want's to sort on cheapest ride/but also distance.
What would be the best approach to fit all there requirements? I've tried Solr but have not found a good solution for putting the price modal in there. Any ideas?

Related

how can I filter capital letters in a data set

I have a column with a lot of rows more than 150k, each cell has a text, there is some cells has problems of having some sentences with capital letters I wanna fix this issue how can I filter them to know how many of them I have for example I have some cells like that I wanna detect the cells that has some capital sentences to be able to fix them:
Each year, They carefully curate the finest gifts to fill our Baskits
from new and wellloved brands, to unique products made exclusively for
us. They specialize in helping busy professionals give thoughtful,
impactful gifts for business development, colleague/employee
recognition, holiday gifts and more. Here are the top three things to
know about us: 1) They ARE CANADA'S LEADING GIFT DELIVERY SERVICE 30+
years of experience and 20, 000 customers 2) They MAKE THOUGHTFUL
GIFTING QUICK AND EASY Online and Mobile Webstore (Open 24/7) Call
Centre Gift Specialists Two Retail Stores (Downtown + North Toronto)
3) They HAVE DELIVERY OPTIONS TO SUIT YOUR NEEDS Delivery Across North
America sameday (and Saturday) Delivery in the GTA
try in A2:
=INDEX(REGEXMATCH(A2:A; "(.*)[A-Z]{2}(.*)")
or if you want a count:
=SUMPRODUCT(1*REGEXMATCH(A2:A; "(.*)[A-Z]{2}(.*)"))
Since it looks like you're in Google Sheets, just do a REGEXMATCH() for two capital letters in a row (as a review flag):
=BYROW(A2:A, LAMBDA(x, REGEXMATCH(x, "(.*)[A-Z]{2}(.*)"))
The BYROW() makes it a one-liner for the entire column. Ditch that if needed.

Is there a dataset for products (UPC/EAN level) and their recycling information?

I am looking to do some analysis around plastic recycling and interested to know if there is any dataset that gives recycling information for products sold in US. For ex: a product with UPC/EAN number has a resin code of 1 (number written at the bottom of a plastic container). If you have any ideas on how to start creating it will be helpful as well. I understand there is something out there that gives information of a general 1 gallon milk container but I am looking at information on a brand/manufacturer level.
Thanks

Calculating dynamic pricing on Google Sheets

I have imported data from a trading exchange listing sellers of a particular cryptocurrency.
From this data, I want to create dynamic pricing to display an average cost on an order based on given order size.
I will give an example of what I am looking for:
Example dataset
Within this example, we would be purchasing the cryptocurrency 'SINS'. As per the data showed on this table, if 29.06 SINS was purchased, that would fill the first order, and the total BTC paid would be 0.00459 BTC.
If an order was placed for 145 SINS, it would fill the orders up to row 12 and partially fill the order in row 13. By calculating that manually, I know that would cost 0.02293365 BTC (calculated using col D) at an average price of 0.00015816 per SIN.
What I would like to achieve is if a number is entered in a cell, it confirms the average price of an order based on the number entered and the orders imported from the trading exchange.
=INDIRECT(ADDRESS(MATCH(VLOOKUP(O2,F2:F,1),F:F,0),7,4))+(
INDIRECT(ADDRESS(MATCH(VLOOKUP(O2,F2:F,1),F:F,0)+1,4,4))*(O2-
INDIRECT(ADDRESS(MATCH(VLOOKUP(O2,F2:F,1),F:F,0),6,4)))/
INDIRECT(ADDRESS(MATCH(VLOOKUP(O2,F2:F,1),F:F,0)+1,3,4)))
spreadsheet demo

Grouping results and keeping facet counts consistent

Using Solr 3.3
Key Store Item Name Description Category Price
=========================================================================
1 Store Name Xbox 360 Nice game machine Electronic Games 199.99
2 Store Name Xbox 360 Nice game machine Electronic Games 199.99
3 Store Name Xbox 360 Nice game machine Electronic Games 249.99
I have data similar to above table and loaded into Solr. Item Name,
description Category, Price are searchable.
Expected result
Facet Field
Category
Electronic(1)
Games(1)
**Store Name**
XBox 360 Nice game machine priced from 199.99 - 249.99
What will be the query parameters that I can send to Solr to receive results above, basically I wan to group it by Store, ItemName, Description and min max price
And I want to keep paging consistent with the main (StoreName). The paging should be based on the Store Name group. So if 20 stores were found. I should be able to correctly page.
Please suggest
If using Solr 4.0, the new "Grouping" (which replaces FieldCollapsing) fixes this issue when you add the parameter "group.facet=true".
So to group your fields you would have add the following parameters to your search request:
group=true // Enables grouping
group.facet=true // Facet counts to be number of groups instead of documents
group.field=Store // Groups results by the field "Store"
group.ngroups=true // Tells Solr to return the number of groups found
The number of groups found is what you would show to the user and use for paging, instead of the normal total count, which would be the total number of documents in the index.
Have you looked into field collapsing? It is new in Solr 3.3.
http://wiki.apache.org/solr/FieldCollapsing
What I did is I created another field that grouped the required fields in a single field and stored it, problem solved, so now I just group only on that field and I get the correct count.

reducing similar top results in solr result output

I have a search in solr that is returning about 1500 documents. These documents are basically products. For example, I have a bunch of womens shoes in my dataset. My dataset has a wide variety of shoes for women, but it also has some very similar results, for instance, size 11 womens nike trainers, size 10 womens nike trainers, etc... Now, when I search for womens shoes, solr scoring causes a certain set of these results to bubble to the top that are all very similar.. For instance, all the colors of one particular shoe model might come to the top. They are definitely different products, but I would prefer to get a wider variety of results than just every color of nike trainer shoes.
Does anyone have any suggestions? Note, I don't want to eliminate all the individually colored products. When someone searches for blue womens nike trainers, I want them to get the blue model as the top result. I'm using the dismax query as my main query. What I would like to do is basically boost on some kind of "uniqueness of name compared to other results" factor.
You could either collapse on fields like color or so:
http://wiki.apache.org/solr/FieldCollapsing
or you can use near duplicate detection when indexing:
http://wiki.apache.org/solr/Deduplication
http://karussell.wordpress.com/2010/12/23/detect-stolen-and-duplicate-tweets-with-solr/
the latter algorithm is implemented in jetwick for tweets, so it should work for titles, but not performant enough for big documents (so only plagiarism detection for 'short' strings). for long text you'll need local sensitive hashing:
http://en.wikipedia.org/wiki/Locality_sensitive_hashing

Resources