Solr query to find one letter without other letter around - solr

I have documents in my solr already indexed. I want to find Producer and model in tire.
I have file with producer and model like this:
Nokian;WR G2 SUV
Nokian;WR SUV
Nokian;V
Query:
((productname:"NOKIAN" OR producer:"NOKIAN") AND (productname:"V" OR description:"V" OR referencenumber:"V"))
But it found for example this:
"2X NOKIAN 215/55 R17 94V LINE (3)"
Because in this product speed index is V and here model is Line. My algorithm take this product for Nokian;V not for Nokian;Line.
How to ask solr to gives me only this product where this V don't have any other letters around?
LETNIE 225/45/17 94V NOKIAN V FINLAND - PŁOTY
This found beautiful. Its Nokian;V.

As far as I understand your question you need to put MUST quantifier before each boolean clause. So query will look like:
(
+(productname:"NOKIAN" OR producer:"NOKIAN") AND
+(productname:"V" OR description:"V" OR referencenumber:"V")
)

If your productname field is of type text it has the WordDelimiterFilter in the analysis chain. One of the default behaviors of this filter is to split terms on letter-number boundaries causing:
2X NOKIAN 215/55 R17 94V LINE (3)
to generate the following tokens:
2 X NOKIAN 215 55 R 17 94 V LINE 3
(which matches the "V" in your query).
You can always run debug=results to get an explanation for why something matches. I think in this particular case, you might construct another field type for your productname field that analyzes your model string less aggressively.

I solved the problem in such a way that sorted out brand,model Dictionary. I used my own comparer.
public class MyComparer : IComparer<string>
{
int IComparer<string>.Compare(string x, string y)
{
if (x == y)
{
return 0;
}
if (x.Contains(y))
{
return -1;
}
else
{
return 1;
}
}
}
All model that have V or H now are on the end of Dcitionary. It's works very well. Because first solr searched Nokian;Line and this product where found add to other list alreadyFound and skip this product where found model. Thanks all for your reply.

Related

EOF Error Parsing Manchester Syntax in OWL-API

I have an API that receive a JSON document with classes, properties and axioms of an ontology. The file looks like this:
{
"id": "myontologyid",
"outformat": "OWL",
"ontoclass": ["Person", "Man", "Woman", "Animal", "Rational", "Arm"],
"ontoaxioms": ["Man subClassOf (Person)", "Person EquivalentTo: (Man OR Woman)", "hasBrother max 2 xsd:integer"],
"ontoproperties": ["hasPart", "isBrotherOf", "hasBrother"]
}
The ontoaxioms key is an array with all the axioms of the ontology. The values of this array MUST be in Manchester syntax as I will use the ManchesterOWLSyntaxParser to parse.
When I try to parse this code, I get the following error on hasBrother max 2 xsd:integer axiom:
[apache-tomcat-8.5.69-2]: org.semanticweb.owlapi.manchestersyntax.renderer.ParserException: Encountered |EOF| at line 1 column 29. Expected one of:
SubClassOf:
or
and
DisjointWith:
EquivalentTo:
I believe the Manchester syntax is incorrect. But I couldn't find any reference or documentation of OWL-API which indicates how to use it. Is there some?
Below is part of my code which tries to parse the axioms:
ManchesterOWLSyntaxParserImpl parser = (ManchesterOWLSyntaxParserImpl) OWLManager.createManchesterParser();
parser.setOWLEntityChecker(entityChecker);
try {
for (int i = 0; i < this.axiomas.length(); i++) {
parser.setStringToParse(this.axiomas.getString(i));
owlOntology.addAxiom(parser.parseAxiom());
}
} catch (Exception e) {
System.out.print(e.toString());
return null;
}
The questions are:
How to solve this EOF error?
How to insert correctly Manchester Syntax into OWL-API?
Where can I find some documentation on how to use Manchester Syntax to parse ontologies?
Many thanks in advance.
Your use of OWLAPI classes appears correct. The problem with the input that something else is expected to follow, i.e., that's not a full axiom.
Is the intent to say that hasBrother can only appear twice for an individual and has integer range?
As it happens, there's a unit test in the OWLAPI contract module that uses this string as input for parsing:
String in = "p max 1 owl:real";
ManchesterOWLSyntaxParser parser = OWLManager.createManchesterParser();
parser.setStringToParse(in);
OWLClassExpression cl = parser.parseClassExpression();
The string has the same format as what you're trying to parse, and it gives a class expression, not an axiom - specifically, a qualified max cardinality restriction for a data property. This can be the superclass or the subclass in a subclass axiom, for example, but the rest of the axiom is not present.

What's the difference between rangeSets and rangeSet in SolrIndexedProperty type?

While creating a new search facet in Hybris 5.7 I've found that in SolrIndexedProperty type there is an attribute called rangeSet and there is also a many-to-many relation called SolrIndexedProperty2SolrValueRangeSetRelation between SolrIndexedProperty and SolrValueRangeSet.
What's the difference between these fields? None of them is deprecated or something. Which one should I use in order to create my own facet with particular value ranges?
I hope you have already find the answer for your question. Still adding my understanding just in case...
A SolrValueRangeSet is a collection of related SolrValueRange.
There are two different fields in hybris to support rangeSet and rangeSets.
One can add a SolrValueRangeSet or a Collection of SolrValueRangeSet to a SolrIndexedProperty to support one-2-many or one-2-many-2many property range values. You can consider the later as the enhancement over the prior.
If you want to allow multi facet ranges for different values you can use rangeSets as shown in below example
INSERT_UPDATE SolrValueRangeSet;name[unique=true]; qualifier; type; solrValueRanges(&rangeValueRefID)
;priceRange-USD ; PriceRangeUSD; double; usd-range1, usd-range2
;priceRange-EUR ; PriceRangeEUR; double; eur-range1, eur-range2
SolrValueRange : Define related price range values like below
INSERT_UPDATE SolrValueRange; &rangeValueRefID;s olrValueRangeSet(name)[unique=true]; name[unique=true]; from; to
;usd-range1;priceRange-USD; Rating 1; 0; 50
;usd-range2;priceRange-USD; Rating 2; 50; 100
;eur-range1;priceRange-EUR; Rating 1; 0; 120
;eur-range2;priceRange-EUR; Rating 2; 120; 300
INSERT_UPDATE SolrIndexedProperty; name[unique = true];rangeSets(name)
; price range; priceRange-USD , priceRange-EUR

Flink - behaviour of timesOrMore

I want to find pattern of events that follow
Inner pattern is:
Have the same value for key "sensorArea".
Have different value for key "customerId".
Are within 5 seconds from each other.
And this pattern needs to
Emit "alert" only if previous happens 3 or more times.
I wrote something but I know for sure it is not complete.
Two Questions
I need to access the previous event fields when I'm in the "next" pattern, how can I do that without using the ctx command because it is heavy..
My code brings weird result - this is my input
and my output is
3> {first=[Customer[timestamp=50,customerId=111,toAdd=2,sensorData=33]], second=[Customer[timestamp=100,customerId=222,toAdd=2,sensorData=33], Customer[timestamp=600,customerId=333,toAdd=2,sensorData=33]]}
even though my desired output should be all first six events (users 111/222 and sensor are 33 and then 44 and then 55
Pattern<Customer, ?> sameUserDifferentSensor = Pattern.<Customer>begin("first", skipStrategy)
.followedBy("second").where(new IterativeCondition<Customer>() {
#Override
public boolean filter(Customer currCustomerEvent, Context<Customer> ctx) throws Exception {
List<Customer> firstPatternEvents = Lists.newArrayList(ctx.getEventsForPattern("first"));
int i = firstPatternEvents.size();
int currSensorData = currCustomerEvent.getSensorData();
int prevSensorData = firstPatternEvents.get(i-1).getSensorData();
int currCustomerId = currCustomerEvent.getCustomerId();
int prevCustomerId = firstPatternEvents.get(i-1).getCustomerId();
return currSensorData==prevSensorData && currCustomerId!=prevCustomerId;
}
})
.within(Time.seconds(5))
.timesOrMore(3);
PatternStream<Customer> sameUserDifferentSensorPatternStream = CEP.pattern(customerStream, sameUserDifferentSensor);
DataStream<String> alerts1 = sameUserDifferentSensorPatternStream.select((PatternSelectFunction<Customer, String>) Object::toString);
You will have an easier time if you first key the stream by the sensorArea. They you will be pattern matching on streams where all of the events are for a single sensorArea, which will make the pattern easier to express, and the matching more efficient.
You can't avoid using an iterative condition and the ctx, but it should be less expensive after keying the stream.
Also, your code example doesn't match the text description. The text says "within 5 seconds" and "3 or more times", while the code has within(Time.seconds(2)) and timesOrMore(2).

how to order groups by count in solr

I'm wondering how to order groups in a Solr result. I want to order the groups by numFound. I saw how to order the groups by score here, but that didn't seem to actually make a difference in the examples I looked at, and isn't exactly what I wanted.
In the xml you can see the number per group as numFound and that is what I want to sort the groups by, so for example I could see the largest group at the top.
<arr name="groups">
<lst>
<str name="groupValue">top secret</str>
<result name="doclist" numFound="12" start="0">
...
Any tips appreciated! Thanks!
This is an old question, but it is possible with two queries.
First query: bring back the field you're grouping by as a set of facets for your navigation state. You can limit the number of records returned to 0 here: you just need the facets. The number of facets you return should be the size of your page.
group_id:
23 (6)
143:(3)
5:(2)
Second query: Should be for the records, so no facets are required. The query should be an OR query for the facet field values returned from the first query. (group_id:23 OR group_id:143 OR group_id:5 and so on) and be grouped by the id you are used for grouping.
Sorting: reorder the records from query 2 to match the order from query 1.
That'll do it, with the proviso that I'm not sure how scalable that OR query will be. If you're looking to paginate, remember that you can offset facets: use that as the mechanism instead of offseting the records.
Sorting on the numFound is not possible as numFound is not an field in Solr.
Check the discussion mentioning it not being supported and I did not find a JIRA open for the issue as well.
Not possible since the last time I looked into this.
you can sort by using fields
consider an Example :
If you have 5 FACETS and COUNT associated with it.
Then you can sort by using the COUNTS of each fields.
It can be applicable to normal/non-facets fields .
public class FacetBean implements Category,Serializable {
private String facetName; //getter , setters
private long facetCount; // getter , setters
public FacetBean(String facetName, long count,) {
this.facetName = facetName;
this.count = count;
}}
Your calling method should be like this
private List<FacetBean> getFacetFieldsbyCount(QueryResponse queryResponse)
{
List<FacetField> flds = queryResponse.getFacetFields();
List<FacetBean> facetList = new ArrayList<FacetBean>();
FacetBean facet = null;
if (flds != null) {
for (FacetField fld : flds) {
facet = new FacetBean();
facet.setFacetName(fld.getName());
List<Count> counts = fld.getValues();
if (counts != null) {
for (Count count : counts) {
facet.setFacetCount(count.getCount());
}
}
facetList.add(facet);
}
}
Collections.sort(facetList,new Comparator<FacetBean>() {
public int compare(FacetBean obj1, FacetBean obj2) {
if(obj1.getFacetCount() > obj2.getFacetCount()) {
return (int)obj1.getFacetCount();
} else {
return (int)obj2.getFacetCount();
}
}
});
return facetList;
}
In The same URL They have mentioned something like.
sort -- >ex : For example, sort=popularity desc will cause the groups to be sorted according to the highest popularity doc
group.sort -- > you can apply your field here .
Hope it helps.

Querying a timestamp column from LINQ to SQL

My table has a timestamp column named "RowVer" which LINQ maps to type System.Data.Linq.Binary. This data type seems useless to me because (unless I'm missing something) I can't do things like this:
// Select all records that changed since the last time we inserted/updated.
IEnumerable<UserSession> rows = db.UserSessions.Where
( usr => usr.RowVer > ???? );
So, one of the solutions I'm looking at is to add a new "calculated column" called RowTrack which is defined in SQL like this:
CREATE TABLE UserSession
(
RowVer timestamp NOT NULL,
RowTrack AS (convert(bigint,[RowVer])),
-- ... other columns ...
)
This allows me to query the database like I want to:
// Select all records that changed since the last time we inserted/updated.
IEnumerable<UserSession> rows = db.UserSessions.Where
( usr => usr.RowTrack > 123456 );
Is this a bad way to do things? How performant is querying on a calculated column? Is there a better work-around?
Also, I'm developing against Sql Server 2000 for ultimate backwards compatibility, but I can talk the boss into making 2005 the lowest common denominator.
AS Diego Frata outlines in this post there is a hack that enables timestamps to be queryable from LINQ.
The trick is to define a Compare method that takes two System.Data.Linq.Binary parameters
public static class BinaryComparer
{
public static int Compare(this Binary b1, Binary b2)
{
throw new NotImplementedException();
}
}
Notice that the function doesn't need to be implemented, only it's name (Compare) is important.
And the query will look something like:
Binary lastTimestamp = GetTimeStamp();
var result = from job in c.GetTable<tblJobs>
where BinaryComparer.Compare(job.TimeStamp, lastTimestamp)>0
select job;
(This in case of job.TimeStamp>lastTimestamp)
EDIT:
See Rory MacLeod's answer for an implementation of the method, if you need it to work outside of SQL.
SQL Server "timestamp" is only an indicator that the record has changed, its not actually a representation of Date/Time. (Although it is suppose to increment each time a record in the DB is modified,
Beware that it will wrap back to zero (not very often, admittedly), so the only safe test is if the value has changed, not if it is greater than some arbitrary previous value.
You could pass the TimeStamp column value to a web form, and then when it is submitted see if the TimeStamp from the form is different to the value in the current record - if its is different someone else has changed & saved the record in the interim.
// Select all records that changed since the last time we inserted/updated.
Is there a better work-around?
Why not have two columns, one for createddate another for lastmodifieddate. I would say that is more traditional way to handle this scenario.
Following on from jaraics' answer, you could also provide an implementation for the Compare method that would allow it to work outside of a query:
public static class BinaryExtensions
{
public static int Compare(this Binary b1, Binary b2)
{
if (b1 == null)
return b2 == null ? 0 : -1;
if (b2 == null)
return 1;
byte[] bytes1 = b1.ToArray();
byte[] bytes2 = b2.ToArray();
int len = Math.Min(bytes1.Length, bytes2.Length);
int result = memcmp(bytes1, bytes2, len);
if (result == 0 && bytes1.Length != bytes2.Length)
{
return bytes1.Length > bytes2.Length ? 1 : -1;
}
return result;
}
[DllImport("msvcrt.dll")]
private static extern int memcmp(byte[] arr1, byte[] arr2, int cnt);
}
The use of memcmp was taken from this answer to a question on comparing byte arrays. If the arrays aren't the same length, but the longer array starts with the same bytes as the shorter array, the longer array is considered to be greater than the shorter one, even if the extra bytes are all zeroes.

Resources