Lucene find index in multivalued field?

Lucene find index in multivalued field? - solr

I have a multivalued field of names and I have to find the index of the matching value in the list.
DOC example:
profile_id: 1
names: [ "My name", "something", "My second name", "My nickname"]
query:
profile_id:1 AND names:"My secon name"~
Expected result:
my doc, and the index of the matched, 2
Is it possible?

SpanTermQuery matches documents just like TermQuery, but it also keeps track of position of the same terms that appear within the same document.
Spans positionInfoForAllMatchingDocs = spanQuery.getSpans(..arguments..);
int position = -1;
while(positionInfoForAllMatchingDocs.next()){
position = positionInfoForAllMatchingDocs.start() // Returns the start position of the current match.
System.out.println("Found match in the document with id: " + positionInfoForAllMatchingDocs.doc() + " at position: " + position); // You obviously want to replace this sysout with something elegant.
}
Make sure that the field, for which you are planning to retrieve the positional information, was indexed with Field.TermVector.WITH_POSITIONS or Field.TermVector.WITH_POSITIONS_AND_OFFSETS.

Related

In the vis.js library, how do I get the max value in the "id" field?

I'm discovering vis.js, especially the network module.
I need to get the max value of the "id" field of my nodes dataset:
var nodes = new vis.DataSet([ {id: 1, label: "Node 1"}, {id: 2, label: "Node 2"}, {id: 3, label: "Node 3"}]);
The best I've been able to do so far is using a forEach loop:
var max=0;
nodes.forEach(function(el){j=parseInt(el.id);if(j!=NaN && j>max){max=j;}});
console.log("max: ", max);
It seems to me it can't be THE way to do this.
I saw a max(field) method documented in the doc for vis' DataSet (https://visjs.github.io/vis-data/data/dataset.html):
max(field) [Object|null] Find the item with maximum value of specified field. Returns null if no item is found.
But as stupid as i may sound, I just can't get it to work.
I tried :
console.log("max: ", nodes.max('id'));
console.log("max: ", nodes.max(node => node.id));
console.log("max: ", nodes.max(node => node['id']));
How can I simply get the max value of the field 'id' of all entries of a DataSet?
[Edit] The ID's in the example here above are numeric ({id: 1, ...}).
In my case, they were strings ({id: '1', ...}), and exactly that seemed to be the problem.

try this line:
nodes.max('id').id
nodes.max('id') will return the node with the max id value.

After loads of lost time, I finally figured out that max() works perfectly with numeric IDs.
My first thought was then: Reading the doc could have saved me hours...
...But checking https://visjs.github.io/vis-network/docs/network/nodes.html , it explicitely defines ID as a string:
[name:] id [type:] String [default:] undefined [description:] The id of the node. The id is mandatory for nodes and they have to be unique. This should obviously be set per node, not globally.
So beware: it's supposed to be a string, but if it's a string, some features don't work.
If I'm still missing something, I'd be happy to read your comments.

ArangoDB Indexes and arrays

I am trying use document collection for fast lookup, sample document
document Person {
...
groups: ["admin", "user", "godmode"],
contacts: [
{
label: "main office",
items: [
{ type: "phone", value: '333444222' },
{ type: "phone", value: '555222555' },
{ type: "email", value: 'bob#gmail.com' }
]
}
]
...
}
Create Hash index for "groups" field
Query: For P in Person FILTER "admin" IN P.groups RETURN P
Result: Working, BUT No index used via explain query
Question: How use queries with arrays filter and indexes ? performance is main factor
Create Hash index for "contacts[].items[].value"
Query: For P in Person FILTER "333444222" == P.contacts[*].items[*].value RETURN P
Result: Double usage of wildcard not supported?? Index not used, query empty
Question: How organize fast lookup with for this structure with indexes ?
P.S. also tried MATCHES function, multi lever for-in, hash indexed for arrays never used
ArangoDB version 2.6.8

Indexes can be used from ArangoDB version 2.8 on.
For the first query (FILTER "admin" IN p.groups), an array hash index on field groups[*] will work:
db._create("persons");
db.persons.insert(personDateFromOriginalExample);
db.persons.ensureIndex({ type: "hash", fields: [ "groups[*]" ] });
This type of index does not exist in versions prior to 2.8.
With an array index in place, the query will produce the following execution plan (showing that the index is actually used):
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 1 - FOR p IN persons /* hash index scan */
3 CalculationNode 1 - LET #1 = "admin" in p.`groups` /* simple expression */ /* collections used: p : persons */
4 FilterNode 1 - FILTER #1
5 ReturnNode 1 - RETURN p
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash persons false false 100.00 % [ `groups[*]` ] "admin" in p.`groups`
The second query will not be supported by array indexes, as it contains multiple levels of nesting. The array indexes in 2.8 are restricted to one level, e.g. groups[*] or contacts[*].label will work, but not groups[*].items[*].value.

about 1.) this is Work-in-progress and will be included in one of the next releases (most likely 2.8).
We have not yet decided about the AQL syntax to retrieve the array, but FILTER "admin" IN P.groups is among the most likely ones.
about 2.) having implemented 1. this will work out of the box as well, the index will be able to cover several depths of nesting.
Neither of the above can be properly indexed in the current release (2.6)
The only alternative i can offer is to externalize the values and use edges instead of arrays.
In your code the data would be the following (in arangosh).
I used fixed _key values for simplicity, works without them as well:
db._create("groups"); // saves the group elements
db._create("contacts"); // saves the contact elements
db._ensureHashIndex("value") // Index on contacts.value
db._create("Person"); // You already have this
db._createEdgeCollection("isInGroup"); // Save relation group -> person
db._createEdgeCollection("hasContact"); // Save relation item -> person
db.Person.save({_key: "user"}) // The remainder of the object you posted
// Now the items
db.contacts.save({_key:"phone1", type: "phone", value: '333444222' });
db.contacts.save({_key:"phone2", type: "phone", value: '555222555' });
db.contacts.save({_key:"mail1", type: "email", value: 'bob#gmail.com'});
// And the groups
db.groups.save({_key:"admin"});
db.groups.save({_key:"user"});
db.groups.save({_key:"godmode"});
// Finally the relations
db.hasContact.save({"contacts/phone1", "Person/user", {label: "main office"});
db.hasContact.save({"contacts/phone2", "Person/user", {label: "main office"});
db.hasContact.save({"contacts/mail1", "Person/user", {label: "main office"});
db.isInGroup.save("groups/admin", "Person/user", {});
db.isInGroup.save("groups/godmode", "Person/user", {});
db.isInGroup.save("groups/user", "Person/user", {});
Now you can execute the following queries:
Fetch all admins:
RETURN NEIGHBORS(groups, isInGroup, "admin")
Get all users having a contact with value 333444222:
FOR x IN contacts FILTER x.value == "333444222" RETURN NEIGHBORS(contacts, hasContact, x)

How to do searching from first character in Google App Engine?

In my pageSearch there is my value which I have to search and the country is my column name and FilterOperator is my Filteration ,If I type "A"(uppercase and lowercase) then it should give value starting with the value "A" that's what I need it.
Query query=new Query("customerRolodex").addFilter("country",FilterOperator.EQUAL,pageSearch);//.setFilter(c_r);
PreparedQuery pq=ds.prepare(query);
for(Entity result:pq.asIterable()){
//here i m using json to send and printing data;
p=new cust_rolo();
p.setCountry(result.getProperty("country").toString());
p.setRegion(result.getProperty("region").toString());
list.add(p);
}
json.put("rows", list);
out.print(json.toString());
Any help would be appreciated and also I applied Greater than or Equal to operator for this

You should use query between a range of values:
.addFilter("country", FilterOperator.GREATER_THAN_OR_EQUAL, pageSearch)
.addFilter("country", FilterOperator.LESS_THAN, pageSearch + "\uffff")

Custom Query Component - How to get "score" from document id?

I'm writing several Solr Custom Query Components.
Each component run different kinds of queries:
Component A: does a group by query field A
Component B: does a group by on a different fild B
Each component will send it's the documents from it's result to the next component.
In my "process" function, I'm donig the following after the result is set by grouping:
IndexSchema schema = searcher.getSchema();
DocListAndSet s = result.getDocListAndSet();
DocSet s3 = s.docSet;
DocIterator dit = s3.iterator()
while (dit.hasNext())
{
SolrDocument doc = new SolrDocument();
int docid = dit.nextDoc();
//float score = dit.score();<--This does not get the score
Document luceneDoc = searcher.doc(docid);//get the document using the doc id
for( Fieldable field : luceneDoc.getFields())
{
SchemaField sf = schema.getField( field.name() );
doc.addField( field.name(), sf.getType().toObject( field ) );
......
}
And then iterating through the Set and createing SolrDocument.
The SolrDocumentes are entered into a SolDocumentList and end I send it off to the next Component:
rb.req.getContext().put("TAG", list);
I also want to add a field called "score" SolrDocument, this field will contain the actual score. I've tried getting the score using:
float score = dit.score()
But the above does not get the score of the document. How do I get the "score" of the document using the document id?

Is there a particular reason you are getting the docSet instead of the docList?
I would try (condensing a bit) getting s.docList.iterator() instead of s.docSet.iterator(). The latter states specifically in the documentation here that you can't get meaningful scores from it, where the docList states it may contains valid scores.

Well you have to set GET_Scores in getDocList(query,List,Lsort,offset,maxnoofdocs,1)
Here
`
query is your query obj
List<Query> your filters could be null
lsort could be null
offset
maxnoofdocs integer
1 means get score with documents`

lucene ngram tokenizer usage for fuzzy phrase match

I am trying to achieve fuzzy phrase search (to match misspelled words) by using lucene, by referring various blogs I thought to try ngram indexes on fuzzy phrase search.
But I couldn't find ngram tokenizer as part of my lucene3.4 JAR library, is it deprecated and replaced with something else ? - currently I am using standardAnalyzer where I am getting decent results for exact match of terms.
I have below two requirements to handle.
My index is having document with phrase "xyz abc pqr", when I provide query "abc xyz"~5, I am able to get results, but my requirement is to get results for same document even though I have one extra word like "abc xyz pqr tst" in my query (I understand match score will be little less) - using proximity extra word in phrase is not working, if I remove proximity and double quotes " " from my query, I am getting expected results (but there I get many false positives like documents containing only xyz, only abc etc.)
In same above example, if somebody misspell query "abc xxz", I still want to get results for same document.
I want to give a try with ngram but not sure it will work as expected.
Any thoughts ?

Try to use BooleanQuery and FuzzyQuery like:
public void fuzzysearch(String querystr) throws Exception{
querystr=querystr.toLowerCase();
System.out.println("\n\n-------- Start fuzzysearch -------- ");
// 3. search
int hitsPerPage = 10;
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
BooleanQuery bq = new BooleanQuery();
String[] searchWords = querystr.split(" ") ;
int id=0;
for(String word: searchWords ){
Query query = new FuzzyQuery(new Term(NAME,word));
if(id==0){
bq.add(query, BooleanClause.Occur.MUST);
}else{
bq.add(query, BooleanClause.Occur.SHOULD);
}
id++;
}
System.out.println("query ==> " + bq.toString());
searcher.search(bq, collector );
parseResults( searcher, collector ) ;
searcher.close();
}
public void parseResults(IndexSearcher searcher, TopScoreDocCollector collector ) throws Exception {
ScoreDoc[] hits = collector.topDocs().scoreDocs;
// 4. display results
System.out.println("Found " + hits.length + " hits.");
for(int i=0;i<hits.length;++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ". " + d.get(NAME));
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Lucene find index in multivalued field? - solr

Related

In the vis.js library, how do I get the max value in the "id" field?

ArangoDB Indexes and arrays

How to do searching from first character in Google App Engine?

Custom Query Component - How to get "score" from document id?

lucene ngram tokenizer usage for fuzzy phrase match

Categories

Resources