Can Solr DIH do atomic updates?` - solr

With Solr 4 came the ability to do atomic (partial) updates on existing documents within the index. I.e. one can match on the document ID and replace the contents of just one field, or add further entries to multivalued fields: http://wiki.apache.org/solr/Atomic_Updates
Can atomic updates be done from DataImportHandler (DIH)?

The answer is "yes" with the ScriptTransformer, I discovered through trial and error.
The Solr documentation shows how to add an update attribute to a field node with "set", "add" or "inc". If I create a test XML file with the requisite update attribute, it works fine when passed to the regular update handler. But, when passed to DIH - even without any transformation - the update attributes get ignored completely.
Here's a simplified version of the script transformer I used to reintroduce the update attribute and get atomic updates working. Note the use of the Java HashMap.
var atomicTransformer = function (row) {
var authorMap = new java.util.HashMap();
var author = String(row.get('author'));
authorMap.put('add', author);
row.put('author', authorMap);
};
This produces the following JSON in DIH debug mode:
{
"id": [
123
],
"author": [
{
"add": "Smith, J"
}
]
}
Multivalued fields are also no problem: pass in an ArrayList to the HashMap instead of a string.
var atomicTransformer = function (row) {
var fruits = new java.util.ArrayList();
fruits.add("banana");
fruits.add("apple");
fruits.add("pear");
var fruitMap = new java.util.HashMap();
fruitMap.put('add', fruits);
row.put('fruit', fruitMap);
}

Related

Return first n element from array in elasticsearch query

I have an array field in document named as IP which contains above 10000 ips as element.
for e.g.
IP:["192.168.a:A","192.168.a:B","192.168.a:C","192.168.A:b"...........]
Now i made a search query with some filter and i got the results but the problem is size of result very huge because of above field.
Now I want to fetch only N ips from array let say only 10 order doesn't matters.
So How do i do that...
update:
Apart from IP field there are others fields also and i applied filter on that field not on IP .I want whole document which satisfies filters .I just want to limit the number of elements in single IP fields.(Let me know if there is any other way apart from using script also ).
This kind of request could solve your problem :
GET ips/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"truncate_ip": {
"script": {
"source": """
String[] trunc_ip = new String[10];
for (int i = 0; i < 10; ++i) {
trunc_ip[i]= params['_source']['IP'][i];
}
return trunc_ip;
"""
}
}
}
}
You can use scriptedFields for generating a new field from existing fields in Elastic Search. Details added as comments.
GET indexName/_search
{
"_source": {
"excludes": "ips" //<======= Exclude from source the IP field (change the name based on your document)
},
"query": {
"match_all": {} // <========== Define relevant filters
},
"script_fields": {
"limited_ips": { // <========= add a new scipted field
"script": {
"source": "params['_source'].ips.stream().limit(2).collect(Collectors.toList())" // <==== Replace 2 with the number of i.ps you want in result.
}
}
}
}
Note:
If you remove _source then only the scripted field will be the part of the result.
Apart from accessing the value of the field, the rest of the syntax is Java. Change as it suits you.
Apart from non-analyzed text fields, use doc['fieldName'] to access the field with-in script. It is faster. See the below excerpt from E.S docs :
By far the fastest most efficient way to access a field value from a
script is to use the doc['field_name'] syntax, which retrieves the
field value from doc values. Doc values are a columnar field value
store, enabled by default on all fields except for analyzed text
fields
By default ES returns only 10 matching results so I am not sure what is your search query and what exactly you want to restrict
no of elements in single ip field
No of ip fields matching your search results
Please clarify above and provide your search query to help further.

MongoDb replaceOne() but wanted few field

in case of replaceOne() --> it replace whole object by new object -->what if i want few field of old object as it and replace that object by new object
replaceOne will replace the whole document and update with the one you are replacing you need to use updateOne instead this will only update specific fields which you wanted rest fields will be as it is
For example
db.restaurant.updateOne(
{ "name" : "Central Perk Cafe" },
{ $set: { "violations" : 3 }},
{upsert:true}
);
here upsert:true option will help in inserting new documents if it doesn't exist you can read more about it here https://docs.mongodb.com/manual/reference/method/db.collection.updateOne/
you can use update
if you document is {name:"company",country:"India",owner:"me"}
db.company.update({
_id:*id*
},
{$set:{
name:"mycompany",
country:"United Kingdom"}})

How can I filter results by custom field in Solr query?

I need some custom field filter for my Solr data like
{
"id":"1",
"name":"Test title",
"language":"en"
},
{
"id":"2",
"name":"Test title",
"language":"fr"
"parent": "1"
}
I need to get just first item by query
/select?q=name:test
So I need to filter results by parent field in such way, that one of the items will be present in the result.
Thanks for any ideas.
When I need to do querys in Solr I used the SearchQuery() and inside them I set filterQueries. There was possible set filters for my search.
final String FIELD_NAME = "name_text_mv"; // name of my field in Solr
SearchQuery searchQuery = init(facetSearchConfig); // init configs
searchQuery.setFreeTextQueryBuilder(text); // set the text of my search
setFiltersFreeTextSearch(searchQuery.getFilterQueries(), text, FIELD_NAME);
The function to make the magic (add in my search my filters):
private void setFiltersFreeTextSearch(List<QueryField> filters, String text, String... fields) {
text = StringUtils.stripAccents(text).toLowerCase();
String textCapitalized = capitalizeEachWolrd(text.toLowerCase());
for (String field : fields) {
QueryField queryField = new QueryField(field, SearchQuery.Operator.OR, SearchQuery.QueryOperator.CONTAINS,
text, text.toUpperCase(), textCapitalized);
filters.add(queryField);
}
}
How you can see, in this QueryField you can add the 'wheres' of you search in Solr. I was using CONTAINS and that is my 'LIKE' and 'OR' for find any item.
So basicly you can use QueryField() to add filters for you specifically field.
Well, this was my solution for my case, anyway, this is just an idea. :)
(For the projet is used Java)

Lunr - gatsby-plugin-lunr - Can I alter data / index on build?

I've got a Gatsy-Sanity project that needs a search component. For this I though of using gatsby-plugin-lunr. I run into a problem that my nodes are multilingual. For example one of my fields is constructed like:
"title": {
"_type": "localeString",
"nl": "Begin ",
"en": "Home "
},
(This parser is, in short, like following. If has key _type that starts with 'locale*', than return only value of key en or nl. This is passed by a var)
I could make a parser that splits/strips the data. I've got this sort of working (not yet succesfull) inside the component that runs the search query from the search-index. But that would mean it parses it each search. Is there a way to do this on build in gatsby-node.js with a lunr plugin? I also need this since I would need to add a language prefix on the slug/path of the result.
const SearchProcess = lunr => builder => {
// how to pre-process data
}
I'm going with a different gatsby plugin. gatsby-plugin-local-search
This plugin is able to alter the data before saving it with normalizer Now I can call a method to conditional alter the data per language.

mongodb - retrieve array subset

what seemed a simple task, came to be a challenge for me.
I have the following mongodb structure:
{
(...)
"services": {
"TCP80": {
"data": [{
"status": 1,
"delay": 3.87,
"ts": 1308056460
},{
"status": 1,
"delay": 2.83,
"ts": 1308058080
},{
"status": 1,
"delay": 5.77,
"ts": 1308060720
}]
}
}}
Now, the following query returns whole document:
{ 'services.TCP80.data.ts':{$gt:1308067020} }
I wonder - is it possible for me to receive only those "data" array entries matching $gt criteria (kind of shrinked doc)?
I was considering MapReduce, but could not locate even a single example on how to pass external arguments (timestamp) to Map() function. (This feature was added in 1.1.4 https://jira.mongodb.org/browse/SERVER-401)
Also, there's always an alternative to write storedJs function, but since we speak of large quantities of data, db-locks can't be tolerated here.
Most likely I'll have to redesign the structure to something 1-level deep, like:
{
status:1,delay:3.87,ts:138056460,service:TCP80
},{
status:1,delay:2.83,ts:1308058080,service:TCP80
},{
status:1,delay:5.77,ts:1308060720,service:TCP80
}
but DB will grow dramatically, since "service" is only one of many options which will append each document.
please advice!
thanks in advance
In version 2.1 with the aggregation framework you are now able to do this:
1: db.test.aggregate(
2: {$match : {}},
3: {$unwind: "$services.TCP80.data"},
4: {$match: {"services.TCP80.data.ts": {$gte: 1308060720}}}
5: );
You can use a custom criteria in line 2 to filter the parent documents. If you don't want to filter them, just leave line 2 out.
This is not currently supported. By default you will always receive the whole document/array unless you use field restrictions or the $slice operator. Currently these tools do not allow filtering the array elements based on the search criteria.
You should watch this request for a way to do this: https://jira.mongodb.org/browse/SERVER-828
I'm attempting to do something similar. I tried your suggestion of using the GROUP function, but I couldn't keep the embedded documents separate or was doing something incorrectly.
I needed to pull/get a subset of embedded documents by ID. Here's how I did it using Map/Reduce:
db.parent.mapReduce(
function(parent_id, child_ids){
if(this._id == parent_id)
emit(this._id, {children: this.children, ids: child_ids})
},
function(key, values){
var toReturn = [];
values[0].children.forEach(function(child){
if(values[0].ids.indexOf(product._id.toString()) != -1)
toReturn.push(child);
});
return {children: toReturn};
},
{
mapparams: [
"4d93b112c68c993eae000001", //example parent id
["4d97963ec68c99528d000007", "4debbfd5c68c991bba000014"] //example embedded children ids
]
}
).find()
I've abstracted my collection name to 'parent' and it's embedded documents to 'children'. I pass in two parameters: The parent document ID and an array of the embedded document IDs that I want to retrieve from the parent. Those parameters are passed in as the third parameter to the mapReduce function.
In the map function I find the parent document in the collection (which I'm pretty sure uses the _id index) and emit its id and children to the reduce function.
In the reduce function, I take the passed in document and loop through each of the children, collecting the ones with the desired ID. Looping through all the children is not ideal, but I don't know of another way to find by ID on an embedded document.
I also assume in the reduce function that there is only one document emitted since I'm searching by ID. If you expect more than one parent_id to match, than you will have to loop through the values array in the reduce function.
I hope this helps someone out there, as I googled everywhere with no results. Hopefully we'll see a built in feature soon from MongoDB, but until then I have to use this.
Fadi, as for "keeping embedded documents separate" - group should handle this with no issues
function getServiceData(collection, criteria) {
var res=db[collection].group({
cond: criteria,
initial: {vals:[],globalVar:0},
reduce: function(doc, out) {
if (out.globalVar%2==0)
out.vals.push({doc.whatever.kind.and.depth);
out.globalVar++;
},
finalize: function(out) {
if (vals.length==0)
out.vals='sorry, no data';
return out.vals;
}
});
return res[0];
};

Resources