Copy fields on nested child documents? Ngram search on multiple child docs to find a specific parent - solr

I am transforming JSON objects and index them in Solr 9. When resolving lists / arrays of objects, I am using nested child documents, so array elements are stored as own documents.
Now, I ran into the issue that I would like to use copy fields on nested child documents and store this value in the parent.
JSON:
{
"legalName": "Some Name",
"addresss": {
"street": "Bala Street",
"houseNr": 13,
"city": "Random City",
"postalCode": 1234,
"country": "NL"
},
"otherLegalNames": [
{
"text": "TEXT IN EN",
"lang": "EN"
},
{
"text": "TEXT IN DE",
"lang": "DE"
},
{
"text": "TEXT IN NL",
"lang": "NL"
}
]
}
When indexing this object, I flatten basic structs like address but keep arrays, e.g., otherLegalNames, and stored them through child docs.
Basically, the documents look like this (q=*:*&fl=*,[child]):
{
"id": "5493006O42CR4SHELP26",
"legalName": "Some Name",
"addresss.street": "Bala Street",
"addresss.houseNr": 13,
"addresss.city": "Random City",
"addresss.postalCode": 1234,
"addresss.country": "NL",
"otherLegalNames": [
{
"id": "5493006O42CR4SHELP26/otherLegalNames#0",
"text": "TEXT IN EN",
"lang": "EN"
},
{
"id": "5493006O42CR4SHELP26/otherLegalNames#1",
"text": "TEXT IN DE",
"lang": "DE"
},
{
"id": "5493006O42CR4SHELP26/otherLegalNames#2",
"text": "TEXT IN NL",
"lang": "NL"
}
]
}
Now I would like to search for these docs by their legalName and must therefore search in the parent legalName field but also include all text fields stored under otherLegalNames. During research, I found that copy fields are the way to go, but I am not sure how I would handle child documents with such copy fields.
My goal would be to get a searchableLegalNames field with the value: ["Some Name", "TEXT IN EN", "TEXT IN DE", "TEXT IN NL"] or similar to perform a Ngram based search on legalName including every language.
Is this possible to achieve with copy fields or are child documents not supported for this purpose? If so, how should I restructure my schema? It's really hard to flatten every legal name, as this array might be empty or contain an arbitrary number of otherLegalNames.
Thanks.
Regards Artur

Related

Manipulate field value of copy-field in Apache Solr

I have a simple string "PART_NUMBER" value as a field in solr. I would like to add an additional field which places that value in a URL field. To do this, I created a new field type, field, and copy field
"add-field-type": {
"name": "endpoint_url",
"class": "solr.TextField",
"positionIncrementGap": "100",
"analyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.PatternReplaceFilterFactory",
"pattern": "([\\s\\S]*)",
"replacement": "http://myurl/$1.jpg"
}
]
}
},
"add-field": {
"name": "URL",
"type": "endpoint_url",
"stored": true,
"indexed": true
},
"add-copy-field":{ "source":"PART_NUMBER", "dest":"URL" }
As some of you probably guessed, my query output looks like
{
"id": "1",
"PART_NUMBER": "ABCD1234",
"URL": "ABCD1234",
"_version_": 1645658574812086272
}
Because the endpoint_url fieldtype only modifies the index. Indeed, when doing my analysis, I get
http://myurl/ABCD1234.jpg
My question: Is there any way to apply a tokenizer or filter and feed it back in to the field value? I would prefer this output when returning the result:
{
"id": "1",
"PART_NUMBER": "ABCD1234",
"URL": "http://myurl/ABCD1234.jpg",
"_version_": 1645658574812086272
}
Is this possible to do in Solr?
Solution was posted here:
Custom Solr analyzers not being used during indexing
I need to use an Update Processors In order to change the field value before analysis. The process can be found here:
https://lucene.apache.org/solr/guide/8_1/update-request-processors.html

How to do a NoSql linked query

I have a noSql (Cloudant) database
-Within the database we have documents where one of the document fields represents “table” (type of document)
-Within the documents we have fields that represent links other documents within the database
For example:
{_id: 111, table:main, user_id:222, field1:value1, other1_id: 333}
{_id: 222, table:user, first:john, other2_id: 444}
{_id: 333, table:other1, field2:value2}
{_id: 444, table:other2, field3:value3}
We want of way of searching for _id:111
And the result be one document with data from linked tables:
{_id:111, user_id:222, field1:value1, other1_id: 333, first:john, other2_id: 444, field2:value2, field3:value3}
Is there a way to do this?
There is flexibility on the structure of how we store or get the data back—any suggestions on how to better structure the data to make this possible?
The first thing to say is that there are no joins in Cloudant. If you're schema relies on lots of joining then you're working against the grain of Cloudant which may mean extra complication for you or performance hits.
There is a way to de-reference other documents' ids in a MapReduce view. Here's how it works:
create a MapReduce view to emit the main document's body and its linked document's ids in the form { _id: 'linkedid'}
query the view with include_docs=true to pull back the document AND the de-referenced ids in one go
In your case, a map function like this:
function(doc) {
if (doc.table === 'main') {
emit(doc._id, doc);
if (doc.user_id) {
emit(doc._id + ':user', { _id: doc.user_id });
}
}
}
would allow you to pull back the main document and its linked user document in one API by hitting the GET /mydatabase/_design/mydesigndoc/_view/myview?startkey="111"&endkey="111z"&include_docs=true endpoint:
{
"total_rows": 2,
"offset": 0,
"rows": [
{
"id": "111",
"key": "111",
"value": {
"_id": "111",
"_rev": "1-5791203eaa68b4bd1ce930565c7b008e",
"table": "main",
"user_id": "222",
"field1": "value1",
"other1_id": "333"
},
"doc": {
"_id": "111",
"_rev": "1-5791203eaa68b4bd1ce930565c7b008e",
"table": "main",
"user_id": "222",
"field1": "value1",
"other1_id": "333"
}
},
{
"id": "111",
"key": "111:user",
"value": {
"_id": "222"
},
"doc": {
"_id": "222",
"_rev": "1-6a277581235ca01b11dfc0367e1fc8ca",
"table": "user",
"first": "john",
"other2_id": "444"
}
}
]
}
Notice how we get two rows back, the first is the main document body, the second the linked user.

How to projection element in array field of MongoDb collection?

MongoDb Collection Example (Person):
{
"id": "12345",
"schools": [
{
"name": "A",
"zipcode": "12345"
},
{
"name": "B",
"zipcode": "67890"
}
]
}
Desired output:
{
"id": "12345",
"schools": [
{
"zipcode": "12345"
},
{
"zipcode": "67890"
}
]
}
My current partial code for retrieving all:
collection.find({}, {id: true, schools: true})
I am querying the entire collection. But I only want to return zipcode part of school element, not other fields (because the actual school object might contain much more data which I do not need). I could retrieve all and remove those un-needed fields (like "name" of school) in code, but that's not what I am looking for. I want to do a MongoDb query.
You can use the dot notation to project specific fields inside documents embedded in an array.
db.collection.find({},{id:true, "schools.zipcode":1}).pretty()

Issue when deleting the items from the array in document of MongoDB

I am inserting log items in the document in the form of an array. I have restricted document size up to 5MB to make sure that the document size is not increased.
Here one document contains one array and all the log items will be stored into the array. Lets say I have 500 log items of 5 MB size is stored in one document in the form an array.
When I delete 497 log items,It is showing the remaining 3 log items in the document but when I tried to delete one of the items from the 3 log items, the entire document was deleted, I don't know What is happening.
Is the array in the document should have some minimum number size of data.
Note: I am restricting the document size at my application level.
Here is the sample data:
activityLogDetails:
[{
"activityLog": {
"acctId": 1,
"info1": {
"itemName": "-",
"value": "-"
},
"info2": {
"itemName": "-",
"value": "-"
},
"errorCode": "",
"internalInformation": "",
"kind": "Infomation",
"loginId": "0",
"opeLogId": "G1_1",
"operation": "startDiscovery",
"result": "normal",
"targetId": "1",
"timestamp": "1470980265729",
"undoFlag": "false"
}
},{
"activityLog": {
"acctId": 2,
"info1": {
"itemName": "-",
"value": "-"
},
"info2": {
"itemName": "-",
"value": "-"
},
"errorCode": "",
"internalInformation": "",
"kind": "Infomation",
"loginId": "0",
"opeLogId": "G1_1",
"operation": "startDiscovery",
"result": "normal",
"targetId": "1",
"timestamp": "1470980265729",
"undoFlag": "false"
}
},
etc....]
Delete Query:
db.test.remove({activityLogDetails.activityLog.acctId:{$gt:2}})
Could any body tell me what could be the issue?
What you are doing in your query, will remove the whole record.
Try the following query using $pull:-
db.test.updateMany(
{'activityLogDetails.activityLog.acctId':{$gt:2}},
{$pull:{activityLogDetails:{'activityLog.acctId':{$gt:2}}}})
Refer $pull for more info on how to use.

Identify documents in mongodb when matching two key:value pairs within a single array

I am trying to identify documents where both key-value pairs within an array match using the aggregate pipeline. Specifically, if I want to find documents where one array contains user_attribute.Name = Quests_In_Progress and user_attribute.Value =3. Below is an example of such a document that I'm trying to match.
If I use
db.myCollection.aggregate({
$match: {
"user_attribute.Name": "Quests_In_Progress",
"user_attribute.Value": "3"
}
})
It will match every document that contains Quests_In_Progress for user_attribute.Name in one element of the array and contains "3" for user_attribute.Value, regardless of whether they exist in the same element of the array or not.
i.e.
db.myCollection.aggregate({
$match: {
"user_attribute.Name": "Quests_In_Progress",
"user_attribute.Value": "0"
}
})
will match the same document simply because one element of the array has a key:Value pair of Value:0 and another element of the array contains a key:value pair of Quests_In_Progress.
What I want to do is identify documents where both of those conditions are met within one element of the array.
I tried to do this with $elemMatch, but I couldn't get it to work. Plus the aggregate documentation doesn't indicate that $elemMatch works, so maybe that's why I couldn't get it to work.
Lastly, I need to use the aggregate pipeline, because there are a bunch of other things I have to do after finding these documents- specifically unwinding them.
{
"_id": ObjectId("5555bb32de938ce667f78ce00"),
"user_attribute": [{
"Value": "Facebook",
"Name": "Social_Connection"
}, {
"Name": "Total_Fireteam_Missions_Initiated",
"Value": "0"
}, {
"Name": "Quests_Completed",
"Value": "3"
}, {
"Name": "Item_Slots_Owned",
"Value": "36"
}, {
"Name": "Quests_In_Progress",
"Value": "3"
}, {
"Name": "Player_Progression",
"Value": "0"
}, {
"Value": "1",
"Name": "Characters_Owned"
}, {
"Name": "Quests_Started",
"Value": "6"
}, {
"Name": "Total_Friends",
"Value": "0"
}, {
"Name": "Device_Type",
"Value": "Phone"
}]
}
Try using $elemMatch
db.myCollection.aggregate([{$match: {"user_attribute": {$elemMatch: {"Name":"Quests_In_Progress", "Value":"0"}}}}, { $out, "temp"}])
That query will find anyone who has element of their array "Quests_In_Progress" with a value of 0 and put it into the collection temp

Resources