I have a normalized SQLite database and I want to convert to one collection MongoDB database.
let's take an example:
We suppose that my SQLite database looks like :
SQLite database tables
I want my MongoDB database looks like :
{
"_id" : ObjectId("5efdf2c2b268674c2bf74e85"),
"firstName": "robert",
"lastName": "kas",
"addresses":[
{
"address": "This is address 1",
"type": "home",
},
{
"address": "This is address 2",
"type": "work",
}
]
}
Related
I am transforming JSON objects and index them in Solr 9. When resolving lists / arrays of objects, I am using nested child documents, so array elements are stored as own documents.
Now, I ran into the issue that I would like to use copy fields on nested child documents and store this value in the parent.
JSON:
{
"legalName": "Some Name",
"addresss": {
"street": "Bala Street",
"houseNr": 13,
"city": "Random City",
"postalCode": 1234,
"country": "NL"
},
"otherLegalNames": [
{
"text": "TEXT IN EN",
"lang": "EN"
},
{
"text": "TEXT IN DE",
"lang": "DE"
},
{
"text": "TEXT IN NL",
"lang": "NL"
}
]
}
When indexing this object, I flatten basic structs like address but keep arrays, e.g., otherLegalNames, and stored them through child docs.
Basically, the documents look like this (q=*:*&fl=*,[child]):
{
"id": "5493006O42CR4SHELP26",
"legalName": "Some Name",
"addresss.street": "Bala Street",
"addresss.houseNr": 13,
"addresss.city": "Random City",
"addresss.postalCode": 1234,
"addresss.country": "NL",
"otherLegalNames": [
{
"id": "5493006O42CR4SHELP26/otherLegalNames#0",
"text": "TEXT IN EN",
"lang": "EN"
},
{
"id": "5493006O42CR4SHELP26/otherLegalNames#1",
"text": "TEXT IN DE",
"lang": "DE"
},
{
"id": "5493006O42CR4SHELP26/otherLegalNames#2",
"text": "TEXT IN NL",
"lang": "NL"
}
]
}
Now I would like to search for these docs by their legalName and must therefore search in the parent legalName field but also include all text fields stored under otherLegalNames. During research, I found that copy fields are the way to go, but I am not sure how I would handle child documents with such copy fields.
My goal would be to get a searchableLegalNames field with the value: ["Some Name", "TEXT IN EN", "TEXT IN DE", "TEXT IN NL"] or similar to perform a Ngram based search on legalName including every language.
Is this possible to achieve with copy fields or are child documents not supported for this purpose? If so, how should I restructure my schema? It's really hard to flatten every legal name, as this array might be empty or contain an arbitrary number of otherLegalNames.
Thanks.
Regards Artur
I'm trying to parse some data in Nifi (1.7.1) using UpdateRecord Processor.
Original data are json files, that I would like to convert to Avro, based on a schema.
The Avro conversion is ok, but in that convertion I also need to parse one array element from the json data to a different structure in Avro.
This is a sample data of the input json:
{ "geometry" : {
"coordinates" : [ [ 4.963087975800593, 45.76365595859971 ], [ 4.962874487781098, 45.76320922779652 ], [ 4.962815443439148, 45.763116079159374 ], [ 4.962744732112515, 45.763010484202866 ], [ 4.962096825239138, 45.762112721939246 ] ]} ...}
Being its schema (specified in RecordReader):
{ "type": "record",
"name": "features",
"fields": [
{
"name": "geometry",
"type": {
"type": "record",
"name": "geometry",
"fields": [
{
"name": "coordinatesJson",
"type": {
"type": "array",
"items": {
"type": "array",
"items": "double"
}
}
},
]
}
},
....
]
}
As you can see, coordinates is an array of arrays.
And I need to parse those data to Avro, based on this schema (specified in RecordWriter):
{
"name": "outputdata",
"type": "record",
"fields": [
{"name": "coordinatesAvro",
"type": {
"type": "array",
"items" : {
"type" : "record",
"name" : "coordinatesAvro",
"fields" : [ {
"name" : "X",
"type" : "double"
}, {
"name" : "Y",
"type" : "double"
} ]
}
}
},
.....
]
}
The problem here is that I'm not being able to parse from coordinatesJson to coordinatesAvro, using RecordPath functions
I tried several mappings, like:
Property: Value:
/coordinatesJson[0..-1]/X /geometry/coordinatesAvro[*][0]
/coordinatesJson[0..-1]/Y /geometry/coordinatesAvro[*][1]
It should be a pretty straighforward parsing step, but as I said, I've been going in circles to achive this for a while.
Any help would be really appreciated.
When I collide with something like that I do next:
1) Transofrm Json into Json with strcuture that I need (for example in your case: coordinatesAvro) by ExecuteScript Processor. I have used ECMAScript cause you can simple parse JSON and work with objects (transform them).
2) ConvertJsonToAvro with one common schema (coordinatesAvro in your case) for Reader and Writer.
It works very good and I have used it on BigData cases. This is one of possible resolutions for your problem.
I have the following users collection:
{
"type": "provider",
"name": "user name",
"username": "username",
"password": "$2y$10$D3z0tLwOwB0tqPEnl63VuexOwqcR75QkVILemB1.TEsAJlk6Ixwim",
"specialties": [
"specialty 1",
"specialty 2"
]
}
and a specialties collection :
{
"_id": "5b26103b2df243228c0003ea",
"title": "specialty 1",
"description": "specialti 1 desc",
},{
"_id": "5b26103b2df243228c0003ea",
"title": "specialty 2",
"description": "specialti 2 desc",
},
The relation between them is embedded-many and here is my relationship in User model,
public function specialties()
{
return $this->embedsMany(Specialty::class, 'specialties', 'title');
}
I want to filter the users by specialty. For example, the above JSON user object should be returned if the filtered specialty is "specialty 1".
I know about non-embedded collections but my data is saved on my database and I cannot not change the schema.
Are there any alternative solutions?
The solution was in the documentation, MongoDB specific operators,
In my case, the answer is,
$providers = User::where('specialties', 'all', ['specialty 1'])->with('s_specialties')->get();
This code simulates $in operator in MongoDB. More about operators.
I have a noSql (Cloudant) database
-Within the database we have documents where one of the document fields represents “table” (type of document)
-Within the documents we have fields that represent links other documents within the database
For example:
{_id: 111, table:main, user_id:222, field1:value1, other1_id: 333}
{_id: 222, table:user, first:john, other2_id: 444}
{_id: 333, table:other1, field2:value2}
{_id: 444, table:other2, field3:value3}
We want of way of searching for _id:111
And the result be one document with data from linked tables:
{_id:111, user_id:222, field1:value1, other1_id: 333, first:john, other2_id: 444, field2:value2, field3:value3}
Is there a way to do this?
There is flexibility on the structure of how we store or get the data back—any suggestions on how to better structure the data to make this possible?
The first thing to say is that there are no joins in Cloudant. If you're schema relies on lots of joining then you're working against the grain of Cloudant which may mean extra complication for you or performance hits.
There is a way to de-reference other documents' ids in a MapReduce view. Here's how it works:
create a MapReduce view to emit the main document's body and its linked document's ids in the form { _id: 'linkedid'}
query the view with include_docs=true to pull back the document AND the de-referenced ids in one go
In your case, a map function like this:
function(doc) {
if (doc.table === 'main') {
emit(doc._id, doc);
if (doc.user_id) {
emit(doc._id + ':user', { _id: doc.user_id });
}
}
}
would allow you to pull back the main document and its linked user document in one API by hitting the GET /mydatabase/_design/mydesigndoc/_view/myview?startkey="111"&endkey="111z"&include_docs=true endpoint:
{
"total_rows": 2,
"offset": 0,
"rows": [
{
"id": "111",
"key": "111",
"value": {
"_id": "111",
"_rev": "1-5791203eaa68b4bd1ce930565c7b008e",
"table": "main",
"user_id": "222",
"field1": "value1",
"other1_id": "333"
},
"doc": {
"_id": "111",
"_rev": "1-5791203eaa68b4bd1ce930565c7b008e",
"table": "main",
"user_id": "222",
"field1": "value1",
"other1_id": "333"
}
},
{
"id": "111",
"key": "111:user",
"value": {
"_id": "222"
},
"doc": {
"_id": "222",
"_rev": "1-6a277581235ca01b11dfc0367e1fc8ca",
"table": "user",
"first": "john",
"other2_id": "444"
}
}
]
}
Notice how we get two rows back, the first is the main document body, the second the linked user.
MongoDb Collection Example (Person):
{
"id": "12345",
"schools": [
{
"name": "A",
"zipcode": "12345"
},
{
"name": "B",
"zipcode": "67890"
}
]
}
Desired output:
{
"id": "12345",
"schools": [
{
"zipcode": "12345"
},
{
"zipcode": "67890"
}
]
}
My current partial code for retrieving all:
collection.find({}, {id: true, schools: true})
I am querying the entire collection. But I only want to return zipcode part of school element, not other fields (because the actual school object might contain much more data which I do not need). I could retrieve all and remove those un-needed fields (like "name" of school) in code, but that's not what I am looking for. I want to do a MongoDb query.
You can use the dot notation to project specific fields inside documents embedded in an array.
db.collection.find({},{id:true, "schools.zipcode":1}).pretty()