Fix data post API call to fix inconsistencies, like typos? - arrays

Ok, so this is going to be a complicated question, I hope I'm clear. Full admission, I just finished a Bootcamp yesterday so I'm not aware of a lot of technologies out there, and I think I may need additional technologies to accomplish what I'm looking for...
Right now, I have an application that uses bandsintown API call to populate a database. What I've noticed is that bandsintown isn't consistent with their data returns in each object, which makes operations after retrieving the objects difficult/seemingly impossible. An example would be that different artists performing at the same venue returns different latitude, longitude, venue name, etc. Examples:
Here is Primus playing at Bonnaroo:
{
"offers": [],
"venue": {
"country": "United States",
"city": "Manchester",
"latitude": "35.4839582",
"name": "Bonnaroo Music and Arts Festival 2020",
"location": "",
"region": "TN",
"longitude": "-86.08963169999998"
},
"datetime": "2020-09-25T12:00:00",
"on_sale_datetime": "",
"description": "",
"lineup": [
"Primus"
],
"bandsintown_plus": false,
"id": "1020701795",
"title": "",
"artist_id": "1263",
"url": "https://www.bandsintown.com/e/1020701795?app_id=451f31b2808001d069daed45c32a9dac&came_from=267&utm_medium=api&utm_source=public_api&utm_campaign=event"
}
compared to The Weeknd playing at Bonnaroo:
{
"id": "18604416",
"url": "https://www.bandsintown.com/e/18604416?app_id=451f31b2808001d069daed45c32a9dac&came_from=267&utm_medium=api&utm_source=public_api&utm_campaign=event",
"datetime": "2017-05-17T19:00:00",
"title": "",
"description": "",
"venue": {
"location": "",
"name": "Bonnaroo",
"latitude": "35.476247",
"longitude": "-86.081026",
"city": "Manchester",
"country": "United States",
"region": "TN"
},
"lineup": [
"The Weeknd"
],
"offers": [],
"artist_id": "1371750",
"on_sale_datetime": "",
"bandsintown_plus": false
}
My issue is now I wish to aggregate and $group in MongoDB because both events were at Bonnaroo, but the Object{venue.name} is not the same... Even the latitude & longitude is different so I can't use those either. I'm wondering if there is a way to alter the data of the objects automatically without having to go into the DB and edit individual objects. Both these events include the word Bonnaroo, so could I have something find and match text and then slice out the text that isn't similar? If so, can I then use the matched venue name field as a reference to change the latitude & longitude values too?
I hope I was clear, feel free to ask any clarifying questions if I wasn't. This site has helped me so many times and I appreciate all the hard work the community puts in to help each other! Thanks ahead of time!
~~~EDIT~~~
Thanks for the first reply #morad takhtameshloo.
So I was able to build something before I saw your reply that splits the data into an array, which is along the same lines as what you offered. The only thing that won't work is the $arrayToElem with the index cause there are some venues that:
Have multiple-word names (e.g. The Stone Pony)
Have words before the actual venue name (saw it in one result that was like
"Verizon Live Presents at The Stony Pony")
Using this Bonnaroo example, I have the new field returning every word as a value in the array:
"venueName": ["Bonnaroo", "Music", "and","Arts","Festival","2020"]
My next step is going to be to compare the [venueName] of the 'Primus' object and the 'The Weeknd' object, find what values in the array are the same, and return them back to the value of "venueName".
Hope this makes more sense, I appreciate your input!

actual the trick depends to your data, you should provide more data if the ones you've provided does not depict the whole problem
in other words how deep you want to dive in.
for the dumbest answer, at least for the data you've provided
db.prod4.aggregate([
{
$addFields: {
venueName: {
$arrayElemAt: [{ $split: ['$venue.name', ' '] }, 0],
},
},
},
])
but that not the case of course, something that comes to mind is that venue's geolocations for the same venue should not be far away from each other, for instance, the data you've provided two locations are in 1.16 KM of each other.
so another dummy solution that works would be writing a simple script that selects a random element from the array of all data, and finds data that their lat/lng is for example in 2km of that point, and removes those elements from array and selects another random element from the array and do the same
if you provide more data it would be much more easier, because the easiest solution is to find many patterns and plan only for them

Related

Solr query for child documents and return parents and filtered children

I'm having trouble creating a Solr query to be able to pull out the right documents, and am starting to wonder if what I am trying to do is even possible.
Currently on Solr 8.9 using a managed schema and every field is using a wildcard field.
Firstly what the document looks like
(changed names due to redacting internal business language):
{
"id": "COUNTY:1",
"county_name_s": "Hertfordshire",
"coordinates_s": {
"id": "COUNTY:1COORDINATES:!",
"lat_s": "54.238948",
"long_s": "54.238948"
},
"cities": [
{
"id": "COUNTY:1CITY:1",
"city_name_s": "St Albans",
"size": {
"id": "COUNTY:1CITY:1SIZE:1",
"sq_ft_s": "100",
"sq_meters_s": "5879"
}
},
{
"id": "COUNTY:1CITY:2",
"city_name_s": "Watford",
"size": {
"id": "COUNTY:1CITY:2SIZE:2",
"sq_ft_s": "150",
"sq_meters_s": "10000"
}
}
],
"mayor": {
"title_s": "Mrs.",
"first_name_s": "Sheila",
"last_name_s": "Smith"
}
}
And what I want to return:
{
"id": "COUNTY:1",
"county_name_s": "Hertfordshire",
"coordinates": {
"id": "COUNTY:1COORDINATES:!",
"lat_s": "54.238948",
"long_s": "54.238948"
},
"cities": [
{
"id": "COUNTY:1CITY:1",
"city_name_s": "St Albans",
"size": {
"id": "COUNTY:1CITY:1SIZE:1",
"sq_ft_s": "100",
"sq_meters_s": "5879"
}
}
],
"mayor": {
"title_s": "Mrs.",
"first_name_s": "Sheila",
"last_name_s": "Smith"
}
}
Basically my goal is to return more or less the entire thing, however with filtering out one of the cities. For example, the condition for the city would be like city_name_s:"St Albans". So it's to say that I want the parent and all children, however if the child is in that array (ie cities array), then the given field (city_name_s) must equal my defined value, or we don't want that child.
Things I've tried:
I've basically tried two approaches here:
I've tried to play around with {!child} and {!parent} to get a result that I want. Currently I can only get something from City level or the entire thing as if the filter was not there at county level.
I've tried to change values for the childFilter option, with things like:
city_name_s:"St Albans" OR (*:* NOT city_name_s:[* TO *]) to try and say 'if field exists it should be this'.
Anyhow I'm starting to run out of ideas with this; been hacking away at it for the past couple of days and not really got any closer.
Thanks in advance for any help; bashing my head against the wall currently so any suggestions are more than welcome :)
I had a similar issue in solr 9.0.0 and this solved it for me: Apache Solr Filter on Child Documents
In your case, just add fl=*,[child childFilter=city_name_s:"St Albans"]

Parse JSON with wrong designed structure in Swift

I have to parse some really terrible designed JSON, and to be honest I have never faced with such one. The following is a simplified cut from the entire JSON file:
{
"5ee70183-87fe-4799-802e-ef7f5e7323db":
{
"title": "Bank 1",
"logo": "655ee02d87cf4cdf912c3507233b0520.gif"
},
"332c7078-97ad-4bf7-b8ee-44d85a9c88d1":
{
"title": "Bank 2",
"logo": "655ee02d87cf4cdf912c3507233b0520.gif"
},
"8e9bd4c8-6f4a-4663-ae86-b8fbaf295030":
{
"title": "Bank 3",
"logo": "655ee02d87cf4cdf912c3507233b0520.gif"
}
}
As you can see the "root" keys are some UUIDs. Those keys with values are supposed to be a list, but instead of using correct [] brackets for a list it's used {} wrong one. If I parse this using codables I have to create structs with UUID names, but what is worst this "list" is not fixed but go unlimited in theory. So my job is to parse this JSON and get an array of bank entities. As I'm shocked and confused at the moment I just think that I'm not able to use codables and need to parse this manually to a dictionary and get properties from there by assigning to the correct list item. If you ever faced with such an issue or know better parsing option, it will greatly help me to handle this.
You need
let res = try! JSONDecoder().decode([String:Root].self,from:data)
print(Array(res.values))
struct Root: Codable {
let title, logo: String
}

Query nested elements from unknown number of arrays

Querying to get an element from an array that is nested in an object is not a big deal. But, when you have a self-repeating structure that is nested within itself, it seems to become much more complicated. Here is an example document:
{
"_id": "1",
"name": "Alpha",
"options": [
{
"_id": "2",
"name": "Bravo",
"options": [
]
},
{
"_id": "3",
"name": "Charlie",
"options": [
{
"_id": "4",
"name": "Delta",
"options": [
...etc
]
}
]
}
]
}
As you can see, here we have a number of arrays holding objects that are alike. This is useful for, say, a tabbed interface being configured out of a DB.
What I'm looking to do is actually be able to modify the documents that are further down in the structure without knowing their positions in the ancestor arrays. So far I've only found the "$" update operator helpful when you don't have to deal with more than one level of arrays. But as you can see, there are arrays within objects that are inside of arrays here, so it becomes complicated.
Pushing to an array in the first level is not difficult:
db.myCollection.update({_id: "1", "options._id": "2"}, {$push: {"options.$.options": {...object definition here...}}});
This works. However, when you try to go deeper into the structure, I don't understand how to target the deeper portions of the document.
The following query is not working for me, and I'm hoping somebody could help me out:
db.myCollection.update({_id: "1", "options.options._id": "4"}, {"options.options.$.options": {...object definition here...}});
It whines about how I can't use dot syntax. Which makes sense, because in this case "options.options" is ambiguous. I was hoping that the query selection would have pointed to the right place.
tl;dr: How do you update document portions that are multiple levels deep nested within more than one array where you do not know the index of the target element? You can select it, so you should be able to modify it.

Text inside entities in Draft.js

I've been playing with the Entity system in Draft.js. One limitation I see is that entities have to correspond with a range of text in the content they are inserted into. I was hoping I could make a zero-length entity which would have a display based on the data in the entity rather than the text-content in the block. Is this possible?
This is possible when you have a whole block. As you can see in the code example this serialised blockMap contains a block containing no text, but the character list has one entry with an entity attached to it. There is also some discussion going on regarding adding meta-data to a block. see https://github.com/facebook/draft-js/issues/129
"blockMap": {
"80sam": {
"key": "80sam",
"type": "sticker",
"text": "",
"characterList": [
{
"style": [],
"entity": "1"
}
],
"depth": 0
},
},

searching an array deep inside a mongo document

in my mongo collection called pixels, I have documents like the sample
I'm looking for a way to search in the actions.tags part of the documents?
db.pixelsactifs.actions.find({tags:{$in : ["Environnement"]}})
db.pixelsactifs.find({actions.tags:{$in : {Environnement}})
doesn't work. I'm also looking for the PHP equivalent ?
I'm also asking myself should I make an "actions" collection instead of putting everything inside one document
I'm new to mongo so any good tutorial on structuring the db would be great
Thanks for the insight
{
"_id": { $oid": "51b98009e4b075a9690bbc71" },
"name": "open Atlas",
"manager": "Tib Kat",
"type": "Association",
"logo": "",
"description": "OPEN ATLAS",
"actions": [
{
"name": "Pixel Humain",
"tags": [ "Toutes thémathiques" ],
"description": "le PH agit localement",
"images": [],
"origine": "oui",
"website": "www.echolocal.org"
}
],
"email": "my#gmail.com",
"adress": "102 rue",
"cp": "97421",
"city": "Saint louis",
"country": "Réunion",
"phone": "06932"
}
you can try like this
collectionName->find(array("actions.tags" => array('$in' => "Environnement")));
I do not think you need to maintain the actions in separate collection. NoSQL gives you more flexibility to do embed th document . Event it allows sub document also be indexed . True power of NoSQL comes with merging the document into each other to get the faster retrieval. The only short coming I can see here , you can not get the part of sub document . find will always return the complete Parent document. In case you want to show one entry of subdocument array , it is not possible . It will return the whole subdocument and you have to filter in on the client side. So if you are planning to show action as individual to end user , it is better to have in separate collection
Read here : http://docs.mongodb.org/manual/use-cases/

Resources