Parse JSON with wrong designed structure in Swift - arrays

I have to parse some really terrible designed JSON, and to be honest I have never faced with such one. The following is a simplified cut from the entire JSON file:
{
"5ee70183-87fe-4799-802e-ef7f5e7323db":
{
"title": "Bank 1",
"logo": "655ee02d87cf4cdf912c3507233b0520.gif"
},
"332c7078-97ad-4bf7-b8ee-44d85a9c88d1":
{
"title": "Bank 2",
"logo": "655ee02d87cf4cdf912c3507233b0520.gif"
},
"8e9bd4c8-6f4a-4663-ae86-b8fbaf295030":
{
"title": "Bank 3",
"logo": "655ee02d87cf4cdf912c3507233b0520.gif"
}
}
As you can see the "root" keys are some UUIDs. Those keys with values are supposed to be a list, but instead of using correct [] brackets for a list it's used {} wrong one. If I parse this using codables I have to create structs with UUID names, but what is worst this "list" is not fixed but go unlimited in theory. So my job is to parse this JSON and get an array of bank entities. As I'm shocked and confused at the moment I just think that I'm not able to use codables and need to parse this manually to a dictionary and get properties from there by assigning to the correct list item. If you ever faced with such an issue or know better parsing option, it will greatly help me to handle this.

You need
let res = try! JSONDecoder().decode([String:Root].self,from:data)
print(Array(res.values))
struct Root: Codable {
let title, logo: String
}

Related

Is there a way to auto generate ObjectIds inside arrays of MongoDB?

Below is my MongoDb collection structure:
{
"_id": {
"$oid": "61efa44933eabb748152a250"
},
"title": "my first blog",
"body": "Hello everyone,wazuzzzzzzzzzzzzzzzzzup",
"comments": [{
"comment": "fabulous work bruhv",
}]
}
}
Is there a way to auto generate ids for comments without using something like this:
db.messages.insert({messages:[{_id:ObjectId(), message:"Message 1."}]});
I found the above method from the SO question:
mongoDB : Creating An ObjectId For Each New Child Added To The Array Field
But someone in the comments pointed out that:
"I have been looking at how to generate a JSON insert using ObjectID() and in my travels have found that this solution is not very good for indexing. The proposed solution has _id values that are strings rather than real object IDs - i.e. "56a970405ba22d16a8d9c30e" is different from ObjectId("56a970405ba22d16a8d9c30e") and indexing will be slower using the string. The ObjectId is actually represented internally as 16 bytes."
So is there a better way to do this?

Logic Apps - looping through a nested array in JSON

I need to loop through this optional array (it's only the sectional of JSON I have trouble with).
As you can see from the code:
The optional bullseye has an array rings. rings has arrays of expansionCriteria and expansionCriteria may or may not have actions.
How do I iterate and get all type, threshold in expansionCriteria? I also need to access all skillsToRemove under actions, if available.
I am rather new to Logic Apps, so any help is appreciated.
"bullseye": {
"rings": [
{
"expansionCriteria": [
{
"type": "TIMEOUT_SECONDS",
"threshold": 180
}
],
"actions": {
"skillsToRemove": [
{
"name": "Claims Foundation",
"id": "60bd469a-ebab-4958-9ca9-3559636dd67d",
"selfUri": "/api/v2/routing/skills/60bd469a-ebab-4958-9ca9-3559636dd67d"
},
{
"name": "Claims Advanced",
"id": "bdc0d667-8389-4d1d-96e2-341e383476fc",
"selfUri": "/api/v2/routing/skills/bdc0d667-8389-4d1d-96e2-341e383476fc"
},
{
"name": "Claims Intermediate",
"id": "c790eac3-d894-4c00-b2d5-90cd8a69436c",
"selfUri": "/api/v2/routing/skills/c790eac3-d894-4c00-b2d5-90cd8a69436c"
}
]
}
},
{
"expansionCriteria": [
{
"type": "TIMEOUT_SECONDS",
"threshold": 5
}
]
}
]
}
Please let me know if you need more info.
To generate the schema, you can remove the name of the object at the top of the code: "bullseye":
Thank you pramodvalavala-msft for posting your answer in MS Q&A for the similar thread .
" As you are working with a JSON Object instead of an Array, unfortunately there is no built-in function to loop over the keys. There is a feature request to add a method to extract keys from an object for scenarios like this, that you could up vote for it gain more traction.
You can use the inline code action to extract the keys from your object as an array (using Object.keys()). And then you can loop over this array using the foreach loop to extract the object that you need from the main object, which you could then use to create records in dynamics."
For more information you can refer the below links:
. How to loop and extract items from Nested Json Array in Logic Apps .
.Nested ForEach Loop in Workflow. .

Fix data post API call to fix inconsistencies, like typos?

Ok, so this is going to be a complicated question, I hope I'm clear. Full admission, I just finished a Bootcamp yesterday so I'm not aware of a lot of technologies out there, and I think I may need additional technologies to accomplish what I'm looking for...
Right now, I have an application that uses bandsintown API call to populate a database. What I've noticed is that bandsintown isn't consistent with their data returns in each object, which makes operations after retrieving the objects difficult/seemingly impossible. An example would be that different artists performing at the same venue returns different latitude, longitude, venue name, etc. Examples:
Here is Primus playing at Bonnaroo:
{
"offers": [],
"venue": {
"country": "United States",
"city": "Manchester",
"latitude": "35.4839582",
"name": "Bonnaroo Music and Arts Festival 2020",
"location": "",
"region": "TN",
"longitude": "-86.08963169999998"
},
"datetime": "2020-09-25T12:00:00",
"on_sale_datetime": "",
"description": "",
"lineup": [
"Primus"
],
"bandsintown_plus": false,
"id": "1020701795",
"title": "",
"artist_id": "1263",
"url": "https://www.bandsintown.com/e/1020701795?app_id=451f31b2808001d069daed45c32a9dac&came_from=267&utm_medium=api&utm_source=public_api&utm_campaign=event"
}
compared to The Weeknd playing at Bonnaroo:
{
"id": "18604416",
"url": "https://www.bandsintown.com/e/18604416?app_id=451f31b2808001d069daed45c32a9dac&came_from=267&utm_medium=api&utm_source=public_api&utm_campaign=event",
"datetime": "2017-05-17T19:00:00",
"title": "",
"description": "",
"venue": {
"location": "",
"name": "Bonnaroo",
"latitude": "35.476247",
"longitude": "-86.081026",
"city": "Manchester",
"country": "United States",
"region": "TN"
},
"lineup": [
"The Weeknd"
],
"offers": [],
"artist_id": "1371750",
"on_sale_datetime": "",
"bandsintown_plus": false
}
My issue is now I wish to aggregate and $group in MongoDB because both events were at Bonnaroo, but the Object{venue.name} is not the same... Even the latitude & longitude is different so I can't use those either. I'm wondering if there is a way to alter the data of the objects automatically without having to go into the DB and edit individual objects. Both these events include the word Bonnaroo, so could I have something find and match text and then slice out the text that isn't similar? If so, can I then use the matched venue name field as a reference to change the latitude & longitude values too?
I hope I was clear, feel free to ask any clarifying questions if I wasn't. This site has helped me so many times and I appreciate all the hard work the community puts in to help each other! Thanks ahead of time!
~~~EDIT~~~
Thanks for the first reply #morad takhtameshloo.
So I was able to build something before I saw your reply that splits the data into an array, which is along the same lines as what you offered. The only thing that won't work is the $arrayToElem with the index cause there are some venues that:
Have multiple-word names (e.g. The Stone Pony)
Have words before the actual venue name (saw it in one result that was like
"Verizon Live Presents at The Stony Pony")
Using this Bonnaroo example, I have the new field returning every word as a value in the array:
"venueName": ["Bonnaroo", "Music", "and","Arts","Festival","2020"]
My next step is going to be to compare the [venueName] of the 'Primus' object and the 'The Weeknd' object, find what values in the array are the same, and return them back to the value of "venueName".
Hope this makes more sense, I appreciate your input!
actual the trick depends to your data, you should provide more data if the ones you've provided does not depict the whole problem
in other words how deep you want to dive in.
for the dumbest answer, at least for the data you've provided
db.prod4.aggregate([
{
$addFields: {
venueName: {
$arrayElemAt: [{ $split: ['$venue.name', ' '] }, 0],
},
},
},
])
but that not the case of course, something that comes to mind is that venue's geolocations for the same venue should not be far away from each other, for instance, the data you've provided two locations are in 1.16 KM of each other.
so another dummy solution that works would be writing a simple script that selects a random element from the array of all data, and finds data that their lat/lng is for example in 2km of that point, and removes those elements from array and selects another random element from the array and do the same
if you provide more data it would be much more easier, because the easiest solution is to find many patterns and plan only for them

BQ load JSON File with Array of Array

Im trying to load a JOSN file where some of the arrays are empty.
{"house_account_payable":"0.00","house_account_receivable":"0.00","gift_sales_payable":"0.00","gift_sales_receivable":"0.00","store_credit_sales_payable":"0.00","percentage_row":null,"sales_per_period":[["02:00AM - 02:59AM",{"amount":0,"qty":0}],["03:00AM - 03:59AM",{"amount":0,"qty":0}]],"revenue_centers":[],"tax_breakdowns":[]}
This is giving the error:
rror while reading table: test2, error message: Failed to parse JSON: No object found when new array is started.; BeginArray returned false; Parser terminated before end of string
Could somebody help me on this?
Are you trying to load data from your local machine or GCS? Please, remember about exporting in JSONL(Newline delimited JSON):
{"open_orders_ids": []}
{"unpaid_orders_ids": []}
The output:
Take a look for documentation about nested and repeated columns.
EDIT:
Your JSON schema should look like this:
{
"items": [
{
"house_account_payable": "0.00",
"house_account_receivable": "0.00",
"gift_sales_payable": "0.00",
"gift_sales_receivable": "0.00",
"store_credit_sales_payable": "0.00",
"percentage_row": "",
"sales_per_period": [
{
"AM02_00_AM02_59": {
"amount": "0",
"qty": "0"
}
},
{
"AM03_00_AM03_59": {
"amount": "0",
"qty": "0"
}
}
]
}
]
}
Regarding to Felipe Hoffa's post, run following commands:
jq -c .items[] <FILENAME>.json > <FILENAME>.jq.json
bq load --source_format NEWLINE_DELIMITED_JSON --autodetect <DATASET_ID>.<TABLENAME> <FILENAME>.jq.json
The schema:
Let me know if this is what you are looking for.
There's no problem with the null arrays.
The problem lies in this shorter json:
{"sales_per_period":[["02:00AM - 02:59AM",{"amount":0,"qty":0}],["03:00AM - 03:59AM",{"amount":0,"qty":0}]]}
The arrays there hold elements of different types, and to bring it into a structured table, a different schema is needed.
For example:
{"sales_per_period":[{"a":"02:00AM - 02:59AM","b":{"amount":0,"qty":0}},{"a":"03:00AM - 03:59AM","b":{"amount":0,"qty":0}}]}
Now this loads easily into BigQuery:
bq load --source_format=NEWLINE_DELIMITED_JSON --autodetect temp.short delete.short.json
Can you change this source JSON easily outside BigQuery? Otherwise load it raw into BigQuery, and parse it with a JS UDF inside BigQuery.

jsonschema: Verifying that an array contains an element, without erroring on other elements

I recently found jsonschema and I've been loving using it, however recently I've come across something that I want to do that I just haven't been able to figure out.
What I want to do is to validate that an array must contain an element that matches a schema, but I don't want to have validation fail on other elements that would be in the list.
Say that I have an array like the following:
arr = [
{"some object": True},
False,
{"AnotherObj": "a string this time"},
"test"
]
I want to be able to do something like "validate that arr contains an object that has a property 'some object' that is a boolean, and error if it doesn't, but don't care about other elements."
I don't want it to validate the other items in the list. I just want to make sure that the list contains an element that matches the schema at least once. I also do not know the order which the elements will arrive in the array.
I've tried this already with a schema like:
{"type": "array",
"items": {
"type": "object",
"properties": {
"tool": {
# A schema here to validate tool
},
"required": ["tool"]
}
}
The problem is that it requires every item in the array to have the property "tool", and not what I actually want.
Any help anyone can give me with this would be much appreciated! I've been stumped on this for a really long time with no forward progress.
Thanks!
I've gotten an answer to this question:
The schema used is (where ... B ... is the schema to require):
{
"type": "array",
"not": {
"items": {
"not": {... B ...}
}
}
}
It basically works out to be something like "Ensure that not (items don't match B)". I'm not 100% clear on why this works the way it does, but it does so I figured I'd share it for posterity.

Resources