Best Firestore data model for file tree data - database

I'm currently building an app that needs to store data with a similar structure as a file tree. It looks something like this:
{
"type": "folder",
"name": "folder A",
"private": false,
"updatedAt": 1231243,
"items": [
{
"type": "folder",
"name": "subfolder A",
"private": false,
"updatedAt": 1231243,
"items": [
{
"type": "file",
"name": "file1"
}
]
},
{
"type": "file",
"name": "file2"
}
]
}
What I'm aware so far, there are 3 ways of implementing this.
Just dump all the data in 1 doc
Create a subcollection for each items array
Create a flat folder structure and save the parent folder id for lookup
I'm looking for a way to get all of this data with as minimum query as possible, the ideal use case would be only having the root folder id and getting all the subfolder & items. But I'm not sure if that's possible.
Also, I'm planning to subscribe to the data in the future, so the file tree will be updated in real-time.
Please give me a suggestion on what should the data model looks like
Update 1
To give more clarification about the query that I will need:
I will only need to get the topmost folder (folder A) in this example and the items under it
I won't need to get the nested items directly
example: getting Subfolder A / File 2 without accessing Folder A

If all you need is to get the entire structure by some unique ID, just put it all in one document, and get() it using the known ID. You have a 1MB limit per document.
It can get more complicated if you need to access items within lists (which is not really possible to do without reading the entire document), so this would actually be a bad idea for that case.
You would also want to reconsider this if you intend to update the elements of this document frequently, as there is a limit of 1 write per second (sustained). You would want to split it up otherwise.

Related

Azure Cognitive Search query specific data source

I have a Azure Cognitive Search set up with two DataSources, two Indexers indexing those DataSources and one Index.
I'd like to be able to able to query/filter by DataSource. Is that possible?
This is certainly possible. You'd just need to find a way to get a field in the index that contains the name of the data source. The best way to do this depends on the data source you're using--for example, if you're using a SQL data source, you might just be able to edit the query to return the value whereas that wouldn't work for blob storage.
Another option that would work for all data sources would be to include a skillset with a conditional skill which you can use to set a default value for a field.
Off of #Derek Legenzoff serve this is how I did it...
Add all your datasources
Create skillsets for each one data source with the following skill
{
"#odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
"name": "#1",
"description": "",
"context": "/document",
"inputs": [
{
"name": "condition",
"source": "= true"
},
{
"name": "whenTrue",
"source": "= 'Production'" //name of your environment
},
{
"name": "whenFalse",
"source": "= ''"
}
],
"outputs": [
{
"name": "output",
"targetName": "Env"
}
]
}
Create single index for your data model and add a Env field, and it's filterable and queriable
Create indexers for each one of your data sources that point to the index from step 3 and datasource from step 1
IMPORTANT: make sure you indexers have the following code in the defnition.
"outputFieldMappings": [
{
"sourceFieldName": "/document/Env",
"targetFieldName": "Env"
}
],
This will connect the product of the skill to the index

What syntax should be used for reading the final row in an Array on the Mapping tab on the Copy Data activity in Azure Data Factory / Synapse?

I'm using the copy data activity in Azure Data Factory to copy data from an API to our data lake for alerting & reporting purposes. The API response is comprised of multiple complex nested JSON arrays with key-value pairs. The API is updated on a quarter-hourly basis and data is only held for 2 days before falling off the stack. The API adopts an oldest-to-newest record structure and so the newest addition to the array would be the final item in the array as opposed to the first.
My requirement is to copy only the most recent record from the API as opposed to the collection - so the 192th reading or item 191 of the array (with the array starting at 0.)
Due to the nature of the solution, there are times when the API isn't being updated as the sensors that collect and send over the data to the server may not be reachable.
The current solution is triggered every 15 minutes and tries a copy data activity of item 191, then 190, then 189 and so on. After 6 attempts it fails and so the record is missed.
current pipeline structure
I have used the mapping tab to specify the items in the array as follows (copy attempt 1 example):
$['meta']['params']['sensors'][*]['name']
$['meta']['sensorReadings'][*]['readings'][191]['dateTime']
$['meta']['sensorReadings'][*]['readings'][191]['value']
Instead of explicitly referencing the array number, I was wondering if it is possible to reference the last item of the array in the above code?
I understand we can use 0 for the first record however I don't understand how to reference the final item. I've tried the following using the 'last' function but am unsure of how to place it:
$['meta']['sensorReadings'][*]['readings'][last]['dateTime']
$['meta']['sensorReadings'][*]['readings']['last']['dateTime']
last['meta']['sensorReadings'][*]['readings']['dateTime']
$['meta']['sensorReadings'][*]['readings']last['dateTime']
Any help or advice on a better way to proceed would be greatly appreciated.
Can you call your API with a Web activity? If so, this pulls the API result into the data pipeline and then apply ADF functions like last to it.
A simple example calling the UK Gov Bank Holidays API:
This returns a resultset that looks like this:
{
"england-and-wales": {
"division": "england-and-wales",
"events": [
{
"title": "New Year’s Day",
"date": "2017-01-02",
"notes": "Substitute day",
"bunting": true
},
{
"title": "Good Friday",
"date": "2017-04-14",
"notes": "",
"bunting": false
},
{
"title": "Easter Monday",
"date": "2017-04-17",
"notes": "",
"bunting": true
},
... etc
You can now apply the last function to is, e.g. using a Set Variable activity:
#string(last(activity('Web1').output['england-and-wales'].events))
Which yields the last bank holiday of 2023:
{
"name": "varWorking",
"value": "{\"title\":\"Boxing Day\",\"date\":\"2023-12-26\",\"notes\":\"\",\"bunting\":true}"
}
Or
#string(last(activity('Web1').output['england-and-wales'].events).date)

How to get Resource Group name from within Logic App

In an Azure Logic App, how can I get the name of the Resource Group containing the current logic app?
I want to include some tracking details in the JSON output that I am sending to another system.
I can get the run Identifier ( using #{workflow()['run']['name']} ),
and the current logic app name ( using #{workflow()['name']} )
However, I cant work out how to get the name of the resource group to which the logic app is deployed.
As a last resort, I will use the resource group name used by the deployment template, but that will be wrong if the logic app is moved later.
I could also use tags, but again that could get out of step if the logic app is moved.
Thanks
A simple formula may be:
split(workflow().id, "/")[4]
If you're deploying the Logic Apps using ARM templates (e.g. edit in Visual Studio, check into Azure DevOps git repo and deploy using release pipeline), you can create an ARM parameter:
"resGroup_ARM": {
"type": "string",
"defaultValue": "[resourceGroup().name]",
"metadata": {
"description": "Resouce group name"
}
}
Then, you can create a workflow parameter:
"resGroup_LA": {
"type": "string",
"defaultValue": "ResGroup LA default"
}
... and give it a value in the parameters initialisation section:
"resGroup_LA": {
"value": "[parameters('resGroup_ARM')]"
}
You can get all the other properties of resourceGroup() in a similar manner, see: https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/template-functions-resource?tabs=json#resourcegroup
First we can create a "Initialize variable" action to get all of the data in workflow, shown as below screenshot:
Then we can find the data in workflow is:
{
"id": "/subscriptions/*****/resourceGroups/huryTest/providers/Microsoft.Logic/workflows/hurylogicblob",
"name": "hurylogicblob",
"type": "Microsoft.Logic/workflows",
"location": "eastus",
"tags": {},
"run": {
"id": "/subscriptions/*****/resourceGroups/huryTest/providers/Microsoft.Logic/workflows/hurylogicblob/runs/*****",
"name": "*****",
"type": "Microsoft.Logic/workflows/runs"
}
}
It contains the resource group name, so we just need to get the property "id" and substring it to get resource group name. The length of "resourceGroups/" is 15, so in the expression below I use add(,15) and sub(,15).
You can use the expression as below:
substring(workflow()['id'],add(indexOf(workflow()['id'],'resourceGroups/'),15),sub(sub(indexOf(workflow()['id'],'/providers'),indexOf(workflow()['id'],'resourceGroups/')),15))
At last, I got the resource group name of the logic app:

How can you retrieve a full nested document in Solr?

In my instance of Solr 4.10.3 I would like to index JSONs with a nested structure.
Example:
{
"id": "myDoc",
"title": "myTitle"
"nestedDoc": {
"name": "test name"
"nestedAttribute": {
"attr1": "attr1Val"
}
}
}
I am able to store it correctly through the admin interface:
/solr/#/mySchema/documents
and I'm also able to search and retrieve the document.
The problem I'm facing is that when I get the response document from my Solr search, I cannot see the nested attributes. I only see:
{
"id": "myDoc",
"title": "myTitle"
}
Is there a way to include ALL the nested fields in the returned documents?
I tried with : "fl=[child parentFilter=title:myTitle]" but it's not working (ChildDocTransformerFactory from:https://cwiki.apache.org/confluence/display/solr/Transforming+Result+Documents). Is that the right way to do it or is there any other way?
I'm using: Solr 4.10.3!!!!!!
To get returned all the nested structure, you indeed need to use ChildDocTransformerFactor. However, you first need to properly index your documents.
If you just passed your structure as it is, Solr will index them as separate documents and won't know that they're actually connected. If you want to be able to correctly query nested documents, you'll have to pre-process your data structure as described in this post or try using (modifying as needed) a pre-processing script. Unfortunately, including the latest Solr 6.0, there's no nice and smooth solution on indexing and returning nested document structures, so everything is done through "workarounds".
Particularly in your case, you'll need to transform your document structure into this:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name",
"_childDocuments_" :[
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}]
}
Then, the following ChildDocTransformerFactor query will return you all subdocuments (btw, although it says it's available since Solr 4.9, I've actually only seen it in Solr 5.3... so you need to test):
q=title:myTitle&fl=*,[child parentFilter=type:parentDoc limit=50]
Note, although it returns all nested documents, the returned document structure will be flattend (alas!), i.e., you'll get:
{
"type": "parentDoc",
"id": "myDoc",
"title": "myTitle"
"_childDocuments_": [
{
"type": "nestedDoc",
"name": "test name"
},
{
"type": "nestedAttribute"
"attr1": "attr1Val"
}]
}
Probably, not really what you've expected but... this is the unfortunate Solr's behavior that will be fixed in a nearest future release.
You can put
q={!parent which=}
and in fl field :"fl=*,[child parentFilter=title:myTitle].
It will give you all parent field and children field of title:mytitle

Text inside entities in Draft.js

I've been playing with the Entity system in Draft.js. One limitation I see is that entities have to correspond with a range of text in the content they are inserted into. I was hoping I could make a zero-length entity which would have a display based on the data in the entity rather than the text-content in the block. Is this possible?
This is possible when you have a whole block. As you can see in the code example this serialised blockMap contains a block containing no text, but the character list has one entry with an entity attached to it. There is also some discussion going on regarding adding meta-data to a block. see https://github.com/facebook/draft-js/issues/129
"blockMap": {
"80sam": {
"key": "80sam",
"type": "sticker",
"text": "",
"characterList": [
{
"style": [],
"entity": "1"
}
],
"depth": 0
},
},

Resources