Do Azure Logic Apps have a built-in way to chunk up arrays of data for batch processing - azure-logic-apps

I have a logic app that (among other, not relevant things) calls API A to fetch data (as a JSON array), and sends that data to API B.
B handles data uploads in batches that are smaller than the size of the data set that A returns, so in order to submit them, I must chunk up the data from A into smaller arrays and submit multiple batches to B.
I have not found any Logic App actions that look like they would do the job, but it is quite possible that I missed something. There is the foreach action, which is probably what I want to use after the data is chunked up, but I definitely do not want to be submitting these one-by-one.
Preliminarily, I think I can get the job done with a custom JS action: grab the results from the call to A, chunk up the array, and return an array of arrays, like so:
const data = workflowContext.actions.API_A_ACTION.outputs.body;
const dataLength = data.length;
const batchedData = [];
const batchSize = 1000;
for (let i = 0; i < dataLength; i += batchSize) {
const batch = data.slice(i, i + batchSize);
batchedData.push(batch);
}
return batchedData;
The the foreach action could grab the results from the javascript action, and submit the data in batches.
If there is a better way to do this, though, I'd like to know.

Doing it with inline code would result in the smallest number of actions, therefore faster/cheaper. An alternative would be to configure workflow B with the batch trigger with batch release criteria to be the size of the smaller array you need. Then, in workflow A, send the data to B with the send to batch action.

Related

Best/Quickest way to execute Promises in-parallel? (React)

Suppose I need to fetch data to create a card. What is the quickest way to get this data using promises? This is the current way I'm doing it:
async function getCards() {
const promises = []
for (let i = 0; i < 10; i++) {
promises.push(getCard(i))
}
const cards = await Promise.allSettled(promises)
setCards(cards)
}
async function getCard(i) {
const property1 = await getProperty1(i)
const property2 = await getProperty2(i)
const property3 = await getProperty3(i)
const card = <div>
<div>Property 1: {property1}</div>
<div>Property 2: {property2}</div>
<div>Property 3: {property3}</div>
</div>
return card
}
For my purposes, I don't need Promise.allSettled, since I don't need to wait for all 10 cards to finish awaiting (I may just create a component), I can render each one as they complete. But I'd still like it to be parallel/execute as fast as possible. What other options do I have there? And is there a better way to handle what I'm doing in getCard?
If getPropertyN() are indeed an asynchronous operation (such as a networking request), then getCards() will run all the calls in your for loop in parallel, such that they are all in-flight at the same time and it will generally reduce the end-to-end time vs. run them serially.
There are some other factors in play, such as what the receiving host does when it receives a bunch of requests at once. If it only handles them one at a time, then you may not gain a whole lot. But, if the host has any parallelism, then you will definitely see a speedup by putting multiple requests in flight at the same time.
Note that your getCard(i) implementation is serializing the three calls to getProperty1(), getProperty2() and getProperty3() which perhaps could also be done in parallel with something like:
const [property1, property2, property3] = await Promise.all([
getProperty1(i),
getProperty2(i),
getProperty3(i)
]);
Instead of this:
const property1 = await getProperty1(i)
const property2 = await getProperty2(i)
const property3 = await getProperty3(i)
Another thing to keep in mind is that a browser (such as a fetch() call) will only make N simultaneous requests to the same host (where N is around 6). Once you exceed that number of requests to the same host that are all in-flight at the same time, then the browser will queue the rest of the requests until one of the previous ones finishes. The way it's implemented, it doesn't slow things down to do more than the max requests, but you don't gain any more parallelism after the browser's limit. If you were running this code from a different Javascript environment such as nodejs, then that limit would not apply as this is a browser-specific thing.
Note, the key thing to achieving the parallelism is launching multiple requests to be in-flight at the same time. There is no requirement that you use Promise.allSettled() before acting on any results unless you need to get all the results in order before you can process the results.
If the results can be processed individually as they finish and can be processed in any order, you can also write the code that way without using Promise.allSettled() such as:
getProperty(1).then(processResult).catch(processErr);
getProperty(2).then(processResult).catch(processErr);
getProperty(3).then(processResult).catch(processErr);
Note: I also don't see any error handling in your code. Any outside network request can fail and you must have some handler for rejected promises.

Save Google App Script state while parsing an object array and continue where left off later on

I am using this simple google app script to parse through all available Google Sites and dump the html content of individual pages. There are quite many pages so the script will eventually run into 6 minute time limit.
Is it possible to somehow use the PropertiesService to save the current progress (especially in the array loops) and continue where left off later on?
var sites = SitesApp.getAllSites("somedomain.com");
var exportFolder = DriveApp.getFolderById("a4342asd1242424folderid-");
// Cycle through all sites
for (var i in sites){
var SiteName = sites[i].getName();
var pages = sites[i].getAllDescendants();
// Create folder in Drive for each site name
var siteFolder = exportFolder.createFolder(SiteName)
for (var p in pages){
// Get page name and url
var PageUrl = pages[p].getUrl();
//Dump the raw html content in the text file
var htmlDump = pages[p].getHtmlContent();
siteFolder.createFile(PageUrl+".html", htmlDump)
}
}
I can image how one can use the Properties Service to store current line number in the Spreadsheet, and continute where left off. But how can this be done with array containing objects like Sites or Pages?
Using Objects with Properties Service
According to the quotas the maximum size of something you can store in the properties service is 9kb. With a total of 500kb. So if your object is less than this size, it should be no problem. That said, you will need to convert the object to a string with JSON.stringify() and when you retrieve it, use JSON.parse.
Working around the run time limit
What is commonly done to work around the limit is to structure a process around the properties service and triggers. Essentially you make the script keep track of time, and if it starts to take a long time, you get it to save its position and then create a trigger so that the script runs again in 10 seconds (or however long you want), for example:
function mainJob(x) {
let timeStart = new Date()
console.log("Starting at ", timeStart)
for (let i = x; i < 500000000; i++){ // NOTE THE i = x
// MAIN JOB INSTRUCTIONS
let j = i
// ...
// Check Time
let timeCheck = new Date()
if (timeCheck.getTime() - timeStart.getTime() > 30000) {
console.log("Time limit reached, i = ", i)
// Store iteration number
PropertiesService
.getScriptProperties()
.setProperty('PROGRESS', i)
console.log("stored value of i")
// Create trigger to run in 10 seconds.
ScriptApp.newTrigger("jobContinue")
.timeBased()
.after(10000)
.create()
console.log("Trigger created for 10 seconds from now")
return 0
}
}
// Reset progress counter
PropertiesService
.getScriptProperties()
.setProperty('PROGRESS', 0)
console.log("job complete")
}
function jobContinue() {
console.log("Restarting job")
previousTrigger = ScriptApp.getProjectTriggers()[0]
ScriptApp.deleteTrigger(previousTrigger)
console.log("Previous trigger deleted")
triggersRemain = ScriptApp.getProjectTriggers()
console.log("project triggers", triggersRemain)
let progress = PropertiesService
.getScriptProperties()
.getProperty('PROGRESS')
console.log("about to start main job again at i = ", progress)
mainJob(progress)
}
function startJob() {
mainJob(0)
}
Explanation
This script only has a for loop with 500 million iterations in which it assigns i to j, it is just an example of a long job that potentially goes over the run time limit.
The script is started by calling function startJob which calls mainJob(0).
Within mainJob
It starts by creating a Date object to get the start time of the mainJob.
It takes the argument 0 and uses it to initialize the for loop to 0 as you would normally initialise a for loop.
At the end of every iteration, it creates a new Date object to compare with the one created at the beginning of mainJob. In the example, it is set to see if the script has been running for 30 seconds, this can obviously be extended but keep it well below the limit.
If it has taken more than 30 seconds, it stores the value of i in the properties service and then creates a trigger to run jobContinue in 10 seconds.
After 10 seconds, the function jobContinue calls the properties service for the value for i, and calls mainJob with the value returned from the properties service.
jobContinue also deletes the trigger it just created to keep things clean.
This script should run as-is in a new project, try it out! When I run it, it takes around 80 seconds, so it runs the first time, creates a trigger, runs again, creates a trigger, runs again and then finally finishes the for loop.
References
quotas
JSON.stringify()
JSON.parse.
ScriptApp
Triggers
If you are able to process all pages of 1 site under 6 minutes then you could try saving the site names first in a sheet or props depending on the number again. And keep processing n-sites per run. Can also try SitesApp.getAllSites(domain, start, max) and save start value in props after incrementing.
Can do something similar for pages if you cannot process them under 6 minutes.
SitesApp.getAllDescendants(options)

Is there a way to batch read firebase documents

I am making a mobile app using flutter with firebase as my backend.
I have a collection of user document that stores user information. one of the fields is an array of references (reference documents in another collection) which I want to use in an operation like batch that in that would then allow be to read all the documents.
I know batch only allows writes to the database, My second option would be Transaction, which requires writes after reads which I am trying to avoid.
Is there a way to read multiple documents in one operation without having to use Transaction?
Firestore doesn't offer a formal batch read API. As Frank mentions in his comment, there is a way to use IN to fetch multiple documents from a single collection using their IDs. However, all of the documents must be in the same collection, and you can't exceed 10 documents per query. You might as well just get() for each document individually, as the IN query has limitations, and isn't guaranteed to execute any faster than the individual gets. Neither solution is guaranteed to be "consistent", so any one of the documents fetched could be "more fresh" than the others at any given moment in time.
If you know the document IDs and the collection paths of the documents needed to be fetched, you could always use the getAll() method which is exposed in the firebase Admin SDK (at least for Node.js environments).
Then, for example, you could write an HTTPS Callable Function that would accept a list of absolute document paths and perform a "batch get" operation on them using the getAll() method.
e.g.
// Import firebase functionality
const functions = require('firebase-functions');
const admin = require('firebase-admin');
// Configure firebase app
admin.initializeApp(functions.config().firebase);
// HTTPS callable function
exports.getDocs = functions.https.onCall((data, context) => {
const docPathList = data.list; // e.g. ["users/Jkd94kdmdks", "users/8nkdjsld", etc...]
const firestore = admin.firestore();
var docList = [];
for (var i = 0; i <= docPathList.length - 1; i++) {
const docPath = docPathList[i];
const doc = firestore.doc(docPath);
docList.push(doc);
}
// Get all
return firestore.getAll(...docList)
.then(results => {
return { data : results.map(doc => doc.data()) };
})
.catch(err => {
return { error : err };
})
});
Not sure what the limit (if any) is for the number of documents you can fetch using getAll(), but I do know my application is able to fetch at least 50 documents per call successfully using this method.
Firestore has a REST API that allows you to do batch GETs with document paths that may be what you need.
See https://firebase.google.com/docs/firestore/reference/rest/v1beta1/projects.databases.documents/batchGet

How to Update an array without downloading all data from firebase ReactJs

Actually am new in react and am trying to create an event app in which a user can join an event
here is code for joining an event
export const JoinEvent = (id) => {
return async dispatch => {
let data = await firebase.firestore().collection('Events').doc(id).get()
let tmpArray = data.data()
let currentUser = firebase.auth().currentUser
let newArray = tmpArray.PeopleAttending
await firebase.firestore().collection('Events').doc(id).update({
PeopleAttending : {...newArray, [currentUser.uid]: {displayName : currentUser.displayName}}
})
}
}
actually i have created an action bascailly in JoinEvent an id is passed of the particular event which is clicked.
here is my firestore structure look like this..
so basically i have to download the whole data and store in local array and then add new user and then finally update
So here am basically download the whole data is there any way to just simply add new Object without downloading whole data??
thankyou
You are doing it wrong. Firestore document size limit is Maximum size for a document 1 MiB (1,048,576 bytes), so sooner or later you're going to reach that limit if you keep adding data like this. It may seems that you're not going to reach that limit, but it's very unsafe to store data that way. You can check Firestore query using an object element as parameter how to query objects in firestore documents, but I suggest you don't do it that way.
The proper way to do it, is to create a subcollection PeopleAttending on each document inside the Events collection and then use that collection to store the data.
Also you can try document set with merge or mergeFields like documented here https://googleapis.dev/nodejs/firestore/latest/DocumentReference.html#set and here https://stackoverflow.com/a/46600599/1889685.

rx.js catchup subscription from two sources

I need to combine a catch up and a subscribe to new feed. So first I query the database for all new records I've missed, then switch to a pub sub for all new records that are coming in.
The first part is easy do your query, perhaps in batches of 500, that will give you an array and you can rx.observeFrom that.
The second part is easy you just put an rx.observe on the pubsub.
But I need to do is sequentially so I need to play all the old records before I start playing the new ones coming in.
I figure I can start the subscribe to pubsub, put those in an array, then start processing the old ones, and when I'm done either remove the dups ( or since I do a dup check ) allow the few dups, but play the accumulated records until they are gone and then one in one out.
my question is what is the best way to do this? should I create a subscribe to start building up new records in an array, then start processing old, then in the "then" of the oldrecord process subscribe to the other array?
Ok this is what I have so far. I need to build up the tests and finish up some psudo code to find out if it even works, much less is a good implementation. Feel free to stop me in my tracks before I bury myself.
var catchUpSubscription = function catchUpSubscription(startFrom) {
EventEmitter.call(this);
var subscription = this.getCurrentEventsSubscription();
// calling map to start subscription and catch in an array.
// not sure if this is right
var events = rx.Observable.fromEvent(subscription, 'event').map(x=> x);
// getPastEvents gets batches of 500 iterates over and emits each
// till no more are returned, then resolves a promise
this.getPastEvents({count:500, start:startFrom})
.then(function(){
rx.Observable.fromArray(events).forEach(x=> emit('event', x));
});
};
I don't know that this is the best way. Any thoughts?
thx
I would avoid mixing your different async strategies unnecessarily. You can use concat to join together the two sequences:
var catchUpSubscription = function catchUpSubscription(startFrom) {
var subscription = this.getCurrentEventsSubscription();
return Rx.Observable.fromPromise(this.getPastEvents({count:500, start:startFrom}))
.flatMap(x => x)
.concat(Rx.Observable.fromEvent(subscription, 'event'));
};
///Sometime later
catchUpSubscription(startTime).subscribe(x => /*handle event*/)

Resources