Non-blocking array reduce in NodeJS? - arrays

I have a function that takes in two very large arrays. Essentially, I am matching up orders with items that are in a warehouse available to fulfill that order. The order is an object that contains a sub array of objects of order items.
Currently I am using a reduce function to loop through the orders, then another reduce function to loop through the items in each order. Inside this nested reduce, I am doing a filter on items a customer returned so as not to give the customer a replacement with the item they just send back. I am then filtering the large array of available items to match them to the order. The large array of items is mutable since I need to mark an item used and not assign it to another item.
Here's some psudocode of what I am doing.
orders.reduce(accum, currentOrder)
{
currentOrder.items.reduce(internalAccum, currentItem)
{
const prevItems = prevOrders.filter(po => po.customerId === currentOrder.customerId;
const availItems = staticItems.filter(si => si.itemId === currentItem.itemId && !prevItems.includes(currentItem.labelId)
// Logic to assign the item to the order
}
}
All of this is running in a MESOS cluster on my server. The issue I am having is that my MESOS system is doing a health check every 10 seconds. During this working of the code, the server will stop responding for a short period of time (up to 45 seconds or so). The health check will kill the container after 3 failed attempts.
I am needing to find some way to do this complex looping without blocking the response of the health check. I have tried moving everything to a eachSerial using the async library but it still locks up. I have to do the work in order or I would have done something like async.each or async.eachLimit, but if not processed in order, then items might be assigned the same thing simultaneously.

You can do batch processing here with a promisified setImmediate so that incoming events can have a chance to execute between batches. This solution requires async/await support.
async function batchReduce(list, limit, reduceFn, initial) {
let result = initial;
let offset = 0;
while (offset < list.length) {
const batchSize = Math.min(limit, list.length - offset);
for (let i = 0; i < batchSize; i++) {
result = reduceFn(result, list[offset + i]);
}
offset += batchSize;
await new Promise(setImmediate);
}
return result;
}

Related

AppScript: 'number of columns in the data does not match the number of columns in the range.' setValues method not reading array correctly?

I'm trying to automate the collection of phone numbers from an API into a Google Sheet with app script. I can get the data and place it in an array with the following code:
const options = {
method: 'GET',
headers: {
Authorization: 'Bearer XXXXXXXXXXXXXXX',
Accept: 'Application/JSON',
}
};
var serviceUrl = "dummyurl.com/?params";
var data=UrlFetchApp.fetch(serviceUrl, options);
if(data.getResponseCode() == 200) {
var response = JSON.parse(data.getContentText());
if (response !== null){
var keys = Object.keys(response.call).length;
var phoneArray = [];
for(i = 0; i < keys; i++) {
phoneArray.push(response.call[i].caller.caller_id);
}
This works as expected - it grabs yesterday's caller ID values from a particular marketing campaign from my API. Next, I want to import this data into a column in my spreadsheet. To do this, I use the setValues method like so:
Logger.log(phoneArray);
var arrayWrapper = [];
arrayWrapper.push(phoneArray);
Logger.log(arrayWrapper);
for(i = 0; i < keys; i++) {
var sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
var cell = sheet.getRange("A8");
cell.setValues(arrayWrapper);
}
}
}
}
I am aware that I need my array length to equal the length of the selected range of cells in my sheet. However, I get conflicting errors depending on the length I set for my getRange method. If I set it to a single cell, as you see above, the error I get is:
The number of columns in the data does not match the number of columns in the range. The data has 8 but the range has 1.
However, if I set the length of my range to 8 (or any value except 1), I get the error:
The number of columns in the data does not match the number of columns in the range. The data has 1 but the range has 8.
As you see, the error swaps values. Now I have the appropriate number of columns in the range, but my script only finds 1 cell of data. When I check the log, I see that my 2D array looks normal in both cases - 8 phone numbers in an array wrapped in another array.
What is causing this error? I cannot find reference to similar errors on SO or elsewhere.
Also, please note that I'm aware this code is a little wonky (weird variables and two for loops where one would do). I've been troubleshooting this for a couple hours and was originally using setValue instead of setValues. While trying to debug it, things got split up and moved around a lot.
The dimension of your range is one row and several columns
If you push an array into another array, the dimension will be [[...],[...],[...]] - i.e. you have one column and multiple rows
What you want instead is one row and multiple columns: [[...,...,...]]
To achieve this you need to create a two-dimensional array and push all entries into the first row of your array: phoneArray[0]=[]; phoneArray[0].push(...);
Sample:
var phoneArray = [];
phoneArray[0]=[];
for(i = 0; i < keys; i++) {
var phoneNumber = response.call[i].caller.caller_id;
phoneNumber = phoneNumber.replace(/-/g,'');
phoneArray[0].push(phoneNumber);
}
var range = sheet.getRange(1,8,1, keys);
range.setValues(phoneArray);
So I figured out how to make this work, though I can't speak to why the error is occurring, or rather why one receives reversed error messages depending on the setRange value.
Rather than pushing the whole list of values from the API to phoneArray, I structured my first for loop to reset the value of phoneArray each loop and push a single value array to my arrayWrapper, like so:
for(i = 0; i < keys; i++) {
var phoneArray = [];
var phoneNumber = response.call[i].caller.caller_id;
phoneNumber = phoneNumber.replace(/-/g,'');
phoneArray.push(phoneNumber);
arrayWrapper.push(phoneArray);
}
Note that I also edited the formatting of the phone numbers to suit my needs, so I pulled each value into a variable to make replacing a character simple. What this new for loop results in is a 2D array like so:
[[1235556789],[0987776543],[0009872345]]
Rather than what I had before, which was like this:
[[1235556789,0987776543,0009872345]]
It would appear that this is how the setValues method wants its data structured, although the documentation suggests otherwise.
Regardless, if anyone were to run into similar issues, this is the gist of what must be done to fix it, or at least the method I found worked. I'm sure there are far more performant and elegant solutions than mine, but I will be dealing with dozens of rows of data, not thousands or millions. Performance isn't a big concern for me.
var correct = [[data],[data]] -
is the data structure that is required for setValues()
therefore
?.setValues(correct)

Node fast way to find in array

I have a Problem.
My script was working fine and fast, when there was only like up to 5000 Objects in my Array.
Now there over 20.000 Objects and it runs slower and slower...
This is how i called it
for(var h in ItemsCases) {
if(itmID == ItemsCases[h].sku) {
With "for" for every object and check where the sku is my itmID, cause i dont want every ItemsCases. Only few of it each time.
But what is the fastest and best way to get the items with the sku i need out of it?
I think mine, is not the fastest...
I get multiple items now with that code
var skus = res.response.cases[x].skus;
for(var j in skus) {
var itmID = skus[j];
for(var h in ItemsCases) {
if(itmID == ItemsCases[h].sku) {
the skus is also an array
ItemsCases.find(item => item.sku === itmID) (or a for loop like yours, depending on the implementation) is the fastest you can do with an array (if you can have multiple items returned, use filter instead of find).
Use a Map or an object lookup if you need to be faster than that. It does need preparation and memory, but if you are searching a lot it may well be worth it. For example, using a Map:
// preparation of the lookup
const ItemsCasesLookup = new Map();
ItemsCases.forEach(item => {
const list = ItemsCasesLookup.get(item.sku);
if (list) {
list.push(item)
} else {
ItemsCasesLookup.set(item.sku, [item]);
}
});
then later you can get all items for the same sku like this:
ItemsCasesLookup.get(itmID);
A compromise (not more memory, but some speedup) can be achieved by pre-sorting your array, then using a binary search on it, which is much faster than linear search you have to do on an unprepared array.

How to buffer and drop a chunked bytestring with a delimiter?

Lets say you have a publisher using broadcast with some fast and some slow subscribers and would like to be able to drop sets of messages for the slow subscriber without having to keep them in memory. The data consists of chunked ByteStrings, so dropping a single ByteString is not an option.
Each set of ByteStrings is followed by a terminator ByteString("\n"), so I would need to drop a set of ByteStrings ending with that.
Is that something you can do with a custom graph stage? Can it be done without aggregating and keeping the whole set in memory?
Avoid Custom Stages
Whenever possible try to avoid custom stages, they are very tricky to get correct as well as being pretty verbose. Usually some combination of the standard akka-stream stages and plain-old-functions will do the trick.
Group Dropping
Presumably you have some criteria that you will use to decide which group of messages will be dropped:
type ShouldDropTester : () => Boolean
For demonstration purposes I will use a simple switch that drops every other group:
val dropEveryOther : ShouldDropTester =
Iterator.from(1)
.map(_ % 2 == 0)
.next
We will also need a function that will take in a ShouldDropTester and use it to determine whether an individual ByteString should be dropped:
val endOfFile = ByteString("\n")
val dropGroupPredicate : ShouldDropTester => ByteString => Boolean =
(shouldDropTester) => {
var dropGroup = shouldDropTester()
(byteString) =>
if(byteString equals endOfFile) {
val returnValue = dropGroup
dropGroup = shouldDropTester()
returnValue
}
else {
dropGroup
}
}
Combining the above two functions will drop every other group of ByteStrings. This functionality can then be converted into a Flow:
val filterPredicateFunction : ByteString => Boolean =
dropGroupPredicate(dropEveryOther)
val dropGroups : Flow[ByteString, ByteString, _] =
Flow[ByteString] filter filterPredicateFunction
As required: the group of messages do not need to be buffered, the predicate will work on individual ByteStrings and therefore consumes a constant amount of memory regardless of file size.

In Firebase, is there a way to get the number of children of a node without loading all the node data?

You can get the child count via
firebase_node.once('value', function(snapshot) { alert('Count: ' + snapshot.numChildren()); });
But I believe this fetches the entire sub-tree of that node from the server. For huge lists, that seems RAM and latency intensive. Is there a way of getting the count (and/or a list of child names) without fetching the whole thing?
The code snippet you gave does indeed load the entire set of data and then counts it client-side, which can be very slow for large amounts of data.
Firebase doesn't currently have a way to count children without loading data, but we do plan to add it.
For now, one solution would be to maintain a counter of the number of children and update it every time you add a new child. You could use a transaction to count items, like in this code tracking upvodes:
var upvotesRef = new Firebase('https://docs-examples.firebaseio.com/android/saving-data/fireblog/posts/-JRHTHaIs-jNPLXOQivY/upvotes');
upvotesRef.transaction(function (current_value) {
return (current_value || 0) + 1;
});
For more info, see https://www.firebase.com/docs/transactions.html
UPDATE:
Firebase recently released Cloud Functions. With Cloud Functions, you don't need to create your own Server. You can simply write JavaScript functions and upload it to Firebase. Firebase will be responsible for triggering functions whenever an event occurs.
If you want to count upvotes for example, you should create a structure similar to this one:
{
"posts" : {
"-JRHTHaIs-jNPLXOQivY" : {
"upvotes_count":5,
"upvotes" : {
"userX" : true,
"userY" : true,
"userZ" : true,
...
}
}
}
}
And then write a javascript function to increase the upvotes_count when there is a new write to the upvotes node.
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp(functions.config().firebase);
exports.countlikes = functions.database.ref('/posts/$postid/upvotes').onWrite(event => {
return event.data.ref.parent.child('upvotes_count').set(event.data.numChildren());
});
You can read the Documentation to know how to Get Started with Cloud Functions.
Also, another example of counting posts is here:
https://github.com/firebase/functions-samples/blob/master/child-count/functions/index.js
Update January 2018
The firebase docs have changed so instead of event we now have change and context.
The given example throws an error complaining that event.data is undefined. This pattern seems to work better:
exports.countPrescriptions = functions.database.ref(`/prescriptions`).onWrite((change, context) => {
const data = change.after.val();
const count = Object.keys(data).length;
return change.after.ref.child('_count').set(count);
});
```
This is a little late in the game as several others have already answered nicely, but I'll share how I might implement it.
This hinges on the fact that the Firebase REST API offers a shallow=true parameter.
Assume you have a post object and each one can have a number of comments:
{
"posts": {
"$postKey": {
"comments": {
...
}
}
}
}
You obviously don't want to fetch all of the comments, just the number of comments.
Assuming you have the key for a post, you can send a GET request to
https://yourapp.firebaseio.com/posts/[the post key]/comments?shallow=true.
This will return an object of key-value pairs, where each key is the key of a comment and its value is true:
{
"comment1key": true,
"comment2key": true,
...,
"comment9999key": true
}
The size of this response is much smaller than requesting the equivalent data, and now you can calculate the number of keys in the response to find your value (e.g. commentCount = Object.keys(result).length).
This may not completely solve your problem, as you are still calculating the number of keys returned, and you can't necessarily subscribe to the value as it changes, but it does greatly reduce the size of the returned data without requiring any changes to your schema.
Save the count as you go - and use validation to enforce it. I hacked this together - for keeping a count of unique votes and counts which keeps coming up!. But this time I have tested my suggestion! (notwithstanding cut/paste errors!).
The 'trick' here is to use the node priority to as the vote count...
The data is:
vote/$issueBeingVotedOn/user/$uniqueIdOfVoter = thisVotesCount, priority=thisVotesCount
vote/$issueBeingVotedOn/count = 'user/'+$idOfLastVoter, priority=CountofLastVote
,"vote": {
".read" : true
,".write" : true
,"$issue" : {
"user" : {
"$user" : {
".validate" : "!data.exists() &&
newData.val()==data.parent().parent().child('count').getPriority()+1 &&
newData.val()==newData.GetPriority()"
user can only vote once && count must be one higher than current count && data value must be same as priority.
}
}
,"count" : {
".validate" : "data.parent().child(newData.val()).val()==newData.getPriority() &&
newData.getPriority()==data.getPriority()+1 "
}
count (last voter really) - vote must exist and its count equal newcount, && newcount (priority) can only go up by one.
}
}
Test script to add 10 votes by different users (for this example, id's faked, should user auth.uid in production). Count down by (i--) 10 to see validation fail.
<script src='https://cdn.firebase.com/v0/firebase.js'></script>
<script>
window.fb = new Firebase('https:...vote/iss1/');
window.fb.child('count').once('value', function (dss) {
votes = dss.getPriority();
for (var i=1;i<10;i++) vote(dss,i+votes);
} );
function vote(dss,count)
{
var user='user/zz' + count; // replace with auth.id or whatever
window.fb.child(user).setWithPriority(count,count);
window.fb.child('count').setWithPriority(user,count);
}
</script>
The 'risk' here is that a vote is cast, but the count not updated (haking or script failure). This is why the votes have a unique 'priority' - the script should really start by ensuring that there is no vote with priority higher than the current count, if there is it should complete that transaction before doing its own - get your clients to clean up for you :)
The count needs to be initialised with a priority before you start - forge doesn't let you do this, so a stub script is needed (before the validation is active!).
write a cloud function to and update the node count.
// below function to get the given node count.
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp(functions.config().firebase);
exports.userscount = functions.database.ref('/users/')
.onWrite(event => {
console.log('users number : ', event.data.numChildren());
return event.data.ref.parent.child('count/users').set(event.data.numChildren());
});
Refer :https://firebase.google.com/docs/functions/database-events
root--|
|-users ( this node contains all users list)
|
|-count
|-userscount :
(this node added dynamically by cloud function with the user count)

Script runtime execution time limit

My Google Apps Script is iterating through the user's Google Drive files and copying and sometimes moving files to other folders. The script is always stopped after certain minutes with no error message in the log.
EDITOR's NOTE: The time limit have varied over the time and might vary between "consumer" (free) and "Workspace" (paid) accounts but as of December 2022 most of the answers are still valid.
I am sorting tens or sometimes thousands files in one run.
Are there any settings or workarounds?
One thing you could do (this of course depends on what you are trying to accomplish) is:
Store the necessary information (i.e. like a loop counter) in a spreadsheet or another permanent store(i.e. ScriptProperties).
Have your script terminate every five minutes or so.
Set up a time driven trigger to run the script every five minutes(or create a trigger programmatically using the Script service).
On each run read the saved data from the permanent store you've used and continue to run the script from where it left off.
This is not a one-size-fit-all solution, if you post your code people would be able to better assist you.
Here is a simplified code excerpt from a script that I use every day:
function runMe() {
var startTime= (new Date()).getTime();
//do some work here
var scriptProperties = PropertiesService.getScriptProperties();
var startRow= scriptProperties.getProperty('start_row');
for(var ii = startRow; ii <= size; ii++) {
var currTime = (new Date()).getTime();
if(currTime - startTime >= MAX_RUNNING_TIME) {
scriptProperties.setProperty("start_row", ii);
ScriptApp.newTrigger("runMe")
.timeBased()
.at(new Date(currTime+REASONABLE_TIME_TO_WAIT))
.create();
break;
} else {
doSomeWork();
}
}
//do some more work here
}
NOTE#1: The variable REASONABLE_TIME_TO_WAIT should be large enough for the new trigger to fire. (I set it to 5 minutes but I think it could be less than that).
NOTE#2: doSomeWork() must be a function that executes relatively quick( I would say less than 1 minute ).
NOTE#3 : Google has deprecated Script Properties, and introduced Properties Service in its stead. The function has been modified accordingly.
NOTE#4: 2nd time when the function is called, it takes the ith value of for loop as a string. so you have to convert it into an integer
Quotas
The maximum execution time for a single script is 6 mins / execution
- https://developers.google.com/apps-script/guides/services/quotas
But there are other limitations to familiarize yourself with. For example, you're only allowed a total trigger runtime of 1 hour / day, so you can't just break up a long function into 12 different 5 minute blocks.
Optimization
That said, there are very few reasons why you'd really need to take six minutes to execute. JavaScript should have no problem sorting thousands of rows of data in a couple seconds. What's likely hurting your performance are service calls to Google Apps itself.
You can write scripts to take maximum advantage of the built-in caching, by minimizing the number of reads and writes. Alternating read and write commands is slow. To speed up a script, read all data into an array with one command, perform any operations on the data in the array, and write the data out with one command.
- https://developers.google.com/apps-script/best_practices
Batching
The best thing you can possibly do is reduce the number of service calls. Google enables this by allowing batch versions of most of their API calls.
As a trivial example, Instead of this:
for (var i = 1; i <= 100; i++) {
SpreadsheetApp.getActiveSheet().deleteRow(i);
}
Do this:
SpreadsheetApp.getActiveSheet().deleteRows(i, 100);
In the first loop, not only did you need 100 calls to deleteRow on the sheet, but you also needed to get the active sheet 100 times as well. The second variation should perform several orders of magnitude better than the first.
Interweaving Reads and Writes
Additionally, you should also be very careful to not go back and forth frequently between reading and writing. Not only will you lose potential gains in batch operations, but Google won't be able to use its built-in caching.
Every time you do a read, we must first empty (commit) the write cache to ensure that you're reading the latest data (you can force a write of the cache by calling SpreadsheetApp.flush()). Likewise, every time you do a write, we have to throw away the read cache because it's no longer valid. Therefore if you can avoid interleaving reads and writes, you'll get full benefit of the cache.
- http://googleappsscript.blogspot.com/2010/06/optimizing-spreadsheet-operations.html
For example, instead of this:
sheet.getRange("A1").setValue(1);
sheet.getRange("B1").setValue(2);
sheet.getRange("C1").setValue(3);
sheet.getRange("D1").setValue(4);
Do this:
sheet.getRange("A1:D1").setValues([[1,2,3,4]]);
Chaining Function Calls
As a last resort, if your function really can't finish in under six minutes, you can chain together calls or break up your function to work on a smaller segment of data.
You can store data in the Cache Service (temporary) or Properties Service (permanent) buckets for retrieval across executions (since Google Apps Scripts has a stateless execution).
If you want to kick off another event, you can create your own trigger with the Trigger Builder Class or setup a recurring trigger on a tight time table.
Also, try to minimize the amount of calls to google services. For example, if you want to change a range of cells in the spreadsheets, don't read each one, mutate it and store it back.
Instead read the whole range (using Range.getValues()) into memory, mutate it and store all of it at once (using Range.setValues()).
This should save you a lot of execution time.
Anton Soradoi's answer seems OK but consider using Cache Service instead of storing data into a temporary sheet.
function getRssFeed() {
var cache = CacheService.getPublicCache();
var cached = cache.get("rss-feed-contents");
if (cached != null) {
return cached;
}
var result = UrlFetchApp.fetch("http://example.com/my-slow-rss-feed.xml"); // takes 20 seconds
var contents = result.getContentText();
cache.put("rss-feed-contents", contents, 1500); // cache for 25 minutes
return contents;
}
Also note that as of April 2014 the limitation of script runtime is 6 minutes.
G Suite Business / Enterprise / Education and Early Access users:
As of August 2018, max script runtime is now set to 30 minutes for these users.
Figure out a way to split up your work so it takes less than 6 minutes, as that's the limit for any script. On the first pass, you can iterate and store the list of files and folders in a spreadsheet and add a time-driven trigger for part 2.
In part 2, delete each entry in the list as you process it. When there are no items in the list, delete the trigger.
This is how I'm processing a sheet of about 1500 rows that gets spread to about a dozen different spreadsheets. Because of the number of calls to spreadsheets, it times out, but continues when the trigger runs again.
I have used the ScriptDB to save my place while processing a large amount of information in a loop. The script can/does exceed the 5 minute limit. By updating the ScriptDb during each run, the script can read the state from the db and pick up where it left off until all processing is complete. Give this strategy a try and I think you'll be pleased with the results.
If you are using G Suite Business or Enterprise edition.
You can register early access for App Maker after App maker enabled your script run runtime will increase run time from 6 minutes to 30 minutes :)
More details about app maker Click here
Here's an approach based very heavily on Dmitry Kostyuk's absolutely excellent article on the subject.
It differs in that it doesn't attempt to time execution and exit gracefully. Rather, it deliberately spawns a new thread every minute, and lets them run until they are timed out by Google. This gets round the maximum execution time limit, and speeds things up by running processing in several threads in parallel. (This speeds things up even if you are not hitting execution time limits.)
It tracks the task status in script properties, plus a semaphore to ensure no two threads are editing the task status at any one time. (It uses several properties as they are limited to 9k each.)
I have tried to mimick the Google Apps Script iterator.next() API, but cannot use iterator.hasNext() as that would not be thread-safe (see TOCTOU). It uses a couple of facade classes at the bottom.
I would be immensely grateful for any suggestions. This is working well for me, halving the processing time by spawning three parallel threads to run through a directory of documents. You could spawn 20 within quota, but this was ample for my use case.
The class is designed to be drop-in, usable for any purpose without modification. The only thing the user must do is when processing a file, delete any outputs from prior, timed out attempts. The iterator will return a given fileId more than once if a processing task is timed out by Google before it completes.
To silence the logging, it all goes through the log() function at the bottom.
This is how you use it:
const main = () => {
const srcFolder = DriveApp.getFoldersByName('source folder',).next()
const processingMessage = processDocuments(srcFolder, 'spawnConverter')
log('main() finished with message', processingMessage)
}
const spawnConverter = e => {
const processingMessage = processDocuments()
log('spawnConverter() finished with message', processingMessage)
}
const processDocuments = (folder = null, spawnFunction = null) => {
// folder and spawnFunction are only passed the first time we trigger this function,
// threads spawned by triggers pass nothing.
// 10,000 is the maximum number of milliseconds a file can take to process.
const pfi = new ParallelFileIterator(10000, MimeType.GOOGLE_DOCS, folder, spawnFunction)
let fileId = pfi.nextId()
const doneDocs = []
while (fileId) {
const fileRelativePath = pfi.getFileRelativePath(fileId)
const doc = DocumentApp.openById(fileId)
const mc = MarkupConverter(doc)
// This is my time-consuming task:
const mdContent = mc.asMarkdown(doc)
pfi.completed(fileId)
doneDocs.push([...fileRelativePath, doc.getName() + '.md'].join('/'))
fileId = pfi.nextId()
}
return ('This thread did:\r' + doneDocs.join('\r'))
}
Here's the code:
const ParallelFileIterator = (function() {
/**
* Scans a folder, depth first, and returns a file at a time of the given mimeType.
* Uses ScriptProperties so that this class can be used to process files by many threads in parallel.
* It is the responsibility of the caller to tidy up artifacts left behind by processing threads that were timed out before completion.
* This class will repeatedly dispatch a file until .completed(fileId) is called.
* It will wait maxDurationOneFileMs before re-dispatching a file.
* Note that Google Apps kills scripts after 6 mins, or 30 mins if you're using a Workspace account, or 45 seconds for a simple trigger, and permits max 30
* scripts in parallel, 20 triggers per script, and 90 mins or 6hrs of total trigger runtime depending if you're using a Workspace account.
* Ref: https://developers.google.com/apps-script/guides/services/quotas
maxDurationOneFileMs, mimeType, parentFolder=null, spawnFunction=null
* #param {Number} maxDurationOneFileMs A generous estimate of the longest a file can take to process.
* #param {string} mimeType The mimeType of the files required.
* #param {Folder} parentFolder The top folder containing all the files to process. Only passed in by the first thread. Later spawned threads pass null (the files have already been listed and stored in properties).
* #param {string} spawnFunction The name of the function that will spawn new processing threads. Only passed in by the first thread. Later spawned threads pass null (a trigger can't create a trigger).
*/
class ParallelFileIterator {
constructor(
maxDurationOneFileMs,
mimeType,
parentFolder = null,
spawnFunction = null,
) {
log(
'Enter ParallelFileIterator constructor',
maxDurationOneFileMs,
mimeType,
spawnFunction,
parentFolder ? parentFolder.getName() : null,
)
// singleton
if (ParallelFileIterator.instance) return ParallelFileIterator.instance
if (parentFolder) {
_cleanUp()
const t0 = Now.asTimestamp()
_getPropsLock(maxDurationOneFileMs)
const t1 = Now.asTimestamp()
const { fileIds, fileRelativePaths } = _catalogFiles(
parentFolder,
mimeType,
)
const t2 = Now.asTimestamp()
_setQueues(fileIds, [])
const t3 = Now.asTimestamp()
this.fileRelativePaths = fileRelativePaths
ScriptProps.setAsJson(_propsKeyFileRelativePaths, fileRelativePaths)
const t4 = Now.asTimestamp()
_releasePropsLock()
const t5 = Now.asTimestamp()
if (spawnFunction) {
// only triggered on the first thread
const trigger = Trigger.create(spawnFunction, 1)
log(
`Trigger once per minute: UniqueId: ${trigger.getUniqueId()}, EventType: ${trigger.getEventType()}, HandlerFunction: ${trigger.getHandlerFunction()}, TriggerSource: ${trigger.getTriggerSource()}, TriggerSourceId: ${trigger.getTriggerSourceId()}.`,
)
}
log(
`PFI instantiated for the first time, has found ${
fileIds.length
} documents to process. getPropsLock took ${t1 -
t0}ms, _catalogFiles took ${t2 - t1}ms, setQueues took ${t3 -
t2}ms, setAsJson took ${t4 - t3}ms, releasePropsLock took ${t5 -
t4}ms, trigger creation took ${Now.asTimestamp() - t5}ms.`,
)
} else {
const t0 = Now.asTimestamp()
// wait for first thread to set up Properties
while (!ScriptProps.getJson(_propsKeyFileRelativePaths)) {
Utilities.sleep(250)
}
this.fileRelativePaths = ScriptProps.getJson(_propsKeyFileRelativePaths)
const t1 = Now.asTimestamp()
log(
`PFI instantiated again to run in parallel. getJson(paths) took ${t1 -
t0}ms`,
)
spawnFunction
}
_internals.set(this, { maxDurationOneFileMs: maxDurationOneFileMs })
// to get: _internal(this, 'maxDurationOneFileMs')
ParallelFileIterator.instance = this
return ParallelFileIterator.instance
}
nextId() {
// returns false if there are no more documents
const maxDurationOneFileMs = _internals.get(this).maxDurationOneFileMs
_getPropsLock(maxDurationOneFileMs)
let { pending, dispatched } = _getQueues()
log(
`PFI.nextId: ${pending.length} files pending, ${
dispatched.length
} dispatched, ${Object.keys(this.fileRelativePaths).length -
pending.length -
dispatched.length} completed.`,
)
if (pending.length) {
// get first pending Id, (ie, deepest first)
const nextId = pending.shift()
dispatched.push([nextId, Now.asTimestamp()])
_setQueues(pending, dispatched)
_releasePropsLock()
return nextId
} else if (dispatched.length) {
log(`PFI.nextId: Get first dispatched Id, (ie, oldest first)`)
let startTime = dispatched[0][1]
let timeToTimeout = startTime + maxDurationOneFileMs - Now.asTimestamp()
while (dispatched.length && timeToTimeout > 0) {
log(
`PFI.nextId: None are pending, and the oldest dispatched one hasn't yet timed out, so wait ${timeToTimeout}ms to see if it will`,
)
_releasePropsLock()
Utilities.sleep(timeToTimeout + 500)
_getPropsLock(maxDurationOneFileMs)
;({ pending, dispatched } = _getQueues())
if (pending && dispatched) {
if (dispatched.length) {
startTime = dispatched[0][1]
timeToTimeout =
startTime + maxDurationOneFileMs - Now.asTimestamp()
}
}
}
// We currently still have the PropsLock
if (dispatched.length) {
const nextId = dispatched.shift()[0]
log(
`PFI.nextId: Document id ${nextId} has timed out; reset start time, move to back of queue, and re-dispatch`,
)
dispatched.push([nextId, Now.asTimestamp()])
_setQueues(pending, dispatched)
_releasePropsLock()
return nextId
}
}
log(`PFI.nextId: Both queues empty, all done!`)
;({ pending, dispatched } = _getQueues())
if (pending.length || dispatched.length) {
log(
"ERROR: All documents should be completed, but they're not. Giving up.",
pending,
dispatched,
)
}
_cleanUp()
return false
}
completed(fileId) {
_getPropsLock(_internals.get(this).maxDurationOneFileMs)
const { pending, dispatched } = _getQueues()
const newDispatched = dispatched.filter(el => el[0] !== fileId)
if (dispatched.length !== newDispatched.length + 1) {
log(
'ERROR: A document was completed, but not found in the dispatched list.',
fileId,
pending,
dispatched,
)
}
if (pending.length || newDispatched.length) {
_setQueues(pending, newDispatched)
_releasePropsLock()
} else {
log(`PFI.completed: Both queues empty, all done!`)
_cleanUp()
}
}
getFileRelativePath(fileId) {
return this.fileRelativePaths[fileId]
}
}
// ============= PRIVATE MEMBERS ============= //
const _propsKeyLock = 'PropertiesLock'
const _propsKeyDispatched = 'Dispatched'
const _propsKeyPending = 'Pending'
const _propsKeyFileRelativePaths = 'FileRelativePaths'
// Not really necessary for a singleton, but in case code is changed later
var _internals = new WeakMap()
const _cleanUp = (exceptProp = null) => {
log('Enter _cleanUp', exceptProp)
Trigger.deleteAll()
if (exceptProp) {
ScriptProps.deleteAllExcept(exceptProp)
} else {
ScriptProps.deleteAll()
}
}
const _catalogFiles = (folder, mimeType, relativePath = []) => {
// returns IDs of all matching files in folder, depth first
log(
'Enter _catalogFiles',
folder.getName(),
mimeType,
relativePath.join('/'),
)
let fileIds = []
let fileRelativePaths = {}
const folders = folder.getFolders()
let subFolder
while (folders.hasNext()) {
subFolder = folders.next()
const results = _catalogFiles(subFolder, mimeType, [
...relativePath,
subFolder.getName(),
])
fileIds = fileIds.concat(results.fileIds)
fileRelativePaths = { ...fileRelativePaths, ...results.fileRelativePaths }
}
const files = folder.getFilesByType(mimeType)
while (files.hasNext()) {
const fileId = files.next().getId()
fileIds.push(fileId)
fileRelativePaths[fileId] = relativePath
}
return { fileIds: fileIds, fileRelativePaths: fileRelativePaths }
}
const _getQueues = () => {
const pending = ScriptProps.getJson(_propsKeyPending)
const dispatched = ScriptProps.getJson(_propsKeyDispatched)
log('Exit _getQueues', pending, dispatched)
// Note: Empty lists in Javascript are truthy, but if Properties have been deleted by another thread they'll be null here, which are falsey
return { pending: pending || [], dispatched: dispatched || [] }
}
const _setQueues = (pending, dispatched) => {
log('Enter _setQueues', pending, dispatched)
ScriptProps.setAsJson(_propsKeyPending, pending)
ScriptProps.setAsJson(_propsKeyDispatched, dispatched)
}
const _getPropsLock = maxDurationOneFileMs => {
// will block until lock available or lock times out (because a script may be killed while holding a lock)
const t0 = Now.asTimestamp()
while (
ScriptProps.getNum(_propsKeyLock) + maxDurationOneFileMs >
Now.asTimestamp()
) {
Utilities.sleep(2000)
}
ScriptProps.set(_propsKeyLock, Now.asTimestamp())
log(`Exit _getPropsLock: took ${Now.asTimestamp() - t0}ms`)
}
const _releasePropsLock = () => {
ScriptProps.delete(_propsKeyLock)
log('Exit _releasePropsLock')
}
return ParallelFileIterator
})()
const log = (...args) => {
// easier to turn off, json harder to read but easier to hack with
console.log(args.map(arg => JSON.stringify(arg)).join(';'))
}
class Trigger {
// Script triggering facade
static create(functionName, everyMinutes) {
return ScriptApp.newTrigger(functionName)
.timeBased()
.everyMinutes(everyMinutes)
.create()
}
static delete(e) {
if (typeof e !== 'object') return log(`${e} is not an event object`)
if (!e.triggerUid)
return log(`${JSON.stringify(e)} doesn't have a triggerUid`)
ScriptApp.getProjectTriggers().forEach(trigger => {
if (trigger.getUniqueId() === e.triggerUid) {
log('deleting trigger', e.triggerUid)
return ScriptApp.delete(trigger)
}
})
}
static deleteAll() {
// Deletes all triggers in the current project.
var triggers = ScriptApp.getProjectTriggers()
for (var i = 0; i < triggers.length; i++) {
ScriptApp.deleteTrigger(triggers[i])
}
}
}
class ScriptProps {
// properties facade
static set(key, value) {
if (value === null || value === undefined) {
ScriptProps.delete(key)
} else {
PropertiesService.getScriptProperties().setProperty(key, value)
}
}
static getStr(key) {
return PropertiesService.getScriptProperties().getProperty(key)
}
static getNum(key) {
// missing key returns Number(null), ie, 0
return Number(ScriptProps.getStr(key))
}
static setAsJson(key, value) {
return ScriptProps.set(key, JSON.stringify(value))
}
static getJson(key) {
return JSON.parse(ScriptProps.getStr(key))
}
static delete(key) {
PropertiesService.getScriptProperties().deleteProperty(key)
}
static deleteAll() {
PropertiesService.getScriptProperties().deleteAllProperties()
}
static deleteAllExcept(key) {
PropertiesService.getScriptProperties()
.getKeys()
.forEach(curKey => {
if (curKey !== key) ScriptProps.delete(key)
})
}
}
If you're a business customer, you can now sign up for Early Access to App Maker, which includes Flexible Quotas.
Under the flexible quota system, such hard quota limits are removed. Scripts do not stop when they reach a quota limit. Rather, they are delayed until quota becomes available, at which point the script execution resumes. Once quotas begin being used, they are refilled at a regular rate. For reasonable usage, script delays are rare.
If you are using G Suite as a Business, Enterprise or EDU customer the execution time for running scripts is set to:
30 min / execution
See: https://developers.google.com/apps-script/guides/services/quotas
The idea would be to exit gracefully from the script, save your progress, create a trigger to start again from where you left off, repeat as many times as necessary and then once finished clean up the trigger and any temporary files.
Here is a detailed article on this very topic.
As many people mentioned, the generic solution to this problem is to execute your method across multiple sessions. I found it to be a common problem that I have a bunch of iterations I need to loop over, and I don't want the hassle of writing/maintaining the boilerplate of creating new sessions.
Therefore I created a general solution:
/**
* Executes the given function across multiple sessions to ensure there are no timeouts.
*
* See https://stackoverflow.com/a/71089403.
*
* #param {Int} items - The items to iterate over.
* #param {function(Int)} fn - The function to execute each time. Takes in an item from `items`.
* #param {String} resumeFunctionName - The name of the function (without arguments) to run between sessions. Typically this is the same name of the function that called this method.
* #param {Int} maxRunningTimeInSecs - The maximum number of seconds a script should be able to run. After this amount, it will start a new session. Note: This must be set to less than the actual timeout as defined in https://developers.google.com/apps-script/guides/services/quotas (e.g. 6 minutes), otherwise it can't set up the next call.
* #param {Int} timeBetweenIterationsInSeconds - The amount of time between iterations of sessions. Note that Google Apps Script won't honor this 100%, as if you choose a 1 second delay, it may actually take a minute or two before it actually executes.
*/
function iterateAcrossSessions(items, fn, resumeFunctionName, maxRunningTimeInSeconds = 5 * 60, timeBetweenIterationsInSeconds = 1) {
const PROPERTY_NAME = 'iterateAcrossSessions_index';
let scriptProperties = PropertiesService.getScriptProperties();
let startTime = (new Date()).getTime();
let startIndex = parseInt(scriptProperties.getProperty(PROPERTY_NAME));
if (Number.isNaN(startIndex)) {
startIndex = 0;
}
for (let i = startIndex; i < items.length; i++) {
console.info(`[iterateAcrossSessions] Executing for i = ${i}.`)
fn(items[i]);
let currentTime = (new Date()).getTime();
let elapsedTime = currentTime - startTime;
let maxRunningTimeInMilliseconds = maxRunningTimeInSeconds * 1000;
if (maxRunningTimeInMilliseconds <= elapsedTime) {
let newTime = new Date(currentTime + timeBetweenIterationsInSeconds * 1000);
console.info(`[iterateAcrossSessions] Creating new session for i = ${i+1} at ${newTime}, since elapsed time was ${elapsedTime}.`);
scriptProperties.setProperty(PROPERTY_NAME, i+1);
ScriptApp.newTrigger(resumeFunctionName).timeBased().at(newTime).create();
return;
}
}
console.log(`[iterateAcrossSessions] Done iterating over items.`);
// Reset the property here to ensure that the execution loop could be restarted.
scriptProperties.deleteProperty(PROPERTY_NAME);
}
You can now use this pretty easily like so:
let ITEMS = ['A', 'B', 'C'];
function execute() {
iterateAcrossSessions(
ITEMS,
(item) => {
console.log(`Hello world ${item}`);
},
"execute");
}
It'll automatically execute the internal lambda for each value in ITEMS, seamlessly spreading across sessions as needed.
For example, if you use a 0-second maxRunningTime it would run across 4 sessions with the following outputs:
[iterateAcrossSessions] Executing for i = 0.
Hello world A
[iterateAcrossSessions] Creating new session for i = 1.
[iterateAcrossSessions] Executing for i = 1.
Hello world B
[iterateAcrossSessions] Creating new session for i = 2.
[iterateAcrossSessions] Executing for i = 2.
Hello world C
[iterateAcrossSessions] Creating new session for i = 3.
[iterateAcrossSessions] Done iterating over items.

Resources