Save Google App Script state while parsing an object array and continue where left off later on - arrays

I am using this simple google app script to parse through all available Google Sites and dump the html content of individual pages. There are quite many pages so the script will eventually run into 6 minute time limit.
Is it possible to somehow use the PropertiesService to save the current progress (especially in the array loops) and continue where left off later on?
var sites = SitesApp.getAllSites("somedomain.com");
var exportFolder = DriveApp.getFolderById("a4342asd1242424folderid-");
// Cycle through all sites
for (var i in sites){
var SiteName = sites[i].getName();
var pages = sites[i].getAllDescendants();
// Create folder in Drive for each site name
var siteFolder = exportFolder.createFolder(SiteName)
for (var p in pages){
// Get page name and url
var PageUrl = pages[p].getUrl();
//Dump the raw html content in the text file
var htmlDump = pages[p].getHtmlContent();
siteFolder.createFile(PageUrl+".html", htmlDump)
}
}
I can image how one can use the Properties Service to store current line number in the Spreadsheet, and continute where left off. But how can this be done with array containing objects like Sites or Pages?

Using Objects with Properties Service
According to the quotas the maximum size of something you can store in the properties service is 9kb. With a total of 500kb. So if your object is less than this size, it should be no problem. That said, you will need to convert the object to a string with JSON.stringify() and when you retrieve it, use JSON.parse.
Working around the run time limit
What is commonly done to work around the limit is to structure a process around the properties service and triggers. Essentially you make the script keep track of time, and if it starts to take a long time, you get it to save its position and then create a trigger so that the script runs again in 10 seconds (or however long you want), for example:
function mainJob(x) {
let timeStart = new Date()
console.log("Starting at ", timeStart)
for (let i = x; i < 500000000; i++){ // NOTE THE i = x
// MAIN JOB INSTRUCTIONS
let j = i
// ...
// Check Time
let timeCheck = new Date()
if (timeCheck.getTime() - timeStart.getTime() > 30000) {
console.log("Time limit reached, i = ", i)
// Store iteration number
PropertiesService
.getScriptProperties()
.setProperty('PROGRESS', i)
console.log("stored value of i")
// Create trigger to run in 10 seconds.
ScriptApp.newTrigger("jobContinue")
.timeBased()
.after(10000)
.create()
console.log("Trigger created for 10 seconds from now")
return 0
}
}
// Reset progress counter
PropertiesService
.getScriptProperties()
.setProperty('PROGRESS', 0)
console.log("job complete")
}
function jobContinue() {
console.log("Restarting job")
previousTrigger = ScriptApp.getProjectTriggers()[0]
ScriptApp.deleteTrigger(previousTrigger)
console.log("Previous trigger deleted")
triggersRemain = ScriptApp.getProjectTriggers()
console.log("project triggers", triggersRemain)
let progress = PropertiesService
.getScriptProperties()
.getProperty('PROGRESS')
console.log("about to start main job again at i = ", progress)
mainJob(progress)
}
function startJob() {
mainJob(0)
}
Explanation
This script only has a for loop with 500 million iterations in which it assigns i to j, it is just an example of a long job that potentially goes over the run time limit.
The script is started by calling function startJob which calls mainJob(0).
Within mainJob
It starts by creating a Date object to get the start time of the mainJob.
It takes the argument 0 and uses it to initialize the for loop to 0 as you would normally initialise a for loop.
At the end of every iteration, it creates a new Date object to compare with the one created at the beginning of mainJob. In the example, it is set to see if the script has been running for 30 seconds, this can obviously be extended but keep it well below the limit.
If it has taken more than 30 seconds, it stores the value of i in the properties service and then creates a trigger to run jobContinue in 10 seconds.
After 10 seconds, the function jobContinue calls the properties service for the value for i, and calls mainJob with the value returned from the properties service.
jobContinue also deletes the trigger it just created to keep things clean.
This script should run as-is in a new project, try it out! When I run it, it takes around 80 seconds, so it runs the first time, creates a trigger, runs again, creates a trigger, runs again and then finally finishes the for loop.
References
quotas
JSON.stringify()
JSON.parse.
ScriptApp
Triggers

If you are able to process all pages of 1 site under 6 minutes then you could try saving the site names first in a sheet or props depending on the number again. And keep processing n-sites per run. Can also try SitesApp.getAllSites(domain, start, max) and save start value in props after incrementing.
Can do something similar for pages if you cannot process them under 6 minutes.
SitesApp.getAllDescendants(options)

Related

JMeter looping with indexes and property update

I'm attempting to create a test plan when a certain value is reached, then some functionality happens. The test plan consists of multiple threads running with a loop, and when some condition is reached I'd like to fire an HTTP request .
I'll drill down to the guts of it:
In my test I have logic in a looping way with multiple threads, and when a condition is met (the condition is met every 10 seconds) then I need to iterate through a value that it's value should be saved from the previous iteration - that value which I defined is a property (inside user.properties) - startIndex = 0 (initialized to 0).
So I've made a While Controller which it's condition is like this:
${__javaScript(${__P(startIndex,)}<=${currBulk},)}
And I expect the HTTP request, which depends on startIndex value inside the while to be executed when startIndex<=currBulk variable.
Inside the While Controller the HTTP request should to be fired until all indexes are covered, and I've written it like this inside BeanShell PostProcessor:
int startIndexIncInt = Integer.parseInt(props.getProperty("startIndex")); //get the initiated index of the loop
startIndexIncInt = startIndexIncInt + 1; //increment it and see if needed to fire the request again, by the original While condition
vars.put("startIndexIncIntVar", String.valueOf(startIndexIncInt));
props.put("startIndex",vars.get("startIndexIncIntVar")); //the property incremental and update
So, I designed it like in order that in the next time (after 10 more seconds) I'll have an updated startIndex that will be compared to the new currBulk (which is always updated by my test plan).
And I just cant have it done . I keep receiving errors like:
startIndexIncInt = Integer.parseInt(props.ge . . . '' : Typed variable declaration : Method Invocation Integer.parseInt
Needless to say that also the var startIndexIncIntVar I defined isn't setted (I checked via debug sampler).
Also, my problem isn't with the time entering the while, my problems are basically with the variable that I should increment and use inside my HTTP request (the while condition, and beanshell post processor script)
Just for more info on it, if I'd written it as pseudo code it would look like this:
startInc = 0
----Test plan loop----
------ test logic, currBulk incremented through the test-----
if(time condition to enter while){
while (startIndex <= currBulk){
Send HTTP request (the request depends on startIndex value)
startIndex++
}
}
Please assist
It appears to be a problem with your startIndex property as I fail to see any Beanshell script error, the code is good so my expectation is that startIndex property is unset or cannot be cast to the integer. You can get a way more information regarding the problem in your Beanshell script in 2 ways:
Add debug() command to the beginning of your script - you will see a lot of debugging output in the console window.
Put your code inside try block like:
try {
int startIndexIncInt = Integer.parseInt(props.getProperty("startIndex")); //get the initiated index of the loop
startIndexIncInt = startIndexIncInt + 1; //increment it and see if needed to fire the request again, by the original While condition
vars.put("startIndexIncIntVar", String.valueOf(startIndexIncInt));
props.put("startIndex", vars.get("startIndexIncIntVar")); //the property incremental and update
} catch (Throwable ex) {
log.error("Beanshell script failure", ex);
throw ex;
}
this way you will be able to see the cause of the problem in jmeter.log file
Actually it appears that you are overscripting as incrementing a variable can be done using built-in components like Counter test element or __counter() function. See How to Use a Counter in a JMeter Test article for more information on the domain.

rx.js catchup subscription from two sources

I need to combine a catch up and a subscribe to new feed. So first I query the database for all new records I've missed, then switch to a pub sub for all new records that are coming in.
The first part is easy do your query, perhaps in batches of 500, that will give you an array and you can rx.observeFrom that.
The second part is easy you just put an rx.observe on the pubsub.
But I need to do is sequentially so I need to play all the old records before I start playing the new ones coming in.
I figure I can start the subscribe to pubsub, put those in an array, then start processing the old ones, and when I'm done either remove the dups ( or since I do a dup check ) allow the few dups, but play the accumulated records until they are gone and then one in one out.
my question is what is the best way to do this? should I create a subscribe to start building up new records in an array, then start processing old, then in the "then" of the oldrecord process subscribe to the other array?
Ok this is what I have so far. I need to build up the tests and finish up some psudo code to find out if it even works, much less is a good implementation. Feel free to stop me in my tracks before I bury myself.
var catchUpSubscription = function catchUpSubscription(startFrom) {
EventEmitter.call(this);
var subscription = this.getCurrentEventsSubscription();
// calling map to start subscription and catch in an array.
// not sure if this is right
var events = rx.Observable.fromEvent(subscription, 'event').map(x=> x);
// getPastEvents gets batches of 500 iterates over and emits each
// till no more are returned, then resolves a promise
this.getPastEvents({count:500, start:startFrom})
.then(function(){
rx.Observable.fromArray(events).forEach(x=> emit('event', x));
});
};
I don't know that this is the best way. Any thoughts?
thx
I would avoid mixing your different async strategies unnecessarily. You can use concat to join together the two sequences:
var catchUpSubscription = function catchUpSubscription(startFrom) {
var subscription = this.getCurrentEventsSubscription();
return Rx.Observable.fromPromise(this.getPastEvents({count:500, start:startFrom}))
.flatMap(x => x)
.concat(Rx.Observable.fromEvent(subscription, 'event'));
};
///Sometime later
catchUpSubscription(startTime).subscribe(x => /*handle event*/)

Implement "load more" using threads.list() combined with 'q' = older/newer

On first sign-up I am doing a full sync for the last 50 threads in label with id INBOX.
How should I go about implementing a "load more" feature, where the user can say I would like to fetch the next 50 threads. As far as I see there are 2 possible ways to go about it:
Cache nextPageToken from initial full sync and use that to load next 50 (maxResults = 50)
Use the q parameter with older and newer - this works well for dates however I could not find if this works for absolute time.
Neither of them works for my use case in which I specifically would like to get the next 50 threads older or all threads newer than this point of time.
I would like to do this because if I fetch threads per label, and in my data model labels and threads have a many-to-many relationship, I will have date gaps in the different labels.
Here is an example: I go into a label that has messages from 2009, I fetch them. They are also in Inbox so if I go there I will see emails from October 2014 and then suddenly September 2009. My solution would be to fetch threads from All Mail newer than the oldest thread whenever I do load more or initial full sync to make sure there are no date gaps.
Also to save bandwidth, is it possible to include in the request the thread ids I already have, to not be returned in the response?
I don't think you need to overcomplicate things. If you don't do any newer, older or specific sorting the messages are ordered by date desc. For the pages I created a simple array to hold all the page tokens. Quite easy and works well (AngularJS example):
/*
* Get next page
*/
$scope.fetchNextPage = function() {
if ($scope.nextPageToken) {
$scope.page++;
$scope.pageTokenArray[$scope.page] = $scope.nextPageToken;
$scope.targetPage = $scope.pageTokenArray[$scope.page];
$scope.fetch(false);
// As we have a next page, always allow
// to go back
$scope.lastBtnDisabled = false;
}
};
/*
* Get previous page
*/
$scope.fetchLastPage = function() {
if ($scope.page > 0) {
$scope.page--;
$scope.targetPage = $scope.pageTokenArray[$scope.page];
$scope.fetch(false);
// When page is 0 now, disable last page
// button
if ($scope.page == 0) {
$scope.lastBtnDisabled = true;
} else {
$scope.lastBtnDisabled = false;
}
}
};

continuationToken is the same as the previous one

I use the Google Apps Script function DriveApp to iterate through 425 folders.
The function handles one folder each run and saves the continuationToken as a UserProperty. It runs every minute.
My problem is that after many runs (~300) the continuationToken is the same as the previous one making a loop that I cant get past.
I have made this script in many different ways but the token-issue always occur and for different folders. Even when I handled multiple folders per run.
This is basically what I do:
function name(){
var proper = PropertiesService.getUserProperties();
var alla = proper.getProperty('addLoop');
if(alla == "")
alla = DriveApp.getFolderById("ID").getFolders();
else
alla = DriveApp.continueFolderIterator(alla);
if(alla.hasNext()){
var next = alla.next();
proper.setProperty('addLoop', alla.getContinuationToken());
functionName(next);
}
else
proper.setProperty('addLoop', "");
}
I can't believe this is a scripting-error but i haven't read of any limitation for this usage of the Apps Script either... I'm clueless and happy for any input.

Delete avis and jpgs from a specified folder and older than x days

I want to use a script, that deletes AVIs and JPGs files from a specific folder. I want to filter them by date and extension. I have this script, which I think is really close to what I want, but it didn't deletes anything, it sends me an empty letter. (I know, I should comment out the trash parts, but it is for safety reasons so I will do it when my reports would look good)
function DeleteMyAVIs() {
var pageSize = 5000;
var files = null;
var token = null;
var i = null;
var SevenDaysBeforeNow = new Date().getTime()-3600*1000*24*7 ;
Logger.clear()
do {
var result = DocsList.getAllFilesForPaging(pageSize, token);
var files = DocsList.getFolder("motion").getFiles();
var token = result.getToken();
for(n=0;n<files.length;++n){
if(files[n].getName().toLowerCase().match('.avi')=='.avi' && files[n].getDateCreated().getTime() < SevenDaysBeforeNow){
//files[n].setTrashed(true)
Logger.log(files[n].getName()+' created on '+Utilities.formatDate(files[n].getDateCreated(), 'GMT','MMM-dd-yyyy'))
}
if(files[n].getName().toLowerCase().match('.mpg')=='.mpg' && files[n].getDateCreated().getTime() < SevenDaysBeforeNow){
//files[n].setTrashed(true)
Logger.log(files[n].getName()+' created on '+Utilities.formatDate(files[n].getDateCreated(), 'GMT','MMM-dd-yyyy'))
}
}
} while (files.length == pageSize);
MailApp.sendEmail('xy#gmail.com', 'Script AUTODELETE report', Logger.getLog());
}
You're not getting the files from the folder (it's fixed on the code below). Also, I recommend you get the folder by the id, which is a way more robust, because it allows you to rename the folder or move it inside others, and the code would still work.
Your match is also wrong (although it's not the reason it's not working), because the string will be converted into a regexp. And ".avi" would get any file with "avi" in it anywhere (aside from the very first 3 letters).
Lastly, DocsList token is not useful, because you cannot save it for a later execution, and we page not due to a Google Drive limitation, but Apps Script 6 minutes maximum execution time. In your case of deleting files, continuing the search is not really required since the files will not be there on the next search results anyway.
Lastly, when you're calling formatDate and passing GMT you're most likely going to shift the days by one, either one day before or after depending on where you are, unless you're really on GMT 0 hour and have no daylight saving shift (which I doubt). You should use your own real timezone or grab your script's default (like shown below).
function deleteMyAVIs() {
var pageSize = 500; //be careful with how much files you process at once, you're going to timeout
var sevenDaysAgo = Date.now()-1000*60*60*24*7;
var TZ = Session.getScriptTimeZone();
Logger.clear();
var result = DocsList.getFolderById('folder-id').getFilesForPaging(pageSize);
var files = result.getFiles();
//token = result.getToken(); //not useful and fortunately not important for your case
for( n=0;n<files.length;++n ) {
if(files[n].getName().toLowerCase().match('\\.(avi|mpg)$') && files[n].getDateCreated().getTime() < sevenDaysAgo){
//files[n].setTrashed(true)
Logger.log(files[n].getName()+' created on '+Utilities.formatDate(files[n].getDateCreated(), TZ,'MMM-dd-yyyy'))
}
}
MailApp.sendEmail('xy#gmail.com', 'Script AUTODELETE report', Logger.getLog());
}
By the way, if you ever require to continue a Drive search later on, you should use DriveApp instead of DocsList. Please see this other post where I show how to do it.

Resources