Google Apps Script: Use array out of spreadsheet - arrays

I try to use Google Script Apps (instead of VBA which I am more used to) and managed now to create a loop over different spreadsheets (and not only different sheets in one document) using the forEach function.
(I tried with a for (r=1;r=lastRow; r++) but I did not manage).
It is working now defining the array for the sheetnames manually:
var SheetList = ["17DCu1nyyX4a6zCkkT3RfBSfo-ghoc2fXEX8chlVMv5k", "1rRGQHs_JShPSBIGFCdG6AqXM967JFhdlfQ92cf5ISL8", "1pFDyXgYmvC5gnN5AU5xJ8vGiihwtubcbG2n4LPhPACQ", "1mK_X4Q7ysJQTt8NZoZASBE5zuUllPmmWSJsxu5Dnu9Y", "1FpjIGWTG5_6MMYJF72wvoiBRp_Xlt5BDpzvSZKcsU"]
And then for information the loop:
SheetList.forEach(function(r) {
var thisSpreadsheet = SpreadsheetApp.openById(r)
var thisData = thisSpreadsheet.getSheetByName('Actions').getDataRange()
var values = thisData.getValues();
var toWorksheet = targetSpreadsheetID.getSheetByName(targetWorksheetName);
var last = toWorksheet.getLastRow ()+ 1
var toRange = toWorksheet.getRange(last, 1, thisData.getNumRows(), thisData.getNumColumns())
toRange.setValues(values);
})
Now I want to create the definition of the array "automatically" out of the spreadsheet 'List' where all spreadsheets which I want to loop are listed in column C.
I tried several ideas, but always failed.
Most optimistic ones were:
var SheetList = targetSpreadsheetID.getSheetByName('List').getRange(2,3,lastRow-2,3).getValues()
And I also tried with the array-function:
var sheetList=Array.apply(targetSpreadsheetID.getSheetByName('List').getRange(2,3,lastRow-2,3))
but all without success.
It should be possible normally in more or less one single line to import the array from the speadsheet to the Google apps scripts?
I would very much appreciate if someone could please give me a hint where my mistake is.
Thank you very much.
Maria

I still did not manage to put the array as I wanted it initially, but now I found a workable solution with the For Loop which I want to share here in case someone is looking for a similar solution (and then finds at least my workaround ;) )
for (i=2; i<lastRow;i++){
var SheetList = targetSpreadsheetID.getSheetByName('List').getRange(i,3).getValues()
Logger.log(SheetList);
var thisSpreadsheet = SpreadsheetApp.openById(SheetList);
... // the rest identical to loop above...
Don't hesitate to add your comments or advice anyhow, but I will mark the question as closed.
Thanks a lot.
Maria

Related

Want to pass in a string array into a WS.sendRequest using groovy

I am new to API testing and am using Katalon to develop the tests. I've Googled any question I could think of and couldn't find anything to answer my question.
We have an API with the following Body
[
"${idValue1}",
"${idValue2}",
"${idValue3}",
"${idValue4}",
"${idValue5}",
]
I believe the purpose of this one is to delete multiple records at once by id. The script that we have in the step definition is
response = WS.sendRequest(findTestObject('EquipmentAPI/data-objects/DELETE Equipment By Ids', [('host') : GlobalVariable.host, ('idValues') : GlobalVariable.equipId]))
GlobalVariable.equipId = WS.getElementPropertyValue(response, 'data[0].id')
There are other step definitions that run before this one to set the Global Variables for use. I was able to generate the string array without issue.
Is this something that's possible? Please help!
Please let me know if further information is needed. Thanks.

The simplest way to find the differences between two arrays

I have two arrays
print(">>>>>>>>>>>>>>>>>>>>>>deviceStartList")
dump(deviceStartList)
print(">>>>>>>>>>>>>>>>>>>>>>")
and
print(">>>>>>>>>>>>>>>>>>>>>>deviceEndList")
dump(deviceEndList)
print(">>>>>>>>>>>>>>>>>>>>>>")
It product this result:
I came from a PHP background when I can just call array_diff() , done.
How would one go about doing this?
If you are looking to do something like set subtraction, Swift has that. See Set operations (union, intersection) on Swift array?.
The title of the SO post above doesn't per se mention the methods subtract(_:) or subtracting(_:), but the content of the accepted answer does cite subtract(_:).
In Swift 4.1, You could do something like:
var deviceStartList = ["12345", "67890", "55555", "44444"]
var deviceEndList = ["12345", "55555"]
var deviceStartSet = Set<String>(deviceStartList)
var deviceEndSet = Set<String>(deviceEndList)
let devicesDiff = deviceStartSet.subtracting(deviceEndSet)
And if you need an array as the final output, you can get that by doing this:
var devicesDiffArray = [String](devicesDiff)
Here is a screenshot of a Playground with this working:
This is Set way. Hope it's working for you.
var a=["B8D7","38C9","484B", "F4B7"]
var b=["484B","F4B7"]
Set(a).symmetricDifference(Set(b))
You can try
let result = deviceStartList.filter { deviceEndList.contains($0) == false }
Also I strongly recommend the Set way as it's internally optimized rather than the usual way

Best practices to execute faster a CasperJS script that scrapes thousands of pages

I've written a CasperJS script that works very well except that it takes a (very very) long time to scrape pages.
In a nutshell, here's the pseudo code:
my functions to scrape the elements
my casper.start() to start the navigation and log in
casper.then() where I loop through an array and store my links
casper.thenOpen() to open each link and call my functions to scrap.
It works perfectly (and fast enough) for scraping a bunch of links. But when it comes to thousands (right now I'm running the script with an array of 100K links), the execution time is endless: the first 10K links have been scrapped in 3h54m10s and the following 10K in 2h18m27s.
I can explain a little bit the difference between the two 10K batches : the first includes the looping & storage of the array with the 100K links. From this point, the scripts only open pages to scrap them. However, I noticed the array was ready to go after roughly 30 minutes so it doesn't explain exactly the time gap.
I've placed my casper.thenOpen() in the for loop hoping that after each new link built and stored in the array, the scrapping will happen. Now, I'm sure I've failed this but will it change anything in terms of performance ?
That's the only lead I have in mind right now and I'd be very thankful if anyone is willing to share his/her best practices to reduce significantly the running time of the script's execution (shouldn't be hard!).
EDIT #1
Here's my code below:
var casper = require('casper').create();
var fs = require('fs');
// This array maintains a list of links to each HOL profile
// Example of a valid URL: https://myurl.com/list/74832
var root = 'https://myurl.com/list/';
var end = 0;
var limit = 100000;
var scrapedRows = [];
// Returns the selector element property if the selector exists but otherwise returns defaultValue
function querySelectorGet(selector, property, defaultValue) {
var item = document.querySelector(selector);
item = item ? item[property] : defaultValue;
return item;
}
// Scraping function
function scrapDetails(querySelectorGet) {
var info1 = querySelectorGet("div.classA h1", 'innerHTML', 'N/A').trim()
var info2 = querySelectorGet("a.classB span", 'innerHTML', 'N/A').trim()
var info3 = querySelectorGet("a.classC span", 'innerHTML', 'N/A').trim()
//For scraping different texts of the same kind (i.e: comments from users)
var commentsTags = document.querySelectorAll('div.classComments');
var comments = Array.prototype.map.call(commentsTags, function(e) {
return e.innerText;
})
// Return all the rest of the information as a JSON string
return {
info1: info1,
info2: info2,
info3: info3,
// There is no fixed number of comments & answers so we join them with a semicolon
comments : comments.join(' ; ')
};
}
casper.start('http://myurl.com/login', function() {
this.sendKeys('#username', 'username', {keepFocus: true});
this.sendKeys('#password', 'password', {keepFocus: true});
this.sendKeys('#password', casper.page.event.key.Enter, {keepFocus: true});
// Logged In
this.wait(3000,function(){
//Verify connection by printing welcome page's title
this.echo( 'Opened main site titled: ' + this.getTitle());
});
});
casper.then( function() {
//Quick summary
this.echo('# of links : ' + limit);
this.echo('scraping links ...')
for (var i = 0; i < limit; i++) {
// Building the urls to visit
var link = root + end;
// Visiting pages...
casper.thenOpen(link).then(function() {
// We pass the querySelectorGet method to use it within the webpage context
var row = this.evaluate(scrapDetails, querySelectorGet);
scrapedRows.push(row);
// Stats display
this.echo('Scraped row ' + scrapedRows.length + ' of ' + limit);
});
end++;
}
});
casper.then(function() {
fs.write('infos.json', JSON.stringify(scrapedRows), 'w')
});
casper.run( function() {
casper.exit();
});
At this point I probably have more questions than answers but let's try.
Is there a particular reason why you're using CasperJS and not Curl for example ? I can understand the need for CasperJS if you are going to scrape a site that uses Javascript for example. Or you want to take screenshots. Otherwise I would probably use Curl along with a scripting language like PHP or Python and take advantage of the built-in DOM parsing functions.
And you can of course use dedicated scraping tools like Scrapy. There are quite a few tools available.
Then the 'obvious' question: do you really need to have arrays that large ? What you are trying to achieve is not clear, I am assuming you will want to store the extracted links to a database or something. Isn't it possible to split the process in small batches ?
One thing that should help is to allocate sufficient memory by declaring a fixed-size array ie:
var theArray = new Array(1000);
Resizing the array constantly is bound to cause performance issues. Every time new items are added to the array, expensive memory allocation operations must take place in the background, and are repeated as the loop is being run.
Since you are not showing any code, so we cannot suggest meaningful improvements, just generalities.

Video.js - How to reference multiple videos in one page?

My goodness, I cannot find an answer for this and I spent several hours already.
How can you reference multiple videos at the same time in video.js?
The API documentation says:
Referencing the Player: You just need to make sure your video tag has an ID. The example embed code has an ID of "example_video_1". If you have multiple videos on one page, make sure every video tag has a unique ID.
var myPlayer = V("example_video_1");
This example shows a single ID, but it doesnt show how I can reference multiple IDs at the same time.
If I have 3 different tags: "video_1", "video_2", "video_3", how do I reference them all?
I tried an array and it didnt work. I also tried listing the videos like this:
var myPlayer = _V_("video_1", "video_2");
and didnt work neither.
Can somebody help me here?
Thank you.
You can't pass multiple ids to _V_(). Either do them one at a time:
var myPlayer1 = _V_("video_1");
var myPlayer2 = _V_("video_2");
var myPlayer3 = _V_("video_3");
Or if you want them as an array:
var myPlayers = Array(_V_("video_1"), _V_("video_2"), _V_("video_3"));
myPlayers[1].play();
Note: this was written for an older version of video.js. _V_() still works but is deprecated: use videojs() instead.
This would also work:
var video = [];
video[1] = _V_("Video1");
video[2] = _V_("Video2");
video[3] = _V_("Video3");
video[4] = _V_("Video4");
video[5] = _V_("Video5");
video[6] = _V_("Video6");
video[7] = _V_("Video7");
video[8] = _V_("Video8");
video[9] = _V_("Video9");
video[10] = _V_("Video10");

Plotting a word-cloud by date for a twitter search result? (using R)

I wish to search twitter for a word (let's say #google), and then be able to generate a tag cloud of the words used in twitts, but according to dates (for example, having a moving window of an hour, that moves by 10 minutes each time, and shows me how different words gotten more often used throughout the day).
I would appreciate any help on how to go about doing this regarding: resources for the information, code for the programming (R is the only language I am apt in using) and ideas on visualization. Questions:
How do I get the information?
In R, I found that the twitteR package has the searchTwitter command. But I don't know how big an "n" I can get from it. Also, It doesn't return the dates in which the twitt originated from.
I see here that I could get until 1500 twitts, but this requires me to do the parsing manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of twitts. Is it even possible to get them in retrospect?? (for example, asking older posts each time through the API URL ?) If not, there is the more general question of how to create a personal storage of twitts on your home computer? (a question which might be better left to another SO thread - although any insights from people here would be very interesting for me to read)
How to parse the information (in R)? I know that R has functions that could help from the rcurl and twitteR packages. But I don't know which, or how to use them. Any suggestions would be of help.
How to analyse? how to remove all the "not interesting" words? I found that the "tm" package in R has this example:
reuters <- tm_map(reuters, removeWords, stopwords("english"))
Would this do the trick? I should I do something else/more ?
Also, I imagine I would like to do that after cutting my dataset according to time (which will require some posix-like functions (which I am not exactly sure which would be needed here, or how to use it).
And lastly, there is the question of visualization. How do I create a tag cloud of the words? I found a solution for this here, any other suggestion/recommendations?
I believe I am asking a huge question here but I tried to break it to as many straightforward questions as possible. Any help will be welcomed!
Best,
Tal
Word/Tag cloud in R using "snippets" package
www.wordle.net
Using openNLP package you could pos-tag the tweets(pos=Part of speech) and then extract just the nouns, verbs or adjectives for visualization in a wordcloud.
Maybe you can query twitter and use the current system-time as a time-stamp, write to a local database and query again in increments of x secs/mins, etc.
There is historical data available at http://www.readwriteweb.com/archives/twitter_data_dump_infochimp_puts_1b_connections_up.php and http://www.wired.com/epicenter/2010/04/loc-google-twitter/
As for the plotting piece: I did a word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ using the snippets package, my code is in there. I manually pulled out certain words. Check it out and let me know if you have more specific questions.
I note that this is an old question, and there are several solutions available via web search, but here's one answer (via http://blog.ouseful.info/2012/02/15/generating-twitter-wordclouds-in-r-prompted-by-an-open-learning-blogpost/):
require(twitteR)
searchTerm='#dev8d'
#Grab the tweets
rdmTweets <- searchTwitter(searchTerm, n=500)
#Use a handy helper function to put the tweets into a dataframe
tw.df=twListToDF(rdmTweets)
##Note: there are some handy, basic Twitter related functions here:
##https://github.com/matteoredaelli/twitter-r-utils
#For example:
RemoveAtPeople <- function(tweet) {
gsub("#\\w+", "", tweet)
}
#Then for example, remove #d names
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))
##Wordcloud - scripts available from various sources; I used:
#http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/
#Call with eg: tw.c=generateCorpus(tw.df$text)
generateCorpus= function(df,my.stopwords=c()){
#Install the textmining library
require(tm)
#The following is cribbed and seems to do what it says on the can
tw.corpus= Corpus(VectorSource(df))
# remove punctuation
tw.corpus = tm_map(tw.corpus, removePunctuation)
#normalise case
tw.corpus = tm_map(tw.corpus, tolower)
# remove stopwords
tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english'))
tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)
tw.corpus
}
wordcloud.generate=function(corpus,min.freq=3){
require(wordcloud)
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq)
wc
}
print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7))
##Generate an image file of the wordcloud
png('test.png', width=600,height=600)
wordcloud.generate(generateCorpus(tweets,'dev8d'),7)
dev.off()
#We could make it even easier if we hide away the tweet grabbing code. eg:
tweets.grabber=function(searchTerm,num=500){
require(twitteR)
rdmTweets = searchTwitter(searchTerm, n=num)
tw.df=twListToDF(rdmTweets)
as.vector(sapply(tw.df$text, RemoveAtPeople))
}
#Then we could do something like:
tweets=tweets.grabber('ukgc12')
wordcloud.generate(generateCorpus(tweets),3)
I would like to answer your question in making big word cloud.
What I did is
Use s0.tweet <- searchTwitter(KEYWORD,n=1500) for 7 days or more, such as THIS.
Combine them by this command :
rdmTweets = c(s0.tweet,s1.tweet,s2.tweet,s3.tweet,s4.tweet,s5.tweet,s6.tweet,s7.tweet)
The result:
This Square Cloud consists of about 9000 tweets.
Source: People voice about Lynas Malaysia through Twitter Analysis with R CloudStat
Hope it help!

Resources