What's an efficient way to find the last web element on a web page using RSelenium? - rselenium

I'm using RSelenium to automatically scroll down a social media website and save posts. I want to find the last instance of a web element (specifically, the post dates), but it's taking unacceptably long using my current method (using findElements() to return all post dates then extracting the last one - see code below) if I've scrolled a long way down the page.
Can anyone recommend a fast way to find the last web element (specifically, the post date) on a web page? For example, is there a way you can use findElement() (which searches for the first match) such that it starts from the bottom of the page rather than the top? Any suggestions welcome.
Here's a trivial example of my code, but it takes an unacceptably long time if I've scrolled a long way down the page.
# Load webpage of interest
library(RSelenium)
library(rvest)
rD = rsDriver(browser = "firefox")
remDr = rD[["client"]]
url = paste0("https://stocktwits.com/symbol/NZDCHF")
remDr$navigate(url)
# Scroll down page three times, loading new content each time.
for (i in 1:3) { #Only scrolling 3 times for illustration
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(2) #delay by 3sec to give chance to load
}
# Get date of last post. WORKS BUT TAKES FOREVER IF I'VE SCROLLED MANY TIMES
e = remDr$findElements("css", ".message-date")
last_date = e[[length(e)]]$getElementText()

I found a messy workaround to solve the above. Specifically, I used the below to get the last message (which included the date), then wrote a regular expression to extract the last date.
last_child = remDriver$findElement(using = "css selector",
value = ".messageli:last-child")
last_child = unlist(last_child$getElementText())

Related

Inner loops for column 2 in imacros

I am having a problem in form filling. I have two column, both have data, I made a loop for that, but I want a inner loop for column 2, like first it select the row 2 from column 1 and then select data from column 2 until the column 2 over.
For example, I have 10 entries in column one and 20 entries in column 2
outer loop column 1
inner loop column 2
inner loop ends if data in column 2 not found
again repeat
Here is the code:
VERSION BUILD=10022823
TAB T=1
SET !DATASOURCE F:\tgif.csv
SET !LOOP 2
SET !DATASOURCE_LINE {{!LOOP}}
URL GOTO={{!COL1}}
SIZE X=1392 Y=863
WAIT SECONDS=2.797
DS CMD=CLICK X=589 Y=396 CONTENT=
WAIT SECONDS=2
DS CMD=KEY CONTENT={{!COL2}}
WAIT SECONDS=5
TAG POS=1 TYPE=BUTTON:SUBMIT FORM=ACTION:/ajax/updatestatus.php?av=100009092062416 ATTR=TXT:Post
WAIT SECONDS=5
For more complicated iMacro scripts (e.g. with looping and so forth), you can use the iMacro Firefox add-on to run Javascript scripts. Also, in general, for better control it's almost always better to use TAG that targets more specific element attributes (e.g. class, id, etc) instead of by position.
What you want to do sounds like it can be done through a nested loop. If firstField and secondField are arrays, then you could do something like:
for (var i=0, alength = firstField.length; i < alength; i++){
//select first column
for (var j=0, blength = secondField.length; j < blength; j++){
//select cell in second column, with the same first column #
}
}
To play iMacro code through javascript, you can use iimPlay:
iimPlay("CODE: " + m) //for mac, for windows, you can take out the "CODE: " part, if I remember correctly. Or it might be the other way around!
I'm revamping it, so the inclusion part isn't working yet (and some features haven't been added completely)...but I'm working on a javascript library to work with iMacros. You can see it here:
https://github.com/anonmily/iMacroLibrary
If you want, you can use segments of the code or just as help with how to use Javascript with iMacros. To use it, right now, you can just copy what's inside the iMacroLibrary.js document, then add on your own code afterwards. You can minify everything in the library after the "Define unsafewindow" section so that is' a single line, saving space. Then, you can use CSS selectors to select elements on the page and interact with them. You can also import and export CSV data too.
var Year = $M('.yearinput');
var Make = #M('.makeinput');
//or
var Year = $M('select',1); //the first dropdown (select) element
var Make = $M('select',2); //the second dropdown (select) element
Year.click();
Year.extract('TXT')
Make.extract('TXTALL'); //the same as Make.extractAll()
To export or import CSV data:
exportcsv = function(csvarray, filename);
importcsv = function(file_name, line_num, fields_num);
The older version of the library (before I started tweaking with it again...!) Is commit#: 8b6045ecf9559fa7c9e13492d69af067c86a61b5
It's a bit messy towards the end since that's where I put some code for testing, but you can see how it can be implemented that way. Just for reference!
P.S. For automation though, I've been playing around with the Python library Splinter, and it's actually easier to use than iMacros in some ways. The code is easier to write too! I've started to convert some of my old iMacros to Python because of that; there's definitely more power and flexibility. iMacros is great for simple tasks that can be easily/quickly recorded, in my opinion. For web scraping, Beautiful Soup is quite useful too. Also, I've heard about Selenium for web automation as well, though I've not tried it as of yet. Just throwing out some other options out there, just in case. I sure wish I'd known of them earlier!

Printing on Silverlight

I am trying to print a report where we have several different components within the xaml.
By what I`ve found, when printing, you have to treat every UIelement as a single one, thus if the desiredSize is bigger than the AvailableSize you have to activate the flag HasMorePages.
But here comes the problem.
My user can write as much text as he/she wants on the grid, therefore, depending on the amount, the row expands and goes off the printable area, as you can see on the picture below.
I thought about giving a whole page to the grid, but it was to big still, which got me into a loop where the DesizedSize was always bigger than the PrintableArea.
My code is not very different from any source you find on internet when searching for Multiple Page printing.
It is based on this http://eswarbandaru.blogspot.com.au/2011/02/print-mulitple-pages-using-silverlight.html , but using Stackpanels instead of Textboxes.
Any idea?
Thank you in advance.
First you need to work out how many pages are needed
Dim pagesNeeded As Integer = Math.Ceiling(gridHeight / pageHeight) 'gets number of pages needed
Then once the first page has been sent to the printer, you need to move that data out of view and bring the new data into view ready to print. I do this by converting the whole dataset into an image/UI element, i can then adjust Y value accordingly to bring the next set of required data on screen.
transformGroup.Children.Add(New TranslateTransform() With {.Y = -(pageIndex * pageHeight)})
Then once the number of needed pages is reached, tell the printer to stop
If pagesLeft <= 0 Then
e.HasMorePages = False
Exit Sub
Else
e.HasMorePages = True
End If
Or if this is too much work, you can simply just scale all the notes to fit onto screen. Again probably by converting to UI element.
Check out this link for converting to a UI element.
http://www.codeproject.com/Tips/248553/Silverlight-converting-to-image-and-printing-an-UI
Hope this helps

Paging with reverse cursors in appengine

I am trying to get forward and backwards pagination working for a query I have on my app.
I have started with the example at: https://developers.google.com/appengine/docs/python/ndb/queries#cursors
I would expect that example to do a typical forward/back pagination to create cursors that you can pass to your template in order to be used in a subsequent request for the page after/before the current one. But what it is doing is getting cursors for the same page, one from the beginning and the other from the end (if I have understood correctly).
What I want is a cursor to the beginning of the following page, and a cursor to the beginning of the previous page, to use in my UI.
I have managed to almost get that with the following code, based on the mentioned example:
curs = Cursor(urlsafe=self.request.get('cur'))
q = MyModel.query(MyModel.usett == usett_key)
q_forward = q.order(-MyModel.sugerida)
q_reverse = q.order(MyModel.sugerida)
ofus, next_curs, more = q_forward.fetch_page(num_items_page,
start_cursor=curs)
rev_cursor = curs.reversed()
ofus1, prev_curs, more1 = q_reverse.fetch_page(num_items_page,
start_cursor=rev_cursor)
context = {}
if more and next_curs:
context['next_curs'] = next_curs.urlsafe()
if more1 and prev_curs:
context['prev_curs'] = prev_curs.reversed().urlsafe()
The problem, and the point of this question, is that I use more and more1 to see if there is a next page. And that is not working in the backwards sense. For the first page, more1 is True, in the second page more1 is False, and subsequent pages give True.
I would need something that gives False for the first page and True for every other page. It seems like this more return value is the thing to use, but maybe I have a bad Query setup, or any other thing wrong.
Thanks everyone!
Edit: Since I didn't find a simple solution for this, I switched to using ndbpager.
There's no such thing.
You know thats theres (at least) one page before the current page if you started the query with a cursor (the first page usualy dosnt have a cursor).
A common trick to access the previous page is inverting the sort-order.
If you have a list, sorted by creationdate desc, you could take the creationdate of the first element of your current page, query for elements with creationdate < this creationdate using inverted sort order. This will return the oldest elements which are newer then the given creationdate. Flip the list of retrived elements (to bring them into the correct order again) and there you have the elements of the page before, without using a cursor.
Note: this requires the values of your sortorder beeing distinct.
In some cases, its also possible to use a prebuild index allowing random-access to different pages, see https://bitbucket.org/viur/server/src/98de79b91778bb9b16e520acb28e257b21091790/indexes.py for more.
I have a workaround and not the best solution. it is baiscally redirecting back to the previous page.
Previous
I think PagedQuery has the capability but still waiting for someone to post a more comprehensive tutorial about it.

Programmatically determining max fit in textbox (WP7)

I'm currently writing an eBook reader for Windows Phone Seven, and I'm trying to style it like the Kindle reader. In order to do so, I need to split my books up into pages, and this is going to get a lot more complex when variable font sizes are added.
To do this at the moment, I just add a word at a time into the textblock until it becomes higher than its container. As you can imagine though, with a document of over 120,000 words, this takes an unacceptable period of time.
Is there a way I can find out when the text would exceed the bounds (logically dividing it into pages), without having to actually render it? That way I'd be able to run it in a background thread so the user can keep reading in the meantime.
So far, the only idea that has occurred to me is to find out how the textblock decides its bounds (in the measure call?), but I have no idea how to find that code, because reflector didn't show anything.
Thanks in advance!
From what I can see the Kindle app appears to use a similar algorithm to the one you suggest. Note that:
it generally shows the % position through the book - it doesn't show total number of pages.
if you change the font size, then the first word on the page remains the same (so that's where the % comes from) - so the Kindle app just does one page worth of repagination assuming the first word of the page stays the same.
if you change the font size and then scroll back to the first page, then actually there is a discontinuity - they pull content forwards again in order to fill the first page.
Based on this, I would suggest you do not index the whole book. Instead just concentrate on the current page based on a "position" of some kind (e.g. character count - displayed as a percentage). If you have to do something on a background thread, then just look at the next page (and maybe the prev page) in order that scrolling can be more responsive.
Further to optimise your experience, there are a couple of changes you could make to your current algorithm that you could try:
try a different starting point and search increment for your algorithm - no need to start at one word and to then only add one word at a time.
assuming most of your books are ASCII, try caching the width of the common characters, and then work out the width of textblocks yourself.
Beyond that, I'd also quite like to try using <Run> blocks within your TextBlock - it may be possible to get the relative position of each Run within the TextBlock - although I've not managed to do this yet.
I do something similar to adjust font size for individual textboxes (to ensure they all fit). Basically, I create a TextBlock in code, set all my properties and check the ActualWidth and ActualHeight properties. Here is some pseudo code to help with your problem:
public static String PageText(TextBlock txtPage, String BookText)
{
TextBlock t = new TextBlock();
t.FontFamily = txtPage.FontFamily;
t.FontStyle = txtPage.FontStyle;
t.FontWeight = txtPage.FontWeight;
t.FontSize = txtPage.FontSize;
t.Text = BookText;
Size Actual = new Size();
Actual.Width = t.ActualWidth;
Actual.Height = t.ActualHeight;
if(Actual.Height <= txtPage.ActualHeight)
return BookText;
Double hRatio = txtPage.ActualHeight / Actual.Height;
return s.Substring((int)((s.Length - 1) * hRatio));
}
The above is untested code, but hopefully can get you started. Basically it sees if the text can fit in the box, if so you're good to go. If not, it finds out what percentage of the text can fit and returns it. This does not take word breaks into account, and may not be a perfect match, but should get you close.
You could alter this code to return the length rather than the actual substring and use that as your page size. Creating the textblock in code (with no display) actually performs pretty well (I do it in some table views with no noticeable lag). I wouldn't send all 120,000 words to this function, but a reasonable subset of some sort.
Once you have the ideal length you can use a RegEx to split the book into pages. There are examples on this site of RegEx that break on word boundaries after a specific length.
Another option, is to calculate page size ahead of time for each potential fontsize (and hardcode it with a switch statement). This could easily get crazy if you are allowing any font and any size combinations, and would be awful if you allowed mixed fonts/sizes, but would perform very well. Most likely you have a particular range of readable sizes, and just a few fonts. Creating a test app to calculate the text length of a page for each of these combinations wouldn't be that hard and would probably make your life easier - even if it doesn't "feel" right as a programmer :)
I didn't find any reference to this example from Microsoft called: "Principles of Pagination".
It has some interesting sample code running in Windows Phone.
http://msdn.microsoft.com/en-us/magazine/hh205757.aspx
You can also look this article about Page Transitions in Windows Phone and this other about the final touches in the E-Book project.
The code is downloadable: http://archive.msdn.microsoft.com/mag201111UIFrontiers/Release/ProjectReleases.aspx?ReleaseId=5776
You can query the FormattedText class that is used AFAIK inside textBlock. since this is the class being used to format text in preparation for Rendering, this is the most lower-level class available, and should be fast.

Why does silverlight run into an endless loop when printing document longer than 1 page? .HasMorePages = true

My 1st question here on stackoverflow.
I am trying to print a long grid, which was dynamically generated.
pdoc.PrintPage += (p, args) =>
{
args.PageVisual = myGrid;
args.HasMorePages = false;
};
When I use args.HasMorePages = false;, it prints the first page of the grid as it should (although it takes some time, since it sends a 123MB big bitmap to the poor printer - thanks for silverlight 4's print feature implementation.).
However, when I enable printing more pages withargs.HasMorePages = true;, the printing job runs amok on the memory and sends endless copies of the first printing page of the document - effectively disabling my developer machine. Even if the grid is only 2 pages long.
Why does this happen?
What is a possible workaround here? All I found on the net is that SL handles printing badly, but not a real solution.
The HasMorePages property indicates to silverlight printing that you have a least one more page to print. The PrintPage page event fires for each page to be printed.
Hence when you set HasMorePages to true you will get another PrintPage event, if you always set it true (as your code appears to be doing) you are creating an infinite loop.
At some point the code has to leave HasMorePages set to false.
Ultimately its up to you the developer to perform all the pagination logic and decide what appears on each page, Silverlight does not automagically do that for you.

Resources