How to not get contents of child elements within HtmlUnit? - screen-scraping

I have the following:
<th>
Q4/10
<br>
<span> Nov 30, 2010 </span>
</th>
and I'd like to get Q4/10 but not the date that follows. I'm not sure how to do it within HtmlUnit. I know I can split both elements by spaces and then take everything before the first space, but I'm looking for something based on the tags themselves.

If you know that the text you want comes before any sub elements, you can just grab its first child, which will contain your text and some whitespace:
HtmlTableHeaderCell th = ...
System.err.println( th.getFirstChild().toString().trim() ) ;
The more general solution would be to loop through the children of th looking for text nodes, and ignoring sub elements.

Related

angular filter is not working on table if I only input one letter

This is the way how I do filter on a table
<tr ng-repeat="row in sptable.data.data | filter: searchTable">
and the searchTable is ng-modeled to an input.
<input type="text" class="form-control" placeholder="SEARCH INVOICES" ng-model="searchTable">
sptable.data.data is an array of objects. one of the objects is
{"customerWatched":false,"invoiceID":"00c9511b-24b9-4190-a90a-8abf2fe9f4a0","amountDue":"0.00","referenceNumber":"O721-001","amount":"35.75","contact":{"watched":false,"contactName":"Net Connect","contactId":"bym568b799d81934d3","errorMessage":"","redirectURL":null}
it has more than 100 objects in the array, so cannot list them all.
Unfortunately, if I input one letter, the table does not change anything. But if I input two letters or more, then it works.
what would be the reason?
update: the whole object is too large that I only display part of object in the table. and when I input some letters, it would even search on the hidden part of the object. how to filter on the displayed items of the object only?
Angular's filter filter looks in the whole object for your searchTable value.
All strings or objects with string properties in array that match this
string will be returned. This also applies to nested object
properties.
When you put only one letter, it probably don't change the table because this letter is present in at least one attribute of each line of sptable.data.data.
If you prefer, you can filter on a specific attribute of your array (for example invoiceID):
<tr ng-repeat="row in sptable.data.data | filter: {invoiceID: searchTable}">

coldfusion loop over file not working quite right

In ColdFusion I am creating and saving a file, then later looping over characters in the file to display part of it. This is almost working, but the loop is sometimes inserting characters that are formatting rather than just the output. And sometimes it is losing the formatting. Here are the original and the version as read:
The code:
<cfset colvalue = getPageContext().getRequest().getParameterValues('#col#')>
<cfset repa = colvalue[1]>
<cfloop file="#reppath#moxrep/#repa#.cfm" index="chunk" characters="500">
<cfoutput>#chunk#</cfoutput><br>
</cfloop>
Am I doing something wrong in the code? Is there a bug in the ColdFusion loop over file? And if so, is there a workaround?
<cfloop .. characters="500">
It is because your loop uses the "characters" attribute, which limits the number of characters "..read during each iteration of the loop..". That would be fine for a text file. However, since the file content is HTML, it breaks when you try and insert the <br> at an arbitrary position. That causes part of the HTML code to be displayed instead of rendered. For example:
<div <br> style="text-align: left; ">This will not render correctly</div>
That said, it begs the question why read the content line by line instead of just displaying the whole file?
Update:
You really cannot parse HTML with basic string functions or regular expressions - not with any reliability. Encountering a new line character does not necessarily mean you have reached the end of a particular block of HTML code. It is perfectly valid for an HTML element to span multiple lines. Plus, HTML elements are frequently nested. So it is near impossible to identify the "logical" endpoints using string functions (which is basically what the cfloop is doing) alone.
Instead, I would recommend using a tool like JSOUP which is specifically designed for parsing HTML. Once you have parsed the document, it is very easy to access specific elements or sections of the HTML.

Protractor : Find Element by ID with spaces

I've got a button with the following ID
<button id="Emp Btn"....
I'm unable to access it because of the space
I've tried the following and they don't work
element(by.id("Emp Btn"));
element(by.id("Emp%20Btn"));
element(by.id("Emp%Btn"));
element(by.id('Emp Btn'));
its bad idea to use spaces in ID. HTML 5 says, that an id must contain at least one character and must not contain space characters.
But you still can find such element using XPath.
Try to use something like this:
.\\button[contains(#id,'firstPart') and contains(#id,'secondPart')]

traversing tables with selenium/webdriverjs

I want to traverse table with Selenium using Node and webdriverJS:
<table>
<tr>
<td class="name">Peter</td>
<td class="count">1</td>
</tr>
<tr>
<td class="name">John</td>
<td class="count">3</td>
</tr>
</table>
I want for every row to look at the names and the rows cells.
What I have:
driver.findElements(By.tagName('tr')).then(function(rows){
// for every row
for (var i = 0; i< rows.length; i++){
// check the name cell
rows[i].findElement(By.class('name')).getInnerHtml().then(function(name){
// do some stuff
});
// check the count cell
rows[i].findElement(By.class('count')).getInnerHtml().then(function(count){
// do some stuff
});
}
});
This works for the first some rows, but with many rows it fails at a certain point.
My theory: the findElement calls in the for-loop are passed to the manager, then the for-loop finishes. Then the garbage collector removes the rows array. Once the manager executes the the findElement calls, the array and its elements do not exist anymore and fail. The error I get is:
StaleElementReferenceException : The Element is not Attached to the DOM
It does work for the first row as the array still exists early-on in the execution.
My questions:
what am I doing wrong?
Is my theory correct?
How can I bind the row[i] references to the findElement calls for them to persist longer than the original array?
---- Edit ----
When I remove one of the inner findElement calls and only look for one cell per row, I am able to cover more rows. This made me think that, with this implementation, time plays a role. This should not be the case, so I am doing probably something wrong.
Is there anything like a forEach function in Selenium?
I found the problem:
I am using a website implemented with Sencha EXTjs.
The table is created on top of a data store: Apparently, the store is called twice and the whole table is recreated in-between the calls.
So I somehow have to wait until the table has loaded for the second time...
This will be the next challenge.

Angular JS ng-repeat consumes more browser memory

I have the following code
<table>
<thead><td>Id</td><td>Name</td><td>Ratings</td></thead>
<tbody>
<tr ng-repeat="user in users">
<td>{{user.id}}</td>
<td>{{user.name}}</td>
<td><div ng-repeat="item in items">{{item.rating}}</div></td>
</tr>
</tbody>
</table>
users is an array of user objects with only id and name. number of user objects in array - 150
items is an array of item objects with only id and rating. number of item objects in array - 150
When i render this in browser, it takes about 250MB of heap memory when i tried profiling in my chrome - v23.0.1271.95.
I am using AngularJS v1.0.3.
Is there an issue with angular or am i doing anything wrong here?
Here is the JS fiddle
http://jsfiddle.net/JSWorld/WqSGR/5/
Well it's not the ng-repeat per se. I think it's the fact that you are adding bindings with the {{item.rating}}.
All those bindings register watches on the scope so:
150 * 2 = 300(for the 2 user infos)
150 * 150 = 22500(for the rating info)
Total of 22800 watch functions + 22800 dom elements.
That would push the memory to a conceivable value of 250MB
From Databinding in angularjs
You can't really show more than about 2000 pieces of information to a
human on a single page. Anything more than that is really bad UI, and
humans can't process this anyway.
I want to say the leak is in the second array because you are potentially looping through the same array and displaying every item for every user row in users so depending on how large your test data is that view could get rather large. I could do a little more investigating. btw your fiddle is something entirely different.
Right now you are looping through 150 X 150 = 22500 items. And registering a watch (or through a directive just adding item rating) to each one.
Instead - consider adding the user's rating to the user object itself. It will increase the size of each user object but you will only loop through 150 items and register watches only on them.
Also - consider looking into Indexes. It's apparent that there could be similar users or item ratings. Just index them, so instead of looping through heavy objects, you can reduce them.
One more thing - if you are going to be running the directive the same instance, at least change the code:
var text = myTemplate.replace("{{rating}}",myItem.rating);
to a concat style string calculation:
var text = '<div>' + myItem.rating + '</div>';
This will save you a HUGE chunk on calculation. I've made a JSperf for this case, notice the difference, it's about 99% faster ;-)

Resources