How to split a Elements in scala into a Array of Strings - arrays

I have some troubles into slitting an Element (scraped from the web) into an Array of Strings.
Here is my code :
link = "http://www.myurl.com"
val doc: Document = Jsoup.connect(link).get()
val title2 = doc.select("li > h3 > a").toString
that give me :
1ERE COMPAGNIE D'ARC DU DAUPHINÉ
38SMS
40 BATTEURS
4L FOUR LIBERTY
A BORD PERDU
what i want is to have only the href in a Array of Strings. Only take the strings in the " ".
I've try to use JavaConverters like asScala, but i'm falling working with it :/
Thanks

simply extract href attributes from the a you get like:
doc.select("li > h3 > a").map(link -> link.attr("href")).toArray
take a look the more attribute extracting features from Jsoup

Related

Selenium java code: How to read text from hidden element that contain many tags

1.<svg blablabla style:overflow: hidden > <aaa id=aaa>
2.+
3.<g>
4.<g>
5.<g>
6.<g>
The tags are inside the svg.
1 is the hidden tag, i want to get the 5th line text(), this is what i do.
WebElement hiddenDiv = driver.findElement(By.id("aaa"));
String n = hiddenDiv.getText();
String script = "return arguments[0].innerHTML";
n = (String) ((JavascriptExecutor) driver).executeScript(script, hiddenDiv);
System.out.println(n);
how can i get g[2], i have tried by direct xpath and it got an error because the svg is hidden.
You just need to grab an array of the g tags and pick the one you want. If you want the 5th, you would use the code below.
List<WebElement> gs = driver.findElements(By.cssSelector("#aaa g"));
System.out.println(gs.get(4).getAttribute("innerHTML"));

Merging two arrays of Capybara elements

This is similar to something I posted yesterday but i got mixed up with what I actually had in front of me. I have two arrays that need merging into the same index value but I have capybara elements as opposed to strings and integers.
Example
#returned_names = page.all('#results > table.result > tbody > tr.data:first-of-type > td')
#returned_yobs = page.all('#results > table.result > tbody > tr.data:nth-child(2) > td')
# Returns
#returned_names = [#<Capybara::Element tag="td">, #<Capybara::Element tag="td">, #<Capybara::Element tag="td">]
#returned_yobs = [#<Capybara::Element tag="td">, #<Capybara::Element tag="td">, #<Capybara::Element tag="td">]
So based on yesterday's answer to merge these together, matching index values I should do
#collection = #returned_names.zip(#returned_yobs).map { |r| r.join(' ') }
# Returns
["#<Capybara::Node::Element:0x000000038c50e8> #<Capybara::Node::Element:0x000000036fadf8>",
"#<Capybara::Node::Element:0x000000038c50c0> #<Capybara::Node::Element:0x000000036fadd0>",
"#<Capybara::Node::Element:0x000000038c5020> #<Capybara::Node::Element:0x000000036fada8>"]
Which so far looks like it's doing the right thing. I need to then convert this to an array of its text values, but when I do
#collection.map { |t| t.text }
I get an error
undefined method `text' for #<String:0x00000001938310>
I'm guessing I can't map from here as I don't have an enumerable object at this stage?
Is there a way to get #collection back to an enumerable object so that I can then map the text values ?
Array#join converts each object (i.e. node) to a string. This should work:
#returned_names.zip(#returned_yobs).map { |name, yob| "#{name.text} #{yob.text}" }
how about this:
#collection.each { |capybara_element| }.map(&:text)
or more verbose but would do the same thing:
#text_collection = []
#collection.each do |element|
#text_collection.push(element.text)
end

Scala: from Array[String] to Array[Array[String]]

I have an Array[String] in scala like this
my_array: Array[String] = Array(RED;BLUE, RED;PINK, RED;ORANGE, RED;WHITE, RED;YELLOW,
RED;GREY,GREEN;BLUE, GREEN;PINK, GREEN;BROWN, GREEN;ORANGE, GREEN;WHITE, GREEN;YELLOW, GREEN;GREY)
and I need to get this result
my_new_array: Array[Array[String]] = Array(Array(RED;BLUE, RED;PINK, RED;ORANGE, RED;WHITE,RED;YELLOW, RED;GREY),
Array(GREEN;BLUE, GREEN;PINK, GREEN;BROWN, GREEN;ORANGE, GREEN;WHITE, GREEN;YELLOW, GREEN;GREY),
Array(RED;BLUE, GREEN;BLUE), Array(RED;PINK, GREEN;PINK),
Array(RED;ORANGE, GREEN;ORANGE), Array(RED;WHITE, GREEN;WHITE),
Array(RED;YELLOW, GREEN;YELLOW), Array(RED;GREY, GREEN;GREY))
These should be te steps
get a list of unique colors. this means I have to split by ";" each string
once I have this list I have to create a new Array contained the original strings grouped by each single color
Does anyone have an hint?
Provided I've understood your question correctly, this should work (probably not the most efficient solution ever)
myArray
.flatMap(_.split(';')) // get all the colors
.distinct // get the unique set of colors
.map(color => myArray.filter(_.contains(color))) // map each color to each group containing it
I'm using contains assuming that for "YELLOW" you want to match both "YELLOW";"RED" and "RED";"YELLOW".
In case you want to match only the former, you can use startsWith intead.

Scala Converting Each Array Element to String and Splitting

I have an array loaded in, and been playing around in the REPL but can't seem to get this to work.
My array looks like this:
record_id|string|FALSE|1|
offer_id|decimal|FALSE|1|1,1
decision_id|decimal|FALSE|1|1,1
offer_type_cd|integer|FALSE|1|1,1
promo_id|decimal|FALSE|1|1,1
pymt_method_type_cd|decimal|FALSE|1|1,1
cs_result_id|decimal|FALSE|1|1,1
cs_result_usage_type_cd|decimal|FALSE|1|1,1
rate_index_type_cd|decimal|FALSE|1|1,1
sub_product_id|decimal|FALSE|1|1,1
campaign_id|decimal|FALSE|1|1,1
When I run my command:
for(i <- 0 until schema.length){
val convert = schema(i).toString;
convert.split('|').drop(2);
println(convert);
}
It won't drop anything. It also is not splitting it on the |
Strings are immutable, and so split and drop don't mutate the string - they return a new one.
You need to capture the result in a new val
val split = convert.split('|').drop(2);
println(split.mkString(" "));
Consider also defining a lambda function for mapping each item in the array, where intermediate results are passed on with the function,
val res = schema.map(s => s.toString.split('|').drop(2))

google visualization data.addRow - get data out of html data attribute

I have data inside a data attribute, like so:
<div class="dashboard-module" data-rows="new Date(2013,10,04),12,"OR"##new Date(2013,10,17),2,"OR"##new Date(2013,10,09),2,"CA""></div>
Im trying to split this string up and use it in the data.addRow function:
rows = el.data('rows');
rowsarray = rows.split('##');
// Error: Row given with size different than 3 (the number of columns in the table).
$.each(rowsarray, function(index, value) {
data.addRow( [value] );
});
// the following works
data.addRow([new Date(2013,10,04),12,"OR"]);
data.addRow([new Date(2013,10,09),2,"CA"]);
data.addRow([new Date(2013,12,12),14,"AL"]);
I guess the commas inside the new date are being counted as different parts of the array?
I'm assuming that the double-quotes inside your data-rows attribute are escaped (otherwise the HTML is malforned).
When you call rowsarray = rows.split('##');, you are getting an array of strings, like this:
[
'new Date(2013,10,04),12,"OR"',
'new Date(2013,10,17),2,"OR"',
'new Date(2013,10,09),2,"CA"'
]
not an array of arrays. If you want to store your data in an HTML attribute, your best bet is to use a JSON-compatible format. The problem then becomes storing dates, since Date objects are not JSON-compatible, but that is easy to work around. Store your data like this instead:
[["Date(2013,10,04)",12,"OR"],["Date(2013,10,17)",2,"OR"],["Date(2013,10,09)",2,"CA"]]
I did two things with the data-rows attribute: first, I changed the dates from a format like new Date(2013,10,17) to a string like "Date(2013,10,17)". Second, I converted the string to a JSON string representation of an array of arrays (which uses the standard javascript array brackets [ and ]). Note that JSON requires the use of double-quotes for all internal strings, so you must either escape all internal strings to use with the data-rows attribute, or use single-quotes around the data-rows attribute string (eg: data-rows='<string>').
You can then parse that string for entry into your DataTable:
rows = JSON.parse(el.data('rows'));
// convert date strings to Date objects
for (var i = 0; i < rows.length; i++) {
var dateStr = rows[i][0];
var dateArr = dateStr.substring(5, dateStr.length - 1).split(',');
rows[i][0] = new Date(dateArr[0], dateArr[1], dateArr[2]);
}
data.addRows(rows);

Resources