Join importxml & importhtml by braces

Join importxml & importhtml by braces - arrays

This formula is in A1:
=
{
{"LINK DA FOTO","LINK DO PERFIL"}
;
{"LINK DA FOTO","LINK DO PERFIL"}
;
ArrayFormula(SEERRO(PROCH(1,{1;IMPORTXML('Time Casa'!B12,"//table[#class='table squad sortable']//td[#class='photo']/a/img/#src | //table[#class='table squad sortable']//td[#class='name large-link']/a/#href")},(LIN($A$1:$A$52)+1)*2-TRANSPOR(sort(LIN($A$1:$A$2)+0,1,0)))))
}
This formula is in C1:
=
{TRANSPOR(IMPORTXML(
'Time Casa'!B12,"
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th/img/#title |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[6]/span |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[5] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[3] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[4] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[1] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[2]
"))
;
IMPORTHTML('Time Casa'!B12,"table","1")
}
Result:
By uniting like this in Sheet2, it works perfectly, the result is exactly that of the image above.
=
{
Sheet3!A:B
,
Sheet3!C:S
}
But when joining via the same formula I did below, it gives error and says ↓
Function ARRAY_ROW parameter 2 has mismatched line length. Expected 54
and have: 39.
=
{
{
{"LINK DA FOTO","LINK DO PERFIL"}
;
{"LINK DA FOTO","LINK DO PERFIL"}
;
ArrayFormula(SEERRO(PROCH(1,{1;IMPORTXML('Time Casa'!B12,"//table[#class='table squad sortable']//td[#class='photo']/a/img/#src | //table[#class='table squad sortable']//td[#class='name large-link']/a/#href")},(LIN($A$1:$A$52)+1)*2-TRANSPOR(sort(LIN($A$1:$A$2)+0,1,0)))))
}
,
{TRANSPOR(IMPORTXML(
'Time Casa'!B12,"
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th/img/#title |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[6]/span |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[5] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[3] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[4] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[1] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[2]
"))
;
IMPORTHTML('Time Casa'!B12,"table","1")
}
}
I would like to know what I need to adjust so it can work, I tried to use =FILTER(X,X<>"") but the same error continued.
Link to Spreadsheet:
https://docs.google.com/spreadsheets/d/1DNhl5hf5ofST84nawfBF6kuzhn83UMldh1lS20VTJpA/edit?usp=sharing

try:
={{{"LINK DA FOTO", "LINK DO PERFIL"};
{"LINK DA FOTO", "LINK DO PERFIL"};
ARRAYFORMULA(QUERY(IFERROR(HLOOKUP(1, {1; IMPORTXML('Time Casa'!B12,
"//table[#class='table squad sortable']//td[#class='photo']/a/img/#src |
//table[#class='table squad sortable']//td[#class='name large-link']/a/#href")},
(ROW($A$1:$A$52)+1)*2-TRANSPOSE(SORT(ROW($A$1:$A$2)+0, 1, 0)))),
"where Col1 is not null"))},
{TRANSPOSE(IMPORTXML('Time Casa'!B12,
"//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th/img/#title |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[6]/span |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[5] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[3] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[4] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[1] |
//*[#id='page_team_1_block_team_squad_3-table']/thead/tr/th[2]"));
IMPORTHTML('Time Casa'!B12, "table", "1")}}

Related

get index of all True elements in array column in pyspark

What I have:
country | sources | infer_from_source
---------------------------------------------------------------------------
null | ["LUX", "CZE","CHN", "FRA"] | ["FALSE", "TRUE", "FALSE", "TRUE"]
"DEU" | ["DEU"] | ["FALSE"]
What I want after a function:
country | sources | infer_from_source | inferred_country
------------------------------------------------------------------------------------------------
null | ["LUX", "CZE", "CHN", "FRA"] | ["FALSE", "TRUE", "FALSE", "TRUE"] | ["CZE", "FRA"]
"DEU" | ["DEU"] | ["FALSE"] | "DEU"
I need to create a function that
if country column is null, extracts the countries from the sources array based on the boolean values in the infer_from_source column array, otherwise it should give back the country value.
I created this function
from pyspark.sql.types import BooleanType, IntegerType, StringType, FloatType, ArrayType
import pyspark.sql.functions as F
#udf
def determine_entity_country(country: StringType, sources: ArrayType,
infer_from_source: ArrayType) -> ArrayType:
if country:
return country_value
else:
if "TRUE" in infer_from_source:
idx = infer_from_source.index("TRUE")
return sources[idx]
return None
But this yields - basically the .index("TRUE") method returns the index of the first element that matches its argument only.
country | sources | infer_from_source | inferred_country
--------------------------------------------------------------------
null | ["LUX", "CZE", | ["FALSE", "TRUE", |
| "CHN", "FRA"] | "FALSE", "TRUE"] | "CZE"
"DEU" | ["DEU"] | ["FALSE"] | "DEU"

You should avoid using UDFs whenever you could achieve the same only with Spark builtin functions especially when it comes to Pyspark UDFs.
Here's another way using higher order functions transform + filter on arrays:
import pyspark.sql.functions as F
df1 = df.withColumn(
"inferred_country",
F.when(
F.col("country").isNotNull(),
F.array(F.col("country"))
).otherwise(
F.expr("""filter(
transform(sources, (x, i) -> IF(boolean(infer_from_source[i]), x, null)),
x -> x is not null
)""")
)
)
df1.show()
#+-------+--------------------+--------------------+----------------+
#|country| sources| infer_from_source|inferred_country|
#+-------+--------------------+--------------------+----------------+
#| null|[LUX, CZE, CHN, FRA]|[FALSE, TRUE, FAL...| [CZE, FRA]|
#| DEU| [DEU]| [FALSE]| [DEU]|
#+-------+--------------------+--------------------+----------------+
And starting from Spark 3+, you can use index in filter lambda function :
df1 = df.withColumn(
"inferred_country",
F.when(
F.col("country").isNotNull(),
F.array(F.col("country"))
).otherwise(
F.expr("filter(sources, (x, i) -> boolean(infer_from_source[i]))")
)
)

Fixed it! Was simply a list comprehension matter
#udf
def determine_entity_country(country: StringType, sources: ArrayType,
infer_from_source: ArrayType) -> ArrayType:
if country:
return country_value
else:
if "TRUE" in infer_from_source:
max_ix = len(infer_from_source)
true_index_array = [x for x in range(0, max_ix) if infer_from_source[x] == "TRUE"]
return [sources[ix] for ix in true_index_array]
return None

Calculate total number of orders in different statuses

I want to create simple dashboard where I want to show the number of orders in different statuses. The statuses can be New/Cancelled/Finished/etc
Where should I implement these criteria? If I add filter in the Cube Browser then it applies for the whole dashboard. Should I do that in KPI? Or should I add calculated column with 1/0 values?
My expected output is something like:
--------------------------------------
| Total | New | Finished | Cancelled |
--------------------------------------
| 1000 | 100 | 800 | 100 |
--------------------------------------

I'd use measures for that, something like:
CountTotal = COUNT('Orders'[OrderID])
CountNew = CALCULATE(COUNT('Orders'[OrderID]), 'Orders'[Status] = "New")
CountFinished = CALCULATE(COUNT('Orders'[OrderID]), 'Orders'[Status] = "Finished")
CountCancelled = CALCULATE(COUNT('Orders'[OrderID]), 'Orders'[Status] = "Cancelled")

Count unique rows of column in Array of object

Using
Ruby 1.9.3-p194
Rails 3.2.8
Here's what I need.
Count the different human resources (human_resource_id) and divide this by the total number of assignments (assignment_id).
So, the answer for the dummy-data as given below should be:
1.5 assignments per human resource
But I just don't know where to go anymore.
Here's what I tried:
Table name: Assignments
id | human_resource_id | assignment_id | assignment_start_date | assignment_expected_end_date
80101780 | 20200132 | 80101780 | 2012-10-25 | 2012-10-31
80101300 | 20200132 | 80101300 | 2012-07-07 | 2012-07-31
80101308 | 21100066 | 80101308 | 2012-07-09 | 2012-07-17
At first I need to make a selection for the period I need to 'look' at. This is always from max a year ago.
a = Assignment.find(:all, :conditions => { :assignment_expected_end_date => (DateTime.now - 1.year)..DateTimenow })
=> [
#<Assignment id: 80101780, human_resource_id: "20200132", assignment_id: "80101780", assignment_start_date: "2012-10-25", assignment_expected_end_date: "2012-10-31">,
#<Assignment id: 80101300, human_resource_id: "20200132", assignment_id: "80101300", assignment_start_date: "2012-07-07", assignment_expected_end_date: "2012-07-31">,
#<Assignment id: 80101308, human_resource_id: "21100066", assignment_id: "80101308", assignment_start_date: "2012-07-09", assignment_expected_end_date: "2012-07-17">
]
foo = a.group_by(&:human_resource_id)
Now I got a beautiful 'Array of hash of object' and I just don't know what to do next.
Can someone help me?

You can try to execute the request in SQL :
ActiveRecord::Base.connection.select_value('SELECT count(distinct human_resource_id) / count(distinct assignment_id) AS ratio FROM assignments');

You could do something like
human_resource_count = assignments.collect{|a| a.human_resource_id}.uniq.count
assignment_count = assignments.collect{|a| a.assignment_id}.uniq.count
result = human_resource_count/assignment_count

Subset with loop over an array of strings

I have my code like this:
for (i in 1:b) {
carteraR[[i]]=subset(carteraR[[i]],RUN.FONDO=="8026" | RUN.FONDO=="8036" | RUN.FONDO=="8048" | RUN.FONDO=="8057" | RUN.FONDO=="8059" | RUN.FONDO=="8072" | RUN.FONDO=="8094" |
RUN.FONDO=="8107" | RUN.FONDO=="8110" | RUN.FONDO=="8115" | RUN.FONDO=="8130" | RUN.FONDO=="8230" | RUN.FONDO=="8248" | RUN.FONDO=="8257" | RUN.FONDO=="8319")
}
Where b=length(carteraR), and class(carteraR[[i]])=data.frame. RUN.FONDO is one of the head of these data frames. This code is working fine but I want to save some lines.
What I want is something like:
for (i in 1:b) {
for (j in 1:length(A)){
carteraR[[i]]=subset(carteraR[[i]],RUN.FONDO==A[j])
}
}
And where A= "8026" "8036" "8048" "8057" ... "8319" ....... etc......
What should the code be like ?
Thx

Like this:
carteraR <- lapply(carteraR, subset, RUN.FONDO %in% A)
Just be aware there can be risks with using subset in a programmatic way: Why is `[` better than `subset`?. This usage is fine though.

retrieving multiple rows from database and display using struts2

I am developing a photo album web application in which I am using two tables USER_INFORMATION and USER_PHOTO. USER_INFORMATION contains only user record which is one record, USER_PHOTO contains more than one photo form a single user. I want to get the number of user information along with their userimage path and store it in to a Java Pojo variable like a list so that I can display it using display tag in struts 2.
The table goes like this.
USER_PHOTO USER_INFORMATION
======================= =================================
| IMAGEPATH | USER_ID | | USER_NAME | AGE | ADDRESS |ID |
======================= =================================
|xyz | 1 | | abs | 34 | sdas | 1 |
|sdas | 1 | | asddd | 22 | asda | 2 |
|qwewq | 2 | | sadl | 121 | asd | 3 |
| asaa | 1 | ==================================
| 121 | 3 |
=======================
i some what manage to get the user information correctly by using HashSet and display tag the code goes like this...
public ArrayList loadData() throws ClassNotFoundException, SQLException {
ArrayList<UserData> userList = new ArrayList<UserData>();
try {
String name;
String fatherName;
int Id;
int age;
String address;
String query = "SELECT NAME,FATHERNAME,AGE,ADDRESS,ID FROM USER_INFORMATION ,USER_PHOTO WHERE ID=USER_ID";
ps = con.prepareStatement(query);
ResultSet rs = ps.executeQuery();
while (rs.next()) {
name = rs.getString(1);
fatherName = rs.getString(2);
age = rs.getInt(3);
address = rs.getString(4);
//Id = rs.getInt(5);
UserData list = new UserData();
list.setName(name);
list.setFatherName(fatherName);
list.setAge(age);
list.setAddress(address);
//list.setPhot(Id);
userList.add(list);
}
ps.close();
con.close();
} catch (Exception e) {
e.printStackTrace();
}
ArrayList al = new ArrayList();
HashSet hs = new HashSet();
hs.addAll(userList);
al.clear();
al.addAll(hs);
//return (userList);
return al;
and my display tag goes like this in jsp...
<display:table id="data" name="sessionScope.UserForm.userList" requestURI="/userAction.do" pagesize="15" >
<display:column property="name" title="NAME" sortable="true" />
<display:column property="fatherName" title="User Name" sortable="true" />
<display:column property="age" title="AGE" sortable="true" />
<display:column property="address" title="ADDRESS" sortable="true" />
</display:table>
as u can see i am displaying only user information not his photo uploaded i want to display his uploaded filename also. i some what created another table to store his photo information. if i include Imagepath attribute in my sql query than their will be duplicate rows means if user have uploaded 5 photos than display tag will display five records with same information with different ImagePath.
so any can tell me hoe make it to display only one record of user information with his multiple ImagePath so that it not display user information repeatedly.
Thanks In Advance

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Join importxml & importhtml by braces - arrays

Related

get index of all True elements in array column in pyspark

Calculate total number of orders in different statuses

Count unique rows of column in Array of object

Subset with loop over an array of strings

retrieving multiple rows from database and display using struts2

Categories

Resources