D3.js unknown number of columns and rows - arrays

I m currently creating a chart(data from an external csv file) but I dont know beforehand the number of columns and rows. Could you maybe point me in the right direction as to where I could find some help(or some examples) with this issue?
Thank you

d3.csv can help you here:
d3.csv('myCSVFile.csv', function(data){
//the 'data' argument will be an array of objects, one object for each row so...
var numberOfRows = data.length, // we can easily get the number of rows (excluding the title row)
columns = Object.keys( data[0] ), // then taking the first row object and getting an array of the keys
numberOfCOlumns = columns.length; // allows us to get the number of columns
});
Note that this method assumes that the first row (and only the first row) of your spreadsheet is column titles.

In addition to Tom P's advice, it's worth noting that version 4 of D3 introduced a columns property, which you can use to create an array of column headers (i.e. the dataset's 'keys').
This is useful because (a) it's simpler code and (b) the headers in the array are in the same order that they appear in the dataset.
So, for the above dataset:
headers = data.columns
... creates the same array as:
headers = Object.keys(data[0])
... but the array of column names is in a predictable order.

Related

Apps Script - compare two arrays, find differences (value and index of cells)

I have two vertical 1D arrays, one with existing data, and another that could (potentially) have differences.
I want to be able to compare the two arrays, find the differences AND the index of each cell that is different.
Then, I would use that information to find the appropriate cells in the existing data that need to be changed, and set the new values in those specific cells, without having to copy and paste the entire array.
For example:
Existing Data
New Data
John Smith
John Smith
012345
012345
6th grade
7th grade
555-1234
555-1357
Trumpet
Trumpet
5th period
2nd period
Jane Smith
Jane Smith
js#email.com
js#email.com
In this case, the code would see that rows 3,4, and 6 have differences, save those new values and their places in the array, then update the appropriate values in the main data list without changing anything else.
I've tried multiple ways to compare the two arrays AND get the index of the rows that have differences and this is as far as I've gotten without an error or a 'null' result:
function updateInfo() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var currentSheet = ss.getSheetByName("CURRENT");
var infoSheet = ss.getSheetByName("INFO Search");
var origVal = infoSheet.getRange(5,2,52,1).getValues();
var newVal = infoSheet.getRange(5,4,52,1).getValues();
var list = [];
var origData = origVal.map(function(row,index){
return [row[0],index];
});
Logger.log(origData);
var newData = newVal.map(function(row,index){
return [row[0],index];
});
Logger.log(newData);
This just gives me the value and index of each cell.
Is there a fast and efficient way to compare the two arrays, get the data I need, and change the values in just certain cells of the original column?
I can't just copy and paste the whole column over because there are formulas embedded in various rows that need to remain intact.
Thanks for any help you can provide!
Solution:
Iterate through the new data array. For each value, check whether the new value matches the old one. If that's the case, add the corresponding value and index to your list.
Then iterate through your list, and for each pair of value and index, write the value to the corresponding cell.
Code snippet:
newVal.forEach(function(row, index) {
if (row[0] === origVal[index][0]) {
list.push([row[0], index]);
}
});
var firstRow = 2; // Change according to your preferences
var columnIndex = 1; // Change according to your preferences
list.forEach(data => {
sheet.getRange(data[1] + firstRow, columnIndex).setValue(data[0]);
});
Notes:
I don't know where should the data be written, so I'm using undefined sheet, and also random values for firstRow and columnIndex. Please change these according to your preferences.
It would be much more efficient, from a script perspective, to write the whole column to your destination range at once, since that would minimize the number of interactions between the script and the spreadsheet (see Use batch operations). Since you want to write only the updated data, though, I provided a script that does this.

How concatenate 2 column that contains multiple rows of data

I work with Talend 6.3. I want to concatenate 2 columns in tmap. But in specifics rows, there are multiple data and i want match them the first with the first of the row in the other column.
Example :
2 columns : Name and Surname
In Name, I have : Kevin,Zoe,Alan
In Surname, I have : Monta,Rey,Zom
I want an other row that concatenates Kevin with Monta, Zoe with Rey, Alan with Zom.
How do that with talend? Because in tmap if I concatenate classic, I will have only one success concatenation.
I don't know if I explained that correctly but tell me if someone need more information.
Thanks in advance
The job :
Add other data -> Login is the ID
So we have a flow with an ID and 2 columns (Name and Surname), each contain n elements.
n can vary from row to row.
The goal is to have a final flow with the concatenation of all Name + Surname with the ID intact.
It's not a super Talend-y way, but you can use tFlowToIterate to access each row individually and do the pairing of Name + Surname.
After, we access the resulting list and use tNormalize to split it :
Code for the tJava component :
List<String> nameList = Arrays.asList(((String)globalMap.get("row5.Name")).split("\\s*,\\s*"));
List<String> surnameList = Arrays.asList(((String)globalMap.get("row5.Surname")).split("\\s*,\\s*"));
for (int index = 0; index < nameList.size(); index++) {
((ArrayList<String>) context.concat).add(((Integer)globalMap.get("row5.id")) + ";" + nameList.get(index) + ";" + surnameList.get(index));
}
Code for the tFixedFlowInput "Use context (list)" :
StringHandling.EREPLACE(StringHandling.EREPLACE(context.concat.toString(),"\\[",""),"\\]","")
Result :
Sorry like that :
The concatenation work when i have only one data in the column. Only Kevin and only Monta in cels.
Maybe it's cause of my tDernormalize in the schema.
That the code to set the global var with the list
That the code to get my list
And i have this result (the first have organisation empty, the first and the second have to have santeffi, don't look the first column, the second is the good one)
that data before javaRow (code and libelle are in different column)

Add a column with value containing in certain column's value

Really hope to get some help as I already f-d my brain out in trying to achieve it.
I have a DataFrame:
PagePath Source
0 /product/123/sometext (Other)
1 /product/234?someutminfo (Other)
2 /product/112?whatever (Other)
A aslo have another dataframe with short product paths:
Path Other stuff
0 /product/123 Foo
1 /product/234 Bar
2 /product/345 Buzz
3 /product/456 Lol
What I need is to create a new column in first df that will match the second df so that it will contain short Paths if there are ones.
So far I managed to do the following:
1) Created a series from the second df by subsetting it
2) Sort of iterated through the first df with list from the second
df1['newcol'] = df1['PagePath'].str.contains('|'.join(list_from_df2))
Which gave me a column with True/False based on whether match was found.
I understand that what I need to do is to iterate through each row from first df, iterate through each value of list and return it when the match is found.
But if only could I write an appropriate code for it. I really hope for your help.
Solved the problem myself:
First we define a function:
def return_match(row):
try:
return re.search(r'/product/.+-\d+/', row).group(0)
except:
return 'Not a product'
Then we apply a funcgion over a necessary column:
df['newcol'] = df['PagePath'].apply(return_match)

SparkR - extracting dataframe's array<int> for an R function

I have 1000s of sensors, I need to partition the data (i.e. per sensor per day) then submit each list of data points to an R algorithm). Using Spark, simplified sample looks like:
//Spark
val rddData = List(
("1:3", List(1,1,456,1,1,2,480,0,1,3,425,0)),
("1:4", List(1,4,437,1,1,5,490,0)),
("1:6", List(1,6,500,0,1,7,515,1,1,8,517,0,1,9,522,0,1,10,525,0)),
("1:11", List(1,11,610,1))
)
case class DataPoint(
key: String,
value: List[Int]) // 4 value pattern, sensorID:seq#, seq#, value, state
I convert to a parquet file, save it.
Load the parquet in SparkR, no problem, the schema says:
#SparkR
df <- read.df(sqlContext, filespec, "parquet")
schema(df)
StructType
|-name = "key", type = "StringType", nullable = TRUE
|-name = "value", type = "ArrayType(IntegerType,true)", nullable = TRUE
So in SparkR, I have a dataframe where each record has all of the data I want (df$value). I want to extract that array into something R can consume then mutate my original dataframe(df) with a new column holding the resultant array. Logically something like results = function(df$value). Then I need to get results (for all rows) back into a SparkR dataframe for output.
How to I extract an array from the SparkR dataframe then mutate with the results?
Let spark data frame be, df and R data frame be df_r
To convert sparkR df to R df, use code
df_r <- collect(df)
with R data frame df_r, you can do all computations you want to do in R.
let say you have the result in column df_r$result
Then for converting back to SparkR data frame use code,
#this is a new SparkR data frame, df_1
df_1 <- createDataFrame(sqlContext, df_r)
For adding the result back to SparkR data frame `df` use code
#this adds the df_1$result to a new column df$result
#note that number of rows should be same in df and `df_1`, if not use `join` operation
df$result <- df_1$result
Hope this solves your problem
I had this problem too. The way I got around it was by adding a row index into the spark DataFrame and then using explode inside a select statement. Make sure to select the index and then the row you want in your select statement. That will get you a "long" dataframe. If each of the nested lists in the DataFrame column has the same amount of information in it (for example if you are exploding a list-column of x,y coordinates), you would expect each row index in the long DataFrame to occur twice.
After doing the above, I typically do a groupBy(index) on the exploded DataFrame, filter where the n() of each index is not equal to the expected number of items in the list and proceed with additional groupBy, merge, join, filter, etc. operations on the Spark DataFrame.
There are some excellent guides on the Urban Institute's GitHub page. Good luck. -nate

LINQ: How to exclude data columns from DataTable.ItemArray query?

Given a collection of data rows, IEnumerable of type DataRow, how can I query the collection so as to compute the Sum all the values in the collection (i.e. all the values of all the columns in all the rows) with the exception of two specific columns? I know only two columns prior to run time and those are the two I want to exclude. All the others I want included in the sum.
There is a extension method which will give me a specific field, e.g. Field<T>("Foo") but what I really need is the ability to say sum all the fields except field X, Y and Z.
You can use more LINQ:
DataColumn[] unwantedColumns = { table.Column1, ... };
var sum = table.Columns.Cast<DataColumn>()
.Except(unwantedColumns)
.Select(c => row.Field<int>(c))
.Sum();
I don't think you can modify the columns a DataRow has. You can modify the columns of a DataTable though.

Resources