Related
In the code below cohort_counts_4 is a dataframe that has 3 columns g,samplingRate and samplingRate1. In the rowDF variable I am collecting the columns samplingRate and samplingRate1 (which are percentages). And in the
percentages variable I am converting it to Array[Double].
When I am trying to run this, I am getting the error below during runtime in the percentages. I need it to be Array[Double] as I have to randomSplit in the next step.
java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to java.lang.Double.
Please let me know your thoughts.
sample data of percentages -
percentages: Array[Seq[Double]] =
Array(WrappedArray(0.06449504858964898, 0.9355049514103511)
, WrappedArray(0.015861918725594032, 0.9841380812744059)
, WrappedArray(0.22082241578907924, 0.7791775842109208)
, WrappedArray(0.14416119376185044, 0.8558388062381496)
, WrappedArray(0.10958692395592619, 0.8904130760440738)
, WrappedArray(1.0, 0.0)
, WrappedArray(0.6531128743810083, 0.3468871256189917)
, WrappedArray(0.04880653326943304, 0.9511934667305669))
val cohortList = cohort_counts_4.select("g").collect().map(_(0)).toList
var cohort_list = new ListBuffer[org.apache.spark.sql.DataFrame]()
var total_rows: Int = 0
for (igroupid<-cohortList){
val sample_rate = cohort_counts_4.filter(col("g")===igroupid).select("samplingRate","samplingRate1")
cohort_list += sample_rate
val curr_rows = sample_rate.count().toInt
total_rows += curr_rows
}
val customers_new = cohort_list.reduce(_ union _)
val rowDF = customers_new.select(array(customers_new.columns.map(col):_*) as "row")
var percentages =Array(rowDF.collect.map(_(0)).asInstanceOf[Double])
// var percentages = rowDF.collect.map(_.getSeq[Double](0))
val uni = customers_2.select("x","g").distinct
.randomSplit(percentages)
I changed the code from
var percentages =Array(rowDF.collect.map(_(0)).asInstanceOf[Double])
to below
var percentages =rowDF.collect.map(_(0).asInstanceOf[Seq[Double]]).flatten
and it worked.
When creating an ALS model, we can extract a userFactors DataFrame and an itemFactors DataFrame. These DataFrames contain a column with an Array.
I would like to generate some random data and union it to the userFactors DataFrame.
Here is my code:
val df1: DataFrame = Seq((123, 456, 4.0), (123, 789, 5.0), (234, 456, 4.5), (234, 789, 1.0)).toDF("user", "item", "rating")
val model1 = (new ALS()
.setImplicitPrefs(true)
.fit(df1))
val iF = model1.itemFactors
val uF = model1.userFactors
I then create a random DataFrame using a VectorAssembler with this function:
def makeNew(df: DataFrame, rank: Int): DataFrame = {
var df_dummy = df
var i: Int = 0
var inputCols: Array[String] = Array()
for (i <- 0 to rank) {
df_dummy = df_dummy.withColumn("feature".concat(i.toString), rand())
inputCols = inputCols :+ "feature".concat(i.toString)
}
val assembler = new VectorAssembler()
.setInputCols(inputCols)
.setOutputCol("userFeatures")
val output = assembler.transform(df_dummy)
output.select("user", "userFeatures")
}
I then create the DataFrame with new user IDs and add the random vectors and bias:
val usersDf: DataFrame = Seq(567), (678)).toDF("user")
var usersFactorsNew: DataFrame = makeNew(usersDf, 20)
The problem arises when I union the two DataFrames.
usersFactorsNew.union(uF) produces the error:
org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. struct<type:tinyint,size:int,indices:array<int>,values:array<double>> <> array<float> at the second column of the second table;;
If I print the schema, the uF DataFrame has a feature vector of type Array[Float] and the usersFactorsNew DataFrame as a feature vector of type Vector.
My question is how to change the type of the Vector to an Array in order to perform the union.
I tried writing this udf with little success:
val toArr: org.apache.spark.ml.linalg.Vector => Array[Double] = _.toArray
val toArrUdf = udf(toArr)
Perhaps the VectorAssembler is not the best option for this task. However, at the moment, it is the only option I have found. I would love to get some recommendations for something better.
Instead of creating a dummy dataframe and using VectorAssembler to generate a random feature vector, you can simply use an UDF directly. The userFactors from the ALS model will return an Array[Float] so the output from the UDF should match that.
val createRandomArray = udf((rank: Int) => {
Array.fill(rank)(Random.nextFloat)
})
Note that this will give numbers in the interval [0.0, 1.0] (same as the rand() used in the question), if other numbers are required, modify as fit.
Using a rank of 3 and the userDf:
val usersFactorsNew = usersDf.withColumn("userFeatures", createRandomArray(lit(3)))
will give a dataframe as follows (of course with random feature values)
+----+----------------------------------------------------------+
|user|userFeatures |
+----+----------------------------------------------------------+
|567 |[0.6866711267486822,0.7257031656127676,0.983562255688249] |
|678 |[0.7013908820314967,0.41029552817665327,0.554591149586789]|
+----+----------------------------------------------------------+
Joining this dataframe with the uF dataframe should now be possible.
The reason the UDF did not work should be due to it being an Array[Double] while you need anArray[Float]for theunion. It should be possible to fix with amap(_.toFloat)`.
val toArr: org.apache.spark.ml.linalg.Vector => Array[Float] = _.toArray.map(_.toFloat)
val toArrUdf = udf(toArr)
All of your process are all correct. Even the udf function is working successfully. All you need to do is change the last part of makeNew function as
def makeNew(df: DataFrame, rank: Int): DataFrame = {
var df_dummy = df
var i: Int = 0
var inputCols: Array[String] = Array()
for (i <- 0 to rank) {
df_dummy = df_dummy.withColumn("feature".concat(i.toString), rand())
inputCols = inputCols :+ "feature".concat(i.toString)
}
val assembler = new VectorAssembler()
.setInputCols(inputCols)
.setOutputCol("userFeatures")
val output = assembler.transform(df_dummy)
output.select(col("id"), toArrUdf(col("userFeatures")).as("features"))
}
and you should be perfect to go so that when you do (I created userDf with id column and not user column)
val usersDf: DataFrame = Seq((567), (678)).toDF("id")
var usersFactorsNew: DataFrame = makeNew(usersDf, 20)
usersFactorsNew.union(uF).show(false)
you should be getting
+---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|id |features |
+---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|567|[0.8259185719733708, 0.327713892339658, 0.049547223031371046, 0.056661808506210054, 0.5846626163454274, 0.038497936270104005, 0.8970865088803417, 0.8840660648882804, 0.837866669938156, 0.9395263094918058, 0.09179528484355126, 0.4915430644129799, 0.11083447052043116, 0.5122858182953718, 0.4302683812966408, 0.3862741815833828, 0.6189322403095068, 0.3000371006293433, 0.09331299668168902, 0.7421838728601371, 0.855867963988993]|
|678|[0.7686514248005568, 0.5473580740023187, 0.072945344124282, 0.36648594574355287, 0.9780202082328863, 0.5289221651923784, 0.3719451099963028, 0.2824660794505932, 0.4873197501260199, 0.9364676464120849, 0.011539929543513794, 0.5240615794930654, 0.6282546154521298, 0.995256022569878, 0.6659179561266975, 0.8990775317754092, 0.08650071017556926, 0.5190186149992805, 0.056345335742325475, 0.6465357505620791, 0.17913532817943245] |
|123|[0.04177388548851013, 0.26762014627456665, -0.19617630541324615, 0.34298020601272583, 0.19632814824581146, -0.2748605012893677, 0.07724890112876892, 0.4277132749557495, 0.1927199512720108, -0.40271613001823425] |
|234|[0.04139673709869385, 0.26520395278930664, -0.19440513849258423, 0.3398836553096771, 0.1945556253194809, -0.27237895131111145, 0.07655145972967148, 0.42385169863700867, 0.19098000228405, -0.39908021688461304] |
+---+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
In Julia 0.5 you can write:
using DataFrames, Plots, StatPlots
df = DataFrame(
fruit = ["orange","orange","orange","orange","apple","apple","apple","apple"],
year = [2010,2011,2012,2013,2010,2011,2012,2013],
production = [120,150,170,160,100,130,165,158],
consumption = [70,90,100,95, 80,95,110,120]
)
plotlyjs()
mycolours = [:green :orange]
legend1 = sort(unique(df[:fruit]))
legend2 = legend1'
fruits_plot = plot(df, :year, :production, group=:fruit, linestyle = :solid, linewidth=3, label = ("Production of " * legend2), color=mycolours)
where legend1 is a 2-element DataArrays.DataArray{String,1} and legend2 is a 1×2 DataArrays.DataArray{String,2}.
Now, in julia 0.6 legend2 = legend1' is not working any more. You can do instead a legend2 = reshape(legend1, (1, :)), but that produces a pretty obscure 1×2 Base.ReshapedArray{String,2,DataArrays.DataArray{String,1},Tuple{}} that then is not accepted in the plot() call.
So, any way in julia 0.6 to produce from a 2-element DataArrays.DataArray{String,1} a 1×2 DataArrays.DataArray{String,2} ?
Again, posting on SO helps.. ;-)
I finally got that I can obtain the plot anticipating the string concatenation:
fruits_plot = plot(df, :year, :production, group=:fruit, linestyle = :solid, linewidth=3, label= reshape("Production of " * sort(unique(df[:fruit])),(1,:)), color=mycolours)
I have an array “a” like the following:
let a = [1.0, 2.0, 10.0, 0.0, 5.0] // original array
I am looking to find the number 10.0 in “a” using binary search.
For that I sort the array to get array “asr”:
let asr = a.sorted()
print(asr)
// [0.0, 1.0, 2.0, 5.0, 10.0]
Binary search for 10.0 in “asr” will return me index = 4. Whereas I am looking for index = 2 as in the original array “a”. And I am also looking for speed since my arrays are long.
Any suggestions?
I paste below the binary search algorithm I am using:
func binarySearch<T:Comparable>(inputArr:Array<T>, searchItem: T)->Int{
var lowerIndex = 0;
var upperIndex = inputArr.count - 1
while (true) {
let currentIndex = (lowerIndex + upperIndex)/2
if(inputArr[currentIndex] == searchItem) {
return currentIndex
} else if (lowerIndex > upperIndex) {
return -1
} else {
if (inputArr[currentIndex] > searchItem) {
upperIndex = currentIndex - 1
} else {
lowerIndex = currentIndex + 1
}
}
}
}
I give example of my x (time) and y (value) arrays. For multiple such arrays, I need to find maximum of values in y and store the related unique x value.
let x = [230.0, 231.0, 232.0, 233.0, 234.0, 235.0, 236.0, 237.0, 238.0, 239.0, 240.0, 241.0, 242.0, 243.0, 244.0, 245.0, 246.0, 247.0, 248.0, 249.0, 250.0, 251.0, 252.0, 253.0, 254.0, 255.0, 256.0, 257.0, 258.0, 259.0, 260.0, 261.0, 262.0, 263.0, 264.0, 265.0, 266.0, 267.0, 268.0, 269.0, 270.0, 271.0, 272.0, 273.0, 274.0, 275.0, 276.0, 277.0, 278.0, 279.0, 280.0, 281.0, 282.0, 283.0, 284.0, 285.0, 286.0, 287.0, 288.0, 289.0, 290.0, 291.0, 292.0, 293.0, 294.0, 295.0, 296.0, 297.0, 298.0, 299.0, 300.0, 301.0, 302.0, 303.0, 304.0, 305.0, 306.0, 307.0, 308.0, 309.0, 310.0, 311.0, 312.0, 313.0, 314.0, 315.0, 316.0, 317.0, 318.0, 319.0, 320.0, 321.0, 322.0, 323.0, 324.0, 325.0, 326.0, 327.0, 328.0, 329.0, 330.0, 331.0, 332.0, 333.0, 334.0, 335.0, 336.0, 337.0, 338.0, 339.0, 340.0, 341.0, 342.0, 343.0, 344.0] // unique ascending time stamps
let y = [-0.0050202642876176198, 0.022393410398194001, 0.049790254951834603, 0.077149678828730195, 0.104451119608423, 0.131674058448602, 0.15879803550636501, 0.185802665315146, 0.21266765210574901, 0.239372805059962, 0.26589805348529699, 0.29222346189943499, 0.31832924501308402, 0.34419578259991401, 0.36980363424246498, 0.39513355394291599, 0.42016650458771598, 0.444883672255248, 0.46926648035572899, 0.49329660359275201, 0.51695598173596602, 0.54022683319452802, 0.56309166838114799, 0.58553330285668204, 0.60753487024536401, 0.62907983491101405, 0.65015200438466503, 0.67073554153427395, 0.690814976467372, 0.71037521815772098, 0.72940156578721904, 0.74787971979453605, 0.765795792622184, 0.783136319153938, 0.79988826683476, 0.816039045465632, 0.83157651666592303, 0.84648900299618601, 0.86076529673452795, 0.87439466829995904, 0.88736687431637595, 0.89967216531114802, 0.91130129304248497, 0.92224551745010597, 0.93249661322397404, 0.94204687598616199, 0.95088912808120296, 0.95901672397057403, 0.96642355522725798, 0.97310405512663301, 0.979053202830236, 0.98426652715924901, 0.98874010995489703, 0.99247058902319396, 0.99545516066186102, 0.99769158176748995, 0.99917817152138799, 0.99991381265282298, 0.99989795227872502, 0.99913060231921702, 0.99761233948865402, 0.99534430486218395, 0.99232820301815805, 0.98856630075702201, 0.98406142539766805, 0.97881696265251605, 0.97283685408292797, 0.96612559413686105, 0.95868822677099297, 0.95053034165985795, 0.94165806999482904, 0.93207807987612601, 0.92179757130129403, 0.91082427075393002, 0.89916642539671898, 0.88683279687313799, 0.87383265472251104, 0.86017576941332496, 0.84587240500008198, 0.83093331140918203, 0.81536971635963695, 0.79919331692469797, 0.78241627074072795, 0.76505118686993401, 0.74711111632382299, 0.72860954225449404, 0.70956036982116599, 0.68997791573951905, 0.66987689752174295, 0.649272422415344, 0.62817997604904996, 0.60661541079433601, 0.584594933851312, 0.56213509506794002, 0.53925277450172504, 0.51596516973324402, 0.49228978294099801, 0.46824440774739501, 0.44384711584564701, 0.41911624341769299, 0.39407037735334999, 0.36872834128103499, 0.34310918142055002, 0.31723215226859403, 0.291116702127733, 0.26478245848970899, 0.23824921328407, 0.21153690800324301, 0.18466561871516801, 0.157655540974798, 0.130526974645817, 0.10330030864392201, 0.07599600561323, 0.048634586547227202, 0.0212366153658834] // y values (could be repeating)
It is natural to express a binary search as a recursive algorithm - and usually clearer, at least when you are comfortable with recursion. How about this:
func binarySearchHelper <T:Comparable> (array: Array<T>, item:T, lower:Int, upper:Int) -> Int? {
guard lower <= upper else { return nil }
let center = (lower + upper) / 2
return array[center] == item
? center
: ((lower == center ? nil : binarySearchHelper (array: array, item: item, lower: lower, upper: center)) ??
(upper == center + 1 ? nil : binarySearchHelper (array: array, item: item, lower: center + 1, upper: upper)))
}
func binarySearch <T:Comparable> (array: Array<T>, item:T) -> Int? {
return binarySearchHelper (array: array, item: item, lower: array.startIndex, upper: array.endIndex)
}
Use another array which keeps the indexes. Like:
let indexArray = [0, 1, 2, 3, 4]
Then whenever you switch a number in your original array, switch the equivalent values in indexArray.
At the end the index array would be like:
[3, 0, 1, 4, 2]
using this you can easily get the original index.
If you send the code you are using to sort I can change the code and show you how to do it..
Another method:
You can keep a copy of your original array:
let copyArray = a.copy()
then use this to find the index of each value:
let indexOfA = copyArray.index(of: "aValue")
copyArray[indexOfA] = nil
// OR if the values are all positive
copyArray[indexOfA] = -1
How can I split array into chunks with some special algorithm? E.g. I need to shorten array to the size of 10 elements. If I have array of 11 elements, I want two next standing elements get merged. If I have array of 13 elements, I want three elements merged. And so on. Is there any solution?
Sample #1
var test = ['1','2','3','4','5','6','7','8','9','10','11'];
Need result = [['1'],['2'],['3'],['4'],['5|6'],['7'],['8'],['9'],['10'],['11']]
Sample #2
var test = ['1','2','3','4','5','6','7','8','9','10','11','12','13'];
Need result = [['1|2'],['3'],['4'],['5'],['6'],['7|8'],['9'],['10'],['11'],['12|13']]
Thank you in advance.
The following code most probably does what you want.
function condense(a){
var source = a.slice(),
len = a.length,
excessCount = (len - 10) % 10,
step = excessCount - 1 ? Math.floor(10/(excessCount-1)) : 0,
groupSize = Math.floor(len / 10),
template = Array(10).fill()
.map((_,i) => step ? i%step === 0 ? groupSize + 1
: i === 9 ? groupSize + 1
: groupSize
: i === 4 ? groupSize + 1
: groupSize);
return template.map(e => source.splice(0,e)
.reduce((p,c) => p + "|" + c));
}
var test1 = ['1','2','3','4','5','6','7','8','9','10','11'],
test2 = ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21'];
console.log(condense(test1));
console.log(condense(test2));
A - Find the difference and create thus many random numbers for merge and put in array
B - loop through initial numbers array.
B1 - if iterator number is in the merge number array (with indexOf), you merge it with the next one and increase iterator (to skip next one as it is merged and already in results array)
B1 example:
int mergers[] = [2, 7, 10]
//in loop when i=2
if (mergers.indexOf(i)>-1) { //true
String newVal = array[i]+"|"+array[i+1]; //will merge 2 and 3 to "2|3"
i++; //adds 1, so i=3. next loop is with i=4
}
C - put new value in results array
You can try this code
jQuery(document).ready(function(){
var test = ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16'];
var arrays = [];
var checkLength = test.length;
var getFirstSet = test.slice(0,10);
var getOthers = test.slice(10,checkLength);
$.each( getFirstSet, function( key,value ) {
if(key in getOthers){
values = value +'|'+ getOthers[key];
arrays.push(values);
}else{
arrays.push(value);
}
});
console.log(arrays);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>