Good afternoon
I am currently trying to pull some data from pushshift but I am maxing out at 100 posts. Below is the code for pulling one day that works great.
testdata1<-getPushshiftData(postType = "submission", size = 1000, before = "1546300800", after= "1546200800", subreddit = "mysubreddit", nest_level = 1)
I have a list of Universal Time Codes for the beginning and ending of each day for a month. What I would like to do is get the syntax to replace the "after" and "before" values for each day and for each day to be added to the end of the pulled data. Even if it placed the data to a bunch of separate smaller datasets I could work with it.
Here is my (feeble) attempt. "links" is the data frame with the UTCs
mydata<- lapply(1:30, function(x) getPushshiftData(postType = "submission", size = 1000, after= links$utcstart[,x],before = links$utcendstart[,x], subreddit = "mysubreddit", nest_level = 1))
Here is the error message I get: Error in links$utcstart[, x] : incorrect number of dimensions
I've also tried without the "function (x)" argument and get the following message:
Error in ifelse(is.null(after), "", sprintf("&after=%s", after)) :
object 'x' not found
Can anyone help with this?
Looking for some kind of solution to this issue:
trying to create a tensor from an array of timestamps
[
1612892067115,
],
but here is what happens
tf.tensor([1612892067115]).arraySync()
> [ 1612892078080 ]
as you can see, the result is incorrect.
Somebody pointed out, I may need to use the datatype int64, but this doesn't seem to exist in tfjs ðŸ˜
I have also tried to divide my timestamp to a small float, but I get a similar result
tf.tensor([1.612892067115, 1.612892068341]).arraySync()
[ 1.6128920316696167, 1.6128920316696167 ]
If you know a way to work around using timestamps in a tensor, please help :)
:edit:
As an attempted workaround, I tried to remove my year, month, and date from my timestamp
Here are my subsequent input values:
[
56969701,
56969685,
56969669,
56969646,
56969607,
56969602
]
and their outputs:
[
56969700,
56969684,
56969668,
56969648,
56969608,
56969600
]
as you can see, they are still incorrect, and should be well within the acceptable range
found a solution that worked for me:
Since I only require a subset of the timestamp (just the date / hour / minute / second / ms) for my purposes, I simply truncate out the year / month:
export const subts = (ts: number) => {
// a sub timestamp which can be used over the period of a month
const yearMonth = +new Date(new Date().getFullYear(), new Date().getMonth())
return ts - yearMonth
}
then I can use this with:
subTimestamps = timestamps.map(ts => subts(ts))
const x_vals = tf.tensor(subTimestamps, [subTimestamps.length], 'int32')
now all my results work as expected.
Currently only int32 is supported with tensorflow.js, your data has gone out of the range supported by int32.
Until int64 is supported, this can be solved by using a relative timestamp. Currently a timestamp in js uses the number of ms that elapsed since 1 January 1970. A relative timestamp can be used by using another origin and compute the difference of ms that has elapsed since that date. That way, we will have a lower number that can be represented using int32. The best origin to take will be the starting date of the records
const a = Date.now() // computing a tensor out of it will give an accurate result since the number is out of range
const origin = new Date("02/01/2021").now()
const relative = a - origin
const tensor = tf.tensor(relative, undefined, 'int32')
// get back the data
const data = tensor.dataSync()[0]
// get the initial date
const initial date = new Date(data + origin)
In other scenarios, if using the ms is not of interest, using the number of s that has elapsed since the start would be better. It is called the unix time
I have a task in which I have to calculate the time taken by each step, for example if we are clicking on a link, how much time is taken to load the page and next step to be executed.
I want fail the test case if time taken is more then say 2 seconds.
I have tried using protractor-perf and it gives me the readings below and these don't help, or I am not able to read anything correctly.
{ Styles: 0,
Javascript: 0,
numAnimationFrames: 4625,
numFramesSentToScreen: 4625,
droppedFrameCount: 417,
meanFrameTime_raf: 18.242160830084256,
framesPerSec_raf: 54.81806729556069,
connectEnd: 1524251882749,
connectStart: 1524251882459,
domComplete: 1524251916054,
domContentLoadedEventEnd: 1524251916053,
domContentLoadedEventStart: 1524251916050,
domInteractive: 1524251916050,
domLoading: 1524251883038,
domainLookupEnd: 1524251882459,
domainLookupStart: 1524251882459,
fetchStart: 1524251882458,
firstPaint: 33600.99983215332,
loadEventEnd: 1524251916055,
loadEventStart: 1524251916054,
navigationStart: 1524251882456,
redirectEnd: 0,
redirectStart: 0,
requestStart: 1524251882749,
responseEnd: 1524251883592,
responseStart: 1524251883032,
secureConnectionStart: 0,
unloadEventEnd: 0,
unloadEventStart: 0,
loadTime: 33597,
domReadyTime: 4,
readyStart: 2,
redirectTime: 0,
appcacheTime: 1,
unloadEventTime: 0,
domainLookupTime: 0,
connectTime: 290,
requestTime: 843,
initDomTreeTime: 32458,
loadEventTime: 1 }
I have also tried using log-timestamp and I can print the timestamp to log but cannot get the difference to use it in a variable and fail the test case.
I get the output like this in the log:
[2018-04-20T19:19:13.325Z] Start
[2018-04-20T19:19:14.046Z] Step1
[2018-04-20T19:19:47.667Z] Step2
[2018-04-20T19:19:50.304Z] Step3
[2018-04-20T19:19:52.111Z] Step4
[2018-04-20T19:19:57.344Z] Step5
[2018-04-20T19:19:59.029Z] Step6
I would really appreciate your help guys, this has wasted a lot my time.
I am trying to use zeppelin to plot a realtime graph. I have doing a sentiment analysis on tweets on per minute . I am able to query statically and plot a graph. But i would like this to be done dynamically. I am new to zeppelin and do not have much knowledge about angularJS. What should be the correct approach to this problem?
val final_score=uni_join.map{case((year,month,day,hour,minutes),(tweet_count,sentiment))=>(year, month, day, hour, minutes(sentiment/tweet_count).ceil)}
final_score.saveToCassandra("twitter", "score",writeConf = WriteConf(ttl = TTLOption.constant(1000)))
final_score.foreachRDD(score => {
val rowRDD =score.map{case(year,month,day,hour,minutes,sentiment) =>(year,month,day,hour,minutes,sentiment) }
val tempDF = sqlContext.createDataFrame(rowRDD)
z.angularBindGlobal("stream", parsed) //to bind parsed to stream.
tempDF.registerTempTable("realTimeTable")
})
Doing a query on the above table , i am able to get the graph. But i would like to dynamically update the graph every minute in order to keep in sync with the sentiment score .
Thanks prior.
[update] the angular part for the zeppelin notebook are as follows:
%angular
<div id="graph" style="height: 100%; width: 100%">
<canvas id="myChart" width="400" height="400"></canvas>
<div id="legendDiv"></div>
</div>
<script>
function initMap() {
var colorList = ["#fde577", "#ff6c40", "#c72a40", "#520833", "#a88399"]
var el = angular.element($('#stream'));
console.log("El is "+el) //returns el as object
angular.element(el).ready(function() {
console.log('Hello')
window.locationWatcher =el.$scope.$watch('stream', function(new, old){
console.log('changed');}, true)})
</script>
But running this code keeps returning the following error.
vendor.js:29 jQuery.Deferred exception: Cannot read property '$watch' of undefined TypeError: Cannot read property '$watch' of undefined
The spark version that i am using is 1.6
And Zeppelin is 0.6
spark-highcharts since version 0.6.3 support Spark Structured Streaming.
For a structuredDataFrame after aggregation, with the following code in one Zeppelin paragraph. The OutputMode can be either append or complete depends how the structureDataFrame is aggregated.
import com.knockdata.spark.highcharts._
import com.knockdata.spark.highcharts.model._
val query = highcharts(
structuredDataFrame.seriesCol("country")
.series("x" -> "year", "y" -> "stockpile")
.orderBy(col("year")), z, "append")
And the following code in the next paragraph. The chart in this paragraph will be updated when there are new data coming to the structureDataFrame.
StreamingChart(z)
Run following code to stop update the chart.
query.stop()
Here is the example generate structureDataFrame.
spark.conf.set("spark.sql.streaming.checkpointLocation","/usr/zeppelin/checkpoint")
case class NuclearStockpile(country: String, stockpile: Int, year: Int)
val USA = Seq(0, 0, 0, 0, 0, 6, 11, 32, 110, 235, 369, 640,
1005, 1436, 2063, 3057, 4618, 6444, 9822, 15468, 20434, 24126,
27387, 29459, 31056, 31982, 32040, 31233, 29224, 27342, 26662,
26956, 27912, 28999, 28965, 27826, 25579, 25722, 24826, 24605,
24304, 23464, 23708, 24099, 24357, 24237, 24401, 24344, 23586,
22380, 21004, 17287, 14747, 13076, 12555, 12144, 11009, 10950,
10871, 10824, 10577, 10527, 10475, 10421, 10358, 10295, 10104).
zip(1940 to 2006).map(p => NuclearStockpile("USA", p._1, p._2))
val USSR = Seq(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
5, 25, 50, 120, 150, 200, 426, 660, 869, 1060, 1605, 2471, 3322,
4238, 5221, 6129, 7089, 8339, 9399, 10538, 11643, 13092, 14478,
15915, 17385, 19055, 21205, 23044, 25393, 27935, 30062, 32049,
33952, 35804, 37431, 39197, 45000, 43000, 41000, 39000, 37000,
35000, 33000, 31000, 29000, 27000, 25000, 24000, 23000, 22000,
21000, 20000, 19000, 18000, 18000, 17000, 16000).
zip(1940 to 2006).map(p => NuclearStockpile("USSR/Russia", p._1, p._2))
input.addData(USA.take(30) ++ USSR.take(30))
val structureDataFrame = input.toDF
And the following code can be simulate to update the chart. The chart will be updated when the following code run.
input.addData(USA.drop(30) ++ USSR.drop(30))
NOTE: The example using Zeppelin 0.6.2 and Spark 2.0
NOTE: Please check the highcharts license for commercial usage
I'm using cal-heatmap for displaying a user activity for a month. My issue is that the colour change is not showing properly. My "init" function is given below.
When I provide data with integer values which have difference of 2 or 3 (eg: 8, 12, 3, 7 etc [pls note that I'm giving data as JSON]), I can't see any significant difference in colour for the blocks (screenshot is added in http://i.stack.imgur.com/xspWR.jpg - numbers given in top indicates the data corresponding to that cell).
init({
start: new Date(newDate.getFullYear(), month, 1),
cellRadius: 35,
cellSize: 58,
itemSelector: "#heatmap_busiestDays",
domain: "month", //hour|day|week|month|year
subDomain: "x_day",
subDomainTextFormat: "%b %d",
range: 1,
domainGutter: 10,
previousSelector: "#cal-heatmap-previous",
nextSelector: "#cal-heatmap-next",
displayLegend: false,
data: busiestDayHeatMap,
legendColors: {
min: "#A2F37B",
max: "#26911F",
empty: "white"
}
});
Am I missing anything in settings? Any help will be greatly appreciated. Thanks in advance.