How to plot a Real time graph in Zeppelin? - angularjs

I am trying to use zeppelin to plot a realtime graph. I have doing a sentiment analysis on tweets on per minute . I am able to query statically and plot a graph. But i would like this to be done dynamically. I am new to zeppelin and do not have much knowledge about angularJS. What should be the correct approach to this problem?
val final_score=uni_join.map{case((year,month,day,hour,minutes),(tweet_count,sentiment))=>(year, month, day, hour, minutes(sentiment/tweet_count).ceil)}
final_score.saveToCassandra("twitter", "score",writeConf = WriteConf(ttl = TTLOption.constant(1000)))
final_score.foreachRDD(score => {
val rowRDD =score.map{case(year,month,day,hour,minutes,sentiment) =>(year,month,day,hour,minutes,sentiment) }
val tempDF = sqlContext.createDataFrame(rowRDD)
z.angularBindGlobal("stream", parsed) //to bind parsed to stream.
tempDF.registerTempTable("realTimeTable")
})
Doing a query on the above table , i am able to get the graph. But i would like to dynamically update the graph every minute in order to keep in sync with the sentiment score .
Thanks prior.
[update] the angular part for the zeppelin notebook are as follows:
%angular
<div id="graph" style="height: 100%; width: 100%">
<canvas id="myChart" width="400" height="400"></canvas>
<div id="legendDiv"></div>
</div>
<script>
function initMap() {
var colorList = ["#fde577", "#ff6c40", "#c72a40", "#520833", "#a88399"]
var el = angular.element($('#stream'));
console.log("El is "+el) //returns el as object
angular.element(el).ready(function() {
console.log('Hello')
window.locationWatcher =el.$scope.$watch('stream', function(new, old){
console.log('changed');}, true)})
</script>
But running this code keeps returning the following error.
vendor.js:29 jQuery.Deferred exception: Cannot read property '$watch' of undefined TypeError: Cannot read property '$watch' of undefined
The spark version that i am using is 1.6
And Zeppelin is 0.6

spark-highcharts since version 0.6.3 support Spark Structured Streaming.
For a structuredDataFrame after aggregation, with the following code in one Zeppelin paragraph. The OutputMode can be either append or complete depends how the structureDataFrame is aggregated.
import com.knockdata.spark.highcharts._
import com.knockdata.spark.highcharts.model._
val query = highcharts(
structuredDataFrame.seriesCol("country")
.series("x" -> "year", "y" -> "stockpile")
.orderBy(col("year")), z, "append")
And the following code in the next paragraph. The chart in this paragraph will be updated when there are new data coming to the structureDataFrame.
StreamingChart(z)
Run following code to stop update the chart.
query.stop()
Here is the example generate structureDataFrame.
spark.conf.set("spark.sql.streaming.checkpointLocation","/usr/zeppelin/checkpoint")
case class NuclearStockpile(country: String, stockpile: Int, year: Int)
val USA = Seq(0, 0, 0, 0, 0, 6, 11, 32, 110, 235, 369, 640,
1005, 1436, 2063, 3057, 4618, 6444, 9822, 15468, 20434, 24126,
27387, 29459, 31056, 31982, 32040, 31233, 29224, 27342, 26662,
26956, 27912, 28999, 28965, 27826, 25579, 25722, 24826, 24605,
24304, 23464, 23708, 24099, 24357, 24237, 24401, 24344, 23586,
22380, 21004, 17287, 14747, 13076, 12555, 12144, 11009, 10950,
10871, 10824, 10577, 10527, 10475, 10421, 10358, 10295, 10104).
zip(1940 to 2006).map(p => NuclearStockpile("USA", p._1, p._2))
val USSR = Seq(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
5, 25, 50, 120, 150, 200, 426, 660, 869, 1060, 1605, 2471, 3322,
4238, 5221, 6129, 7089, 8339, 9399, 10538, 11643, 13092, 14478,
15915, 17385, 19055, 21205, 23044, 25393, 27935, 30062, 32049,
33952, 35804, 37431, 39197, 45000, 43000, 41000, 39000, 37000,
35000, 33000, 31000, 29000, 27000, 25000, 24000, 23000, 22000,
21000, 20000, 19000, 18000, 18000, 17000, 16000).
zip(1940 to 2006).map(p => NuclearStockpile("USSR/Russia", p._1, p._2))
input.addData(USA.take(30) ++ USSR.take(30))
val structureDataFrame = input.toDF
And the following code can be simulate to update the chart. The chart will be updated when the following code run.
input.addData(USA.drop(30) ++ USSR.drop(30))
NOTE: The example using Zeppelin 0.6.2 and Spark 2.0
NOTE: Please check the highcharts license for commercial usage

Related

How to have the xlim with seaborn automatically adjust based on dataframe date range

I am trying to loop through plots. Each "station" is a pandas dataframe has a single water year of data (oct 1 to Spet 29). The data is being read in with this code:
sh_784_2020 = pd.read_csv("sh_784_WY2020.csv", parse_dates=['Date'])
sh_784_2020.columns = ["Index", "Date", "Temp_C","Precip_mm","SnowDepth_cm","SWE_mm","SM2","SM8","SM20"]
My plots loop through but the x-axis always starts at the year 2000 through the current date displayed but my data is from 2006-2020. Is there a way to have the xlim adjust automatically for the date range in the data frame? Or is there a way to create this plot in matyplotlib and not seaborn?
for station in stations:
station['Density'] = station['SWE_mm']/(station['SnowDepth_cm']*10)*100
station['Density range'] = pd.cut( station['Density'], [-np.inf, 25, 30, 35, 40, np.inf])
Date = station.loc[:, 'Date'].values
SWE_mm = station.loc[:, 'SWE_mm'].values
Density = station.loc[:, 'Density'].values
sns.scatterplot(station['Date'], station['SWE_mm'], hue='Density range', data= station, edgecolor = 'none', palette=['grey', 'green', 'gold', 'orange', 'crimson'], alpha= 1)
plt.xlim ()
plt.show()
Plot example 1
Plot example 2
If you upgrade to seaborn 0.11 you should find that the default autoscaling works better, but you can get a good result without upgrading by creating the Axes object before plotting and setting the units, e.g. something like
ax = plt.figure().subplots()
ax.xaxis.update_units(station["Date"])

Turn nested lists into a table Python

sorry for my annoying questions again.
Having a bit of trouble with some code. The task is to write a function that takes a nested list of currency conversions and turn it into a table (I've attached some pictures for clarification)
(could only attach one image, this is the nested list it converts)
[[10, 9.6, 7.5, 6.7, 4.96], [20, 19.2, 15.0, 13.4, 9.92], [30, 28.799999999999997, 22.5, 20.1, 14.879999999999999], [40, 38.4, 30.0, 26.8, 19.84], [50, 48.0, 37.5, 33.5, 24.8], [60, 57.599999999999994, 45.0, 40.2, 29.759999999999998], [70, 67.2, 52.5, 46.900000000000006, 34.72], [80, 76.8, 60.0, 53.6, 39.68], [90, 86.39999999999999, 67.5, 60.300000000000004, 44.64], [100, 96.0, 75.0, 67.0, 49.6]]
I've got the Header column for the table to work fine.
I'm having issues when I'm trying to iterate over each sublist in the nested list, convert it to a string (and two decimal places) and with a tab between each entry.
the code I've got so far is:
def printTable(cur):
list2 = makeTable(cur)
lst1 = Extract(cur)
lst1.insert(0, "$NZD")
lne1 = "\t".join(lst1)
print(lne1)
list=map(str,list2)
print(list2)
for list in list2:
for elem in list:
linelem = "\t".join(elem)
print(linelem)
printTable(cur)
(Note: The first function I call and assign to list2 is what generates the data/nested list)
I've tried playing around a bit but I keep coming up with different error messages trying to convert each sublist to a string.
Thank you all for your help!![enter image description here][1]
Try using this method;
import pandas as pd
import csv
with open("out.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(a)
pd.read_csv("out.csv", header = [col1, col2, col3, col4, col5])

Can we calculate time taken to execute each step in protractor?

I have a task in which I have to calculate the time taken by each step, for example if we are clicking on a link, how much time is taken to load the page and next step to be executed.
I want fail the test case if time taken is more then say 2 seconds.
I have tried using protractor-perf and it gives me the readings below and these don't help, or I am not able to read anything correctly.
{ Styles: 0,
Javascript: 0,
numAnimationFrames: 4625,
numFramesSentToScreen: 4625,
droppedFrameCount: 417,
meanFrameTime_raf: 18.242160830084256,
framesPerSec_raf: 54.81806729556069,
connectEnd: 1524251882749,
connectStart: 1524251882459,
domComplete: 1524251916054,
domContentLoadedEventEnd: 1524251916053,
domContentLoadedEventStart: 1524251916050,
domInteractive: 1524251916050,
domLoading: 1524251883038,
domainLookupEnd: 1524251882459,
domainLookupStart: 1524251882459,
fetchStart: 1524251882458,
firstPaint: 33600.99983215332,
loadEventEnd: 1524251916055,
loadEventStart: 1524251916054,
navigationStart: 1524251882456,
redirectEnd: 0,
redirectStart: 0,
requestStart: 1524251882749,
responseEnd: 1524251883592,
responseStart: 1524251883032,
secureConnectionStart: 0,
unloadEventEnd: 0,
unloadEventStart: 0,
loadTime: 33597,
domReadyTime: 4,
readyStart: 2,
redirectTime: 0,
appcacheTime: 1,
unloadEventTime: 0,
domainLookupTime: 0,
connectTime: 290,
requestTime: 843,
initDomTreeTime: 32458,
loadEventTime: 1 }
I have also tried using log-timestamp and I can print the timestamp to log but cannot get the difference to use it in a variable and fail the test case.
I get the output like this in the log:
[2018-04-20T19:19:13.325Z] Start
[2018-04-20T19:19:14.046Z] Step1
[2018-04-20T19:19:47.667Z] Step2
[2018-04-20T19:19:50.304Z] Step3
[2018-04-20T19:19:52.111Z] Step4
[2018-04-20T19:19:57.344Z] Step5
[2018-04-20T19:19:59.029Z] Step6
I would really appreciate your help guys, this has wasted a lot my time.

Why can't I retrieve the master appointment of a series via `AppointmentCalendar.FindAppointmentsAsync`?

I'm retrieving multiple appointments via AppointmentCalendar.FindAppointmentsAsync. I'm evaluating the Recurrence.RecurrenceType and noticed an unexpected value of 1 for master appointments of a series. I expect the Recurrence.RecurrenceType to be 0 (Master) but instead it is 1 (Instance).
(Note: I added AppointmentProperties.Recurrence to FindAppointmentsOptions.FetchProperties that is passed to GetAppointmentsAsync, so the Recurrence data should be fetched propertly.)
To double check I retrieved the respective master appointment via GetAppointmentAsync (instead of FindAppointmentsAsync) using its LocalId - and here the RecurrenceType is correctly set to 0.
Here is demo output for a test appointment series:
Data gotten by FindAppointmentsAsync (Instance??):
"Recurrence": {
"Unit": 0,
"Occurrences": 16,
"Month": 1,
"Interval": 1,
"DaysOfWeek": 0,
"Day": 1,
"WeekOfMonth": 0,
"Until": "2016-09-29T02:00:00+02:00",
"TimeZone": "Europe/Budapest",
"RecurrenceType": 1,
"CalendarIdentifier": "GregorianCalendar"
},
"StartTime": "2016-09-14T19:00:00+02:00",
"OriginalStartTime": "2016-09-14T19:00:00+02:00",
Data gotten by GetAppointmentAsync for the same appointment (Master):
"Recurrence": {
"Unit": 0,
"Occurrences": 16,
"Month": 1,
"Interval": 1,
"DaysOfWeek": 0,
"Day": 1,
"WeekOfMonth": 0,
"Until": "2016-09-29T02:00:00+02:00",
"TimeZone": "Europe/Budapest",
"RecurrenceType": 0,
"CalendarIdentifier": "GregorianCalendar"
},
"StartTime": "2016-09-14T19:00:00+02:00",
"OriginalStartTime": null,
Notice the difference in RecurrenceType. Also note that OriginalStartTime is set to null for the master gotten by GetAppointmentAsync but has a value for the appointment gotten by FindAppointmentsAsync.
You can also see that the StartTime for the master appointment is the start time set for the alleged Instance (which in reality is the master).
Shouldn't FindAppointmentsAsync return a master as the first element of a series, instead of an instance?
(SDK: 10.0.14393.0, Anniversary)
Code to explicitly find such a master/instance situation for a given calendar:
var appointmentsCurrent = await calendar.FindAppointmentsAsync(DateTimeOffset.Now, TimeSpan.FromDays(365), findAppointmentOptions);
foreach(var a in appointmentsCurrent)
{
var a2 = await calendar.GetAppointmentAsync(a.LocalId);
if (a2.Recurrence?.RecurrenceType == RecurrenceType.Master &&
a2.StartTime == a.StartTime &&
a.Recurrence?.RecurrenceType == RecurrenceType.Instance &&
a.OriginalStartTime == a2.StartTime)
{
Debug.WriteLine("Gotcha!");
}
}
I tested above code on my side. If you get the count of the appointments which are got from FindAppontmentsAsync by the following code:var count=appointmentsCurrent.Count;, you will find it does return the count of the appointment instances, not the count of master appointments. So the FindAppontmentsAsync method got all instances of the appointments not master appointments. This is the reason why the RecurrenceType is instance.
It seems like we can get one master appointment by method GetAppointmentAsync as you mentioned above, so I suppose this may not block you.
If you think this is not a good design for this API or you require a API for finding all the master appointments in one calendar, you can submit your ideas to the windows 10 feedback tool or the user voice site.

Showing colour in cal-heatmap tiles

I'm using cal-heatmap for displaying a user activity for a month. My issue is that the colour change is not showing properly. My "init" function is given below.
When I provide data with integer values which have difference of 2 or 3 (eg: 8, 12, 3, 7 etc [pls note that I'm giving data as JSON]), I can't see any significant difference in colour for the blocks (screenshot is added in http://i.stack.imgur.com/xspWR.jpg - numbers given in top indicates the data corresponding to that cell).
init({
start: new Date(newDate.getFullYear(), month, 1),
cellRadius: 35,
cellSize: 58,
itemSelector: "#heatmap_busiestDays",
domain: "month", //hour|day|week|month|year
subDomain: "x_day",
subDomainTextFormat: "%b %d",
range: 1,
domainGutter: 10,
previousSelector: "#cal-heatmap-previous",
nextSelector: "#cal-heatmap-next",
displayLegend: false,
data: busiestDayHeatMap,
legendColors: {
min: "#A2F37B",
max: "#26911F",
empty: "white"
}
});
Am I missing anything in settings? Any help will be greatly appreciated. Thanks in advance.

Resources