I am using this query for creating the index:
db.CollectionName.createIndex({result: {$exists:true}, timestamp : {$gte: 1573890921898000}})
What am I trying to do here is, creating indexing on timestamp > last month and for only those data where result exist, but I am getting error
{
"ok" : 0,
"errmsg" : "Values in v:2 index key pattern cannot be of type object. Only numbers > 0, numbers < 0, and strings are allowed.",
"code" : 67,
"codeName" : "CannotCreateIndex"
}
What am I doing wrong here?
thanks to #prasad_ for the suggestion, i needed to use partial_Indexes so the query becomes :
db.CollectionName.createIndex({result: 1, timestamp: 1}, {partialFilterExpression :{result: {$exists:true}, timestamp : {$gte: 1573890921898000}}})
Find total number of employees group by each state using aggregate
I tried the following in the screenshot link below. But the result is 0.
db.research.aggregate({$unwind:'$offices'},{"$match":
{'offices.country_code':"USA"}},{"$project": {'offices.state_code' : 1}},
{"$group" : {"_id":'$of
fices.state_code',"count" : {"$sum":'$number_of_employees'}}})
you need to $project number_of_employees to count in next stage, or you can remove the $project stage
db.research.aggregate([
{$unwind:'$offices'},
{"$match": {'offices.country_code':"USA"}},
{"$project": {'offices.state_code' : 1, 'number_of_employees' : 1}}, //project number_of_employees
{"$group" : {"_id":'$offices.state_code',"count" : {"$sum":'$number_of_employees'}}}
])
or
db.research.aggregate([
{$unwind:'$offices'},
{"$match": {'offices.country_code':"USA"}},
{"$group" : {"_id":'$offices.state_code',"count" : {"$sum":'$number_of_employees'}}}
])
Below is the firebase Database of a child node of a particular user under the "users" node:
"L1Bczun2d5UTZC8g2LXchLJVXsh1" : {
"email" : "orabbz#yahoo.com",
"fullname" : "orabueze yea",
"teamname" : "orabbz team",
"total" : 0,
"userName" : "orabbz#yahoo.com",
"week1" : 0,
"week10" : 0,
"week11" : 0,
"week12" : 0,
"week2" : 0,
"week3" : 17,
"week4" : 0,
"week5" : 20,
"week6" : 0,
"week7" : 0,
"week8" : 0,
"week9" : 10
},
IS there a way to add up all the values of Weeks 1 down to week 12 and have the total sum in the total key?
I am curently thinking of bringing all the values of week 1 - week 12 brought into the angular js scope then adding up the values and then posting the total back in the firebase databse key total. But this sounds too long winded. is there a shorter solution?
As far as I know, Firebase DB doesn't have any DB functions as you'd have in SQL. So the options you have is to get the data and calculate it in Angular as you say. Or update a counter when the weeks are added to the user (when writing to the DB). Then just read the counter later
I am trying to use zeppelin to plot a realtime graph. I have doing a sentiment analysis on tweets on per minute . I am able to query statically and plot a graph. But i would like this to be done dynamically. I am new to zeppelin and do not have much knowledge about angularJS. What should be the correct approach to this problem?
val final_score=uni_join.map{case((year,month,day,hour,minutes),(tweet_count,sentiment))=>(year, month, day, hour, minutes(sentiment/tweet_count).ceil)}
final_score.saveToCassandra("twitter", "score",writeConf = WriteConf(ttl = TTLOption.constant(1000)))
final_score.foreachRDD(score => {
val rowRDD =score.map{case(year,month,day,hour,minutes,sentiment) =>(year,month,day,hour,minutes,sentiment) }
val tempDF = sqlContext.createDataFrame(rowRDD)
z.angularBindGlobal("stream", parsed) //to bind parsed to stream.
tempDF.registerTempTable("realTimeTable")
})
Doing a query on the above table , i am able to get the graph. But i would like to dynamically update the graph every minute in order to keep in sync with the sentiment score .
Thanks prior.
[update] the angular part for the zeppelin notebook are as follows:
%angular
<div id="graph" style="height: 100%; width: 100%">
<canvas id="myChart" width="400" height="400"></canvas>
<div id="legendDiv"></div>
</div>
<script>
function initMap() {
var colorList = ["#fde577", "#ff6c40", "#c72a40", "#520833", "#a88399"]
var el = angular.element($('#stream'));
console.log("El is "+el) //returns el as object
angular.element(el).ready(function() {
console.log('Hello')
window.locationWatcher =el.$scope.$watch('stream', function(new, old){
console.log('changed');}, true)})
</script>
But running this code keeps returning the following error.
vendor.js:29 jQuery.Deferred exception: Cannot read property '$watch' of undefined TypeError: Cannot read property '$watch' of undefined
The spark version that i am using is 1.6
And Zeppelin is 0.6
spark-highcharts since version 0.6.3 support Spark Structured Streaming.
For a structuredDataFrame after aggregation, with the following code in one Zeppelin paragraph. The OutputMode can be either append or complete depends how the structureDataFrame is aggregated.
import com.knockdata.spark.highcharts._
import com.knockdata.spark.highcharts.model._
val query = highcharts(
structuredDataFrame.seriesCol("country")
.series("x" -> "year", "y" -> "stockpile")
.orderBy(col("year")), z, "append")
And the following code in the next paragraph. The chart in this paragraph will be updated when there are new data coming to the structureDataFrame.
StreamingChart(z)
Run following code to stop update the chart.
query.stop()
Here is the example generate structureDataFrame.
spark.conf.set("spark.sql.streaming.checkpointLocation","/usr/zeppelin/checkpoint")
case class NuclearStockpile(country: String, stockpile: Int, year: Int)
val USA = Seq(0, 0, 0, 0, 0, 6, 11, 32, 110, 235, 369, 640,
1005, 1436, 2063, 3057, 4618, 6444, 9822, 15468, 20434, 24126,
27387, 29459, 31056, 31982, 32040, 31233, 29224, 27342, 26662,
26956, 27912, 28999, 28965, 27826, 25579, 25722, 24826, 24605,
24304, 23464, 23708, 24099, 24357, 24237, 24401, 24344, 23586,
22380, 21004, 17287, 14747, 13076, 12555, 12144, 11009, 10950,
10871, 10824, 10577, 10527, 10475, 10421, 10358, 10295, 10104).
zip(1940 to 2006).map(p => NuclearStockpile("USA", p._1, p._2))
val USSR = Seq(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
5, 25, 50, 120, 150, 200, 426, 660, 869, 1060, 1605, 2471, 3322,
4238, 5221, 6129, 7089, 8339, 9399, 10538, 11643, 13092, 14478,
15915, 17385, 19055, 21205, 23044, 25393, 27935, 30062, 32049,
33952, 35804, 37431, 39197, 45000, 43000, 41000, 39000, 37000,
35000, 33000, 31000, 29000, 27000, 25000, 24000, 23000, 22000,
21000, 20000, 19000, 18000, 18000, 17000, 16000).
zip(1940 to 2006).map(p => NuclearStockpile("USSR/Russia", p._1, p._2))
input.addData(USA.take(30) ++ USSR.take(30))
val structureDataFrame = input.toDF
And the following code can be simulate to update the chart. The chart will be updated when the following code run.
input.addData(USA.drop(30) ++ USSR.drop(30))
NOTE: The example using Zeppelin 0.6.2 and Spark 2.0
NOTE: Please check the highcharts license for commercial usage
I'm querying a database containing entries as displayed in the example. All entries contain the following values:
_id: unique id of overallitem and placed_items
name: the name of te overallitem
loc: location of the overallitem and placed_items
time_id: time the overallitem was stored
placed_items: array containing placed_items (can range from zero: placed_items : [], to unlimited amount.
category_id: the category of the placed_items
full_id: the full id of the placed_items
I want to extract the name, full_id and category_id on a per placed_items level given a time_id and loc constraint
Example data:
{
"_id" : "5040",
"name" : "entry1",
"loc" : 1,
"time_id" : 20121001,
"placed_items" : [],
}
{
"_id" : "5041",
"name" : "entry2",
"loc" : 1,
"time_id" : 20121001,
"placed_items" : [
{
"_id" : "5043",
"category_id" : 101,
"full_id" : 901,
},
{
"_id" : "5044",
"category_id" : 102,
"full_id" : 902,
}
],
}
{
"_id" : "5042",
"name" : "entry3",
"loc" : 1,
"time_id" : 20121001,
"placed_items" : [
{
"_id" : "5045",
"category_id" : 101,
"full_id" : 903,
},
],
}
The expected outcome for this example would be:
"name" "full_id" "category_id"
"entry2" 901 101
"entry2" 902 102
"entry3" 903 101
So if placed_items is empty, do put the entry in the dataframe and if placed_items containts n entries, put n entries in dataframe
I tried to work out an RBlogger example to create the desired dataframe.
#Set up database
mongo <- mongo.create()
#Set up condition
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "loc", 1)
mongo.bson.buffer.start.object(buf, "time_id")
mongo.bson.buffer.append(buf, "$gte", 20120930)
mongo.bson.buffer.append(buf, "$lte", 20121002)
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
#Count
count <- mongo.count(mongo, "items_test.overallitem", query)
#Note that these counts don't work, since the count should be based on
#the number of placed_items in the array, and not the number of entries.
#Setup Cursor
cursor <- mongo.find(mongo, "items_test.overallitem", query)
#Create vectors, which will be filled by the while loop
name <- vector("character", count)
full_id<- vector("character", count)
category_id<- vector("character", count)
i <- 1
#Fill vectors
while (mongo.cursor.next(cursor)) {
b <- mongo.cursor.value(cursor)
order_id[i] <- mongo.bson.value(b, "name")
product_id[i] <- mongo.bson.value(b, "placed_items.full_id")
category_id[i] <- mongo.bson.value(b, "placed_items.category_id")
i <- i + 1
}
#Convert to dataframe
results <- as.data.frame(list(name=name, full_id=full_uid, category_id=category_id))
The conditions work and the code works if I would want to extract values on an overallitem level (i.e. _id or name) but fails to gather the information on a placed_items level. Furthermore, the dotted call for extracting full_id and category_id does not seem to work. Can anyone help?