I'm making a system to filter demands of the user and for one part I must filter by all attributes selected, here is my query builder :
return EncyclopedieModel::with('stats')
->join('equipement_stats', 'equipement_stats.id_equipement', '=', 'equipement.id_equipement')
->join('stats', 'stats.id_stats','=','equipement_stats.id_stats')
->whereIn('stats.id_stats', $filterStats)
->whereIn('id_typeequipement', $filterEquipement)
->whereIn('id_rarete', $filterRarity)
->skip(0 + $toskip)
->take(10)
->get()
->toJson();
I must filter on multiple demands for stats.
I firstly take all equipements that fit the demand of the user then I get the statistics of the item with eager loading.
My problem is that I must get item that has both stats at the same time. For example if my item has the statistic "3" and "7" I must get all items that have both these statistics.
Now, I'm getting all equipements with the statistic "3" and all equipements with the statistic "7"...
I don't know how I should implement it
EDIT : I tried so simplifize with a car and caracteristics
[
{
"id_car":1,
"nom_car":"Car A",
"rarity": 1,
"caracteristic":[
{
"id_caracterstic":3,
"nom_stats":"Caracteristic A"
},
{
"id_caracterstic":8,
"nom_stats":"Caracteristic W"
},
{
"id_caracterstic":4,
"nom_stats":"Caracteristic Z"
}
]
},
{
"id_car":2,
"nom_car":"Car B",
"rarity": 2,
"caracteristic":[
{
"id_caracterstic":5,
"nom_stats":"Caracteristic P"
},
{
"id_caracterstic":8,
"nom_stats":"Caracteristic W"
},
{
"id_caracterstic":12,
"nom_stats":"Caracteristic ZA"
}
]
},
{
"id_car":1,
"nom_car":"Car C",
"rarity": 2,
"caracteristic":[
{
"id_caracterstic":12,
"nom_stats":"Caracteristic P"
},
{
"id_caracterstic":8,
"nom_stats":"Caracteristic W"
},
{
"id_caracterstic":14,
"nom_stats":"Caracteristic ZDD"
}
]
},
]
It's like I must find cars in my database which rarity is "2" and caracteristics are 8 and 12.
The way I'm doing now, I'm getting Car A, Car B and Car C because my query looks through all cars with caracteristic 8 and all cars with caracteristic 12
What I want is to only get Car B and Car C when I'm looking for a car with caracteristic "8" and "12"
From your question, my understanding is that you want to filter your models to include only those that have all of the requisite equipment types.
With that in mind, you simply need to modify your existing query to use HAVING and GROUP BY.
return EncyclopedieModel::with('stats')
->join('equipement_stats', 'equipement_stats.id_equipement', '=', 'equipement.id_equipement')
->join('stats', 'stats.id_stats','=','equipement_stats.id_stats')
->whereIn('stats.id_stats', $filterStats)
->whereIn('id_typeequipement', $filterEquipement)
->whereIn('id_rarete', $filterRarity)
->groupBy('enclopedie.id_car')
->havingRaw('COUNT(id_typeequipement) = ?', [count($filterEquipement)])
->skip(0 + $toskip)
->take(10)
->get()
->toJson();
The GROUP BY is required to group the equipment types per distinct model for comparision later. The HAVING is essentially a WHERE clause applied to an aggregate function. In this case, COUNT.
So we want to find all models that exactly match the number of specified equipment types.
Edit - HAVING definition
A HAVING clause in SQL specifies that an SQL SELECT statement should only return rows where aggregate values meet the specified conditions. It was added to the SQL language because the WHERE keyword could not be used with aggregate functions.
Related
I have documents that contain an object array. Within that array are pulses in a dataset. For example:
samples: [{"time":1224960,"flow":0,"temp":null},{"time":1224970,"flow":0,"temp":null},
{"time":1224980,"flow":23,"temp":null},{"time":1224990,"flow":44,"temp":null},
{"time":1225000,"flow":66,"temp":null},{"time":1225010,"flow":0,"temp":null},
{"time":1225020,"flow":650,"temp":null},{"time":1225030,"flow":40,"temp":null},
{"time":1225040,"flow":60,"temp":null},{"time":1225050,"flow":0,"temp":null},
{"time":1225060,"flow":0,"temp":null},{"time":1225070,"flow":0,"temp":null},
{"time":1225080,"flow":0,"temp":null},{"time":1225090,"flow":0,"temp":null},
{"time":1225100,"flow":0,"temp":null},{"time":1225110,"flow":67,"temp":null},
{"time":1225120,"flow":23,"temp":null},{"time":1225130,"flow":0,"temp":null},
{"time":1225140,"flow":0,"temp":null},{"time":1225150,"flow":0,"temp":null}]
I would like to construct an aggregate pipeline to act on each collection of consecutive 'samples.flow' values above zero. As in, the sample pulses are delimited by one or more zero flow values. I can use an $unwind stage to flatten the data but I'm at a loss as to how to subsequently group each pulse. I have no objections to this being a multistep process. But I'd rather not have to loop through it in code on the client side. The data will comprise fields from a number of documents and could total in the hundreds of thousands of entries.
From the example above I'd like to be able to extract:
[{"time":1224980,"total_flow":123,"temp":null},
{"time":1225020,"total_flow":750,"temp":null},
{"time":1225110,"total_flow":90,"temp":null}]
or variations thereof.
If you are not looking for specific values to be on the time field, then you can use this pipeline with $bucketAuto.
[
{
"$bucketAuto": {
"groupBy": "$time",
"buckets": 3,
"output": {
total_flow: {
$sum: "$flow"
},
temp: {
$first: "$temp"
},
time: {
"$min": "$time"
}
}
}
},
{
"$project": {
"_id": 0
}
}
]
If you are looking for some specific values for time, then you will need to use $bucket and provide it a boundaries argument with precalculated lower bounds. I think this solution should do your job
The document structure has a round collection, which has an array of holes Objects embedded within it, with each hole played/scored entered.
The structure looks like this (there are more fields, but this summarises):
{
"_id": {
"$oid": "60701a691c071256e4f0d0d6"
},
"schema": {
"$numberDecimal": "1.0"
},
"playerName": "T Woods",
"comp": {
"id": {
"$oid": "607019361c071256e4f0d0d5"
},
"name": "US Open",
"tees": "Pro Tees",
"roundNo": {
"$numberInt": "1"
},
"scoringMethod": "Stableford"
},
"holes": [
{
"holeNo": {
"$numberInt": "1"
},
"holePar": {
"$numberInt": "4"
},
"holeSI": {
"$numberInt": "3"
},
"holeGross": {
"$numberInt": "4"
},
"holeStrokes": {
"$numberInt": "1"
},
"holeNett": {
"$numberInt": "3"
},
"holeGrossPoints": {
"$numberInt": "2"
},
"holeNettPoints": {
"$numberInt": "3"
}
}
]
}
In the Atlas web UI, it shows as (note there are 9 holes in this particular round of golf - limited to 3 for brevity):
I would like to find the players who have a holeGross of 2, or less, somewhere in their round of golf (i.e. a birdie on par 3 or better).
Being new to MongoDB, and NoSQL constructs, I am stuck with this. Reading around the aggregation pipeline framework, I have tried to break down the stages I will need as:
Filter by the comp.id and comp.roundNo
Filter this result with any hole within the holes array of Objects
Maybe I have approached this wrong, and should filter or structure this pipeline differently?
So far, using the Atlas web UI, I can apply these filters individually as:
{
"comp.id": ObjectId("607019361c071256e4f0d0d5"),
"comp.roundNo": 2
}
And:
{ "holes.0.holeGross": 2 }
But I have 2 problems:
The second filter query, I have hard-coded the array index to get this value. I would need to search across all the sub-elements of every document that matches this comp.id && comp.roundNo
How do I combine these? I presuming this is where the aggregation comes in, as well as enumerating across the whole array (as above).
I note in particular it is the extra ".0." part of the second query that I am not seeing from various other online postings trying to do the same thing. Is my data structure incorrect? Do I need the [0]...[17] Objects for an 18-hole round of golf?
I would like to find the players who have a holeGross of 2, or less, somewhere in their round of golf
if that is the goal, a simple $lte search inside the holes array like the following would do:
db.collection.find({ "holes.holeGross": { $lte: 2 } })
you simply have to not specify an array index such as 0 in the property path in order to search each element of the array.
https://mongoplayground.net/p/KhZLnj9mJe5
I stumbled upon a funny behavior in MongoDB:
When I run:
db.getCollection("words").update({ word: { $in: ["nico11"] } }, { $inc: { nbHits: 1 } }, { multi: 1, upsert: 1 })
it will create "nico11" if it doesn't exist, and increase nbHits by 1 (as expected).
However, when I run:
db.getCollection("words").update({ word: { $in: ["nico10", "nico11", "nico12"] } }, { $inc: { nbHits: 1 } }, { multi: 1, upsert: 1 })
it will correctly update the keys that are already in the DB, but not insert the missing ones.
Is that the expected behavior, and is there any way I can provide an array to mongoDB, for it to update the existing elements, and create the ones that need to be created?
That is expected behaviour according to the documentation:
The update creates a base document from the equality clauses in the
parameter, and then applies the update expressions from the
parameter. Comparison operations from the will not be
included in the new document.
And, no, there is no way to achieve what you are attempting to do here using a simple upsert. The reason for that is probably that the expected outcome would be impossible to define. In your specific case it might be possible to argue along the lines of: "oh well, it is kind of obvious what we should be doing here". But imagine a more complex query like this:
db.getCollection("words").update({
a: { $in: ["b", "c" ] },
x: { $in: [ "y", "z" ]}
},
{ $inc: { nbHits: 1 } },
{ multi: 1, upsert: 1 })
What should MongoDB do in this case?
There is, however, the concept of bulk write operations in MongoDB where you would need to define three separate updateOne operations and package them up in a single request to the server.
I want to query the array field from elasticsearch. I have an array field that contains one or several node numbers of a gpu that were allocated to a job. Different people may be using the same node at the same time given that some people may be sharing the same gpu node with others. I want get the total number of distinct nodes that were used at a specific time.
Say I have three rows of data which fall in the same time interval. I want to plot a histogram showing that there are three nodes occupied in that period. Can I achieve this on Kibana?
Example :
[3]
[3,4,5]
[4,5]
I am expecting an output of 3 since there were only 3 distinct nodes used.
Thanks in advance
You can accomplish this using a combination of a date histogram aggregation along with either a terms aggregation (if the exact number of nodes is important) or a cardinality aggregation (if you can accept some inaccuracy at higher cardinalities).
Full example:
# Start with a clean slate
DELETE test-index
# Create the index
PUT test-index
{
"mappings": {
"event": {
"properties": {
"nodes": {
"type": "integer"
},
"timestamp": {
"type": "date"
}
}
}
}
}
# Index a few events (using the rows from your question)
POST test-index/event/_bulk
{"index":{}}
{"timestamp": "2018-06-10T00:00:00Z", "nodes":[3]}
{"index":{}}
{"timestamp": "2018-06-10T00:01:00Z", "nodes":[3,4,5]}
{"index":{}}
{"timestamp": "2018-06-10T00:02:00Z", "nodes":[4,5]}
# STRATEGY 1: Cardinality aggregation (scalable, but potentially inaccurate)
POST test-index/event/_search
{
"size": 0,
"aggs": {
"active_nodes_histo": {
"date_histogram": {
"field": "timestamp",
"interval": "hour"
},
"aggs": {
"active_nodes": {
"cardinality": {
"field": "nodes"
}
}
}
}
}
}
# STRATEGY 2: Terms aggregation (exact, but potentially much more expensive)
POST test-index/event/_search
{
"size": 0,
"aggs": {
"active_nodes_histo": {
"date_histogram": {
"field": "timestamp",
"interval": "hour"
},
"aggs": {
"active_nodes": {
"terms": {
"field": "nodes",
"size": 10
}
}
}
}
}
}
Notes:
Terms vs. cardinality aggregation: Use the cardinality agg unless you need to know WHICH nodes are in use. It is significantly more scalable, and until you get into cardinality of 1000s, you likely won't see any inaccuracy.
Date histogram interval: You can play with the interval such that it's something that makes sense for you. If you run through the example above, you'll only see one histogram bucket, however if you change hour to minute, you'll see the histogram build itself out with more data points.
I am trying use document collection for fast lookup, sample document
document Person {
...
groups: ["admin", "user", "godmode"],
contacts: [
{
label: "main office",
items: [
{ type: "phone", value: '333444222' },
{ type: "phone", value: '555222555' },
{ type: "email", value: 'bob#gmail.com' }
]
}
]
...
}
Create Hash index for "groups" field
Query: For P in Person FILTER "admin" IN P.groups RETURN P
Result: Working, BUT No index used via explain query
Question: How use queries with arrays filter and indexes ? performance is main factor
Create Hash index for "contacts[].items[].value"
Query: For P in Person FILTER "333444222" == P.contacts[*].items[*].value RETURN P
Result: Double usage of wildcard not supported?? Index not used, query empty
Question: How organize fast lookup with for this structure with indexes ?
P.S. also tried MATCHES function, multi lever for-in, hash indexed for arrays never used
ArangoDB version 2.6.8
Indexes can be used from ArangoDB version 2.8 on.
For the first query (FILTER "admin" IN p.groups), an array hash index on field groups[*] will work:
db._create("persons");
db.persons.insert(personDateFromOriginalExample);
db.persons.ensureIndex({ type: "hash", fields: [ "groups[*]" ] });
This type of index does not exist in versions prior to 2.8.
With an array index in place, the query will produce the following execution plan (showing that the index is actually used):
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 1 - FOR p IN persons /* hash index scan */
3 CalculationNode 1 - LET #1 = "admin" in p.`groups` /* simple expression */ /* collections used: p : persons */
4 FilterNode 1 - FILTER #1
5 ReturnNode 1 - RETURN p
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash persons false false 100.00 % [ `groups[*]` ] "admin" in p.`groups`
The second query will not be supported by array indexes, as it contains multiple levels of nesting. The array indexes in 2.8 are restricted to one level, e.g. groups[*] or contacts[*].label will work, but not groups[*].items[*].value.
about 1.) this is Work-in-progress and will be included in one of the next releases (most likely 2.8).
We have not yet decided about the AQL syntax to retrieve the array, but FILTER "admin" IN P.groups is among the most likely ones.
about 2.) having implemented 1. this will work out of the box as well, the index will be able to cover several depths of nesting.
Neither of the above can be properly indexed in the current release (2.6)
The only alternative i can offer is to externalize the values and use edges instead of arrays.
In your code the data would be the following (in arangosh).
I used fixed _key values for simplicity, works without them as well:
db._create("groups"); // saves the group elements
db._create("contacts"); // saves the contact elements
db._ensureHashIndex("value") // Index on contacts.value
db._create("Person"); // You already have this
db._createEdgeCollection("isInGroup"); // Save relation group -> person
db._createEdgeCollection("hasContact"); // Save relation item -> person
db.Person.save({_key: "user"}) // The remainder of the object you posted
// Now the items
db.contacts.save({_key:"phone1", type: "phone", value: '333444222' });
db.contacts.save({_key:"phone2", type: "phone", value: '555222555' });
db.contacts.save({_key:"mail1", type: "email", value: 'bob#gmail.com'});
// And the groups
db.groups.save({_key:"admin"});
db.groups.save({_key:"user"});
db.groups.save({_key:"godmode"});
// Finally the relations
db.hasContact.save({"contacts/phone1", "Person/user", {label: "main office"});
db.hasContact.save({"contacts/phone2", "Person/user", {label: "main office"});
db.hasContact.save({"contacts/mail1", "Person/user", {label: "main office"});
db.isInGroup.save("groups/admin", "Person/user", {});
db.isInGroup.save("groups/godmode", "Person/user", {});
db.isInGroup.save("groups/user", "Person/user", {});
Now you can execute the following queries:
Fetch all admins:
RETURN NEIGHBORS(groups, isInGroup, "admin")
Get all users having a contact with value 333444222:
FOR x IN contacts FILTER x.value == "333444222" RETURN NEIGHBORS(contacts, hasContact, x)