I have a document for software that contain these fields _id, category, brand etc. There is a price field which is of type string. Some documents have invalid prices or are null. I want to use an aggregation pipeline so that the price is >=4 and <=8 and convert the price to double. There is also a date field that I want to be >=10. I also want to use $out to create a new collection of this document. I have done this so far, I was wondering if someone could let me know how I can retrieve the documents but I don't want to lose or change the other fields only the Price and date.
db.sw.aggregate([{$match: {}},
{$project: {priceLen: {"$strLenCP": "$price"}}},
{"$match": {priceLen: {"$gte": 4, "$lte": 8}}},
{$project: {price: {$trim: {input: "$price", chars: "$"}}}},
{$project: {price: {$toDouble: "$price"}}}])
my thought process for the $match was to retrieve all the fields. Any help will be really appreciated.
No idea what your requirements are in terms of being "correct".
$project removed all fields (apart from _id) and populates the given fields. If you like to keep existing fields use $set or the alias $addFields which names the actual operation.
Related
please help me to know the difference in place of $project an $addFields if was giving $addFields or $project it was also giving the same output let me know what is the difference between $project and $addFields
$addFields adds new fields to documents. $addFields outputs documents that contain all existing fields from the input documents and newly added fields.
The $addFields stage is equivalent to a $project stage that explicitly specifies all existing fields in the input documents and adds the new fields.
$addFields has the following form:
{ $addFields: { <newField>: <expression>, ... } }
I have been searching through the MongoDB query syntax with various combinations of terms to see if I can find the right syntax for the type of query I want to create.
We have a collection containing documents with an array field. This array field contains ids of items associated with the document.
I want to be able to check if an item has been associated more than once. If it has then more than one document will have the id element present in its array field.
I don't know in advance the id(s) to check for as I don't know which items are associated more than once. I am trying to detect this. It would be comparatively straightforward to query for all documents with a specific value in their array field.
What I need is some query that can return all the documents where one of the elements of its array field is also present in the array field of a different document.
I don't know how to do this. In SQL it might have been possible with subqueries. In Mongo Query Language I don't know how to do this or even if it can be done.
You can use $lookup to self join the rows and output the document when there is a match and $project with exclusion to drop the joined field in 3.6 mongo version.
$push with [] array non equality match to output document where there is matching document.
db.col.aggregate([
{"$unwind":"$array"},
{"$lookup":{
"from":col,
"localField":"array",
"foreignField":"array",
"as":"jarray"
}},
{"$group":{
"_id":"$_id",
"fieldOne":{"$first":"$fieldOne"},
... other fields
"jarray":{"$push":"$jarray"}
}},
{"$match":{"jarray":{"$ne":[]}}},
{"$project":{"jarray":0}}
])
I have a records collection with the following indexes:
{"_id":1}
{"car.make":1,"city":1,"car.mileage":1}
And performing the following query:
db.records.aggregate([
{
"$match":{
"car.mileage":{"$in":[1000,2000,3000,4000]},
"car.make":"Honda",
"city":{"$in":["Miami","San Francisco","New York", "Chigaco", "Seattle", "Boston"]}
}
},
{
"$sort":{"_id":-1}
}
])
The query without the $sort clause finished in a few milliseconds but adding the $sort clause makes it takes around 2 minutes. This query should return around 40 documents from a collection of 6m documents. Any clues about what could cause this huge difference in query time?
After additional testing, this problem goes away by sorting on a different field like creation_date even if creation_date is not indexed. Any ideas why the _id field would perform so much worse than the unindexed creation_date field in this aggregation?
I ran into the same problem today and I'm speculating here but I believe in this case sorting by _id sorts all the entries in the collection before other operations (I said speculating because if you omit the $match clause and keep only the $sort clause even then you get your data in milliseconds).
The workaround that helped me was projection.
If you use a $project clause between $match and $sort then you will get your data in milliseconds again. So you can either use fields like creation_date or if you must use _id then use $project before sorting it.
I have Solr documents with a multi-valued field, and need the distinct values from it. I have to filter by a different field, but my result doesn't have to incklude anything other than the distinct categories.
Documents:
{CountryCode: 'US', Product:'A', Categories:[1,2,3]},
{CountryCode: 'US', Product:'B', Categories:[1,3,77,88]},
{CountryCode: 'JP', Product:'B', Categories:[1,2]}
{CountryCode: 'JP', Product:'B', Categories:[444,555]}
Filter for only CountryCode = 'US'
Result:
{[1,2,3,77,88]}
I tried field collapsing/grouping, but it doesn't work on multi-valued fields.
I tried terms(thanks to suggestion by Persimmonium), but it doesn't want to filter only the 'US' categories. The fact that terms gave how many times a category occurs is a bonus, but not required in this case.
Any suggestions?
Edited after your comment.
One way to achieve this is with:
a fq to get the set of docs you are interested in
then facet on Categories, setting 'limit' high enough to get all values
A fancier way might be usingStreaming Expressions. But faceting is just simpler.
I was wondering if it is possible to sort by the order that you request documents from SOLR. I am running a In based query and would just like SOLR to return them based on the order that I ask.
In (4,2,3,1) should return me documents ordered 4,2,3,1.
Thanks.
You need Sorting in solr, to order them by field.
I assume that "In based query" means something like: fetch docs whose fieldx has values in (val1,val2). You can a field as multi-valued field and facet on that field. A facet query is a 'is in' search, out of the box (so to say) and it can do more sophisticated searches too.
Edited on OP's query:
Updating a document with a multi-valued field in JSON here. See the line
"my_multivalued_field": [ "aaa", "bbb" ] /* use an array for a multi-valued field */
As for doing a facet query, check this.
You need to do one or more fq statements:
&fq=field1:[400 to 500]
&fq=field2:johnson,thompson
Also do read up on the fact (in link above) that you need to facet on stored rather than indexed fields.
You can easily apply sorting with QueryOptions and field sort (ExtraParams property - I am sorting by savedate field, descending):
var results = _solr.Query(textQuery,
new QueryOptions
{
Highlight = new HighlightingParameters
{
Fields = new[] { "*" },
},
ExtraParams = new Dictionary<string, string>
{
{"fq", dateQuery},
{"sort", "savedate desc"}
}
});