is it possible to get all documents using criteria on two fields with a max date - database

I have documents in a 'merge' collection with flat structure and a huge number of 'fields' (more than 100).
Amongst those fields have 'partNumber', and 'date' which are not unique.
I am newbie in mongo, I need to retrieve all documents (and all their fields, without needing to list them explicitely in a project stage), but selecting only the records which have the latest date for a given partNumber (and this for all partNumbers).
Is that possible in mongoDB 3.2 ? What would be the query ?
Many thanks in advance.

yes it is possible, Do you want to know MongoDB query for this thing? Or Are you using any backend programming language ?

After struggling on complex aggregation queries, I found a KISS solution with some code and only 2 queries:
- 1 aggregation query to retrieve couples partNumber, most recent date
db.getCollection('merge').aggregate(
[
{ $group : { _id : "$partNumber", maxdate: { $max: "$date" } } }
]
)
Then in python, a single find query using all tuples parsed from previous step with (partNumber = partNumber1 and date = date1) or (partNumber = partNumber2 and date = date 2) or ...
Performs very fast.

Related

How to use mongodb aggregate functions in Criteria?

I am working on a project where I use spring data mongodb. I am making a mongo query like this:
Query query = new Query();
Criteria one = Criteria.where(DB_FIELD_1).gt(1);
Criteria two = Criteria.where(DB_FIELD_2).lt(10);
Criteria final = new Criteria().andOperator(one,two);
query.addCriteria(final);
// after making the query, I am executing it using mongoTemplate
Now, I have a Date field with format YYYYMMDD. I would like to check if its month = current month. For example, if the field is 20170501, then the date's month (05) is current month (5, May). How do I extract the month value from date and check this logic along with other criteria above ( one and two )? I know there is $month to extract the month value from date. But, I do not know how to incorporate aggregate function within "Criteria" class. My final result should be something like this:
Criteria final = new Criteria().andOperator(one,two,criteria to check month = current month);
query.addCriteria(final);
You can use below aggregation pipeline with the current spring db version 1.10.2/ Spring 1.5.2 Boot. Include the other fields in the $project that you wish to output.
Use $addFields instead of $project for 3.4 x mongo server version.
AggregationOperation project = Aggregation.project().and(DateOperators.dateOf(DB_FIELD_1).month()).as("field1").and(DateOperators.dateOf(DB_FIELD_2).month()).as("field2");
AggregationOperation match = Aggregation.match(Criteria.where("field1").gt(1)
.and("field2").lt(10));
Aggregation aggregation = Aggregation.newAggregation(project, match);
List<BasicDBObject> results = mongoTemplate.aggregate(aggregation, collectionname, BasicDBObject.class).getMappedResults();
Use for 1.4.1 version
AggregationOperation project = Aggregation.project().and(DB_FIELD_1).extractMonth().as("field1").and(DB_FIELD_2).extractMonth().as("field2");

SOLR - Result grouping with Math on two fields

I have a requirement in which I have a json structure like this:
{
"name":"xyz",
"parent_id":123,
"event_date":"1972-05-20T17:33:18.772Z"
}
{
"name":"abc",
"parent_id":123,
"event_date":"1973-05-20T17:33:18.772Z"
}
I want the count of unique parent ids if the difference between the event dates is within X years(/months/days ) . Given this example , given a gap of 1 year the count will be 1
Look into Streaming Aggregations if you're running Solr 6.x. You'd probably
Create two separate query streams ("bornEvents", and "marriedEvents")
Merge them on "name".
Reduce them using (married-born: "ageAtMarriage")
Filter out records where "ageAtMarriage" > ...
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
http://joelsolr.blogspot.com/2015/04/solrjio-computing-complement-of-two.html?m=1

Getting the latest (timestamp wise) value from cloudant query

I have a cloudant DB where each document looks like:
{
"_id": "2015-11-20_attr_00",
"key": "attr",
"value": "00",
"employeeCount": 12,
"timestamp": "2015-11-20T18:16:05.366Z",
"epocTimestampMillis": 1448043365366,
"docType": "attrCounts"
}
For a given attribute there is an employee count. As you can see I have a record for the same attribute every day. I am trying to create a view or index that will give me the latest record for this attribute. Meaning if I inserted a record on 2015-10-30 and another on 2015-11-10, then the one that is returned to me is just employee count for the record with timestamp 2015-11-10.
I have tried view, but I am getting all the entries for each attribute not just the latest. I did not look at indexes because I thought they do not get pre calculated. I will be querying this from client side, so having it pre calculated (like views are) is important.
Any guidance would be most appreciated. thank you
I created a test database you can see here. Just make sure your when you insert your JSON document into Cloudant (or CouchDB), your timestamps are not strings but JavaScript data objects:
https://examples.cloudant.com/latestdocs/_all_docs?include_docs=true
I built a search index like this (name the design doc "summary" and the search index "latest"):
function (doc) {
if ( doc.docType == "totalEmployeeCounts" && doc.key == "div") {
index("division", doc.value, {"store": true});
index("timestamp", doc.timestamp, {"store": true});
}
}
Then here's a query that will return only the latest record for each division. Note that the limit value will apply to each group, so with limit=1, if there are 4 groups you will get 4 documents not 1.
https://examples.cloudant.com/latestdocs/_design/summary/_search/latest?q=*:*&limit=1&group_field=division&include_docs=true&sort_field=-timestamp
Indexing TimeStamp as a string is not recommended.
Reference:
https://cloudant.com/blog/defensive-coding-in-mapindex-functions/#.VvRVxtIrJaT
I have the same problem. I converted the timestamp value to milliseconds (number) and then indexed that value.
var millis= Date.parse(timestamp);
index("millis",millis,{"store": false});
You can use the same query as Raj suggested but with the 'millis' field instead of the timestamp .

Django Query: Annotate with `count` of a *window*

I search for a query which is pretty similar to this one. But as an extension, I do not want to count all objects, but just over the ones, that are fairly recent.
In my case, there are two models. Let one be the Source and one be the Data. As result I'd like to get a list of all Sources ordered by the number of data records, that has been collected during the last week.
For me it is not iteresting, how many data records have been collected in total, but if there is a recent activity of that source.
Using the following code snippet from the above link, I cannot make up how to subquery the Data Table before.
from django.db.models import Count
activity_per_source = Source.objects.annotate(count_data_records=Count('Data')) \
.order_by('-count_data_records')
The only ways I came up with, would be to write native SQL or to process this in a loop and individual queries. Is there a Django-Query version?
(I use a MySQL database and Django 1.5.4)
Checkout out the docs on the order of annotate and filter: https://docs.djangoproject.com/en/1.5/topics/db/aggregation/#order-of-annotate-and-filter-clauses
Try something along the lines of:
activity_per_source = Source.objects.\
filter(data__date__gte=one_week_ago).\
annotate(count_data_records=Count('Data')).\
order_by('-count_data_records').distinct()
There is a way of doing that mixing Django queries with SQL via extra:
start_date = datetime.date.today() - 7
activity_per_source = (
Source.objects
.extra(where=["(select max(date) from app_data where source_id=app_source.id) >= '%s'"
% start_date.strftime('%Y-%m-%d')])
.annotate(count_data_records=Count('Data'))
.order_by('-count_data_records'))
The where part will filter the Sources by its Data last date.
Note: replace table and field names with actual ones.

Solrnet grouping - can I get n results per group

I am using solrnet 0.40 grouping functionality.
And I am grouping on a single field (say filename).
But in the results I would like to display multiple hits for the group (filename).
FileName-1
hit-1, hit-2....hit-n
FileName-2
hit-1, hit-2....
and so on....
Is there any way grouping gives me the functionality to get results clustered other than the obvious way of running a secondary query for each group?
TIA
Just needed to set the grouping params to desired value. For example, have set it to 10 to get 10 results per group.
Grouping = new GroupingParameters()
{
Fields = new [] { "manu_exact" },
Format = GroupingFormat.Grouped,
Limit = 10,
}

Resources