How to use mongodb aggregate functions in Criteria? - spring-data-mongodb

I am working on a project where I use spring data mongodb. I am making a mongo query like this:
Query query = new Query();
Criteria one = Criteria.where(DB_FIELD_1).gt(1);
Criteria two = Criteria.where(DB_FIELD_2).lt(10);
Criteria final = new Criteria().andOperator(one,two);
query.addCriteria(final);
// after making the query, I am executing it using mongoTemplate
Now, I have a Date field with format YYYYMMDD. I would like to check if its month = current month. For example, if the field is 20170501, then the date's month (05) is current month (5, May). How do I extract the month value from date and check this logic along with other criteria above ( one and two )? I know there is $month to extract the month value from date. But, I do not know how to incorporate aggregate function within "Criteria" class. My final result should be something like this:
Criteria final = new Criteria().andOperator(one,two,criteria to check month = current month);
query.addCriteria(final);

You can use below aggregation pipeline with the current spring db version 1.10.2/ Spring 1.5.2 Boot. Include the other fields in the $project that you wish to output.
Use $addFields instead of $project for 3.4 x mongo server version.
AggregationOperation project = Aggregation.project().and(DateOperators.dateOf(DB_FIELD_1).month()).as("field1").and(DateOperators.dateOf(DB_FIELD_2).month()).as("field2");
AggregationOperation match = Aggregation.match(Criteria.where("field1").gt(1)
.and("field2").lt(10));
Aggregation aggregation = Aggregation.newAggregation(project, match);
List<BasicDBObject> results = mongoTemplate.aggregate(aggregation, collectionname, BasicDBObject.class).getMappedResults();
Use for 1.4.1 version
AggregationOperation project = Aggregation.project().and(DB_FIELD_1).extractMonth().as("field1").and(DB_FIELD_2).extractMonth().as("field2");

Related

Django query bases on greater date

I want to know how efficient this filter can be done with django queries. Essentially I have the followig two clases
class Act(models.Model):
Date = models.DateTimeField()
Doc = models.ForeignKey(Doc)
...
class Doc(models.Model):
...
so one Doc can have severals Acts, and for each Doc I want to get the act with the greater Date. I'm only interested in Acts objects.
For example, if a have
act1 = (Date=2021-01-01, Doc=doc1)
act2 = (Date=2021-01-02, Doc=doc1)
act3 = (Date=2021-01-03, Doc=doc2)
act4 = (Date=2021-01-04, Doc=doc2)
act5 = (Date=2021-01-05, Doc=doc2)
I want to get [act2, act5] (the Act with Doc=doc1 with the greater Date and the Act with Doc=doc2 with the greater Date).
My only solution is to make a for over Docs.
Thank you so much
You can do this with one or two queries: the first query will retrieve the latest Act per Doc, and then the second one will then retrieve the acts:
from django.db.models import OuterRef, Subquery
last_acts = Doc.objects.annotate(
latest_act=Subquery(
Act.objects.filter(
Doc_id=OuterRef('pk')
).values('pk').order_by('-Date')[:1]
)
).values('latest_act')
and then we can retrieve the corresponding Acts:
Act.objects.filter(pk__in=last_acts)
depending on the database, it might be more efficient to first retrieve the primary keys, and then make an extra query:
Act.objects.filter(pk__in=list(last_acts))

is it possible to get all documents using criteria on two fields with a max date

I have documents in a 'merge' collection with flat structure and a huge number of 'fields' (more than 100).
Amongst those fields have 'partNumber', and 'date' which are not unique.
I am newbie in mongo, I need to retrieve all documents (and all their fields, without needing to list them explicitely in a project stage), but selecting only the records which have the latest date for a given partNumber (and this for all partNumbers).
Is that possible in mongoDB 3.2 ? What would be the query ?
Many thanks in advance.
yes it is possible, Do you want to know MongoDB query for this thing? Or Are you using any backend programming language ?
After struggling on complex aggregation queries, I found a KISS solution with some code and only 2 queries:
- 1 aggregation query to retrieve couples partNumber, most recent date
db.getCollection('merge').aggregate(
[
{ $group : { _id : "$partNumber", maxdate: { $max: "$date" } } }
]
)
Then in python, a single find query using all tuples parsed from previous step with (partNumber = partNumber1 and date = date1) or (partNumber = partNumber2 and date = date 2) or ...
Performs very fast.

using like clause with peewee.Model.get method

class SomeModel(peewee.Model):
date_time_added = peewee.DateTimeField()
date_time_added row contains datetime in the format "12-02-1982 18:12:22" format
i can get by building the select query
SomeModel.select(SomeModel.date_added).where(SomeModel.date_time_added.startswith("12-02-1982"))
how do I get time with specific date using get method of peewee.Model
SomeModel.get
First off I'd like to strongly advise against using that format. Why? It doesn't sort properly. If you use YYYY-mm-dd HH:MM:SS then you can do sorts and range scans and the results will actually be sensible.
With .get() you would:
obj = SomeModel.get(SomeModel.date_time_added.startswith('12-02-1982'))
Or alternatively:
query = SomeModel.select(SomeModel.date_added).where(SomeModel.date_time_added.startswith('12-02-1982'))
obj = query.get()

MongoDB numeric index

I was wondering if it's possible to create a numeric count index where the first document would be 1 and as new documents are inserted the count would increase. If possible are you also able to apply it to documents imported via mongoimport? I have created and index via db.collection.createIndex( {index : 1} ) but it doesn't seem to be applying.
I would strongly recommend using ObjectId as your _id field. This has the benefit of being a good value for distributed systems, but also based on the date it was created. It also has a built-in index inside MongoDB.
Example using Morphia:
Date d = ...;
QueryImpl<MyClass> query = datastore.createQuery(MyClass);
query.field("_id").greaterThanOrEq(new ObjectId(d));
query.sort("_id");
query.limit(100);
List<MyClass> myDocs = query.asList();
This would fetch all documents created since date d in order of creation.
To load the next batch, change to:
query.field("_id").greaterThan(lastDoc.getId());
This will very efficiently load the next batch based on the ID of the last document from the previous batch.

Django Query: Annotate with `count` of a *window*

I search for a query which is pretty similar to this one. But as an extension, I do not want to count all objects, but just over the ones, that are fairly recent.
In my case, there are two models. Let one be the Source and one be the Data. As result I'd like to get a list of all Sources ordered by the number of data records, that has been collected during the last week.
For me it is not iteresting, how many data records have been collected in total, but if there is a recent activity of that source.
Using the following code snippet from the above link, I cannot make up how to subquery the Data Table before.
from django.db.models import Count
activity_per_source = Source.objects.annotate(count_data_records=Count('Data')) \
.order_by('-count_data_records')
The only ways I came up with, would be to write native SQL or to process this in a loop and individual queries. Is there a Django-Query version?
(I use a MySQL database and Django 1.5.4)
Checkout out the docs on the order of annotate and filter: https://docs.djangoproject.com/en/1.5/topics/db/aggregation/#order-of-annotate-and-filter-clauses
Try something along the lines of:
activity_per_source = Source.objects.\
filter(data__date__gte=one_week_ago).\
annotate(count_data_records=Count('Data')).\
order_by('-count_data_records').distinct()
There is a way of doing that mixing Django queries with SQL via extra:
start_date = datetime.date.today() - 7
activity_per_source = (
Source.objects
.extra(where=["(select max(date) from app_data where source_id=app_source.id) >= '%s'"
% start_date.strftime('%Y-%m-%d')])
.annotate(count_data_records=Count('Data'))
.order_by('-count_data_records'))
The where part will filter the Sources by its Data last date.
Note: replace table and field names with actual ones.

Resources