TypeError: can't compare datetime.date to DateProperty - google-app-engine

I am trying to query if a certain date belongs to a specific range of dates. Source code example:
billing_period_found = BillingPeriod.query(
ndb.AND(
transaction.date > BillingPeriod.start_date,
transaction.date < BillingPeriod.end_date)
).get()
Data definition:
class Transaction(ndb.Model):
date = ndb.DateProperty(required=False)
class BillingPeriod(ndb.Model):
start_date = ndb.DateProperty(required=False)
end_date = ndb.DateProperty(required=False)
Getting the following error:
TypeError: can't compare datetime.date to DateProperty
The message error does make sense because datetime is different from DateProperty. However, as you can see, the definition for transaction.date is not datetime, so I am not getting where this attempt to convert datetime to date is coming from. Anyways - If I figure out how to convert datetime to DateProperty, I guess it would fix the problem.
Any ideas on how to solve this?
Thanks!

The App Engine datastore does not allow queries with inequalities on multiple properties (not a limitation of ndb, but of the underlying datastore). Selecting date-range entities that contain a certain date is a typical example of tasks that this makes it impossible to achieve in a single query.
Check out Optimizing a inequality query in ndb over two properties for an example of this question, and, in the answer, one suggestion that might work: query for (in your case) all BillingPeriod entities with end_date greater than the desired date, perhaps with a projection to just get their key and start_date; then, select out of those only those with start_date less than the desired date, in your own application (if you only want one of them, then a next over the iterator will stop as soon as it finds one).
Edit: the issue above is problem #1 with this code; once solved, problem #2 arises -- as clearly listed at https://cloud.google.com/appengine/docs/python/ndb/queries, the property is ndb queries is always on the left of the comparison operator. So, one can't do date < BillingPeriod.end_date, as that would have the property on the right; rather, one does BillingPeriod.end_date > date.

Related

Mongodb has a bug on "$lte" (query or aggregation) while searching for dates ranges

Scenario:
I have a db hosted on MongoDb Atlas.
This db has a collection which, among other data, has a created field of type Date.
pseudoCode Schema:
... {
created: { type: Date }
}...
I want to perform a query that allows me to find all the objects existing in the collection which have a created value between specifics days including the boundary dates.
Let's assume that the date range is 2020-08-01 and 2020-08-31, the query would be
{created: {'$gte': new Date('2020-08-01'), '$lte': new Date('2020-08-31')}}
right?
Wrong.
By doing the query this way, I only get results that are greater than or equal to "2020-08-01" and lower than "2020-08-31". Meaning that, even if I'm performing an $lte query, I always get the $lt results.
Tests I did
I've tested this only for a type Date field atm on different collections and having consistently the same issue. Didn't have time for further investigations on different data types yet.
I've tested it on aggregation $match pipelines and find queries on:
my codebase
a clean script that just does this operations
directly on MongoDb Compass
In all 3 cases, the results are consisent with the problem exposed and confirm the problem.
Quick fix
Simply use $lt instead of $lte and always consider 1 day more than you intended.
Using the previous example, the query will become
{created: {'$gte': new Date('2020-08-01'), '$lt': new Date('2020-09-01')}}
and in this case, I'm getting the expected date range "2020-08-01" - "2020-08-31" results.
Note that I could have also used $lte and I would get the exact same results however, the $lt form is logically more correct for whom is reading the code.
Why I'm posting this
I did find few people have posted about this issue across the years, more relevant links are this GitHub issue (initially the dev believed the problem was with mongoose, then a solution proposed to check the schema but that's not the issue since in my case the schema is properly defined and I've tested it on Compass directly) and this google group discussion (the problem is poorly stated and received no answer).
But I did not find a solution.
Even though I've quick fixed the issue, I wanted to point it out better and understand if:
I'm doing something wrong and this is the expected behavior
there is something I'm doing wrong in my query
there is a problem with $lte which need to be addressed properly
Who has ideas?
When you run new Date('2020-08-01') then the result is actually ISODate("2020-08-01T00:00:00Z")
So
{created: {'$gte': new Date('2020-08-01'), '$lte': new Date('2020-08-31')}}
becomes
{created: {'$gte': ISODate("2020-08-01T00:00:00Z"), '$lte': ISODate("2020-08-31T00:00:00Z")}}
i.e. day 2020-08-31 is not included. You may also consider time zones if data was inserted as local time and thus not stored as 2020-08-02T00:00:00Z but 2020-08-02T02:00:00Z for example.
One solution is to add one day and use $lt:
{created: {'$gte': new Date('2020-08-01'), '$lt': new Date('2020-09-01')}}
or you can use Moment.js like this:
{created: {'$gte': new Date('2020-08-01'), '$lte': moment.utc('2020-08-31').endOf('day').toDate()}}
or perhaps moment.utc('2020-08-01').endOf('month').toDate()
Quick fix
Simply use $lt instead of $lte and always consider 1 day more than you intended.
Using the previous example, the query will become
{created: {'$gte': new Date('2020-08-01'), '$lt': new Date('2020-09-01')}}
and in this case, I'm getting the expected date range "2020-08-01" - "2020-08-31" results.
Note that I could have also used $lte and I would get the exact same results however, the $lt form is logically more correct for whom is reading the code.

How to detect concurrent dates

I need an algorithm which looks simple, but I still can't think about a well optimised way to do to do this.
I have the following json object:
[
{
"start": "2000-01-01T04:00:00.000Z",
"end": "2020-01-01T08:00:00.000Z"
}, {
"start": "2000-01-01T05:00:00.000Z",
"end": "2020-01-01T07:00:00.000Z"
}
]
As you can see, the second object is inside the range of the first. I need to iterate over this array and return which dates are conflicting.
My project is in ruby on rails right now, but I just need an idea how to implement the algorithm so, any high level programming language would be good.
Any ideas?
First, we can transform the list of hashes to parse the dates into Date objects:
require 'date'
dates = input.map do |hsh|
hsh.transform_values { |str| Date.parse str }
end
Now we can use a nested loop and use Range#cover? to find if there are duplicates:
conflicting = dates.select.with_index do |date, idx|
[date[:start], date[:end]].any? do |date_to_compare|
dates.map.with_index.any? do |date2, idx2|
next if idx == idx2 # so we don't compare to self
(date2[:start]..date2[:end]).cover?(date_to_compare)
end
end
end
Detect a DateTime Object Covered By a Range
There may be a more elegant way to do this, but this seems relatively straightforward to me. The trick is to convert your Hash values into DateTime ranges that can take advantage of the built-in Range#cover? method.
Consider the following:
require 'date'
dates = [
{:start=>"2000-01-01T04:00:00.000Z", :end=>"2020-01-01T08:00:00.000Z"},
{:start=>"2000-01-01T05:00:00.000Z", :end=>"2020-01-01T07:00:00.000Z"},
]
# convert your date hashes into an array of date ranges
date_ranges = dates.map { |hash| hash.values}.map do |array|
(DateTime.parse(array.first) .. DateTime.parse(array.last))
end
# compare sets of dates; report when the first covers the second range
date_ranges.each_slice(2) do |range1, range2|
puts "#{range1} covers #{range2}" if range1.cover? range2
end
Because Range#cover? is Boolean, you might prefer to simply store dates which are covered and do something with them later, rather than taking immediate action on each one. In that case, just use Array#select. For example:
date_ranges.each_slice(2).select { |r1, r2| r1.cover? r2 }
Shove the data into a database using BTREE index on the date fields. Let the DB do the work for you.
Lets say we have the following table:
TABLE myDate {
id BIGINT UNSIGNED, date_start DATETIME, date_end DATETIME
}
Then you want BTREE (or BTREE+) index on date_start and date_end, and HASH index on id.
Once these are in place, feed your table the data, and perform the following select statement to find times that overlap:
-- Query to select dates that are fully contained such as in the example (l contains r):
SELECT l.id, l.date_start, l.date_end, r.id, r.date_start, r.date_end
FROM myDate l JOIN myDate r ON (l.date_start < r.date_start) AND (l.date_end > r.date_end);
-- Query to select dates that overlap on one side:
SELECT l.id, l.date_start, l.date_end, r.id, r.date_start, r.date_end
FROM myDate l JOIN myDate r ON ((l.date_start < r.date_start) AND (l.date_end > r.date_start)) OR ((l.date_start > r.date_start) AND (l.date_end < r.date_start));
Those strings look like ISO 8601 format. You should be able to easily parse that into a Date/DateTime/orsimilar object. Check the docs about those classes, it will be shown there show you cn do that. Then, after parsing into objects, you should be able to compare those date objects simply with </<=/>=/> operators. With this you will be able to compare starts/ends, and you will be able to determine if a date X is:
(a) fully before the other one
(b) startsbefore and ends within the other one
(c) fully within the other one
(d) startswithin and ends after the other one
(e) fully after the other one
(f) is longer and fully contains the other one
I think that's all possibilities, but you better double-check that. Draw them all on time axis if needed and see if there are any other possibilities.
When you have code that can do this classification, you're good to go and implement rest of the logic that bases on that.
but I still can't think about a well optimised way
don't. Write it first in any way, just to get it working and reliable. Understand the problem from the beginning to the end, thoroughly. Then measure its speed and quality. If it's not good, then write a v2 version based on a first-whatever-guess regarding speed/quality observations. Measure and compare. If it's still not good, then collect code, data sets, measurements, make sure test cases and measurements are repeatable by readers that don't have your computer&network&passwords&etc, and then explain the problem and about how to fix/optimize that. Without all of this, asking about "optimization"*) mostly leads to pure guessing.
*) OFC assuming that "well optimized way" wasn't an empty buzzword, but a real question re performance

Solr dateRangeField fetch exact data considering only dates and without considering time

1) In solr 6,I have dateRange fieldType
<fieldType name="daterange" class="solr.DateRangeField"/>
2) This contains availableDateRange of a hotel lodge rooms.
"availableDateRange":["[2017-01-01T00:00:00Z TO 2017-12-31T00:00:00Z]"],
3) When I query to get all Available hotel rooms for whole 2017 year with following query
fq={!field f=dateRange op=Contains}[2017 TO 2017]
4) I am getting only rooms which are also available on 2018-01-01, I mean I am getting only records which has dateRange endDate+1
[2017-01-01T00:00:00Z TO 2018-01-01T00:00:00Z]
I know This problem can be solved if I can save end date as
"availableDateRange":["[2017-01-01T00:00:00Z TO 2017-12-31T23:59:59Z]"],
OR change the endDate in query itself, i.e. EndDate-1
due to some reason I can not do above two approach
Does anyone knows any settings or any change in query (#3 in my question) so I will get following record.
"availableDateRange":["[2017-01-01T00:00:00Z TO 2017-12-31T00:00:00Z]"],
Instead of operation Contains, you should use Within, which will do the trick for you.
I did the query: fq={!field f=date op=Within}[2017 TO 2017] and it returns to me document with range [2017-01-01T00:00:00Z TO 2017-12-31T00:00:00Z] and with range [2017-01-01T00:00:00Z TO 2017-12-31T23:59:59Z]
Full source of the example is here.

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks
Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn
A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

SQL Query Notifications and GetDate()

I am currently working on a query that is registered for Query Notifications. In accordance w/ the rules of Notification Serivces, I can only use Deterministic functions in my queries set up for subscription. However, GetDate() (and almost any other means that I can think of) are non-deterministic. Whenever I pull my data, I would like to be able to limit the result set to only relevant records, which is determined by the current day.
Does anyone know of a work around that I could use that would allow me to use the current date to filter my results but not invalidate the query for query notifications?
Example Code:
SELECT fcDate as RecordDate, fcYear as FiscalYear, fcPeriod as FiscalPeriod, fcFiscalWeek as FiscalWeek, fcIsPeriodEndDate as IsPeriodEnd, fcPeriodWeek as WeekOfPeriod
FROM dbo.bFiscalCalendar
WHERE fcDate >= GetDate() -- This line invalidates the query for notification...
Other thoughts:
We have an application controls table in our database that we use to store application level settings. I had thought to write a small script that keeps a record up to date w/ teh current smalldatetime. However, my join to this table is failing for notificaiton as well and I am not sure why. I surmise that it has something to do w/ me specifitying a text type (the column name), which is frustrating.
Example Code 2:
SELECT fcDate as RecordDate, fcYear as FiscalYear, fcPeriod as FiscalPeriod, fcFiscalWeek as FiscalWeek, fcIsPeriodEndDate as IsPeriodEnd, fcPeriodWeek as WeekOfPeriod
FROM dbo.bFiscalCalendar
INNER JOIN dbo.xApplicationControls ON fcDate >= acValue AND acName = N'Cache_CurrentDate'
Does anyone have any suggestions?
EDIT: Here is a link on MSDN that gives the rules for Notification Services
As it turns out, I figured out the solution. Basically, I was invalidating my query attempts because I was casting a value as a DateTime which marks it as Non-Deterministic. Even though you don't specifically call out a cast but do something akin to:
RecordDate = 'date_string_value'
You still end up w/ a Date Cast. Hopefully this will help out someone else who hits this issue.
This link helped me quite a bit.
http://msdn.microsoft.com/en-us/library/ms178091.aspx
A good way to bypass this is simply to create a view that just says "SELECT GetDate() AS Now", then use the view in your query.
EDIT : I see nothing about not using user-defined functions (which is what I've used the 'view today' bit in). So can you use a UDF in the query that points at the view?

Resources