Microsoft Recognizer Issue - text-parsing

A couple of years ago I wrote a proof of concept function to date strings to calculate the end date from the string. This never made it to production for a number of reasons but all the functions worked. For example the string
999 years less 1 day from 16 August 2000
should evaluate to 15th August 2999. I wrote this function using the Microsoft Recognizers library (V1.3.2) and it all worked.
var culture = Culture.MapToNearestLanguage("en-gb");
DateTimeOptions dateTimeOptions;
dateTimeOptions = DateTimeOptions.ExtendedTypes;
var results = DateTimeRecognizer.RecognizeDateTime(leaseTermPhrase, culture, dateTimeOptions);
results would return 3 elements derived from the string, a duration of 999 years, a duration of 1 day and a date of 16 August 2000, and using that data I was able to calculate the end date (I made an assumption that any second duration should be taken away from the end date).
I am now revisiting this code as the need for it has come back into scope. I updated the Recognizer packages to the latest version (1.8.2) and now the unit test that tests this particular format of string fails (other string formats still pass).
Upon investigation I find that the results variable now only contains two parts; a duration of 999 years and a Date 0f 15 August 2000. So the new library is parsing '1 day from 16 August 2000' as a single date entity instead of two separate ones (a duration and a date)
Does anyone know if it is possible to get the Recognizer to produce the same results as previously?

Related

Can anyone help me identify this timestamp format?

I am importing information from an Oracle database on an AIX machine into SQL Server 2008r2. I inherited this process from the previous DBA. The timestamp comes in the following format: 4170180534, which, based on the conversion function in the executable, converts to the following:
417 = year (2017)
018 = days since beginning of year (018 converts to Jan 18)
0534 = time HH:mm
I need to provide maintenance on the conversion function (the previous DBA retired in 2016, so the date conversion function only works through the end of 2016).
Can anyone tell me exactly what this timestamp format is? I assume the '4' stands for the century, but it would be nice to know for sure what the first digit of the value actually is.
4should stand for weeks since start of year
format for that would be
(weeks since 1st jan, 2last digits of year, days since 1st jan, hours, minutes)
WW IY DDD HH MI

Stata: Convert String to Date

I am relatively new to Stata. I have a string variable time that records year and month in the following format:
2000m1
2000m2
2000m3
...
2015m12
I would first like to create a date variable that looks identical (but it has to be in the date format) to the above. Second, I would like to separate year and month components into two different variables, and third, I would like to rename the month component to January, February, etc.
For the first task, the command date = date(time, "YM") returns an empty variable and I can't figure what I am doing wrong.
The function date() yields daily dates, not monthly dates or any other kind of date that isn't a daily date. See its help (help date()) which begins
date(s1,s2[,Y])
Description: the e_d date (days since 01jan1960) corresponding to s1
based on s2 and Y
s1 contains the date, recorded as a string, in virtually
any format. Months can be spelled out, abbreviated (to
three characters), or indicated as numbers; years can
include or exclude the century; blanks and punctuation are
allowed.
s2 is any permutation of M, D, and [##]Y, with their order
defining the order that month, day, and year occur in s1.
##, if specified, indicates the default century for
two-digit years in s1. For instance, s2="MD19Y" would
translate s1="11/15/91" as 15nov1991.
In essence, it needs to be told a day, month and year. You supplied a month and year, and date() won't (can't) play.
As documented at the same place, daily() is a synonym for the same function and it's good practice to use it to remind yourself (and readers of your code) of what it does.
Correspondingly, monthly() provides an easier solution to create a monthly date from string input than in your own answer. Try out solutions using display (di is allowed) on simple cases where you know the right answer.
. di monthly("2000m1", "YM")
480
. di %tm monthly("2000m1", "YM")
2000m1
Reading the documentation is crucial here. See help datetime for a start. There is a lot to explain as dates come in many different forms, but it's all documented.
See also help datetime_display_formats for how to display dates differently. (No "renaming" is involved here.) For example,
. di %tmMonth_CCYY monthly("2000m1", "YM")
January 2000
I figured out the first part. I post the answer to here for anybody who needs a reference:
gen date = ym(real(substr(time, 1,4)),real(substr(time,6,2)))
format date %tm
Try this code may be it works for you
string Date1 = String.Format("{0}", Request.Form["date"]);
string Date2 = String.Format("{0}", Request.Form["date1"]);
Date1 = DateTime.Parse(Date1).ToString("yyyy-MM-dd");
Date2 = DateTime.Parse(Date2).ToString("yyyy-MM-dd");

Why the WinForms DateTimePicker's supports maximum of 12/31/9998 23:59:59, instead of 12/31/9999 23:59:59

Does anyone knows why the control does not support higher values like 12/31/9999? I am looking for the particular reason for this.
From the DateTimePicker.cs source code file, as visible at the ReferenceSource site:
[Browsable(false), EditorBrowsable(EditorBrowsableState.Never)]
public static readonly DateTime MaxDateTime = new DateTime(9998, 12, 31);
[Browsable(false), EditorBrowsable(EditorBrowsableState.Never)]
public static readonly DateTime MinDateTime = new DateTime(1753, 1, 1);
These limits are checked in the Value property setter before it pinvokes the native control to set the date.
Not 100% sure where these limits came from. They do follow the common pattern, a programmer taking a shortcut to avoid dealing with an awkward problem. Some common examples:
COM dates can't go lower than 1900. That was a shortcut taken by a Lotus programmer, working on the once dominant spreadsheet program called "123". He didn't deal with the year 1900 not being a leap year. Microsoft had to copy the bug in Excel to keep it compatible with Lotus spreadsheets.
The year 1753, as used in DateTimePicker.MinDateTime was a shortcut taken by a Sybase programmer, the company that started SQL Server. That's the year that England switched from the Julian to the Gregorian calender. Which caused 15 days to get lost, the amount by which Julian dates drifted by not properly handling leap years. Not having to deal with invalid dates was obviously desirable. Putting that limit into DTP avoids data-binding problems.
DateTime.MinDate being the year 0 is a shortcut for not having to deal with negative DateTime.Tick values.
DateTime.MaxDate being the year 10,000 was a shortcut around a problem with TimeSpan.TotalMilliseconds. Which returns a double, a value type that has up to 15 significant digits. Going beyond 10,000 requires more digits.
Which inspires an explanation for the year 9998, there are plenty of icky problems getting close to DateTime.MaxDate. For example, SQL Server conks out at 3 milliseconds before midnight, .NET at 100 nanoseconds before midnight. DateTimePicker uses local time which can cause MaxDate to be exceeded in various timezones throughout the day of December 31st. So the Microsoft programmer did what most any other programmer did before him, he took a shortcut:
public static DateTime MaximumDateTime {
get {
DateTime maxSupportedDateTime = CultureInfo.CurrentCulture.Calendar.MaxSupportedDateTime;
if (maxSupportedDateTime.Year > MaxDateTime.Year)
{
return MaxDateTime;
}
return maxSupportedDateTime;
}
}
This is of course never a real problem, it is not meaningful to handle dates that far into the future. Use the MaximumDateTime property if you need some kind of validity check in your own code.
This is another non-answer, but maybe it'll be useful to someone. Following the WINAPI route in my comment, SYSTEMTIME is limited to dates between 1601 and 30827 because it is based on a FILETIME structure, which stores time as a 64-bit count of 100ns ticks since #1/1/1601#. It further only allows values less than 0x8000000000000000, which results in the year 30827 upper limit.
The .NET DateTimePicker control is based on the WINAPI Date and Time Picker control, so it makes sense that it would have at least these limits. The documentation mentions the switch from the Julian to the Gregorian calendar in 1753, which may explain the #1/1/1753# limit coded into the .NET control.
That may help to explain the lower limit, but still doesn't explain the upper limit. Unless someone from the development team chimes in, the only answer to "why?" may be "because it's hardcoded that way".
{Edit: the justification for the 1601 date for SYSTEMTIME appears to be that it was the previous start of a 400-year cycle in the proleptic Gregorian calendar. Still doesn't help explain #12/31/9998#.}
This isn't a proper answer (though I'm still trying to dig). Looking at the source there is a field defined in the DateTimePicker:
[Browsable(false)]
[EditorBrowsable(EditorBrowsableState.Never)]
public static readonly DateTime MaxDateTime = new DateTime(9998, 12, 31);
In your question you ask why it is "12/31/9998 23:59:59" when in fact it is only "12/31/9998 00:00:00", which is almost more peculiar.
I then searched for usages of this field. It seems to be used only as an absolute bound for the date time picker and nothing else. In fact, the MaximumDateTime property looks like this:
public static DateTime MaximumDateTime
{
get
{
DateTime supportedDateTime = CultureInfo.CurrentCulture.Calendar.MaxSupportedDateTime;
if (supportedDateTime.Year > DateTimePicker.MaxDateTime.Year)
return DateTimePicker.MaxDateTime;
else
return supportedDateTime;
}
}
So the maximum date is actually defined by either the current culture's defined maximum supported date, or the absolute maximum supported by the DateTimePicker, whichever is less in terms of the year part. For the "en-US" culture the maximum supported date is equal to DateTime.MaxValue, so the MaxDateTime field is used insteam.
Again, this isn't meant to answer why the particular value is used, but to give more insight on how it is being used within the DateTimePicker itself.

SQL date values converted to integers

Ok, I can't understand this thing.
A customer of mine has a legacy Windows application (to produce invoices) which stores date values as integers.
The problem is that what is represented like '01.01.2002' (value type: date) is indeed stored in SQL Server 2000 as 731217 (column type: integer).
Is it an already known methodology to convert date values into integers (for example - I don't know - in order to make date difference calculations easier?)
By the way, I have to migrate those data into a new application, but for as much I googled about it I can't figure out the algorithm used to apply such conversion.
Can anybody bring some light?
It looks like the number of days since Jan 1st 0000 (although that year doesn't really exists).
Anyway, take a date as a reference like Jan 1st 2000 and look what integer you have for that date (something like 730121).
You then take the difference between the integer you have for a particular date and the one for your reference date and you that number of days to your reference date with the DATEADD function.
DATEADD(day, *difference (eg 731217 - 730121)*, *reference date in proper SQLServer format*)
You can adjust if you're off by a day a two.

Flip month and day in SQL Server DateTime field

I have a portion of my data that has been parsed incorrectly (due to earlier mistakes in handling Culture) so that the month and the day should be flipped. Is there an easy way to do this in SQL Server? Fortunately, it is still early enough in the dataset that it is easy to locate the bad data.
Update I think this will work:
SELECT seen, DATEADD(DAY, DATEDIFF(DAY, seen,
CONVERT(DATETIME, CAST(YEAR(seen) AS VARCHAR(4))
+ RIGHT('0'+CAST(DAY(seen) AS VARCHAR(2)),2) +
+ RIGHT('0'+CAST(MONTH(seen) AS VARCHAR(2)),2), 112)), seen)
FROM TermStats
WHERE seen < '2011-09-01' AND DAY(seen) <= 12
But I think I can do better. All good dates are after 9/1. (You can tell I really lucked out here... lol)
SELECT DISTINCT YEAR(seen), MONTH(seen), DAY(seen)
FROM TermStats
ORDER BY YEAR(seen), MONTH(seen), DAY(seen)
2006 2 13
2011 3 9
2011 4 9
2011 5 9
2011 6 9
2011 9 3
2011 9 4
2011 9 5
2011 9 6
Personally I wouldn't do it through string manipulation. It's entirely possible that you can do it nicely in a stored procedure, but I'd personally (as someone with little SQL experience) write a client side tool to fix it:
Fetch the ID and date of every row you need to fix (fetching the date/time value as a date/time value, not as a string)
Create the correct date, e.g. in C# using new DateTime(dt.Year, dt.Day, dt.Month); to flip the fields of the wrong date
Update the database with a parameterized query - again, not converting the dates into strings
Basically, wherever you can, avoid conversion between text and other formats. It only leads to the kind of pain you've already discovered.

Resources