Linq query where First() appears to return null when results are not found - sql-server

I was presented with the following (this has been simplified for the question):
int programId = 3;
int refugeeId = 5;
var q = ( from st in Students
join cr in Class_Rosters on st.StudentId equals cr.StudentId
join pp in Student_Program_Part on cr.StudentId equals pp.StudentId
from refg in (Student_Program_Participation_Values
.Where(rudf => rudf.ProgramParticipationId == pp.ProgramParticipationId
&& rudf.UDFId == refugeeId)).DefaultIfEmpty()
where cr.ClassId == 22898
&& pp.ProgramId == programId
select new
{
StudentId = st.StudentId,
Refugees = refg.Value ?? "IT WAS NULL",
Refugee = Student_Program_Participation_Values
.Where(rudf => rudf.ProgramParticipationId == pp.ProgramParticipationId
&& rudf.RefugeeId == refugeeId)
.Select(rudf => (rudf.Value == null ? "IT WAS NULL" : "NOT NULL!"))
.First() ?? "First Returned NULL!",
});
q.Dump();
In the above query the Student_Program_Participation_Values table does not have records for all students. The Refugees value properly returns the Value or "IT WAS NULL" when there is a missing record in Student_Program_Participation_Values. However, the Refugee column returns either "NOT NULL!" or "First Returned NULL!".
My question is, why is "First Returned NULL!" being seen since, in my experience with Linq, calling First() on an empty set should throw an exception, but in this query it appears to be doing something completely different. Note that refg.Value is never null in the database (it is either a valid value, or there is no record).
Note also that this is Linq to SQL and we are running this query in Linqpad.
To clarify, here is some sample output:
StudentId Refugees Refugee
22122 True NOT NULL!
2332 IT WAS NULL First Returned NULL!
In the above when Refugees returns "IT WAS NULL" there was no record in the Student_Program_Participation_Values table, so I expected First() to throw an exception, but instead it was null so Refugee shows "First Returned NULL!".
Any ideas?
Update: Enigmativity pushed me in the right direction by pointing out that I was stuck on the First() call when being a IQueryable the First() wasn't really a function call at all, but simply translated into "TOP 1" in the query. It was obvious when I looked at the generated SQL in LINQPad. Below is the important part of the generated SQL that makes it clear what is happening and why. I won't paste the entire thing since it's enormous and not germane to the discussion.
...
COALESCE((
SELECT TOP (1) [t12].[value]
FROM (
SELECT
(CASE
WHEN 0 = 1 THEN 'IT WAS NULL'
ELSE CONVERT(NVarChar(11), 'NOT NULL!')
END) AS [value], [t11].[ProgramParticipationId], [t11].[UDFId]
FROM [p_Student_Program_Participation_UDF_Values] AS [t11]
) AS [t12]
WHERE ([t12].[ProgramParticipationId] = [t3].[ProgramParticipationId]) AND ([t12].[UDFId] = #p8)
), 'First Returned NULL!') AS [value3]
...
So, here you can clearly see that Linq converted the First() into TOP (1) and also determined that "IT WAS NULL" could never happen (thus the 0 = 1) since the whole thing is based on an outer join and the entire query simply coalesces into 'First Returned NULL!'.
So, it was all a perception mistake on my part not separating in my mind that Linq To SQL (and LINQ to Entities for that matter) is very different from calling the same-named methods on Lists and the like.
I hope my mistake is useful to someone else.

Without having your database I couldn't test this code, but try it anyway and see if it works.
var q =
(
from st in Students
join cr in Class_Rosters on st.StudentId equals cr.StudentId
where cr.ClassId == 22898
join pp in Student_Program_Part on cr.StudentId equals pp.StudentId
where pp.ProgramId == programId
select new
{
StudentId = st.StudentId,
refg =
Student_Program_Participation_Values
.Where(rudf =>
rudf.ProgramParticipationId == pp.ProgramParticipationId
&& rudf.UDFId == refugeeId)
.ToArray()
}
).ToArray();
var q2 =
from x in q
from refg in x.refg.DefaultIfEmpty()
select new
{
StudentId = x.StudentId,
Refugees = refg.Value ?? "IT WAS NULL",
Refugee = refg
.Select(rudf => (rudf.Value == null ? "IT WAS NULL" : "NOT NULL!"))
.First() ?? "First Returned NULL!",
};
q2.Dump();
Basically the idea is to capture the records cleanly from the database, bring them in to memory, and then do all the null stuff. If this works then it is because of the failure to translate the LINQ into the same SQL. The translated SQL can sometimes be a little off so you don't get the results you expect. It's like translating English into French - you might not get the correct translation.

Related

Weird SQL Error (Bug)

So this is really weird.
I run a sql command from .net on sqlserver with a 'Select Count(*)' and get a response like "Needs attention CA" (which is in a varchar of one field of one record of the inner joined tables).
Huh? How can Count(*) return a string? 999 out of 1000 times this code executes correctly. Just sometimes on some clients servers it will throw a string of errors for an hour or so only to miraculously stop again.
This is my sqlcommand:
SELECT Count(*)
FROM patientsappointments
INNER JOIN appointmenttypes
ON patientsappointments.appointmenttypeid =
appointmenttypes.appointmenttypeid
WHERE ( ( patientsappointments.date > #WeekStartDate
AND patientsappointments.date < #WeekFinishDate )
AND ( patientsappointments.status = 'Pending' )
AND ( patientsappointments.doctorid = #DoctorID )
AND ( appointmenttypes.appointmentname <> 'Note' ) )
And these are the parameters:
#WeekStartDate = 24/06/2013 12:00:00 AM (DateTime)
#WeekFinishDate = 1/07/2013 12:00:00 AM (DateTime)
#DoctorID = 53630c67-3a5a-406f-901c-dbf6b6d1b20f (UniqueIdentifier)
I do a sqlcmd.executescalar to get the result. Any ideas?
The actual executed code is:
SyncLock lockRefresh
Dim WeekFulfilled, WeekPending As Integer
Using conSLDB As New SqlConnection(modLocalSettings.conSLDBConnectionString)
Dim mySQL As SqlCommand
mySQL = New SqlCommand("SELECT COUNT(*) FROM PatientsAppointments INNER JOIN AppointmentTypes ON PatientsAppointments.AppointmentTypeID = AppointmentTypes.AppointmentTypeID " & _
"WHERE ((PatientsAppointments.Date > #WeekStartDate AND PatientsAppointments.Date < #WeekFinishDate) AND (PatientsAppointments.Status = 'Pending') " & _
"AND (PatientsAppointments.DoctorID = #DoctorID) AND (AppointmentTypes.AppointmentName <> 'Note'))", conSLDB)
Try
mySQL.Parameters.Add("#WeekStartDate", SqlDbType.DateTime).Value = MonthCalendar1.SelectionStart.Date.AddDays(-MonthCalendar1.SelectionStart.Date.DayOfWeek).AddDays(1)
mySQL.Parameters.Add("#WeekFinishDate", SqlDbType.DateTime).Value = MonthCalendar1.SelectionStart.Date.AddDays(-MonthCalendar1.SelectionStart.Date.DayOfWeek).AddDays(8)
mySQL.Parameters.Add("#DoctorID", SqlDbType.UniqueIdentifier).Value = cboDoctors.SelectedValue
conSLDB.Open()
'got errors here like "Conversion from string "R2/3" to type 'Integer' is not valid." Weird.
'failing on deadlock - maybe due to simultaneous updating from udp event. Try adding random delay to refresh
WeekPending = mySQL.ExecuteScalar
Catch ex As Exception
ErrorSender.SendError("frmAppointmentBook - RefreshHeader 1", ex, New String() {String.Format("mySQL.commandtext: {0}", mySQL.CommandText), _
String.Format("mySQL.Parameters: {0}", clsErrorSender.ParamsListToString(mySQL.Parameters))})
End Try
Me.lblPendingWeek.Text = WeekPending
Try
mySQL.CommandText = "SELECT COUNT(*) FROM PatientsAppointments INNER JOIN AppointmentTypes ON PatientsAppointments.AppointmentTypeID = AppointmentTypes.AppointmentTypeID WHERE " & _
"(PatientsAppointments.Date > #WeekStartDate AND PatientsAppointments.Date < #WeekFinishDate) AND (PatientsAppointments.Status = 'Fulfilled') AND " & _
"(PatientsAppointments.DoctorID = #DoctorID) AND (AppointmentTypes.AppointmentName <> 'Note')"
'didn't get the error here... but just in case...
WeekFulfilled = mySQL.ExecuteScalar
Catch ex As Exception
ErrorSender.SendError("frmAppointmentBook - RefreshHeader 2", ex, New String() {String.Format("mySQL.commandtext: {0}", mySQL.CommandText)})
End Try
conSLDB.Close()
End Using
End SyncLock
The exact error message is:
System.InvalidCastException
Conversion from string "Needs Attention DC" to type 'Integer' is not valid.
Your problem has nothing to do with the COUNT(*) portion of your code. The problem is somewhere else in your query. What that particular error is telling you is that at some point you are comparing a character field (it probably usually contains numbers) to an integer field. One of the values of the character field happens to be "Needs Attention DC". If I had to guess it is probably either patientsappointments.appointmenttypeid or appointmenttypes.appointmenttypeid. Double check the datatype of each of those columns to make sure they are in fact INT. If they are both INT then start checking the other explicitly named columns in your query to see if you have any surprises.
You must have an error somewhere in your implementation...
Per the documentation, count always returns an int data type value.
Since this doesn't always happen, it must be a result of one of the paramenter values that is sent in. This is one of the lbuiggest problems with using dynamic SQL. What I would do is create the dymanic SQl and then store it in a database logging table with the date and time and user who executed it. Then when you get the exception, you can find the exact SQL code that was sent. Most likely you need more controls on the input variables to ensure the data placed in them is of the correct data type.
I am going to make another guess. I am guessing that this is a multi threading issue. You probably are sharing the connection between multiple threads. Once in a while the thread will get that man from somewhere else and execute it. Make sure that the connection variable is local, and only one thread can access it at a time.
As Martin points out, the following answer is wrong. I'm keeping this here to show that this is wrong.
From what everyone has already said, there is a type mismatch on your columns. Since your where clause appears to be fine, and your join is fine, it must be elsewhere. I would check to see if patientsappointments or appointmenttypes are views. Maybe the view has a join that's throwing the exception. Check the schema definition of all your joins/where's. Somewhere in there you're storing integers in a character field. It's fine for most rows, but one of them has your string.
If it's not in your views, it may be a trigger somewhere. The point is that somewhere there is a schema mismatch. Once you find your schema mismatch, you can find the row by querying for that string.

Query with integers not working

I've been searching here on stackoverflow and other sources but not found a solution to this
The query below works as expected expect for when either custinfo.cust_cntct_id or custinfo.cust_corrcntct_id = '' (blank not NULL) then I get no results. Both are integer fields and if both have an integer value then I get results. I still want a value returned for either cntct_email or corrcntct_email even if custinfo.cust_cntct_id or custinfo.cust_corrcntct_id = blank
Can someone help me out in making this work? The database is PostgreSQL.
SELECT
cntct.cntct_email AS cntct_email,
corrcntct.cntct_email AS corrcntct_email
FROM
public.custinfo,
public.invchead,
public.cntct,
public.cntct corrcntct
WHERE
invchead.invchead_cust_id = custinfo.cust_id AND
cntct.cntct_id = custinfo.cust_cntct_id AND
corrcntct.cntct_id = custinfo.cust_corrcntct_id;
PostgreSQL won't actually let you test an integer field for a blank value (unless you're using a truly ancient version - 8.2 or older), so you must be using a query generator that's "helpfully" transforming '' to NULL or a tool that's ignoring errors.
Observe this, on Pg 9.2:
regress=> CREATE TABLE test ( a integer );
CREATE TABLE
regress=> insert into test (a) values (1),(2),(3);
INSERT 0 3
regress=> SELECT a FROM test WHERE a = '';
ERROR: invalid input syntax for integer: ""
LINE 1: SELECT a FROM test WHERE a = '';
If you are attempting to test for = NULL, this is not correct. You must use IS NOT NULL or IS DISTINCT FROM NULL instead. Testing for = NULL always results in NULL, which is treated as false in a WHERE clause.
Example:
regress=> insert into test (a) values (null);
INSERT 0 1
regress=> SELECT a FROM test WHERE a = NULL;
a
---
(0 rows)
regress=> SELECT a FROM test WHERE a IS NULL;
a
---
(1 row)
regress=> SELECT NULL = NULL as wrong, NULL IS NULL AS right;
wrong | right
-------+-------
| t
(1 row)
By the way, you should really be using ANSI JOIN syntax. It's more readable and it's much easier to forget to put a condition in and get a cartesian product by accident. I'd rewrite your query for identical functionality and performance but better readability as:
SELECT
cntct.cntct_email AS cntct_email,
corrcntct.cntct_email AS corrcntct_email
FROM
public.custinfo ci
INNER JOIN public.invchead
ON (invchead.invchead_cust_id = ci.cust_id)
INNER JOIN public.cntct
ON (cntct.cntct_id = ci.cust_cntct_id)
INNER JOIN public.cntct corrcntct
ON (corrcntct.cntct_id = ci.cust_corrcntct_id);
Use of table aliases usually keeps it cleaner; here I've aliased the longer name custinfo to ci for brevity.

Using a bit input in stored procedure to determine how to filter results in the where clause

I'm beating my head against the wall here... can't figure out a way to pull this off.
Here's my setup:
My table has a column for the date something was completed. If it was never completed, the field is null. Simple enough.
On the front end, I have a checkbox that defaults to "Only show incomplete entries". When only pulling incomplete entries, it's easy.
SELECT
*
FROM Sometable
WHERE Completed_Date IS NULL
But offering the checkbox option complicates things a great deal. My checkbox inputs a bit value: 1=only show incomplete, 0=show all.
The problem is, I can't use a CASE statement within the where clause, because an actual value uses "=" to compare, and checking null uses "IS". For example:
SELECT
*
FROM Sometable
WHERE Completed_Date IS <---- invalid syntax
CASE WHEN
...
END
SELECT
*
FROM Sometable
WHERE Completed_Date =
CASE WHEN #OnlyIncomplete = 1 THEN
NULL <----- this translates to "WHERE Completed_Date = NULL", which won't work.. I have to use "IS NULL"
...
END
Any idea how to accomplish this seemly easy task? I'm stumped... thanks.
...
WHERE #OnlyIncomplete = 0
OR (#OnlyIncomplete = 1 AND Completed_Date IS NULL)
Hmmm... I think what you want is this:
SELECT
*
FROM Sometable
WHERE Completed_Date IS NULL OR (#OnlyIncomplete = 0)
So that'll show Date=NULL plus, if OnlyIncomplete=0, Date != Null. Yeah, I think that's it.
If you still want to use a CASE function (although it may be overkill in this case) :
SELECT
*
FROM Sometable
WHERE 1 =
(CASE WHEN #OnlyIncomplete = 0 THEN 1
WHEN #OnlyIncomplete = 1 AND Completed_Date IS NULL THEN 1
END)

SQL to LINQ Expression

I have specific SQL expression :
{
select * from courceMCPD.dbo.Contact c
where c.cID in ( select cId from courceMCPD.dbo.Friend f where f.cId=5)
}
i would like to get LINQ expression that gets the same result.
thank you in advance.
That sounds like it's equivalent to something like:
var friendIds = from friend in db.Friends
where friend.ContactId == 5
select friend.ContactId;
var query = from contact in db.Contacts
where friendIds.Contains(contact.Id)
select contact;
(There are lots of different ways of representing the query, but that's the simplest one I could think of.)
It's pretty odd to perform a join on a particular field and also mandate that that field has to have a particular value though... there's not very much difference between that and:
var query = db.Contacts.Where(c => c.Id == 5);
... the only difference is whether there are any friend entries for that particular contact.
EDIT: Smudge gave another option for the query in a comment, so I'm promoting it into this answer...
var query = db.Contacts.Where(c => c.Friends.Any(f => f.cId == 5))
This assumes you've got an appropriate Friends relationship defined in the Contacts entity.
Using labda expressions:
var query = dc.Contact
.Where(c => dc.Friend.Select(f => f.cId).Contains(i.cID))
.Where(c => c.cId == 5);
User "query" syntax:
var query = from c in dc.Contact
where (from f in dc.Friend select f.cID).Contains(c.cId)
where c.cId == 5
select c;
You haven't specified VB/C# so I'm going for VB =P
Dim results As IEnumerable(Of yourEntities.Contact) =
(From c In yourContextInstance.Contacts Where (From f In yourContextInstance.Friends
Where f.cId = 5 Select f.cId).Contains(c.cID))
Clearly, Jon's answer works, this query (I believe) just resembles your T-SQL Closer.

SQL Server: Why does comparison null=value return true for NOT IN?

Why does the comparison of value to null return false, except when using a NOT IN, where it returns true?
Given a query to find all stackoverflow users who have a post:
SELECT * FROM Users
WHERE UserID IN (SELECT UserID FROM Posts)
This works as expected; i get a list of all users who have a post.
Now query for the inverse; find all stackoverflow users who don't have a post:
SELECT * FROM Users
WHERE UserID NOT IN (SELECT UserID FROM Posts)
This returns no records, which is incorrect.
Given hypothetical data1
Users Posts
================ ===============================
UserID Username PostID UserID Subject
------ -------- ------- ------ ----------------
1 atkins 1 1 Welcome to stack ov...
2 joels 2 2 Welcome all!
... ... ... ...
399573 gt6989b ... ...
... ... ... ...
10592 null (deleted by nsl&fbi...
... ...
And assume the rules of NULLs:
NULL = NULL evaluates to unknown
NULL <> NULL evaluates to unknown
value = NULL evaluates unknown
If we look at the 2nd query, we're interested in finding all rows where the Users.UserID is not found in the Posts.UserID column. i would proceed logically as follows:
Check UserID 1
1 = 1 returns true. So we conclude that this user has some posts, and do not include them in the output list
Now check UserID 2:
2 = 1 returns false, so we keep looking
2 = 2 returns true, so we conclude that this user has some posts, and do not include them in the output list
Now check UserID 399573
399573 = 1 returns false, so we keep looking
399573 = 2 returns false, so we keep looking
...
399573 = null returns unknown, so we keep looking
...
We found no posts by UserID 399573, so we would include him in the output list.
Except SQL Server doesn't do this. If you have a NULL in your in list, then suddenly it finds a match. It suddenly finds a match. Suddenly 399573 = null evaluates to true.
Why does the comparison of value to null return unknown, except when it returns true?
Edit: i know that i can workaround this nonsensical behavior by specifically excluding the nulls:
SELECT * FROM Users
WHERE UserID NOT IN (
SELECT UserID FROM Posts
WHERE UserID IS NOT NULL)
But i shouldn't have to, as far as i can tell the boolean logic should be fine without it - hence my question.
Footnotes
1 hypothetical data; if you don't like it: make up your down.
celko now has his own tag
Common problem, canned answer:
The behavior of NOT IN clause may be confusing and as such it needs some explanations. Consider the following query:
SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)
Although there are more than a thousand distinct last names in AdventureWorks.Person.Contact, the query returns nothing. This may look counterintuitive to a beginner database programmer, but it actually makes perfect sense. The explanation consist of several simple steps. First of all, consider the following two queries, which are clearly equivalent:
SELECT LastName, FirstName FROM Person.Contact
WHERE LastName IN('Hedlund', 'Holloway', NULL)
SELECT LastName, FirstName FROM Person.Contact
WHERE LastName='Hedlund' OR LastName='Holloway' OR LastName=NULL
Note that both queries return expected results. Now, let us recall DeMorgan's theorem, which states that:
not (P and Q) = (not P) or (not Q)
not (P or Q) = (not P) and (not Q)
I am cutting and pasting from Wikipedia (http://en.wikipedia.org/wiki/De_Morgan_duality). Applying DeMorgan's theorem to this queries, it follows that these two queries are also equivalent:
SELECT LastName, FirstName FROM Person.Contact WHERE LastName NOT IN('Hedlund', 'Holloway', NULL)
SELECT LastName, FirstName FROM Person.Contact
WHERE LastName<>'Hedlund' AND LastName<>'Holloway' AND LastName<>NULL
This last LastName<>NULL can never be true
The assumption in your first sentence isn't right:
Why does the comparison of value to
null return false, except when using a
NOT IN, where it returns true?
But comparison of a value to null does not return false; it returns unknown. And unknown has its own logic:
unknown AND true = unknown
unknown OR true = true
unknown OR false = unknown
One example of how this works out:
where 1 not in (2, null)
--> where 1 <> 2 and 1 <> null
--> where true and unknown
--> where unknown
The where clause only matches on true, so this filters out any row.
You can find the full glory of 3 value logic at Wikipedia.

Resources