LINQ to SQL anomaly - sql-server

I'm trying to learn LINQ to SQL. I've run into something I just don't get. Here's the LINQ program (vb.net):
Imports System.IOModule Module1
Sub Main()
Dim crs = New DataClasses1DataContext()
Dim sw As New StringWriter()
crs.Log = sw
Dim reports = From report In crs.CRS_Report_Masters
Group report By report_id = report.Report_ID Into grouped = Group
Select New With {
.reportId = report_id,
.two = grouped.Sum(
Function(row) row.active_report * row.Report_ID)
}
For Each report In reports
Console.WriteLine("{0} {1}", report.reportId, report.two)
Next
MsgBox(sw.GetStringBuilder().ToString())
End Sub
End Module
Here's the SQL it produces:
SELECT SUM([t1].[value]) AS [two], [t1].[Report_ID] AS [reportId] FROM (
SELECT (-(CONVERT(Float,[t0].[active_report]))) *
(CONVERT(Float,CONVERT(Float,[t0].[Report_ID]))) AS [value], [t0].[Report_ID]
FROM [dbo].[CRS_Report_Master] AS [t0]
) AS [t1] GROUP BY [t1].[Report_ID]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.30319.1
What I don't get is why there is a minus sign in the SQL before the math in the parentheses. I didn't specify that in the LINQ query.

If I recall correctly, VB.NET treats -1 as Boolean True, while SQL treats +1 as Boolean True. Therefore the minus sign is necessary for VB.NET to properly interpret the active_report field.

So, now I see what's going on. The LINQ to SQL definition of the column corresponding to a SQL bit is a .net Boolean. Fair enough. When doing math using it, however, the compiler negates the value as described by ekolis. I did not expect that -- in fact I think it's an error, since I do not get the results I expect. e.g. if active_report=1 and report_id=123, I expect to get "123" but I'm getting "-123". I modified the query like this:
Dim reports = (From report In crs.CRS_Report_Masters
Group report By report_id = report.Report_ID Into grouped = Group
Select New With {
.reportId = report_id,
.two = grouped.Sum(
Function(row) If(row.active_report, 1, 0) * CInt(row.Report_ID))
})
which changed the generated SQL to:
SELECT SUM([t1].[value]) AS [two], [t1].[Report_ID] AS [reportId] FROM (
SELECT (
(CASE
WHEN (COALESCE([t0].[active_report],#p0)) = 1 THEN #p1
ELSE #p2
END)) * (CONVERT(Int,[t0].[Report_ID])) AS [value], [t0].[Report_ID]
FROM [dbo].[CRS_Report_Master] AS [t0]
) AS [t1] GROUP BY [t1].[Report_ID]
-- #p0: Input Int (Size = -1; Prec = 0; Scale = 0) [0]
-- #p1: Input Int (Size = -1; Prec = 0; Scale = 0) [1]
-- #p2: Input Int (Size = -1; Prec = 0; Scale = 0) [0]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.30319.1
Makes sense I suppose, though it's a roundabout way get the desired result. In T-SQL, a simple SELECT [BOOLEAN]*[INTEGER] will achieve the same result, I believe. Lesson learned: Don't trust LINQ to always do the right thing. Check its work!

Related

Stored procedure - get anticipated columns before fully executing statement?

I'm working through a stored procedure and wondering if there's a way to retrieve the anticipated result column list from a sql statement before fully executing.
Scenarios:
dynamic SQL
a UDF that might vary the columns outside of our control
EX:
//inbound parameter
SET QUERY_DEFINITION_ID = 12345;
//Initial statement pulls query text from bank of queries
var sqlText = getQueryFromQueryBank(QUERY_DEFINITION_ID);
//now we run our query
var cmd = {sqlText: sqlText };
stmt = snowflake.createStatement(cmd);
What I'd like to be able to do is say "right - before you run this, give me the anticipated column list" so I can compare it to what's expected.
EX:
Expected: [col1, col2, col3, col4]
Got: [col1]
Result: Oops. Don't run.
Rationale here is that I want to short-circuit the execution if something is missing - before it potentially runs for a while. I can validate all of this after the fact, but it would be really helpful to stop early.
Any ideas very much appreciated!
This sample SP code shows how to get a list of columns that a query will project into the result before you run the query. It should only be used for large, long running queries because it will take a few seconds to get the column list.
There are a couple of caveats. 1) It will only return the names of the columns. It won't tell you how they were built, that is, whether they're aliased, direct from a table, calculated, etc. 2) The example query I used is straight from the Snowflake documentation here https://docs.snowflake.com/en/user-guide/sample-data-tpcds.html#functional-query-definition. For convenience, I minimized the query to a single line. The output of the columns includes object qualifiers in addition to the column names, so V1.I_CATEGORY, V1.D_YEAR, V1.D_MOY, etc. If you don't want them to make it easier to compare names, you can strip off the qualifiers using the JavaScript split function on the dot and take index 1 of the resulting array.
create or replace procedure EXPLAIN_BEFORE_RUNNING()
returns string
language javascript
execute as caller
as
$$
// Set the context for the session to the TPC-H sample data:
executeNonQuery("use schema snowflake_sample_data.tpcds_sf10tcl;");
// Here's a complex query from the Snowflake docs (minimized to one line for convienience):
var sql = `with v1 as( select i_category, i_brand, cc_name, d_year, d_moy, sum(cs_sales_price) sum_sales, avg(sum(cs_sales_price)) over(partition by i_category, i_brand, cc_name, d_year) avg_monthly_sales, rank() over (partition by i_category, i_brand, cc_name order by d_year, d_moy) rn from item, catalog_sales, date_dim, call_center where cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and cc_call_center_sk= cs_call_center_sk and ( d_year = 1999 or ( d_year = 1999-1 and d_moy =12) or ( d_year = 1999+1 and d_moy =1)) group by i_category, i_brand, cc_name , d_year, d_moy), v2 as( select v1.i_category ,v1.d_year, v1.d_moy ,v1.avg_monthly_sales ,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum from v1, v1 v1_lag, v1 v1_lead where v1.i_category = v1_lag.i_category and v1.i_category = v1_lead.i_category and v1.i_brand = v1_lag.i_brand and v1.i_brand = v1_lead.i_brand and v1.cc_name = v1_lag.cc_name and v1.cc_name = v1_lead.cc_name and v1.rn = v1_lag.rn + 1 and v1.rn = v1_lead.rn - 1) select * from v2 where d_year = 1999 and avg_monthly_sales > 0 and case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 order by sum_sales - avg_monthly_sales, 3 limit 100;`;
// Before actually running the query, generate an explain plan.
executeNonQuery("explain " + sql);
// Now read the column list from the explain plan from the result set.
var columnList = executeSingleValueQuery("COLUMN_LIST", `select "expressions" as COLUMN_LIST from table(result_scan(last_query_id())) where "operation" = 'Result';`);
// For now, just exit with the column list as the output...
return columnList;
// Your code here...
// Helper functions:
function executeNonQuery(queryString) {
var out = '';
cmd = {sqlText: queryString};
stmt = snowflake.createStatement(cmd);
var rs;
rs = stmt.execute();
}
function executeSingleValueQuery(columnName, queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs;
try{
rs = stmt.execute();
rs.next();
return rs.getColumnValue(columnName);
}
catch(err) {
if (err.message.substring(0, 18) == "ResultSet is empty"){
throw "ERROR: No rows returned in query.";
} else {
throw "ERROR: " + err.message.replace(/\n/g, " ");
}
}
return out;
}
$$;
call Explain_Before_Running();

how to for each loop on data table within another datatable

I have a web poll application that reads and processes orders from sql server back end and inserts into windss(jda).
the flow
1) I read all the orders from the sql server tables that is not processed
2) then I need to divide the orders into two sets
- set 1 orders with bundle kit( 1 order with several suborders)
- set 2 is just orders with no bundle kit
3) then after each order is processed I need to update the isprocessed sql field to 1
1) code for reading all orders
Frmmain.vb
dt = WebDatabase.GetAllOrdersFromDatabase()**'this method is below**
For Each drow In dt.Rows
If CDbl(WinDSSStoreNumber) = 123 Or CDbl(WinDSSStoreNumber) = 124 Then
' Store 123 and 124 = customer service - only orders with qualifying source codes
my question is:
the above for each checks if the store is 123 or 124 but now I want to implement another for each loop that reads all the InvoiceHeader_Id that the second field in the datatable and then check if it has a bundle kit or not if it does then process it and update the isprocessed field second last in the datatable else process the orders without any bundle kit and update isprocessed for those as well. please any help is generously appreciated, please ask me any questions in the below comment before marking my question down.
WebData.vb
Public Function GetAllOrdersFromDatabase() As DataTable
DrivePath = "C:\Users\somjething\Documents\files\somefiles\"
Dim ds As DataTable = Nothing
WinDSS.connectionString = ConfigurationManager.AppSettings("WinDSS_Connection").Replace("%DrivePath%", DrivePath)
ds = WinDSS.GetSysMst()
If ds.Rows.Count > 0 Then
WinDSSStoreNumber = ds.Rows(0)("store_no")
End If
Dim dt As New DataTable()
Dim conn As New SqlConnection(ConfigurationManager.AppSettings("WebData_Connection") & ConfigurationManager.AppSettings("WebDataSource"))
Dim cmd As SqlCommand
Dim da As New SqlDataAdapter
conn.Open()
cmd = conn.CreateCommand()
cmd.CommandType = CommandType.Text
cmd.CommandText= SELECT InvoiceDetail_Id, InvoiceHeader_Id, ActualFreightCharge, AmountCollected,
ChargedActualFreight, CollectedExternally, CollectedThroughAR, DateInvoiced, InvoiceNo,
InvoiceType, LineSubTotal, MasterInvoiceNo, OrderInvoiceKey, Reference1, TotalAmount,
TotalTax, isprocessed, storetoprocess
FROM dbo.InvoiceHeader
WHERE (isprocessed = 0) AND (storetoprocess = N'123')
da.SelectCommand = cmd
da.Fill(dt)
conn.Close()
Return dt
End Function
Additoinal info
This is the information given to me by my manager:
loop throught each below
SELECT InvoiceDetail_Id, InvoiceHeader_Id, ActualFreightCharge, AmountCollected,
ChargedActualFreight, CollectedExternally, CollectedThroughAR, DateInvoiced, InvoiceNo,
InvoiceType, LineSubTotal, MasterInvoiceNo, OrderInvoiceKey, Reference1, TotalAmount,
TotalTax, isprocessed, storetoprocess
FROM dbo.InvoiceHeader WHERE (isprocessed = 0) AND (storetoprocess = N'195')
Use the above InvoiceHeader_Id to loop throught all of the order information. Process each Bundle (Kit/)
SELECT InvoiceHeader_Id, LineDetails_Id, LineDetail_Id, OrderLine_Id, GiftFlag, [References],
GiftWrap, IsBundleParent, KitCode, KitQty, LevelOfService, LineSeqNo, LineType,
MaxLineStatus, MaxLineStatusDesc, MinLineStatus, MinLineStatusDesc, MinShipByDate,
OpenQty, OrderHeaderKey, OrderLineKey, OrderedQty, OriginalOrderedQty, OtherCharges,
OverallStatus, PipelineKey, PrimeLineNo, ReceivingNode, RemainingQty, ReqCancelDate,
ReqDeliveryDate,
ReqShipDate, SCAC, ScacAndService, ScacAndServiceKey, ShipNode, ShipToID, ShipToKey,
StatusQuantity, SubLineNo, SubstituteItemID, isprocessed
FROM dbo.OrderLine
WHERE (isprocessed = 0) AND (InvoiceHeader_Id = 13) AND (IsBundleParent = 'Y')
ORDER BY PrimeLineNo, SubLineNo
For each record in the above list query below and add as Zero (0) dollar amount. The cost is associated with the information above record (you may have to query tables to ge the values)
Get the OrderLineKey from the above record and query this below to get the associated sub items.
SELECT TOP (100) PERCENT dbo.BundleParentLine.InvoiceHeader_Id AS BPL_InvoiceHeader_Id,
dbo.BundleParentLine.LineDetails_Id AS BPL_LineDetails_Id, dbo.BundleParentLine.LineDetail_Id AS BPL_LineDetail_Id, dbo.BundleParentLine.OrderLine_Id AS BPL_OrderLine_Id, dbo.BundleParentLine.BundleParentLine_id, dbo.BundleParentLine.SubLineNo AS BPL_SubLineNo, dbo.BundleParentLine.PrimeLineNo AS BPL_PrimeLineNo, dbo.BundleParentLine.OrderLineKey AS BPL_OrderLineKey, dbo.OrderLine.*
FROM dbo.BundleParentLine INNER JOIN dbo.OrderLine ON dbo.BundleParentLine.OrderLine_Id = dbo.OrderLine.OrderLine_Id AND dbo.BundleParentLine.LineDetail_Id = dbo.OrderLine.LineDetail_Id AND dbo.BundleParentLine.LineDetails_Id = dbo.OrderLine.LineDetails_Id AND dbo.BundleParentLine.InvoiceHeader_Id = dbo.OrderLine.InvoiceHeader_Id
WHERE (dbo.BundleParentLine.OrderLineKey = N'76873264832') AND (dbo.BundleParentLine.InvoiceHeader_Id = 13) AND (dbo.BundleParentLine.LineDetails_Id = 6) AND (dbo.OrderLine.isprocessed = 0)
ORDER BY BPL_PrimeLineNo, BPL_SubLineNo
After Processed, set isprocessed = 1
Use the above InvoiceHeader_Id to loop throught all of the order information
SELECT InvoiceHeader_Id, LineDetails_Id, LineDetail_Id, OrderLine_Id, GiftFlag, [References],
GiftWrap, IsBundleParent, KitCode, KitQty, LevelOfService, LineSeqNo,
LineType,
MaxLineStatus, MaxLineStatusDesc, MinLineStatus, MinLineStatusDesc, MinShipByDate,
OpenQty, OrderHeaderKey, OrderLineKey, OrderedQty,
OriginalOrderedQty,
OtherCharges, OverallStatus, PipelineKey, PrimeLineNo, ReceivingNode, RemainingQty,
ReqCancelDate, ReqDeliveryDate, ReqShipDate, SCAC,
ScacAndService,
ScacAndServiceKey, ShipNode, ShipToID, ShipToKey, StatusQuantity,
SubLineNo, SubstituteItemID, isprocessed
FROM dbo.OrderLine
WHERE (isprocessed = 0) AND (InvoiceHeader_Id = 13) AND (IsBundleParent <> 'Y')
ORDER BY PrimeLineNo, SubLineNo
After Processed, set isprocessed = 1
This is not an answer but I want to pose this to prompt you for some clarification and it's not practical in the comments.
First, close visual studio and open SQL Server Management Studio and connect to the database
We'll build some select statements to get the data back and then we can think about an UPDATE statement
This first query you posted gives you the unprocessed headers for store 195 (straight from your post). I don't know if you want to do this for all stores or not. Paste it into SSMS and run it
SELECT H.*
FROM dbo.InvoiceHeader H
WHERE H.isprocessed = 0 AND H.storetoprocess = N'195'
Now, this second query gives you the unprocessed lines against that header that have IsBundleParent='Y'.
Look! No loops!
SELECT H.*, L.*
FROM dbo.InvoiceHeader H
INNER JOIN
dbo.OrderLine L
ON L.InvoiceHeader_Id = H.InvoiceHeader_Id
WHERE H.isprocessed = 0 AND H.storetoprocess = N'195'
AND L.isprocessed = 0 AND L.IsBundleParent = 'Y'
Then in your explanation you have For each record in the above list query below and add as Zero (0) dollar amount which makes no sense to me so perhaps you could clarify
Now we'll add the third query in which adds BundleParentLine (which does have inner joins in it already, but it's a pretty strange list of join fields)
SELECT H.*, L.*, BPL.*
FROM dbo.InvoiceHeader H
INNER JOIN
dbo.OrderLine L
ON L.InvoiceHeader_Id = H.InvoiceHeader_Id
INNER JOIN
dbo.BundleParentLine BPL
ON BPL.OrderLine_Id = L.OrderLine_Id
AND BPL.LineDetail_Id = L.LineDetail_Id
AND BPL.LineDetails_Id = L.LineDetails_Id
AND BPL.InvoiceHeader_Id = L.InvoiceHeader_Id
WHERE H.isprocessed = 0 AND H.storetoprocess = N'195'
AND L.isprocessed = 0 AND L.IsBundleParent = 'Y'
Then in your explanation you have After Processed, set isprocessed = 1, but what is 'processed' here? Is it just setting it to processed in the table?
To do that you just write an UPDATE statement. Here's one way to do that to update just the header (don't run it though until we understand what you're trying to do). Is the problem here that you want to set header and line to processed at the same time?
UPDATE InvoiceHeader
SET Processed = 1
WHERE isprocessed = 0 AND storetoprocess = N'195'
The fourth query you have seems to account for the IsBundleParent <> 'Y' case
Again, we need to know what add as Zero (0) dollar amount and processed means to go any further.

Converting complex sql stored proc into linq

I'm using Linq to Sql and have a stored proc that won't generate a class. The stored proc draws data from multiple tables into a flat file resultset.
The amount of data returned must be as small as possible, the number of round trips to the Sql Server need to be limited, and the amount of server-side processing must be limited as this is for an ASP.NET MVC project.
So, I'm trying to write a Linq to Sql Query however am struggling to both replicate and limit the data returned.
Here's the stored proc that I'm trying to convert:
SELECT AdShops.shop_id as ID, Users.image_url_75x75, AdShops.Advertised,
Shops.shop_name, Shops.title, Shops.num_favorers as hearts, Users.transaction_sold_count as sold,
(select sum(L4.num_favorers) from Listings as L4 where L4.shop_id = L.shop_id) as listings_hearts,
(select sum(L4.views) from Listings as L4 where L4.shop_id = L.shop_id) as listings_views,
L.title AS listing_title, L.price as price, L.listing_id AS listing_id, L.tags, L.materials, L.currency_code,
L.url_170x135 as listing_image_url_170x135, L.url AS listing_url, l.views as listing_views, l.num_favorers as listing_hearts
FROM AdShops INNER JOIN
Shops ON AdShops.shop_id = Shops.shop_id INNER JOIN
Users ON Shops.user_id = Users.user_id INNER JOIN
Listings AS L ON Shops.shop_id = L.shop_id
WHERE (Shops.is_vacation = 0 AND
L.listing_id IN
(
SELECT listing_id
FROM (SELECT l2.user_id , l2.listing_id, RowNumber = ROW_NUMBER() OVER (PARTITION BY l2.user_id ORDER BY NEWID())
FROM Listings l2
INNER JOIN (
SELECT user_id
FROM Listings
GROUP BY
user_id
HAVING COUNT(*) >= 3
) cnt ON cnt.user_id = l2.user_id
) l2
WHERE l2.RowNumber <= 3 and L2.user_id = L.user_id
)
)
ORDER BY Shops.shop_name
Now, so far I can return a flat file but am not able to limit the number of listings. Here's where I'm stuck:
Dim query As IEnumerable = From x In db.AdShops
Join y In (From y1 In db.Shops
Where y1.Shop_name Like _Search + "*" AndAlso y1.Is_vacation = False
Order By y1.Shop_name
Select y1) On y.Shop_id Equals x.shop_id
Join z In db.Users On x.user_id Equals z.User_id
Join l In db.Listings On l.Shop_id Equals y.Shop_id
Select New With {
.shop_id = y.Shop_id,
.user_id = z.user_id,
.listing_id = l.Listing_id
} Take 24 ' Fields ommitted for briefity...
I assume to select a random set of 3 listings per shop, I'd need to use a lambda expression however am not sure how to do this. Also, need to add in somewhere consolidated totals for listing fieelds against individual shops...
Anyone have any thoughts?
UPDATE:
Here's the current solution that I'm looking at:
Result class wrapper:
Public Class NewShops
Public Property Shop_id As Integer
Public Property listing_id As Integer
Public Property tl_listing_hearts As Integer?
Public Property tl_listing_views As Integer?
Public Property listing_creation As Date
End Class
Linq + code:
Using db As New Ads.DB(Ads.DB.Conn)
Dim query As IEnumerable(Of IGrouping(Of Integer, NewShops)) =
(From x In db.AdShops
Join y In (From y1 In db.Shops
Where (y1.Shop_name Like _Search + "*" AndAlso y1.Is_vacation = False)
Select y1
Skip ((_Paging.CurrentPage - 1) * _Paging.ItemsPerPage)
Take (_Paging.ItemsPerPage))
On y.Shop_id Equals x.shop_id
Join z In db.Users On x.user_id Equals z.User_id
Join l In db.Listings On l.Shop_id Equals y.Shop_id
Join lt In (From l2 In db.Listings _
Group By id = l2.Shop_id Into Hearts = Sum(l2.Num_favorers), Views = Sum(l2.Views), Count() _
Select New NewShops With {.tl_listing_views = Views,
.tl_listing_hearts = Hearts,
.Shop_id = id})
On lt.Shop_id Equals y.Shop_id
Select New NewShops With {.Shop_id = y.Shop_id,
.tl_listing_views = lt.tl_listing_views,
.tl_listing_hearts = lt.tl_listing_hearts,
.listing_creation = l.Creation,
.listing_id = l.Listing_id
}).GroupBy(Function(s) s.Shop_id).OrderByDescending(Function(s) s(0).tl_listing_views)
Dim Shops as New Dictionary(Of String, List(Of NewShops))
For Each item As IEnumerable(Of NewShops) In query
Shops.Add(item(0).shop_name, (From i As NewShops In item
Order By i.listing_creation Descending
Select i Take 3).ToList)
Next
End Using
Anyone have any other suggestions?
From the looks of that SQL and code, I'd not be turning it into LINQ queries. It'll just obfuscate the logic and probably take you days to get it correct.
If SQLMetal doesn't generate it properly, have you considered using the ExecuteQuery method of the DataContext to return a list of the items you're after?
Assuming that your sproc you're trying to convert is called sp_complicated, and takes in one parameter, something like the following should do the trick
Protected Class TheResults
Public Property ID as Integer
Public Property image_url_75x75 as String
'... and so on and so forth for all the returned columns. Be careful with nulls
End Class
'then, when you want to use it
Using db As New Ads.DB(Ads.DB.Conn)
dim results = db.ExecuteQuery(Of TheResults)("exec sp_complicated {0}", _Search)
End Using
Before you freak out, that's not susceptible to SQL Injection. L2SQL uses proper SQLParameters, as long as you use the squigglies and don't just concatenate the strings yourself.

LINQ to SQL Take w/o Skip Causes Multiple SQL Statements

I have a LINQ to SQL query:
from at in Context.Transaction
select new {
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select new {
Amount = tb.Amount,
Description = tb.Desc
}
}
This results in one SQL statement being executed. All is good.
However, if I attempt to return known types from this query, even if they have the same structure as the anonymous types, I get one SQL statement executed for the top level and then an additional SQL statement for each "child" set.
Is there any way to get LINQ to SQL to issue one SQL statement and use known types?
EDIT: I must have another issue. When I plugged a very simplistic (but still hieararchical) version of my query into LINQPad and used freshly created known types with just 2 or 3 members, I did get one SQL statement. I will post and update when I know more.
EDIT 2: This appears to be due to a bug in Take. See my answer below for details.
First - some reasoning for the Take bug.
If you just Take, the query translator just uses top. Top10 will not give the right answer if cardinality is broken by joining in a child collection. So the query translator doesn't join in the child collection (instead it requeries for the children).
If you Skip and Take, then the query translator kicks in with some RowNumber logic over the parent rows... these rownumbers let it take 10 parents, even if that's really 50 records due to each parent having 5 children.
If you Skip(0) and Take, Skip is removed as a non-operation by the translator - it's just like you never said Skip.
This is going to be a hard conceptual leap to from where you are (calling Skip and Take) to a "simple workaround". What we need to do - is force the translation to occur at a point where the translator can't remove Skip(0) as a non-operation. We need to call Skip, and supply the skipped number at a later point.
DataClasses1DataContext myDC = new DataClasses1DataContext();
//setting up log so we can see what's going on
myDC.Log = Console.Out;
//hierarchical query - not important
var query = myDC.Options.Select(option => new{
ID = option.ParentID,
Others = myDC.Options.Select(option2 => new{
ID = option2.ParentID
})
});
//request translation of the query! Important!
var compQuery = System.Data.Linq.CompiledQuery
.Compile<DataClasses1DataContext, int, int, System.Collections.IEnumerable>
( (dc, skip, take) => query.Skip(skip).Take(take) );
//now run the query and specify that 0 rows are to be skipped.
compQuery.Invoke(myDC, 0, 10);
This produces the following query:
SELECT [t1].[ParentID], [t2].[ParentID] AS [ParentID2], (
SELECT COUNT(*)
FROM [dbo].[Option] AS [t3]
) AS [value]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ID]) AS [ROW_NUMBER], [t0].[ParentID]
FROM [dbo].[Option] AS [t0]
) AS [t1]
LEFT OUTER JOIN [dbo].[Option] AS [t2] ON 1=1
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
ORDER BY [t1].[ROW_NUMBER], [t2].[ID]
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p2: Input Int (Size = 0; Prec = 0; Scale = 0) [10]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
And here's where we win!
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
I've now determined this is the result of a horrible bug. The anonymous versus known type turned out not to be the cause. The real cause is Take.
The following result in 1 SQL statement:
query.Skip(1).Take(10).ToList();
query.ToList();
However, the following exhibit the one sql statement per parent row problem.
query.Skip(0).Take(10).ToList();
query.Take(10).ToList();
Can anyone think of any simple workarounds for this?
EDIT: The only workaround I've come up with is to check to see if I'm on the first page (IE Skip(0)) and then make two calls, one with Take(1) and the other with Skip(1).Take(pageSize - 1) and addRange the lists together.
I've not had a chance to try this but given that the anonymous type isn't part of LINQ rather a C# construct I wonder if you could use:
from at in Context.Transaction
select new KnownType(
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select KnownSubType(
Amount = tb.Amount,
Description = tb.Desc
)
}
Obviously Details would need to be an IEnumerable collection.
I could be miles wide on this but it might at least give you a new line of thought to pursue which can't hurt so please excuse my rambling.

Natural (human alpha-numeric) sort in Microsoft SQL 2005

We have a large database on which we have DB side pagination. This is quick, returning a page of 50 rows from millions of records in a small fraction of a second.
Users can define their own sort, basically choosing what column to sort by. Columns are dynamic - some have numeric values, some dates and some text.
While most sort as expected text sorts in a dumb way. Well, I say dumb, it makes sense to computers, but frustrates users.
For instance, sorting by a string record id gives something like:
rec1
rec10
rec14
rec2
rec20
rec3
rec4
...and so on.
I want this to take account of the number, so:
rec1
rec2
rec3
rec4
rec10
rec14
rec20
I can't control the input (otherwise I'd just format in leading 000s) and I can't rely on a single format - some are things like "{alpha code}-{dept code}-{rec id}".
I know a few ways to do this in C#, but can't pull down all the records to sort them, as that would be to slow.
Does anyone know a way to quickly apply a natural sort in Sql server?
We're using:
ROW_NUMBER() over (order by {field name} asc)
And then we're paging by that.
We can add triggers, although we wouldn't. All their input is parametrised and the like, but I can't change the format - if they put in "rec2" and "rec10" they expect them to be returned just like that, and in natural order.
We have valid user input that follows different formats for different clients.
One might go rec1, rec2, rec3, ... rec100, rec101
While another might go: grp1rec1, grp1rec2, ... grp20rec300, grp20rec301
When I say we can't control the input I mean that we can't force users to change these standards - they have a value like grp1rec1 and I can't reformat it as grp01rec001, as that would be changing something used for lookups and linking to external systems.
These formats vary a lot, but are often mixtures of letters and numbers.
Sorting these in C# is easy - just break it up into { "grp", 20, "rec", 301 } and then compare sequence values in turn.
However there may be millions of records and the data is paged, I need the sort to be done on the SQL server.
SQL server sorts by value, not comparison - in C# I can split the values out to compare, but in SQL I need some logic that (very quickly) gets a single value that consistently sorts.
#moebius - your answer might work, but it does feel like an ugly compromise to add a sort-key for all these text values.
order by LEN(value), value
Not perfect, but works well in a lot of cases.
Most of the SQL-based solutions I have seen break when the data gets complex enough (e.g. more than one or two numbers in it). Initially I tried implementing a NaturalSort function in T-SQL that met my requirements (among other things, handles an arbitrary number of numbers within the string), but the performance was way too slow.
Ultimately, I wrote a scalar CLR function in C# to allow for a natural sort, and even with unoptimized code the performance calling it from SQL Server is blindingly fast. It has the following characteristics:
will sort the first 1,000 characters or so correctly (easily modified in code or made into a parameter)
properly sorts decimals, so 123.333 comes before 123.45
because of above, will likely NOT sort things like IP addresses correctly; if you wish different behaviour, modify the code
supports sorting a string with an arbitrary number of numbers within it
will correctly sort numbers up to 25 digits long (easily modified in code or made into a parameter)
The code is here:
using System;
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;
public class UDF
{
[SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic=true)]
public static SqlString Naturalize(string val)
{
if (String.IsNullOrEmpty(val))
return val;
while(val.Contains(" "))
val = val.Replace(" ", " ");
const int maxLength = 1000;
const int padLength = 25;
bool inNumber = false;
bool isDecimal = false;
int numStart = 0;
int numLength = 0;
int length = val.Length < maxLength ? val.Length : maxLength;
//TODO: optimize this so that we exit for loop once sb.ToString() >= maxLength
var sb = new StringBuilder();
for (var i = 0; i < length; i++)
{
int charCode = (int)val[i];
if (charCode >= 48 && charCode <= 57)
{
if (!inNumber)
{
numStart = i;
numLength = 1;
inNumber = true;
continue;
}
numLength++;
continue;
}
if (inNumber)
{
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
inNumber = false;
}
isDecimal = (charCode == 46);
sb.Append(val[i]);
}
if (inNumber)
sb.Append(PadNumber(val.Substring(numStart, numLength), isDecimal, padLength));
var ret = sb.ToString();
if (ret.Length > maxLength)
return ret.Substring(0, maxLength);
return ret;
}
static string PadNumber(string num, bool isDecimal, int padLength)
{
return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
}
}
To register this so that you can call it from SQL Server, run the following commands in Query Analyzer:
CREATE ASSEMBLY SqlServerClr FROM 'SqlServerClr.dll' --put the full path to DLL here
go
CREATE FUNCTION Naturalize(#val as nvarchar(max)) RETURNS nvarchar(1000)
EXTERNAL NAME SqlServerClr.UDF.Naturalize
go
Then, you can use it like so:
select *
from MyTable
order by dbo.Naturalize(MyTextField)
Note: If you get an error in SQL Server along the lines of Execution of user code in the .NET Framework is disabled. Enable "clr enabled" configuration option., follow the instructions here to enable it. Make sure you consider the security implications before doing so. If you are not the db admin, make sure you discuss this with your admin before making any changes to the server configuration.
Note2: This code does not properly support internationalization (e.g., assumes the decimal marker is ".", is not optimized for speed, etc. Suggestions on improving it are welcome!
Edit: Renamed the function to Naturalize instead of NaturalSort, since it does not do any actual sorting.
I know this is an old question but I just came across it and since it's not got an accepted answer.
I have always used ways similar to this:
SELECT [Column] FROM [Table]
ORDER BY RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))), 1000)
The only common times that this has issues is if your column won't cast to a VARCHAR(MAX), or if LEN([Column]) > 1000 (but you can change that 1000 to something else if you want), but you can use this rough idea for what you need.
Also this is much worse performance than normal ORDER BY [Column], but it does give you the result asked for in the OP.
Edit: Just to further clarify, this the above will not work if you have decimal values such as having 1, 1.15 and 1.5, (they will sort as {1, 1.5, 1.15}) as that is not what is asked for in the OP, but that can easily be done by:
SELECT [Column] FROM [Table]
ORDER BY REPLACE(RIGHT(REPLICATE('0', 1000) + LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX)))) + REPLICATE('0', 100 - CHARINDEX('.', REVERSE(LTRIM(RTRIM(CAST([Column] AS VARCHAR(MAX))))), 1)), 1000), '.', '0')
Result: {1, 1.15, 1.5}
And still all entirely within SQL. This will not sort IP addresses because you're now getting into very specific number combinations as opposed to simple text + number.
RedFilter's answer is great for reasonably sized datasets where indexing is not critical, however if you want an index, several tweaks are required.
First, mark the function as not doing any data access and being deterministic and precise:
[SqlFunction(DataAccess = DataAccessKind.None,
SystemDataAccess = SystemDataAccessKind.None,
IsDeterministic = true, IsPrecise = true)]
Next, MSSQL has a 900 byte limit on the index key size, so if the naturalized value is the only value in the index, it must be at most 450 characters long. If the index includes multiple columns, the return value must be even smaller. Two changes:
CREATE FUNCTION Naturalize(#str AS nvarchar(max)) RETURNS nvarchar(450)
EXTERNAL NAME ClrExtensions.Util.Naturalize
and in the C# code:
const int maxLength = 450;
Finally, you will need to add a computed column to your table, and it must be persisted (because MSSQL cannot prove that Naturalize is deterministic and precise), which means the naturalized value is actually stored in the table but is still maintained automatically:
ALTER TABLE YourTable ADD nameNaturalized AS dbo.Naturalize(name) PERSISTED
You can now create the index!
CREATE INDEX idx_YourTable_n ON YourTable (nameNaturalized)
I've also made a couple of changes to RedFilter's code: using chars for clarity, incorporating duplicate space removal into the main loop, exiting once the result is longer than the limit, setting maximum length without substring etc. Here's the result:
using System.Data.SqlTypes;
using System.Text;
using Microsoft.SqlServer.Server;
public static class Util
{
[SqlFunction(DataAccess = DataAccessKind.None, SystemDataAccess = SystemDataAccessKind.None, IsDeterministic = true, IsPrecise = true)]
public static SqlString Naturalize(string str)
{
if (string.IsNullOrEmpty(str))
return str;
const int maxLength = 450;
const int padLength = 15;
bool isDecimal = false;
bool wasSpace = false;
int numStart = 0;
int numLength = 0;
var sb = new StringBuilder();
for (var i = 0; i < str.Length; i++)
{
char c = str[i];
if (c >= '0' && c <= '9')
{
if (numLength == 0)
numStart = i;
numLength++;
}
else
{
if (numLength > 0)
{
sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
numLength = 0;
}
if (c != ' ' || !wasSpace)
sb.Append(c);
isDecimal = c == '.';
if (sb.Length > maxLength)
break;
}
wasSpace = c == ' ';
}
if (numLength > 0)
sb.Append(pad(str.Substring(numStart, numLength), isDecimal, padLength));
if (sb.Length > maxLength)
sb.Length = maxLength;
return sb.ToString();
}
private static string pad(string num, bool isDecimal, int padLength)
{
return isDecimal ? num.PadRight(padLength, '0') : num.PadLeft(padLength, '0');
}
}
Here's a solution written for SQL 2000. It can probably be improved for newer SQL versions.
/**
* Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
*
* #author Alexandre Potvin Latreille (plalx)
* #param {nvarchar(4000)} string The formatted string.
* #param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
* #param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
*
* #return {nvarchar(4000)} A string for natural sorting.
* Example of use:
*
* SELECT Name FROM TableA ORDER BY Name
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1-1.
* 2. A1-1. 2. A1.
* 3. R1 --> 3. R1
* 4. R11 4. R11
* 5. R2 5. R2
*
*
* As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
* We can use this function to fix this.
*
* SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1.
* 2. A1-1. 2. A1-1.
* 3. R1 --> 3. R1
* 4. R11 4. R2
* 5. R2 5. R11
*/
ALTER FUNCTION [dbo].[udf_NaturalSortFormat](
#string nvarchar(4000),
#numberLength int = 10,
#sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
DECLARE #sortString varchar(4000),
#numStartIndex int,
#numEndIndex int,
#padLength int,
#totalPadLength int,
#i int,
#sameOrderCharsLen int;
SELECT
#totalPadLength = 0,
#string = RTRIM(LTRIM(#string)),
#sortString = #string,
#numStartIndex = PATINDEX('%[0-9]%', #string),
#numEndIndex = 0,
#i = 1,
#sameOrderCharsLen = LEN(#sameOrderChars);
-- Replace all char that have the same order by a space.
WHILE (#i <= #sameOrderCharsLen)
BEGIN
SET #sortString = REPLACE(#sortString, SUBSTRING(#sameOrderChars, #i, 1), ' ');
SET #i = #i + 1;
END
-- Pad numbers with zeros.
WHILE (#numStartIndex <> 0)
BEGIN
SET #numStartIndex = #numStartIndex + #numEndIndex;
SET #numEndIndex = #numStartIndex;
WHILE(PATINDEX('[0-9]', SUBSTRING(#string, #numEndIndex, 1)) = 1)
BEGIN
SET #numEndIndex = #numEndIndex + 1;
END
SET #numEndIndex = #numEndIndex - 1;
SET #padLength = #numberLength - (#numEndIndex + 1 - #numStartIndex);
IF #padLength < 0
BEGIN
SET #padLength = 0;
END
SET #sortString = STUFF(
#sortString,
#numStartIndex + #totalPadLength,
0,
REPLICATE('0', #padLength)
);
SET #totalPadLength = #totalPadLength + #padLength;
SET #numStartIndex = PATINDEX('%[0-9]%', RIGHT(#string, LEN(#string) - #numEndIndex));
END
RETURN #sortString;
END
I know this is a bit old at this point, but in my search for a better solution, I came across this question. I'm currently using a function to order by. It works fine for my purpose of sorting records which are named with mixed alpha numeric ('item 1', 'item 10', 'item 2', etc)
CREATE FUNCTION [dbo].[fnMixSort]
(
#ColValue NVARCHAR(255)
)
RETURNS NVARCHAR(1000)
AS
BEGIN
DECLARE #p1 NVARCHAR(255),
#p2 NVARCHAR(255),
#p3 NVARCHAR(255),
#p4 NVARCHAR(255),
#Index TINYINT
IF #ColValue LIKE '[a-z]%'
SELECT #Index = PATINDEX('%[0-9]%', #ColValue),
#p1 = LEFT(CASE WHEN #Index = 0 THEN #ColValue ELSE LEFT(#ColValue, #Index - 1) END + REPLICATE(' ', 255), 255),
#ColValue = CASE WHEN #Index = 0 THEN '' ELSE SUBSTRING(#ColValue, #Index, 255) END
ELSE
SELECT #p1 = REPLICATE(' ', 255)
SELECT #Index = PATINDEX('%[^0-9]%', #ColValue)
IF #Index = 0
SELECT #p2 = RIGHT(REPLICATE(' ', 255) + #ColValue, 255),
#ColValue = ''
ELSE
SELECT #p2 = RIGHT(REPLICATE(' ', 255) + LEFT(#ColValue, #Index - 1), 255),
#ColValue = SUBSTRING(#ColValue, #Index, 255)
SELECT #Index = PATINDEX('%[0-9,a-z]%', #ColValue)
IF #Index = 0
SELECT #p3 = REPLICATE(' ', 255)
ELSE
SELECT #p3 = LEFT(REPLICATE(' ', 255) + LEFT(#ColValue, #Index - 1), 255),
#ColValue = SUBSTRING(#ColValue, #Index, 255)
IF PATINDEX('%[^0-9]%', #ColValue) = 0
SELECT #p4 = RIGHT(REPLICATE(' ', 255) + #ColValue, 255)
ELSE
SELECT #p4 = LEFT(#ColValue + REPLICATE(' ', 255), 255)
RETURN #p1 + #p2 + #p3 + #p4
END
Then call
select item_name from my_table order by fnMixSort(item_name)
It easily triples the processing time for a simple data read, so it may not be the perfect solution.
Here is an other solution that I like:
http://www.dreamchain.com/sql-and-alpha-numeric-sort-order/
It's not Microsoft SQL, but since I ended up here when I was searching for a solution for Postgres, I thought adding this here would help others.
EDIT: Here is the code, in case the link goes away.
CREATE or REPLACE FUNCTION pad_numbers(text) RETURNS text AS $$
SELECT regexp_replace(regexp_replace(regexp_replace(regexp_replace(($1 collate "C"),
E'(^|\\D)(\\d{1,3}($|\\D))', E'\\1000\\2', 'g'),
E'(^|\\D)(\\d{4,6}($|\\D))', E'\\1000\\2', 'g'),
E'(^|\\D)(\\d{7}($|\\D))', E'\\100\\2', 'g'),
E'(^|\\D)(\\d{8}($|\\D))', E'\\10\\2', 'g');
$$ LANGUAGE SQL;
"C" is the default collation in postgresql; you may specify any collation you desire, or remove the collation statement if you can be certain your table columns will never have a nondeterministic collation assigned.
usage:
SELECT * FROM wtf w
WHERE TRUE
ORDER BY pad_numbers(w.my_alphanumeric_field)
For the following varchar data:
BR1
BR2
External Location
IR1
IR2
IR3
IR4
IR5
IR6
IR7
IR8
IR9
IR10
IR11
IR12
IR13
IR14
IR16
IR17
IR15
VCR
This worked best for me:
ORDER BY substring(fieldName, 1, 1), LEN(fieldName)
If you're having trouble loading the data from the DB to sort in C#, then I'm sure you'll be disappointed with any approach at doing it programmatically in the DB. When the server is going to sort, it's got to calculate the "perceived" order just as you would have -- every time.
I'd suggest that you add an additional column to store the preprocessed sortable string, using some C# method, when the data is first inserted. You might try to convert the numerics into fixed-width ranges, for example, so "xyz1" would turn into "xyz00000001". Then you could use normal SQL Server sorting.
At the risk of tooting my own horn, I wrote a CodeProject article implementing the problem as posed in the CodingHorror article. Feel free to steal from my code.
Simply you sort by
ORDER BY
cast (substring(name,(PATINDEX('%[0-9]%',name)),len(name))as int)
##
I've just read a article somewhere about such a topic. The key point is: you only need the integer value to sort data, while the 'rec' string belongs to the UI. You could split the information in two fields, say alpha and num, sort by alpha and num (separately) and then showing a string composed by alpha + num. You could use a computed column to compose the string, or a view.
Hope it helps
You can use the following code to resolve the problem:
Select *,
substring(Cote,1,len(Cote) - Len(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1)))alpha,
CAST(RIGHT(Cote, LEN(Cote) - PATINDEX('%[0-9]%', Cote)+1) AS INT)intv
FROM Documents
left outer join Sites ON Sites.IDSite = Documents.IDSite
Order BY alpha, intv
regards,
rabihkahaleh#hotmail.com
I'm fashionably late to the party as usual. Nevertheless, here is my attempt at an answer that seems to work well (I would say that). It assumes text with digits at the end, like in the original example data.
First a function that won't end up winning a "pretty SQL" competition anytime soon.
CREATE FUNCTION udfAlphaNumericSortHelper (
#string varchar(max)
)
RETURNS #results TABLE (
txt varchar(max),
num float
)
AS
BEGIN
DECLARE #txt varchar(max) = #string
DECLARE #numStr varchar(max) = ''
DECLARE #num float = 0
DECLARE #lastChar varchar(1) = ''
set #lastChar = RIGHT(#txt, 1)
WHILE #lastChar <> '' and #lastChar is not null
BEGIN
IF ISNUMERIC(#lastChar) = 1
BEGIN
set #numStr = #lastChar + #numStr
set #txt = Substring(#txt, 0, len(#txt))
set #lastChar = RIGHT(#txt, 1)
END
ELSE
BEGIN
set #lastChar = null
END
END
SET #num = CAST(#numStr as float)
INSERT INTO #results select #txt, #num
RETURN;
END
Then call it like below:
declare #str nvarchar(250) = 'sox,fox,jen1,Jen0,jen15,jen02,jen0004,fox00,rec1,rec10,jen3,rec14,rec2,rec20,rec3,rec4,zip1,zip1.32,zip1.33,zip1.3,TT0001,TT01,TT002'
SELECT tbl.value --, sorter.txt, sorter.num
FROM STRING_SPLIT(#str, ',') as tbl
CROSS APPLY dbo.udfAlphaNumericSortHelper(value) as sorter
ORDER BY sorter.txt, sorter.num, len(tbl.value)
With results:
fox
fox00
Jen0
jen1
jen02
jen3
jen0004
jen15
rec1
rec2
rec3
rec4
rec10
rec14
rec20
sox
TT01
TT0001
TT002
zip1
zip1.3
zip1.32
zip1.33
I still don't understand (probably because of my poor English).
You could try:
ROW_NUMBER() OVER (ORDER BY dbo.human_sort(field_name) ASC)
But it won't work for millions of records.
That why I suggested to use trigger which fills separate column with human value.
Moreover:
built-in T-SQL functions are really
slow and Microsoft suggest to use
.NET functions instead.
human value is constant so there is no point calculating it each time
when query runs.

Resources