Get "surrounding" rows in NHibernate query - sql-server

I am looking for a way to retrieve the "surrounding" rows in a NHibernate query given a primary key and a sort order?
E.g. I have a table with log entries and I want to display the entry with primary key 4242 and the previous 5 entries as well as the following 5 entries ordered by date (there is no direct relation between date and primary key). Such a query should return 11 rows in total (as long as we are not close to either end).
The log entry table can be huge and retrieving all to figure it out is not possible.
Is there such a concept as row number that can be used from within NHibernate? The underlying database is either going to be SQlite or Microsoft SQL Server.
Edited Added sample
Imagine data such as the following:
Id Time
4237 10:00
4238 10:00
1236 10:01
1237 10:01
1238 10:02
4239 10:03
4240 10:04
4241 10:04
4242 10:04 <-- requested "center" row
4243 10:04
4244 10:05
4245 10:06
4246 10:07
4247 10:08
When requesting the entry with primary key 4242 we should get the rows 1237, 1238 and 4239 to 4247. The order is by Time, Id.
Is it possible to retrieve the entries in a single query (which obviously can include subqueries)? Time is a non-unique column so several entries have the same value and in this example is it not possible to change the resolution in a way that makes it unique!

"there is no direct relation between date and primary key" means, that the primary keys are not in a sequential order?
Then I would do it like this:
Item middleItem = Session.Get(id);
IList<Item> previousFiveItems = Session.CreateCriteria((typeof(Item))
.Add(Expression.Le("Time", middleItem.Time))
.AddOrder(Order.Desc("Time"))
.SetMaxResults(5);
IList<Item> nextFiveItems = Session.CreateCriteria((typeof(Item))
.Add(Expression.Gt("Time", middleItem.Time))
.AddOrder(Order.Asc("Time"))
.SetMaxResults(5);
There is the risk of having several items with the same time.
Edit
This should work now.
Item middleItem = Session.Get(id);
IList<Item> previousFiveItems = Session.CreateCriteria((typeof(Item))
.Add(Expression.Le("Time", middleItem.Time)) // less or equal
.Add(Expression.Not(Expression.IdEq(middleItem.id))) // but not the middle
.AddOrder(Order.Desc("Time"))
.SetMaxResults(5);
IList<Item> nextFiveItems = Session.CreateCriteria((typeof(Item))
.Add(Expression.Gt("Time", middleItem.Time)) // greater
.AddOrder(Order.Asc("Time"))
.SetMaxResults(5);

This should be relatively easy with NHibernate's Criteria API:
List<LogEntry> logEntries = session.CreateCriteria(typeof(LogEntry))
.Add(Expression.InG<int>(Projections.Property("Id"), listOfIds))
.AddOrder(Order.Desc("EntryDate"))
.List<LogEntry>();
Here your listOfIds is just a strongly typed list of integers representing the ids of the entries you want to retrieve (integers 4242-5 through 4242+5 ).
Of course you could also add Expressions that let you retrieve Ids greater than 4242-5 and smaller than 4242+5.

Stefan's solution definitely works but better way exists using a single select and nested Subqueries:
ICriteria crit = NHibernateSession.CreateCriteria(typeof(Item));
DetachedCriteria dcMiddleTime =
DetachedCriteria.For(typeof(Item)).SetProjection(Property.ForName("Time"))
.Add(Restrictions.Eq("Id", id));
DetachedCriteria dcAfterTime =
DetachedCriteria.For(typeof(Item)).SetMaxResults(5).SetProjection(Property.ForName("Id"))
.Add(Subqueries.PropertyGt("Time", dcMiddleTime));
DetachedCriteria dcBeforeTime =
DetachedCriteria.For(typeof(Item)).SetMaxResults(5).SetProjection(Property.ForName("Id"))
.Add(Subqueries.PropertyLt("Time", dcMiddleTime));
crit.AddOrder(Order.Asc("Time"));
crit.Add(Restrictions.Eq("Id", id) || Subqueries.PropertyIn("Id", dcAfterTime) ||
Subqueries.PropertyIn("Id", dcBeforeTime));
return crit.List<Item>();
This is NHibernate 2.0 syntax but the same holds true for earlier versions where instead of Restrictions you use Expression.
I have tested this on a test application and it works as advertised

Related

Power BI combine results from two SQL-Server tables

While using Power BI for a few months now, we (the user group) encountered an issue that is not really clear to us.
We use Power-BI with a remote SQL-Server data source, we access the data source through direct query.
Let's pretend we have 2 Tables as below-
Table name: Issue
Column:
ResolutionTime(Date/Time)
IssueID(Unique Numbers)
Table Name: WorkItem
Column:
start (Date/Time)
end (Date/Time)
IssueID (Unique Numbers, Foreign Key to "Issue" table)
Table WorkItem also contain a calculated column "WorkTime" which uses this DAX-expression as below-
WorkTime = WorkItem[end] - WorkItem[start]
The two tables are configured through Power-Bi having a two-way 1:n relationship that can be queried to collect all "WorkItem"(s) assigned to an "Issue" entry, using the "IssueID" as correlation column.
To be able to compute the aggregated "work-time" for each "WorkItem", we use a new/calculated table with the following DAX expression to aggregate the total amount of time invested for a single "Issue":
SumWork =
SUMMARIZE(
WorkItem, WorkItem[IssueID], "All work per item", SUM(WorkItem[WorkTime])
)
The above table computes the total invested work-time for a particular issue, grouping/summarizing results based on the "IssueID" foreign key. This new calculated table is also configured to have a relationship with the "Issue" table, this time a "1:1" relationship, using the IssueID as correlation column.
Now to compute the time that the issue was worked on + the time for Resolution should be summarized in a calculated column inside "Issue", but this does not work:
ResolutionAndWorkTime = Issue[ResolutionTime] + SumWork["All work per item"]
But the above DAX expression fails to compile, as it always reports that it returns "more than one result", thus not being a singular result. But that is suprising, as the two table ("Issue" and "SumWork" are related to each other with a "1:1" relationship).
Tables:
Issues
IssueID ResolutionTime ResolutionAndWorkTime
1 03:20:20 ???
2 01:20:20 ???
3 00:20:20 ???
WorkItem
IssueID start end WorkTime
1 1-2-2020 3:20:20 1-2-2020 3:25:20 00:05:00
1 2-2-2020 6:20:20 2-2-2020 7:20:20 01:00:00
3 1-3-2020 3:20:20 1-3-2020 3:29:20 00:09:00
Any ideas what to look for? Data-types? Table-definition? Table-relationships? We checked other Stackoverflow questions/answers, but no good ideas retrieved so far.
NOTE that a lot of join/merge features of Power BI are not available if direct-query is used and thus joining the tables is not really an option (we think).
You need this following code for your new Calculated column.
Visit HERE To know more about RELATED.
ResolutionAndWorkTime = Issues[ResolutionTime] + RELATED(SumWork[All work per item])
Based on input provided by "mkRabbani" (see other answer) we investigated why "RELATED" does not function as expected. The problem originates in the access to the database. As suspected earlier the function delivers the expected results once the database access is switched to "import" instead of "direct-query".
As a workaround we now joins the data inside the SQL server by using traditional database views. Of course this only works for scenarios where the database is under control of the data analytics team.

Redis storing historical data

I have data generated on daily basis.
let me explain through a example:
On World Market, the price of Gold change on seconds interval basis. and i want to store that price in Redis DBMS.
Gold 22 JAN 11.02PM X-price
22 JAN 11.03PM Y-Price
...
24 DEC 11.04PM X1-Price
Silver 22 JAN 11.02PM M-Price
22 JAN 11.03PM N-Price
I want to store this data on daily basis. want to apply ML (Machine Leaning) on last 52 Week data. Is this possible?
Because As my knowledge goes. redis work on Key Value.
if this is possible. Can i get data from a specific date(04 July) and DateRange(01 Feb to 31 Mar)
In redis, a Sorted Set is appropriate for time series data. If you score each entry with the timestamp of the price quote you can quickly access a whole day or group of days using the ZRANGEBYSCORE command (or ZSCAN if you need paging).
The quote data can be stored right in the sorted set. If you do this make sure each entry is unique. Adding a record to a sorted set that is identical to an existing one just updates the existing record's score (timestamp). This moves the old record to the present and erases it from the past, which is not what you want.
I would recommend only storing a unique key/ID for each quote in the sorted set, and store the data in it's own key or hash field. This will allow you to create additional indexes for the data as needed and access specific records more easily if necessary.

Advice on sql server index performance

I have "UserLog" table with 15 millions rows.
On this table I have a cluster index on User_ID field which is of type bigint identity, and a non clustered index on User_uid field which is of type varchar(35) (a fake uniqueidentifier).
On my application we can have 2 categories of users connection. 1 of them concerns only 0.007% of rows (about 1150 raws over 15 millions) and the 2nd concerns the remaining rows (99%).
The objective is to improve performance of the 0.007% users connection.
That's why I create a 'split' field "Userlog_ID" with type bit with default value of 0. So for each user connection we insert a new row in Userlog (with 0 as a value for User_log).
This field (User_Log) will then be update and it will take either 0 (for more then 99% of rows) or 1 (for 0.007% of rows) depending on the user category.
I create then a non clustered index on this field (User_log).
the select statment I want to optimize is:
SELECT User_UID, User_LastAuthentificationDate,
Language_ID,User_SecurityString
FROM dbo.UserLog
WHERE User_Active = 1
AND User_UID = '00F5AA38-C288-48C6-8ED1922601819486'
So the idea is now to add a filter on User_Log field to optimise the performance (precisely the index seek operator), only when the user belongs to the category 1 (0.007%):
SELECT User_UID, User_LastAuthentificationDate, Language_ID,User_SecurityString
FROM dbo.UserLog
WHERE User_Active = 1
AND User_UID = '00F5AA38-C288-48C6-8ED1922601819486'
and User_Log = 1
In my mind, I have the idea that since we add this filter the index seek will perform better because we have now a smaller result set.
Unfortunately, when I compare the 2 queries with the estimated execution plan I obtain 50% for each query. For both queries the optimiser use an index seek on user_uid non clustered index and then a key lookup on the cluster index (User_id).
So In conclusion by adding the split field and a non clustered index (either normal or filtered) on it, I don't improve the performance.
Can anyone have an explanation why. Maybe my reasoning and my interpretation are totaly wrong.
Thank you

DB Design: Sort Order for Lookup Tables

I have an application where the database back-end has around 15 lookup tables. For instance there is a table for Counties like this:
CountyID(PK) County
49001 Beaver
49005 Cache
49007 Carbon
49009 Daggett
49011 Davis
49015 Emery
49029 Morgan
49031 Piute
49033 Rich
49035 Salt Lake
49037 San Juan
49041 Sevier
49043 Summit
49045 Tooele
49049 Utah
49051 Wasatch
49057 Weber
The UI for this app has a number of combo boxes in various places for these lookup tables, and my client has asked that the boxes list in this case:
CountyID(PK) County
49035 Salt Lake
49049 Utah
49011 Davis
49057 Weber
49045 Tooele
'The Rest Alphabetically
The best plan I have for accomplishing this is to add a column to each lookup table for SortOrder(numeric). I had a colleague tell me he thought that would cause the tables to violate 3rd-Normal-Form, but I think the sort order still depends on the key and only the key (even though the rest of the list is alphabetical).
Is adding the SortOrder column the best way to do this, or is there a better way I am just not seeing?
I agree with #cletus that a sort order column is a good way to go and it does not violate 3NF (because, as you said, the sort order column entries are functionally dependent on the candidate keys of the table).
I'm not sure I agree that alphanumeric is better than numeric. In the specific case of counties, there are seldom new ones created. But there is no requirement that the numbers assigned are sequential; you can allocate them with numbers that are a multiple of a hundred, for example, leaving ample room for insertions.
Yes I agree a sort order column is the best solution when the requirements call for a custom sort order like the one you cite. I wouldn't go with a numeric column however. If the data is alphanumeric, the sort order should be alphanumeric. That way you can seed the value with whatever is in the county field.
If you use a numeric field you'll have to resequence the entire table (potentially) whenever you add a new entry. So:
Columns: ID, County, SortOrder
Seed:
UPADTE County SET SortOrder = CONCAT('M-', County)
and for the special cases:
UPDATE County
SET SortOrder = CONCAT('E-' . County)
WHERE County IN ('Salt Lake', 'Utah', 'Davis', 'Weber', 'Tooele')
Arguably you may want to put another marker column in to indicate those entries are special.
I went with numeric and large multiples.
Even with the CONCAT('E-'.. example, I don't get the required sort order. That would give me Davis, SL, Tooele... and Salt Lake needs to be first.
I ended up using multiples of 10 and assigned the non-special-sort entries a value like 10000. That way the view for each lookup can have
ORDER BY SortOrder ASC, OtherField ASC
Another programmer suggested using DECODE in Oracle, or CASE statements in SQL Server, but this is a more general solution. YMMV.

Is there a difference between id,date or date,id index in sql server

I'm creating an index in sql server 2005 and the discussion with a coworker is if it makes a difference between index key columns being id and date vs date then id.
Is there a fundamental difference in the way the index would be created in either scenario?
Would it make a difference in other versions of SQL server?
Thanks
Yes, definitely. Does anyone ever query the table for JUST date or JUST id? An index of date,id can be used to look up just date, but not just id, and vice-versa
Using date,id:
Jan 1 4
Jan 1 7
Jan 2 6
Jan 2 9
Jan 2 33
Jan 3 23
Jan 4 1
Using id,date:
1 Jan 4
4 Jan 1
6 Jan 2
7 Jan 1
9 Jan 2
23 Jan 3
33 Jan 2
If your WHERE clause or a JOIN in your query is using both date and id, then either index is fine. But you can see that if you're doing a lookup just by date, the first index is useful for that, but the second one is totally random.
In a more general sense, an index on A, B, C, D is going to be useful for queries on A,B,C,D, OR A,B,C OR A,B OR just A.
The order of columns does matter when it comes to indexes. Whether or not it'll matter in your case depends.
Let me explain.
Let's say you have a person table, with first, last, and middle name.
So you create this index, with the columns in the following order:
FirstName, MiddleName, LastName
Now, let's say you now do a query using a WHERE on all of those columns. It'll use the entire index.
But, let's say you only query on first and last name, what happens now is that while it will still use the query, it will grab the range of the index that has the same first name as your WHERE-clause, then scan those, retrieving those that have a matching last name. Note, it will scan all the rows with the same first name.
However, if you had rearranged the index, like this:
FirstName, LastName, MiddleName
Then the above query would grab the range of the index that has the same first and last name, and retrieve those.
It's easier to grasp if you look at it in another way.
The phone book is sorted by last name, then firstname and middle name. If you had put middle name between first and last name, and sorted, then people with the same first and last name would seemingly be all over the place, simply because you sorted on middle name before first name.
Hence, if you're looking for my name, which is "Lasse Vågsæther Karlsen", you'll find all the Karlsen-people, we would be located in a sequential list in the phone book, but my name would be seemingly randomly placed, simply because the list would then be sorted by Vågsæther.
So an index can be used, even if the query doesn't use all the columns in the index, but the quick lookup-features only work as long as the columns are listed at the front of the index. Once you skip a column, some kind of scan takes place.
Now, if all your queries use both id and date, it won't matter much, but if all the queries include date, and only some of them contain an id, then I'd put date first, and id second, this way the index would be used in more cases.
Yes, it does matter. Suppose you create an index on columns (A, B). You can do a SELECT with a WHERE clause including both columns and the index can be used. The index will also be used if you do a SELECT with a WHERE that only includes column A. But if you do a SELECT with a WHERE that only includes column B, the index can't be used.
See here for more info.

Resources