Nhibernate Paging performance on large table (10,000,000 rows)

Nhibernate Paging performance on large table (10,000,000 rows) - sql-server

I have a table that is rather large at about 10,000,000 rows. I need to page through this table from my C# application. I'm using NHibernate. I have tried to use this code example:
return session.CreateCriteria(typeof(T))
.SetFirstResult(startId)
.SetMaxResults(pageSize)
.List<T>();
When I execute it the operation eventually times out if my startId is greater than 7,000,000. The pageSize I'm using is 200. I have used this method on much smaller tables, of less than 1000 rows, and it works and performs quickly.
Question is, on such a large table is there a better way to accomplish this using NHibernate?

You're trying to page through 10 million rows 200 at a time? Why? No human being is going to page through that much data.
You need to filter the dataset first and then apply TSQL style paging to the smaller data set. Here are some methods that will work. Just modify them so that you're getting to less than 10million rows through some kind of filtering (a WHERE clause, CTE, or derived table).

Funny you should bring this up, as I am having the same issue. My issue isn't related to paging using NHibernate, but more with just using straight T-SQL.
It seems as though there are a few options. The one I found quite useful in my instance was this answer to a question regarding paging. It discusses using a "..keyset driven solution" rather than return ranked results through the use of ROW_NUMBER(). I'm not sure what NHibernate would use in this instance or if it's possible to see the SQL it generates based on the query you issue (I know you could in Hibernate, but I've not used NHibernate).
If you aren't aware of the using SQL SERVER to returned ranked results based on ROW_NUMBER, then it's well worth looking into. A lot of people seem to refer to this article as to how to go about paging. I've seen some subsequent posts discourage the use of SET ROWCOUNT though in favour of using TOP with a dynamic parameter - SELECT TOP(#NumOfResults).
There are lots of posts here on SO regarding this, but no definitive answer on the best way to go about it as far as I can see. I'll be keeping an eye on this post to see what others suggest also.

It could by Isolation Layer problem.
I had a similar issues.
If the table your reading from is constantly updated, the updater locks parts of the table, causing timeout then reading from the table.
Add SetIsolationLayer(ReadUncommitted) you must note that the data might be a little dirty.

Related

sql | slow queries | avoid many joins

I am currently working with java spring and postgres.
I have a query on a table, many filters can be applied to the query and each filter needs many joins.
This query is very slow, due to the number of joins that must be performed, also because there are many elements in the table.
Foreign keys and indexes are correctly created.
I know one approach could be to keep duplicate information to avoid doing the joins. By this I mean creating a new table called infoSearch and keeping it updated via triggers. At the time of the query, perform search operations on said table. This way I would do just one join.
But I have some doubts:
What is the best approach in postgres to save item list flat?
I know there is a json datatype, could I use this to hold the information needed for the search and use jsonPath? is this performant with lists?
I also greatly appreciate any advice on another approach that can be used to fix this.
Is there any software that can be used to make this more efficient?
I'm wondering if it wouldn't be more performant to move to another style of database, like graph based. At this point the only problem I have is with this specific table, the rest of the problem is simple queries that adapt very well to relational bases.
Is there any scaling stat based on ratios and number of items which base to choose from?

Denormalization is a tried and true way to speed up queries/reports/searching processes for relational databases. It uses a standard time vs space tradeoff to reduce the time of query, at the cost of duplicating the data and increasing write/insert time.
There are third party tools that are specifically designed for this use-case, including search tools (like ElasticSearch, Solr, etc) and other document-centric databases. Graph databases are probably not useful in this context. They are focused on traversing relationships, not broad searches.

To use or not to use computed columns for performance and maintainability

I have a table where am storing a startingDate in a DateTime column.
Once i have the startingDate value, am supposed to calculate the
number_of_days,
number_of_weeks
number_of_months and
number_of_years
all from the startingDate to the current date.
If you are going to use these values in two or more places in the application and you do care much about the applications response time, would you rather make the calculations in a view or create computed columns for each so you can query the table directly?

Computed columns are easy to maintain and provide an ideal solution to your problem – I have used such a solution recently. However, be aware the values are calculated when requested (when they are SELECTed), not when the row is INSERTed into the table – so performance might still be an issue. This might be acceptable if you can off-load work from the application server to the database server. Views also don’t exist until they are requested (unless they are materialised) so, again, there will be an overhead at runtime, but, again it’s on the database server, not the application server.

Like nearly everything: It depends.
As #RedX suggest it probably not much of a performance difference either way, so it becomes a question of how will use them. To me this is more of a feel thing.
Using them more than once doesn't wouldn't necessary drive me immediately to either a view or computed columns. If I only use them in a few places or low volume code paths I might calc them in-line in those places or use a CTE. But if the are in wide spread or heavy use I would agree with a view or computed column.
You would also want them in a view or cc if you want them available via ORM tools.
Am I using those "computed columns" individual in places or am I using them in sets? If using them in sets I probably want a view of the table that shows included them all.
When i need them do I usually want them associated with data from a particular other table? If so that would suggest a view.
Am I basing updates on the original table of those computed values? If so then I want computed columns to avoid joining the view in these case.

Calculated columns may seem an easy solution at first, but I have seen companies have trouble with them because when they try to do ETL with CDC for real-time Change Data Capture with tools like Attunity it will not recognize the calculated columns since the values are not there permanently. So there are some issues. Also if the columns will be retrieve many, many times by users, you will save time in the long run by putting that logic in the ETL tool or procedure and write it once to the database instead of calculating it many times for each request.

Searching a nvarchar(max) field

Our application connects to a SQL Server database. There is a column that is nvarchar(max) that has been added an must be included in the search. The number of records in the this DB is only in the 10s of thousands and there are only a few hundred people using the application. I'm told to explore Full Text Search, is this necessary?

This is like asking, I work 5 miles away, and I was told to consider buying a car. Is this necessary? Too many variables to give you a simple and correct answer to your question. For example, is it a nice walk? Is there public transit available? Is your schedule flexible? Do you have to go far for lunch or run errands after work?
Full-Text Search can help if your typical searches are going to be WHERE col LIKE '%foo%' - but whether it is necessary depends on how large this column will get, whether your searches are true wildcard searches, your tolerance for millisecond vs. nanosecond queries, the amount of concurrency, even seemingly extraneous stuff like whether the data is always in memory and can be searched more efficiently.
The better answer is that you should try it. Populate a table with a copy of your data, add a full-text index, and see if your typical workload improves by using full-text queries instead of LIKE. It probably will, but there's no way for us to know for sure even if you add more specifics than ballpark row counts.

In a similar situation I ended up making a table structure that was more search friendly and indexable, then setting up a batch job to copy records from the live database to the reporting one.
In my case the original data didn't come close to needing an nvarchar(max) column so I could get away with that. Your mileage may vary. In any case, the answer is "try a few things and see what works for you".

Can Joins between views and table hurt performance?

I am new in sql server,
my manager has given me job where i have to find out performance of query in sql server 2008.
That query is very complex having joins between views and table. I read in internet that joins between views and table cause performance hurt?
If any expert can help me on this? Any good link where i found knowledge of this? How to calculate query performance in sql server?

Look at the query plan - it could be anything. Missing indexes on one of the underlying view tables, missing indexes on the join table or something else.
Take a look at this article by Gail Shaw about finding performance issues in SQL Server - part 1, part 2.

A view (that isn't indexed/materialised) is just a macro: no more, no less
That is, it expands into the outer query. So a join with 3 views (with 4, 5, and 6 joins respectively) becomes a single query with 15 JOINs.
This is a common "I can reuse it" mistake: DRY doesn't usually apply to SQL code.
Otherwise, see Oded's answer

Oded is correct, you should definitely start with the query plan. That said, there are a couple of things that you can already see at a high level, for example:
CONVERT(VARCHAR(8000), CN.Note) LIKE '%T B9997%'
LIKE searches with a wildcard at the front are bad news in terms of performance, because of the way indexes work. If you think about it, it's easy to find all people in the phone book whose name starts with "Smi". If you try to find all people who have "mit" anywhere in their name, you will find that you need to read the entire phone book. SQL Server does the same thing - this is called a full table scan, and is typically quite slow. The other issue is that the left side of the condition uses a function to modify the column (specifically, converting it to a varchar). Essentially, this again means that SQL Server cannot use an index, even if there was one for the column CN.Note.
My guess is that the column is a text column, and that you will not be allowed to change the filter logic to remove the wildcard at the beginning of the search. In this case, I would recommend looking into Full-Text Search / Indexing functionality. By enabling full text indexing, and using specific keywords such as CONTAINS, you should get better performance.
Again (as with all performance optimisation scenarios), you should still start with the query plan to see if this really is the biggest problem with the query.

Multi Criteria Search Algorithm

Here's the problem : I've got a huuge (well at my level) mysql database with technical products in it. I ve got something like 150k rows of products in my database plus 10 to 20 others tables with the same amount of rows. Each tables contains a lot of criteria. Some of the criteria are text values, some are decimal, some are just boolean. I would like to provide a web access (php) to this database with filters on each criteria but I dont know how to do that really fast. I started to create a big table with all colums merged to avoid multiple join, it's cool, faster than the big join but still very very slow. Putting an index on all criteria, doesnt improve things (and i heard it was a bad idea). I was wondering if there were some cool algorithms that could help me preprocess the multi criteria search. Any idea ?
Thanks ahead.

If you're frustrated trying to do this in SQL, you could take a look at Lucene. It lets you do range searches, full text, etc.

Try Full Text Search
You might want to try globbing your text fields together and doing full text search.
Optimize Your Queries
For the other columns, rank them in order of how frequently you expect them to be used.
Write a test suite of queries, and run them all to get a sense of the performance. Then start adding indexes, and see how it affects performance. Keep adding indexes while the performance gets better. Stop when it gets worse.
Use Explain Plan
Since you didn't provide your SQL or table layout, I can't be more specific. But use the Explain Plan command to make sure your queries are hitting indexes, rather than doing table scans. This can be tricky since subtle stuff like the order of the columns in the query can affect whether or not an index is operative.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Nhibernate Paging performance on large table (10,000,000 rows) - sql-server

It could by Isolation Layer problem. I had a similar issues. If the table your reading from is constantly updated, the updater locks parts of the table, causing timeout then reading from the table. Add SetIsolationLayer(ReadUncommitted) you must note that the data might be a little dirty.

Related

sql | slow queries | avoid many joins

To use or not to use computed columns for performance and maintainability

Searching a nvarchar(max) field

Can Joins between views and table hurt performance?

Multi Criteria Search Algorithm

Categories

Resources