Fetching a large block of data using NHibernate

Fetching a large block of data using NHibernate - database

I have some 3 million rows of data to show on data grid using C#.
Currently using NHibernate to fetch data from database sqlserver 2005.
NHibernate takes lots of time to get data. Is there any way to retrieve data from data from database efficiently using NHibernate.
---Edit----
As the application has huge data to operate upon, loading all rows is just a worst case scenario. In a normal use user will load 10k rows. No of displaying rows can be minimised by using paging but as some rows are dependent on others I need to load all data while initializing the app.
NHibernate gets slow even with 1000 rows. Any suggestions to improve the performance.
Thanks.

You could use Stateless Session to get the data.
http://ayende.com/blog/4137/nhibernate-perf-tricks
http://darioquintana.com.ar/blogging/2007/10/08/statelesssession-nhibernate-without-first-level-cache/
But first ask your self do you really need to display millions of rows. What value does that give to your user? Can they easily locate the data they want?
Also, DataGrid itself will take a large amount of memory (regardless whether you are using Windows Forms, WPF, ASP.NET...). Memory to store the data itself, plus the memory to store additional DataGrid column / cell metadata.
Consider having only a subset of data instead. You could either allow the user to filter thru the data and/or add paging. Filter and paging can be translated to HQL / Criteria / Linq / QueryOver queries and eventually to SQL queries.

My recommendation here is not to use an ORM to fetch this size of data but then my second point is why would you want to fetch 3 Million rows of data and show it in a grid?
No user can possibly want to scroll through 3 Million lines of a table.
You could use a paged data system to request only the page you are viewing at any one time. Or you could filter the data for a smaller subset that the user is interested in.
If you have 3 Million records maybe the data needed is an analysis of those records.
I would take a look at some of these resources:
http://msdn.microsoft.com/en-us/magazine/cc164022.aspx
http://weblogs.asp.net/rajbk/archive/2010/05/08/asp-net-mvc-paging-sorting-filtering-using-the-mvccontrib-grid-and-pager.aspx

As the application grows in complexity and the data on which they thrive upon becomes complex, the generality of such OR mapping engines may lead to various performance and scalability bottleneck. Applications cannot scale linearly with the increasing transactional requirements. Caching is one technique which can enhance the performance of the NHibernate app. Infect NHibernate provides a basic, not-so-sophisticated in-process L1 cache out of box. However, it doesn’t provide features that a caching solution must have to have a notable impact on the application performance. So its better to use a 2nd level cache, it may help you to increase the performance of your Nhibernate App. There are many third party NHibernate 2nd Level cache are provider are available there but I'll recommend you to use NCache. Here is a good read about it,
http://www.alachisoft.com/ncache/nhibernate-l2cache-index.html

Related

Performance AngularJS

I create a Logs with tabel. What options are a better way to increase performance?
I extract all rows from Database and I filter them from AngularJS.
When an user type a new filter, I send a HTTP request and I select from database.
I wanna do the first option but I think will lagging because I will have maybe 50.000 rows?

Selecting from a database is much faster.
Databases are designed for super fast querying of data, angular filters are somewhat fast, but are no where as good as a database (which are designed to do super-fast queries super-quickl)
Oh! And if you have 50000 rows, each with around 200 bytes of data, you'll be using 10 megabytes of data per load - that's huge, and browsers (especially mobile) don't handle large logic operations well.
So use database whenever you can, and only use angular filters when you already have data client-side.

How to load large table into tableau for data visualization?

I am able to connect tableau with my database but the table size is really large here. Everytime I try to load the table into tableau, it is crashing and I am not able to find any work around. The table size varies from 10 million - 400 million rows. How should I approach this issue any suggestion ?

You don't "load" data into Tableau, you point Tableau at an external data source. Then Tableau sends a query to the external data source requesting only the summary info (aka query results) needed to create the visualization you designed.
So, for an extreme example, if you place CNT(Number of Records) on the Columns shelf, Tableau will send a simple short query to the external database asking it to report the number of records. Something along the lines of "select count(*) from xxx".
So even if there are billions of rows in the external database, Tableau will send a small amount of information to the database (a query) and receive back a small amount of information (the query results) to display. This allows Tableau to be very fast on its end, and performance depends on how fast the external database can respond to the query. Tuning your database depends on all kinds of factors: type and amount of memory and disk, how indices are set up, etc.
So the first step is to make sure that the database can perform as needed, regardless of Tableau.
That's the purist response. Now for a few messy details. It is possible to design a very complex visualization in Tableau that will send a complex query asking for a very large result set. For instance, you can design a dashboard that draws a dot on the map for every row in the database, and then refreshes a large volume of data everytime you wave the mouse over the marks on the map.
If you have millions or billions of data rows, and you want high performance, then don't do that. No user can read 60 million dots anyway, and they certainly don't want to wait for them to be sent over the wire. Instead first plot aggregate values, min, max, sum, avg etc and then drill down into more detail on demand.
As others suggest, you can use a Tableau extract to offload workload and cache data in a form for fast use by Tableau. An extract is similar to an optimized materialized view stored in Tableau. Extracts are very helpful with speeding up Tableau, but if you want high performance, filter and aggregate your extracts to contain only the data and level of detail needed to support your views. If you blindly make an extract of your entire database, you are simply copying all your data from one form of database to another.

I found a simple solution for optimising Tableau to work with very large datasets (1 billion+ rows): Google BigQuery, which is essentially a managed data warehouse.
Upload data to BigQuery (you can append multiple files into a single table).
Link that table to Tableau as an external data source
Tableau then sends SQL-like commands to BigQuery whenever a new 'view' is requested. The queries are processed quickly on Google's computing hardware, which then sends a small amount of information back to Tableau.
This method allowed me visualise a 100gb mobile call record dataset with ~1 billion rows on a MacBook.

There are two ways to interpret this question:
The data source (which might be a single table, a view, etc.) has 10M to 400M rows and Tableau is crashing at some point in the load process. In that case, I suggest you to contact Tableau tech support. They really like to hear about situations like that and help people through them.
You are trying to create a visualization (such as a text table or crosstab) that has N records resulting in 10M to 400M displayed rows. In that case, you're into a territory that Tableau isn't designed for. A text table with 10M rows is not going to be useful for much of anything than exporting to something else, and in that case, there are better tools than Tableau for doing that (such as export/import tools built into most databases).

Not really sure what your use case is, but I find it unlikely that you need all that data for one Tableau view.
You can parse down / aggregate the data using a view in the database or custom SQL from your Tableau connection. Also, try to use extracts rather than live database connections, as they will preform faster.
I like to use views in the database and then use those views to refresh my Tableau extracts on Tableau Server.

Using application's internal cache while working with Cassandra

As I've been working with traditional relational database for a long time, moving to nosql, especially Cassandra, is a big change. I ussually design my application so that everything in the database are loaded into application's internal caches on startup and if there is any update to a database's table, its corresponding cache is updated as well. For example, if I have a table Student, on startup, all data in that table is loaded into StudentCache, and when I want to insert/update/delete, I will call a service which updates both of them at the same time. The aim of my design is to prevent selecting directly from the database.
In Cassandra, as the idea is to build table containing all needed data so that join is unnencessary, I wonder if my favorite design is still useful, or is it more effective to query data directly from the database (i.e. from one table) when required.

Based on your described usecase I'd say that querying data as you need it prevents storing of data you dont need, plus what if your dataset is 5Gb? Are you still going to load the entire dataset?
Maybe consider a design where you dont load all the data on startup, but load it as needed and then store it and check this store before querying again, like what a cache does!
Cassandra is built to scale, your design cant handle scaling, you'll reach a point where your dataset is too large. Based on that, you should think about a tradeoff. Lots of on-the-fly querying vs storing everything in the client. I would advise direct queries, but store data when you do carry out a query, dont discard it and then carry out the same query again!

I would suggest to query the data directly as saving all the data to the application makes the applications performance based on the input. Now this might be a good thing if you know that the amount of data will never exceed your target machine's memory.
Should you however decide that this limit should change (higher!) you will be faced with a problem. Taking this approach will be fast when it comes down to searching (assuming you sort the result at start) but will pretty much kill maintainability.
The former favorite 'approach' is however still usefull should you choose for this.

What is the best alternative to storing large amounts of data in the server session in an ASP MVC Application?

Ok so I'm working on an ASP MVC Web Application that queries a fairly large amount of data from an SQL Server 2008. When the application starts, the user is presented with a search mask that includes several fields. Using the search mask, the user can search for data in the data base and also filter the search by specifying parameters in the mask. In order to speed up searching I'm storing the result set returned by the data base query in the server session. During subsequent searches I can then search the data I have in the session thus avoiding unecessary trips to the DB.
Since the amount of data that can be returned by a data base query can be quite large, the scalability of the web application is severily limited. If there are, let's say, 100 users using the application at the same time, the server will keep search results in its session for each separate user. This will eventually eat up quite a bit of memory. My question now is, what's the best alternative to storing the data in session? The query in the DB can take quite a while at times so, at the moment, I would like to avoid having to run the query on subsequent searches if the data I already retrieved earlier contains the data that is now being searched for. Options I've considered are creating a temp table in the DB in my search query that stores the retrieved data and which can be used for subsequent searches. The problem with that is, I don't have all too much experience with SQL Server so I don't know if the SQL Server would create temp tables for each user if there are multiple users performing the search. Are there any other possibilities? Could the idea with the temp table in the SQL Server work or would it only lead to memory issues on the SQL Server? Thanks for the help! :)
Edit: Thanks a lot for the helpful and insightful answers, guys! However, I failed to mention a detail that's kind of important. When I query the database, the format of the result set can vary from user to user. This is because the user can decide which columns the result table can have by selecting columns from a predefined multiselect box in the search mask. If user A wants ColA, ColB and ColC to be displayed in his result table, he selects those values from the multiselect box in the search mask. User B, however, might select ColA and ColC only. Therefore, caching the results in a single table for all users might be a bit tricky since the table columsn are not necessarily going to be the same for all users. Therefore, I'm thinking, I'll almost have to use an alternative that saves each user's cached table separately. The HTML5 Local Storage alternative option mentioned below sounds interesting. Since this is an intranet application, it might be fair to assume (or require) that users have an up to date browser that supports HTML5. What do you guys think? Again, thanks for the help :)

If you want to cache query results, they'll have to be either on the web server or client in some form or another. All options will require memory, and since search results are user-specific, that memory usage will increase as a linear function of the number of current users.
My suggestions are to limit the number of rows returned from SQL (with TOP) and/or to look into optimizing your query on the SQL end. If your DB query takes a noticeable amount of time there's a good chance it can be optimized in SQL.

Have you already tought about the NoSql databases?
The idea of a NoSql database is to store information that is optimized for reading or writing and is accessed with 'easy queries' (for example a look-up on search terms). They scale easily horizontally and would allow you to search trough a whole lot of data very fast (Think of Google's BigData for example!)

if HTML5 is a possibility, you could use Local Storage.

You could try turning on sql session state.
http://support.microsoft.com/kb/317604
Plus: Effortless, you will find out if this fixes the memory pressure and has acceptable performance (i.e. reading and writing the cache to the DB). If the perf is okay, then you may want to implement the sort of thing that sql session does yourself because there is a ...
Down side: If you aren't aggressive about removing it from session, it will be serialized and deserialized on each request on unrelated pages.

Dataset retrieving data from another dataset

I work with an application that it switching from filebased datastorage to database based. It has a very large amount of code that is written specifically towards the filebased system. To make the switch I am implementing functionality that will work as the old system, the plan is then making more optimal use of the database in new code.
One problem is that the filebased system often was reading single records, and read them repeatedly for reports. This have become alot of queries to the database, which is slow.
The idea I have been trying to flesh out is using two datasets. One dataset to retrieve an entire table, and another dataset to query against the first, thereby decreasing communication overhead with the database server.
I've tried to look at the DataSource property of TADODataSet but the dataset still seems to require a connection, and it asks the database directly if Connection is assigned.
The reason I would prefer to get the result in another dataset, rather than navigating the first one, is that there is already implemented a good amount of logic for emulating the old system. This logic is based on having a dataset containing only the results as queried with the old interface.
The functionality only have to support reading data, not writing it back.
How can I use one dataset to supply values for another dataset to select from?
I am using Delphi 2007 and MSSQL.

You can use a ClientDataSet/DataSetProvider pair to fetch data from an existing DataSet. You can use filters on the source dataset, filters on the ClientDataSet and provider events to trim the dataset only to the interesting records.
I've used this technique with success in a couple of migrating projects and to mitigate similar situation where a old SQL Server 7 database was queried thousands of times to retrieve individual records with painful performance costs. Querying it only one time and then fetching individual records to the client dataset was, at the time, not only an elegant solution but a great performance boost to that particular application: The most great example was an 8 hour process reduced to 15 minutes... poor users loved me that time.
A ClientDataSet is just a TDataSet you can seamlessly integrate into existing code and UI.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight