Querying NHibernate - sql-server

We are using NHibernate as our ORM for the project and we have only database read only feature. The application will not be updating,deleting or inserting any records into the database it will be just querying the database for records.
My question is which is the best method to query the database with NHibernate in the scenario explained above.

Are you sure you really need an ORM?
Anyway, there are 3 common options to query database using NHibernate:
HQL.
Criteria API.
Linq.
The easiest is 3, the most powerful is 1.
But I don't really understand the nature of your question as the query APIs in NHiebrnate are not muturally exclusive, but rather they add up each other.
So you can use any of them depending on the situation:
For dynamic queries - best is Criteria API.
For complex and never changing - HQL.
For quick and easy - Linq.

Since it is read only, you probably won't have much use for retrieving the query results as mapped objects. A result set type return value might be more useful. For that use session.createQuery and then query.list
Each element of the list will be a object array. Each array element correponds to one select column.

Related

sql | slow queries | avoid many joins

I am currently working with java spring and postgres.
I have a query on a table, many filters can be applied to the query and each filter needs many joins.
This query is very slow, due to the number of joins that must be performed, also because there are many elements in the table.
Foreign keys and indexes are correctly created.
I know one approach could be to keep duplicate information to avoid doing the joins. By this I mean creating a new table called infoSearch and keeping it updated via triggers. At the time of the query, perform search operations on said table. This way I would do just one join.
But I have some doubts:
What is the best approach in postgres to save item list flat?
I know there is a json datatype, could I use this to hold the information needed for the search and use jsonPath? is this performant with lists?
I also greatly appreciate any advice on another approach that can be used to fix this.
Is there any software that can be used to make this more efficient?
I'm wondering if it wouldn't be more performant to move to another style of database, like graph based. At this point the only problem I have is with this specific table, the rest of the problem is simple queries that adapt very well to relational bases.
Is there any scaling stat based on ratios and number of items which base to choose from?
Denormalization is a tried and true way to speed up queries/reports/searching processes for relational databases. It uses a standard time vs space tradeoff to reduce the time of query, at the cost of duplicating the data and increasing write/insert time.
There are third party tools that are specifically designed for this use-case, including search tools (like ElasticSearch, Solr, etc) and other document-centric databases. Graph databases are probably not useful in this context. They are focused on traversing relationships, not broad searches.

NoSQL write-once, automatic timestamp indexed

I'm looking for the least-effort solution to store data in a database. Here are the requirements:
this will be the storage backend for a test automation tool
data will be messages captured from queues: can be JSON, XML, binary... but could be converted to a uniform representation
data will be written once, whatever is written will not change
there will be multiple indexes necessary, however the base index should be the timestamp of the messages inserted into the database - it would be nice if the database of choice could be configured to provide this automatically (eg. query messages inserted between two timestamps - should work out of the box)
ease of query is important (SQL would be best, however the structure of the messages is not always known in advance)
performance is not important
fault tolerance, partition tolerance, reliability etc are not important
ease of access (eg. REST API, API from multiple platforms - JVM, JS, etc) is important.
I was looking at MongoDB, CouchDB, maybe Riak... All of these could work, I just don't know which is the least resistance for the requirements above. I am familiar with Riak, but its strengths are not really what I'm after...
#geraldss has addressed the INSERT question. Let me add the example.
Indexing: you can create indices one one or more fields and the query will use them automatically.
create index idx_ins_time on my_bucket(insert_time);
select my_message from my_bucket
where insert_time
between "2016-04-03T10:46:33.857-07:00" and "2016-04-05T10:46:33.857-07:00";
Use EXPLAIN to see the plan, just like SQL.
You can create multiple indices with one or more keys each.
Couchbase N1QL supports REST API, JDBC/ODBC and SDKs for most popular languages.
It seems that CouchBase is the best alternative, simply because N1QL:
http://developer.couchbase.com/documentation/server/current/n1ql/n1ql-intro/data-access-using-n1ql.html
It ticks all the other boxes (except for the automatic timestamp indexes, but then adding that and doing range queries is straightforward thanks to the query language).
If you use Couchbase, you can use N1QL's INSERT statement to automatically add the timestamp:
INSERT INTO my_bucket(KEY, VALUE)
VALUES ($my_key, {
"insert_time": NOW_STR(),
__my other data fields__
}
)

Is this a "correct" database design?

I'm working with the new version of a third party application. In this version, the database structure is changed, they say "to improve performance".
The old version of the DB had a general structure like this:
TABLE ENTITY
(
ENTITY_ID,
STANDARD_PROPERTY_1,
STANDARD_PROPERTY_2,
STANDARD_PROPERTY_3,
...
)
TABLE ENTITY_PROPERTIES
(
ENTITY_ID,
PROPERTY_KEY,
PROPERTY_VALUE
)
so we had a main table with fields for the basic properties and a separate table to manage custom properties added by user.
The new version of the DB insted has a structure like this:
TABLE ENTITY
(
ENTITY_ID,
STANDARD_PROPERTY_1,
STANDARD_PROPERTY_2,
STANDARD_PROPERTY_3,
...
)
TABLE ENTITY_PROPERTIES_n
(
ENTITY_ID_n,
CUSTOM_PROPERTY_1,
CUSTOM_PROPERTY_2,
CUSTOM_PROPERTY_3,
...
)
So, now when the user add a custom property, a new column is added to the current ENTITY_PROPERTY table until the max number of columns (managed by application) is reached, then a new table is created.
So, my question is: Is this a correct way to design a DB structure? Is this the only way to "increase performances"? The old structure required many join or sub-select, but this structute don't seems to me very smart (or even correct)...
I have seen this done before on the assumed (often unproven) "expense" of joining - it is basically turning a row-heavy data table into a column-heavy table. They ran into their own limitation, as you imply, by creating new tables when they run out of columns.
I completely disagree with it.
Personally, I would stick with the old structure and re-evaluate the performance issues. That isn't to say the old way is the correct way, it is just marginally better than the "improvement" in my opinion, and removes the need to do large scale re-engineering of database tables and DAL code.
These tables strike me as largely static... caching would be an even better performance improvement without mutilating the database and one I would look at doing first. Do the "expensive" fetch once and stick it in memory somewhere, then forget about your troubles (note, I am making light of the need to manage the Cache, but static data is one of the easiest to manage).
Or, wait for the day you run into the maximum number of tables per database :-)
Others have suggested completely different stores. This is a perfectly viable possibility and if I didn't have an existing database structure I would be considering it too. That said, I see no reason why this structure can't fit into an RDBMS. I have seen it done on almost all large scale apps I have worked on. Interestingly enough, they all went down a similar route and all were mostly "successful" implementations.
No, it's not. It's terrible.
until the max number of column (handled by application) is reached,
then a new table is created.
This sentence says it all. Under no circumstance should an application dynamically create tables. The "old" approach isn't ideal either, but since you have the requirement to let users add custom properties, it has to be like this.
Consider this:
You lose all type-safety as you have to store all values in the column "PROPERTY_VALUE"
Depending on your users, you could have them change the schema beforehand and then let them run some kind of database update batch job, so at least all the properties would be declared in the right datatype. Also, you could lose the entity_id/key thing.
Check out this: http://en.wikipedia.org/wiki/Inner-platform_effect. This certainly reeks of it
Maybe a RDBMS isn't the right thing for your app. Consider using a key/value based store like MongoDB or another NoSQL database. (http://nosql-database.org/)
From what I know of databases (but I'm certainly not the most experienced), it seems quite a bad idea to do that in your database. If you already know how many max custom properties a user might have, I'd say you'd better set the table number of columns to that value.
Then again, I'm not an expert, but making new columns on the fly isn't the kind of operations databases like. It's gonna bring you more trouble than anything.
If I were you, I'd either fix the number of custom properties, or stick with the old system.
I believe creating a new table for each entity to store properties is a bad design as you could end up bulking the database with tables. The only pro to applying the second method would be that you are not traversing through all of the redundant rows that do not apply to the Entity selected. However using indexes on your database on the original ENTITY_PROPERTIES table could help greatly with performance.
I would personally stick with your initial design, apply indexes and let the database engine determine the best methods for selecting the data rather than separating each entity property into a new table.
There is no "correct" way to design a database - I'm not aware of a universally recognized set of standards other than the famous "normal form" theory; many database designs ignore this standard for performance reasons.
There are ways of evaluating database designs though - performance, maintainability, intelligibility, etc. Quite often, you have to trade these against each other; that's what your change seems to be doing - trading maintainability and intelligibility against performance.
So, the best way to find out if that was a good trade off is to see if the performance gains have materialized. The best way to find that out is to create the proposed schema, load it with a representative dataset, and write queries you will need to run in production.
I'm guessing that the new design will not be perceivably faster for queries like "find STANDARD_PROPERTY_1 from entity where STANDARD_PROPERTY_1 = 'banana'.
I'm guessing it will not be perceivably faster when retrieving all properties for a given entity; in fact it might be slightly slower, because instead of a single join to ENTITY_PROPERTIES, the new design requires joins to several tables. You will be returning "sparse" results - presumably, not all entities will have values in the property_n columns in all ENTITY_PROPERTIES_n tables.
Where the new design may be significantly faster is when you need a compound where clause on custom properties. For instance, finding an entity where custom property 1 is true, custom property 2 is banana, and custom property 3 is not in ('kylie', 'pussycat dolls', 'giraffe') is e`(probably) faster when you can specify columns in the ENTITY_PROPERTIES_n tables instead of rows in the ENTITY_PROPERTIES table. Probably.
As for maintainability - yuck. Your database access code now needs to be far smarter, knowing which table holds which property, and how many columns are too many. The likelihood of entertaining bugs is high - there are more moving parts, and I can't think of any obvious unit tests to make sure that the database access logic is working.
Intelligibility is another concern - this solution is not in most developers' toolbox, it's not an industry-standard pattern. The old solution is pretty widely known - commonly referred to as "entity-attribute-value". This becomes a major issue on long-lived projects where you can't guarantee that the original development team will hang around.

Nhibernate Paging performance on large table (10,000,000 rows)

I have a table that is rather large at about 10,000,000 rows. I need to page through this table from my C# application. I'm using NHibernate. I have tried to use this code example:
return session.CreateCriteria(typeof(T))
.SetFirstResult(startId)
.SetMaxResults(pageSize)
.List<T>();
When I execute it the operation eventually times out if my startId is greater than 7,000,000. The pageSize I'm using is 200. I have used this method on much smaller tables, of less than 1000 rows, and it works and performs quickly.
Question is, on such a large table is there a better way to accomplish this using NHibernate?
You're trying to page through 10 million rows 200 at a time? Why? No human being is going to page through that much data.
You need to filter the dataset first and then apply TSQL style paging to the smaller data set. Here are some methods that will work. Just modify them so that you're getting to less than 10million rows through some kind of filtering (a WHERE clause, CTE, or derived table).
Funny you should bring this up, as I am having the same issue. My issue isn't related to paging using NHibernate, but more with just using straight T-SQL.
It seems as though there are a few options. The one I found quite useful in my instance was this answer to a question regarding paging. It discusses using a "..keyset driven solution" rather than return ranked results through the use of ROW_NUMBER(). I'm not sure what NHibernate would use in this instance or if it's possible to see the SQL it generates based on the query you issue (I know you could in Hibernate, but I've not used NHibernate).
If you aren't aware of the using SQL SERVER to returned ranked results based on ROW_NUMBER, then it's well worth looking into. A lot of people seem to refer to this article as to how to go about paging. I've seen some subsequent posts discourage the use of SET ROWCOUNT though in favour of using TOP with a dynamic parameter - SELECT TOP(#NumOfResults).
There are lots of posts here on SO regarding this, but no definitive answer on the best way to go about it as far as I can see. I'll be keeping an eye on this post to see what others suggest also.
It could by Isolation Layer problem.
I had a similar issues.
If the table your reading from is constantly updated, the updater locks parts of the table, causing timeout then reading from the table.
Add SetIsolationLayer(ReadUncommitted) you must note that the data might be a little dirty.

How to implement database engine independent paging?

Task: implement paging of database records suitable for different RDBMS. Method should work for mainstream engines - MSSQL2000+, Oracle, MySql, etc.
Please don't post RDBMS specific solutions, I know how to implement this for most of the modern database engines. I'm looking for the universal solution. Only temporary tables based solutions come to my mind at the moment.
EDIT:
I'm looking for SQL solution, not 3rd party library.
There would have been a universal solution if SQL specifications had included paging as a standard. The requirement for any RDBMS language to be called an RDBMS language does not include paging support as well.
Many database products support SQL with proprietary extensions to the standard language. Some of them support paging like MySQL with the limit clause, Rowid with Oracle; each handled differently. Other DBMS's will need to add a field called rowid or something like that.
I dont think you can have a universal solution (anyone is free to prove me wrong here;open to debate) unless it is built into the database system itself or unless there is a company say ABC that uses Oracle, MySQL, SQL Server and they decide to have all the various database systems provide their own implementation of paging by their database developers providing a universal interface for the code that uses it.
The most natural and efficient way to do paging is using the LIMIT/OFFSET (TOP in Sybase world) construct. A DBindependent way would have to know which engine it's running on and apply the proper SQL construct.
At least, that's the way I've seen it done in DB independent libraries' code. You can abstract away the paging logic once you get the data from the engine with the specific query.
If you really are looking for a single, one SQL sentence solution, could you show what you have in mind? Like the SQL for the temp table solution. That would probably get you more relevant suggestions.
EDIT:
I wanted to see what were you thinking because I couldn't see a way to do it with temp tables and not use a engine specific construct. You used specific constructs in the example. I still don't see a way to implement paging in the database with only (implemented) standard SQL. You could bring the whole table in standard SQL and page in the application, but that is obviously stupid.
So the question would now be more like "Is there a way to implement paging without using LIMIT/OFFSET or equivalent?" and I guess that the answer is "Sanely, no." You could try using cursors but you'll fall prey to database specific sentences/behavior there as well.
A wacko (read stupid) idea that just occurred to me would be to add a page column to the table, say create table test (id int, name varchar, phone varchar, page int) and then you can get page 1 with select * from table where page = 1. But that means having to add code to maintain that column, which, again could only be done by either bringing the whole database or using database specific constructs. That besides having to add a different column per each possible ordering and many other flaws.
I can't provide proof, but I really think you just can't do it sanely.
Proceed as usual:
Start by implementing it according to the standard. And then handle the corner cases, i.e. the DBMSes which don't implement the standard. How to handle the corner cases depends on your development environment.
You are looking for a "universal" approach. The most universal way to paginate is through the use of cursors, but cursor-based pagination don't fit very well with a non-stateful environment like a web application.
I've written about the standard and the implementations (including cursors) here:
http://troels.arvin.dk/db/rdbms/#select-limit-offset
SubSonic can do this for you if you if you can tolerate Open Source...
http://subsonicproject.com/querying/webcast-using-paging/
Other than that I know NHib does as well
JPA lets you do it with the Query class:
Query q = ...;
q.setFirstResult (0);
q.setMaxResults (10);
gives you the first 10 results in the result set.
If you want a DBMS independent raw SQL solution, I'm afraid you're out of luck. All the vendors do it differently.
#Vinko Vrsalovic,
as I wrote in question I know how to do it in most DBs. I what to find universal solution or get a proof that it doesn't exist.
Here is one stupid solution based on temporary table. It's obviously bad, so no need to comment on it.
N - upper bound
M - lower bound
create #temp (Id int identity, originalId int)
insert into #temp(originalId)
select top N KeyColumn from MyTable
where ...
select MyTable.* from MyTable
join #temp t on t.originalId = MyTable.KeyColumn
where Id between M and M
order by Id asc
drop #temp

Resources