When and how do I have to create an Index in Grails? - database

In Grails you can add custom indices to your domain classes.
Does Grails generate indices by default for my tabels?
Is there a rule which columns I have to use for my index?
Do my queries change when an index is set?

This isn't really a Grails question, except for the part about when and if Grails creates indexes. You need them like you would in any application that uses a database - create them to improve lookup performance.
Grails doesn't actually create any, Hibernate does that when it generates the DDL that creates your tables. You can see this DDL at any time by running grails schema-export - the generated file will be target/ddl.sql.
In general you'll see unique constraints which will typically create a unique index, and in MySQL and some other databases you'll see indexes created on foreign keys (but this isn't done for Oracle for some reason).
There is some mapping support for getting Hibernate to create indexes as you noted in your question, but in general you'll need to create them yourself since they are often database-specific. Use the http://grails.org/plugin/database-migration plugin for this.

In general you will use indexes on columns that are part of frequent queries and queries with high execution cost. This will happen using any relational database and any development framework.
About Grails, I found this post very useful in how indexes are defined in Grails: http://grails.asia/grails-how-to-create-custom-table-index-or-composite-index

Related

Change ID generator in nhibernate and migrate existing database

I have an existing product using the increment ID generator for most db entities. A new version should allow clustering of multiple server instances working on the same database. The product supports use of MSSQL and Oracle databases.
So I consider changing the ID generator to native, but there are some issues with that.
Two different algorithms will be used for Oracle and MSSQL - will that be transparent when creating objects in the code?
How can I migrate existing databases and how will I get the generator to not use the IDs already in use?
Thanks in advance for any insights on this.
I would suggest looking at a hilo generator strategy. The benefit is that it can be used for multiple processes, and you still retain the performance benefit of using a generated id in NHibernate (specifically allowing batching of inserts).
MSSQL does not allow you to change a column to be an identity column - you will need to add a new column and then update all the foreign keys - if you have a lot of tables / relationships, this can be very very messy.
With the hilo generator strategy you can avoid that issue altogether, it's just a configuration change and adding a table to your database to store the table high values, and populating that table with the correct values.

DB technology for efficient search in tabular data?

We have a repository of tables. Around 200 tables, each table can be thousands of rows, all tables are originally in Excel sheets.
Each table has a different scheme. All data is text or numbers.
We would like to create an application that allows free text search on all tables (we define which columns will be searched in each table) efficiently - speed is important.
The main dilemma is which DB technology we should choose.
We created a mock up by importing all tables to MS SQL Server, and creating a full text index over them. The search is done using the CONTAINS keyword. This solution works well for a small number of tables, but it doesn't scale.
We thought about a NoSQL solution, but we don't yet have any experience in it.
Our limitations (which unfortunately I can not effect): Windows servers only. But we can install on them whatever we want.
Thank you.
Check out ElasticSearch! It's a search server based on Apache Lucene and has a clean REST- and JavaScript-based API. Although it's used usually as a search-index for a primary database, it can also be used stand-alone. So you may want to write a backup routine for a few of your tables/data and try it out.
http://www.elasticsearch.org/
http://en.wikipedia.org/wiki/ElasticSearch
Comparison of ElasticSearch and Apache Solr (another Lucene-based search server):
https://docs.google.com/present/view?id=dc6zhtt5_1frfxwfff&pli=1

SqlServer: How to get meta-data about tables and their relationships?

I was wondering if there was a way (relatively simple I hope) to get information about the table and its attributes and realtionships?
Clarification: I want to grab all tables in the database and get the meta-model for the whole database, tables, column data, indicies, unique constraints, relationships between tables etc.
The system has a data dictionary in sys.tables, sys.columns, sys.indexes and various other tables. You can query these tables to get metadata about the database structure. This posting has a script I wrote a few years ago to reverse engineer a database schema. If you take a look at it you can see some examples of how to use the system data dictionary tables.
there are a whole bunch of system views in the information_schema schema in sql server 2005+. is there anything in particular you're wanting?
some of those views include:
check_contraints,
columns,
tables,
views
Try sp_help <tablename>. This will show you foreign key refrences and data about the columns, etc - that is, if you are interested in a specific table, as your question seemed to indicate.
If using .NET code is an option SMO is the best way to do it.
It abstracts away all these system views and tables hiding them behind nice and easy to use classes and collections.
http://msdn.microsoft.com/en-us/library/ms162169.aspx
This is the same infrastructure SQL Server Management Studio uses itself. It even supports scripting.
Abstraction comes at a cost though so you need maximum performance you'd still have to use system views and custom SQL.
You can try to use this library db-meta

sql server - full-text search

So let's say I have two databases, one for production purposes and another one for development purposes.
When we copied the development database, the full-text catalog did not get copied properly, so we decided to create the catalog ourselves. We matched all the tables and indexes and created the database and the search feature seems to be working okay too (but been entirely tested yet).
However, the former catalog had a lot more files in its folder than the one we manually created. Is that fine? I thought they would have exact same number of files (but the size may vary)
First...when using full text search I would suggest that you don't manually try to create what the wizard does for you. I have to wonder about missing more than just some data. Why not just recreate the indexes?
Second...I suggest that you don't use freetext feature of sql server unless you have no other choice. I used to be a big believer in freetext but was shown an example of creating a Lucene(.net) index and searching it in comparison to creating an index in SQL Server and searching it. Creating a SQL Server index in comparison to creating a Lucene index is considerably slower and hard to maintain. Searching a SQL Server index is considerably less accurate (poor results) in comparison to Lucene. Lucene is like having your own personal Google for searching data.
How? Index your data (only the data you need to search) in Lucene and include the Primary Key of the data that you are indexing for use later. Then search the index using your language and the Lucene(.net) API (many articles written on this topic). In your search results make sure you return the PK. Once you have identified the records you are interested in you can then go get the rest of the data and/or any related data based on the PK that was returned.
Gotchas? Updating the index is also much quicker and easier. However, you have to roll your own for creating the index, updating the index, and searching the index. SUPER EASY to do...but still...there are no wizards or one handed coding here! Also, the index is on the file system. If the file is open and being searched and you try to open it again for another search you will obviously have some issues...so writing some form of infrastructure around opening and reading these indexes needs to be built.
How does this help in SQL Server? You can easily wrap your Lucene search in a CLR function or proc which can be installed in the database that you can then use as though it were native to your t-SQL queries.

Advice Please: SQL Server Identity vs Unique Identifier keys when using Entity Framework

I'm in the process of designing a fairly complex system. One of our primary concerns is supporting SQL Server peer-to-peer replication. The idea is to support several geographically separated nodes.
A secondary concern has been using a modern ORM in the middle tier. Our first choice has always been Entity Framework, mainly because the developers like to work with it. (They love the LiNQ support.)
So here's the problem:
With peer-to-peer replication in mind, I settled on using uniqueidentifier with a default value of newsequentialid() for the primary key of every table. This seemed to provide a good balance between avoiding key collisions and reducing index fragmentation.
However, it turns out that the current version of Entity Framework has a very strange limitation: if an entity's key column is a uniqueidentifier (GUID) then it cannot be configured to use the default value (newsequentialid()) provided by the database. The application layer must generate the GUID and populate the key value.
So here's the debate:
abandon Entity Framework and use another ORM:
use NHibernate and give up LiNQ support
use linq2sql and give up future support (not to mention get bound to SQL Server on DB)
abandon GUIDs and go with another PK strategy
devise a method to generate sequential GUIDs (COMBs?) at the application layer
I'm leaning towards option 1 with linq2sql (my developers really like linq2[stuff]) and 3. That's mainly because I'm somewhat ignorant of alternate key strategies that support the replication scheme we're aiming for while also keeping things sane from a developer's perspective.
Any insight or opinion would be greatly appreciated.
I second Craig's suggestion - option 4.
You can always use the GUID column, populated by the middle-tier, as your PRIMARY KEY (that's a LOGICAL construct).
To avoid massive index (thus: table) fragmentation, use some other key (ideally an INT IDENTITY column) as the CLUSTERING KEY - that's a physical database construct, which CAN be separated from the primary key.
By default, the primary key is the clustering key - but that doesn't have to be that way. In fact, I improved performance and drastically lowered fragmentation by doing just that on a database I "inherited" - add a INT IDENTITY column and put the clustering key on that small, ever-increasing, never-changing INT - works like a charm!
Marc
Huh? I think your three options are a false choice. Consider option 4:
4) Use the Entity Framework with non-sequential, client-generated GUIDs.
The EF can't see DB-server-generated GUIDs for new rows inserted by the framework itself, sure, but you don't need to generate the GUIDs on the DB server. You can generate them on the client when you create your entity instances. The whole point of a GUID is it doesn't matter where you generate it. As for GUIDs generated by a replicated DB, the EF will see them just fine.
Your client-side GUIDs won't be sequential (use Guid.NewGuid()), but they will be world-wide, guaranteed unique.
We do this in shipping, production software with replication. It does work.
Another option (not available when this was posted) is to upgrade to EF 4, which supports server-generated GUIDs.
Why not use identity column? If you are doing merge replication you can have each system start at a separate seed and work in one direction (e.g. node a starts at 1 and adds 1, node b starts at 0 and subtracts one)...
You can use stored procedures if you are really stuck on using NewSequentialID(). You can bind the result columns from the procedure to the appropriate property and once inserted the SQL-generated GUID will be fed back into the object.
Unfortunately you have to define SPs for all three operations (insert, update, delete) even though the other operations would complete properly using the defaults. You also need to maintain the SP code and ensure it is synchronized with your EF model as you make changes, which may make this option unattractive on account of the additional overhead.
There is a step-by-step example at http://blogs.msdn.com/bags/archive/2009/03/12/entity-framework-modeling-action-stored-procedures.aspx which is pretty straight-forward.
use newseqid with your own orm (it not that hard) with linq

Resources