Full-Text Index Design Considerations (SQL Server 2008) - sql-server

My website has a requirement that the user can search a number of different tables and columns. So I'm working to implement this using full-text search.
I'd like to get some input from someone with more FTS experience on the following issues.
While FTS allows you to search multiple columns from the same table in a single search, I'm not seeing an option to search multiple columns from multiple tables in a single search. Is this in fact the case?
If I need multiple searches to search across multiple tables, does it make sense to put the index for each table in a different full-text catalog? The wizards seem to recommend a new catalog for larger tables, but I have no idea what "large" means in this case.
Finally, is there any want to order the results such that matches in one column of a table come before matches in another column?

1.While FTS allows you to search multiple columns from the same table in a single search, I'm not seeing an option to search multiple
columns from multiple tables in a single search. Is this in fact the
case?
A FTIndex on a single table cannot include columns from another table. So typically, you'd just have to write your query so that its making multiple searches (you alluded to this in #2).
Another option, would be to create an Indexed View (see requirements) that spans multiple tables and then build a FTIndex on top of the view. I believe this is possible, but you should test for certainty.
2.If I need multiple searches to search across multiple tables, does it make sense to put the index for each table in a different full-text
catalog? The wizards seem to recommend a new catalog for larger
tables, but I have no idea what "large" means in this case.
shouldn't make a difference in SQL2008 since the catalog is just a logical grouping. You might, however, consider putting the FTIndexes on different filegroups if you have a disk-sub-system that makes sense (similar considerations to partitioning tables across filegroups on different disks...to spread the IO).
3.Finally, is there any want to order the results such that matches in one column of a table come before matches in another column?
I don't believe this is possible...

Related

SQL Server: Filtered Indexes versus Indexed Views

What are the relative merits of each? Both seem to limit the number of rows and columns through which your query needs to trawl, so what determines the basis for choosing one over the other?
An indexed view
can include columns based on an expression.
can include joins of multiple tables.
can be referenced directly in user SQL statements.
allows all deterministic expressions
has complicated prerequisites, but is simple and consistent to use (select * from [indexedview])
A filtered index
is limited to the columns contained within the table.
only allows simple expressions for the filter.
is simple to implement, but the optimizer will determine if usage is appropriate when the base table is queried.
Neither of them can use non-deterministic expressions.
Index View :
i) I have to get result from more than one table.
ii) I create Index on this view to boost performance.
Filtered Index :
i) There are lot of record in single table.
ii) A particular where condition with specific value contain lot of records. and this condition will be very frequently use.Or this condition will be use in very important query where performance is of utmost importance.
In this case we may create filtered index on table.
Check my answer for example
MS SQL Server 2008 - UPDATE a large database

Oracle - Make one table with many columns or split in many tables

What is the best way to model a database? I have many known channels with values. Is it better create one table with many columns, one for each channel or create two table one for values and one for channels? Like that:
Table RAW_VALUES: SERIE_ID, CHANNEL_1, ..., CHANNEL_1000
or
Table RAW_VALUES: SERIE_ID, CHANNEL_ID, VALUE
Table CHANNELS: CHANNEL_ID, NAME, UNIT, ....
My question is about performance to search some data or save database space.
Thanks.
Usually, one would want to know what type of queries you will run against the tables as well as the data distribution etc to choose between two designs. However, I think that there are more fundamental issues here to guide you.
The second alternative is certainly more flexible. Adding one more channel ("Channel_1001") can be done simply by inserting rows in the two tables (a simple DML operation), whereas if you use the first option, you need to add a column to the table (a DDL operation), and that will not be usable by any programs using this table unless you modify them.
That type of flexibility alone is probably a good reason to go with the second option.
Searching will also be better served with the second option. You may create one index on the raw_values table and support indexed searches on the Channel/Value columns. (I would avoid the name "value" for a column by the way.)
Now if you consider what column(s) to index under the first option, you will probably be stumped: you have 1001 columns there. If you want to support indexed searches on the values, would you index them all? Even if you were dealing with just 10 channels, you would still need to index those 10 columns under your first option; not a good idea in general to load a table with more than a few indexes.
As an aside, if I am not mistaken, the limit is 1000 columns per table these days, but a table with more than 255 columns will store a row in multiple row pieces, each storing 255 columns and that would create a lot of avoidable I/O for each select you issue against this table.

Is a full text search on one table faster than two tables?

In the full text search page http://msdn.microsoft.com/en-us/library/ms189760.aspx on MSDN it says that if you want to do a full text search on multiple tables just "use a joined table in your FROM clause to search on a result set that is the product of two or more tables."
My question is, isn't this going to be really slow if you have to merge two very large tables?
If I'm merging a product table with a category table and there are millions of records, won't the join take a long time and then have to search after the join?
Joins on millions of records can still be fast if the join is optimized for performance, for example, a single int column that is indexed in both tables. But there can be other factors at play so the best approach is to try it and gauge the performance yourself.
If the join doesn't perform well, you have a couple of options:
Create a view of the tables joined together, create a full text index on that view, and run your full text queries against that view.
Create a 3rd table which is a combination of the 2 tables you are joining, create a full text index on it, and run your full text queries against that table. You'll need something like an ETL process to keep it updated.

Sorting rows in DB (by id or another field?)

What is the best way and why?
sorting rows by id
sorting rows by another field (create time for example)
Upd 1. Of course, I can add index for another field
Upd 2. Best for speed & usability
Do you mean sorting or indexing? Regardless of which database technology you are using, you can typically apply indexes on any column or on different combinations of columns. This allows the database engine to optimize query execution and make better execution plans. In a way, indexing is "sorting", for your database engine.
Actual sorting (as in ORDER BY MyColumn ASC|DESC) is really only relevant in the context of querying the database. How you decide to sort your query results would typically depend on how you intend to use your data.
I assume that you are using a relational database, such as MySQL or PostgreSQL, and not one of the non-SQL databases.
This means that you are interacting with the database using the SQL language, and the way you get data from the database is to use the SQL SELECT statement. In general, your tables will have a "key" attribute, which means that each row has a unique value for that attribute, and usually the database will store the data pre-sorted by that key (often in a B-tree data structure).
So, when you do a query, e.g. "SELECT firstname,lastname FROM employees;", where "employees" is a table with a few attributes, such as "employee_id", "firstname", "lastname", "home_address", and so forth, the data will generally be delivered in order of employee_id value.
To get the data sorted in a different order, you might make use of the SQL "ORDER_BY" clause in the SELECT statement. For example, if you wanted the data to be sorted by "lastname", then you might use something like "SELECT firstname,lastname FROM employees ORDER_BY lastname;".
One way that the database can implement this query is to retrieve all the data and then sort it before passing it on to the user.
In addition, it is possible to create indexes for the table which allows the database to find rows with particular values, or value ranges, for attributes or sets of attributes. If you have added a "WHERE" clause to the SELECT query which (dramatically) reduces the number of matching rows, then the database may use the index to speed up the query processing by first filtering the rows and then (if necessary) sorting them. Note that the whole topic of query optimization for databases is complex and takes into account a wide range of factors to try and estimate which of the possible query implementation alternatives will result in the fastest implementation.

Table clusters in SQLServer

In Oracle, a table cluster is a group of tables that share common columns and store related data in the same blocks. When tables are clustered, a single data block can contain rows from multiple tables. For example, a block can store rows from both the employees and departments tables rather than from only a single table:
http://download.oracle.com/docs/cd/E11882_01/server.112/e10713/tablecls.htm#i25478
Can this be done in SQLServer?
On the one hand, this sounds very much like views. Data is stored in the table, and the views provide access to only those columns within the table specified by the view's definition. (Thus, your "common columns".)
On the other hand, this sounds like how the database engine stores data the hard drive. In SQL, this is done via 8kb pages. Assuming two completely separate table definitions, there is no way to store data from two such distinct tables in the same page. (If an Oracle block is more along the lines of OS files, then that turns into SQL Files and File Groups, at which point the answer is "yes"... but I suspect this is not what blocks are about.)
Not based on what I am reading here. In SQL Server, each table's pages are independent of other tables' pages.
On the other hand, each table can have a choice of clustered index which can influence the performance greatly. In addition, I believe partitions will influence the execution plan and if both table have similar partition functions, this might boost performance, but the normal objective of partitioning is not for performance reasons.
Typically, optimization of JOINS involves index strategies (in my experience, preferably with covering non-clustered indexes)

Resources