Full Text Search Auto-Partition Schemes and Functions

Full Text Search Auto-Partition Schemes and Functions - sql-server

We have some full text searches running on our SQL Server 2012 Development (Enterprise) database. We noticed that partition schemes and functions are being (periodically) added to the DB. I can only assume that the partitions are for FTS as they have the following form:
Scheme:
CREATE PARTITION SCHEME [ifts_comp_fragment_data_space_46093FC3] AS PARTITION [ifts_comp_fragment_partition_function_46093FC3] TO ([FTS], [FTS], [FTS])
Function:
CREATE PARTITION FUNCTION [ifts_comp_fragment_partition_function_46093FC3](varbinary(128)) AS RANGE LEFT FOR VALUES (0x00330061007A00660073003200360036, 0x0067006F00730066006F00720064)
The problem is that our production servers are running SQL Server 2012 Standard which does not support partitions. Thus it adds an extra admin burden on our schema compares (using SSDT) to exclude these partitions every time. When one does (inevitably) creep in it is a pain to remove. We have done some extensive research and have not been able to come up with any answer as to why this is even happening. Any ideas?

Yes, those are internal to the fulltext search functionality. You have no control over them.
However, I would consider it a bug that they show up in your schema compares. You'll never create/alter/drop them yourselves, and they completely maintained by sql server, so I would file a bug report on http://connect.microsoft.com

Related

Visual Studio Integration Services becomes unresponsive

I am developing ETL solutions in Visual Studio and as soon as I select a view from a SQL Server database, the Visual Studio freezes, and clicking anywhere results in the following notification: "Visual Studio is Busy".
It is very frustrating and I cannot finish creating my solution.
Any advice for making it faster and more responsive?

I. What happens when selecting a view as an OLE DB Source?
I created an SQL Server Profiler trace to track all T-SQL commands execute over the AdventureWorks2017 database while I am selecting the [HumanResources].[vEmployee] view as an OLE DB Source.
The following screenshot shows that the following command is executed twice:
set rowcount 1
select * from [HumanResources].[vEmployee]
This means that the OLE DB source limit the result set of the query to a single row and executes the Select * command over the selected view in order to extract the required metadata.
It is worth mentioning that the SET ROWCOUNT 1 causes SQL Server to stop processing the query after the specified number of rows are returned. This means that only one row is requested and not all the view's data.
II. Issue's possible reasons
The issue you mentioned mostly happens due to the following reasons:
(1) Third-party extensions installed in Visual Studio
In that case, you should try to start Visual Studio in safe mode to prevent loading third-party extensions. You can use the following command
devenv.exe /safemode
(2) View querying a large amount of data
Visual Studio may freeze if the view returns a huge amount of data or contains bad JOINS. You may solve this using a simple workaround. Alter the view's SQL and add a condition that only returns a few rows (For example SELECT TOP 1). Then, use this view while designing the package. Once done, remove the added condition.
(3) Bad database design
Moreover, it is highly important that your views are well designed and that the underlying tables have the appropriate indexes. Besides, check that you don't have any issues related to the database design. For example:
(a) Index fragmentation
The index fragmentation is the index performance value in percentage, which can be fetched by SQL Server DMV. You can refer to the following article for more information:
How to identify and resolve SQL Server Index Fragmentation
(b) Large binary column
Make sure that the view does not include large binary columns since it highly affects the query execution.
Best Practices for tables with VARBINARY(MAX)
How Your SQL Server Data Type Choices Can Affect Database Performance
(4) Hardware issues
Even I do not think this should be the cause in that case. Try to check the available resources on your machine. For Example:
(a) Drive out of storage
If using windows, check the C: drive storage (default system databases directory) and the drive where the databases are stored and make sure they are not full.
(b) Server is out of memory
Make sure that your machine is not running out of memory. You can simply use the Task Manager to identify the amount of available memory.
(5) Optimizing Visual Studio performance
The last thing to mention is that there are several recommendations to improve the performance of Visual Studio. Feel free to check them:
Optimize Visual Studio performance

This can sometimes happen when you try to validate a select statement against a huge table. Depending on the RDBMS , some data sources while doing the validation do not do a good job of returning just metadata to validate against, and instead run Select * from table. So, validation can take what seems like forever.
Try to check if this is actually happening , check the running queries on the RDBMS in the package, when you load up the package.
Otherwise try copying the package and switch to the XML and rebuild it until you find issue. Remove the problem from your XML file, save, and redraw in the designer.

Partial database shortcut in SSDT

I'm assuming this is not possible but asking just in case. I have two database projects in my Visual Studio 2013 solution and Database Y mostly just has shortcuts to tables in Database X. Everything worked great until I added partitioning to the definition of Table A in Database X. Since I deploy Database X to SQL Server 2012 Enterprise and deploy Database Y to SQL Server 2012 Standard, and partitioning is not allowed in Standard, the latter deployment fails.
Is there a way to tell the database project for Database Y to ignore the partitioning stuff? Any other ideas on how to keep the tables in sync without using a shortcut?
UPDATE: Here is the error.
Creating [PartitionByReportFileID]... (75,1): SQL72014: .Net SqlClient
Data Provider: Msg 7736, Level 16, State 1, Line 2 Partition function
can only be created in Enterprise edition of SQL Server. Only
Enterprise edition of SQL Server supports partitioning. (74,0):
SQL72045: Script execution error. The executed script: CREATE
PARTITION FUNCTION PartitionByReportFileID
AS RANGE RIGHT
FOR VALUES (90000000, 120000000, 140000000, 160000000, 180000000, 200000000, 220000000, 240000000, 260000000, 280000000, 300000000,
320000000, 340000000, 360000000, 380000000, 400000000, 420000000,
440000000, 460000000, 480000000, 500000000, 520000000, 540000000,
560000000, 580000000, 600000000);
An error occurred while the batch was being executed.

UPD: Well, it's possible, though you probably won't like the approach.
Indeed, SSDT includes partition-related stuff into the deployment script no matter how hard one tries not to allow it. The other way is to create an empty database and then perform a schema compare between the project (source) and that database (target). In the schema compare settings, make sure 2 checks are set on the General tab:
Ignore object placement on partition schemes
Ignore partition schemes
This works in SSDT 2012, verified. From this point on, you can either run an update directly, or generate script and then use it for deployment, with only minor modifications (such as adding the database (re)creation part from the standard deployment script, if needed).
The only drawback with this approach is that previously partitioned tables appear on default filegroup, which is usually PRIMARY unless you change it. That, and post-deployment script functionality isn't included, afaik (assuming you have one, of course).

DB technology for efficient search in tabular data?

We have a repository of tables. Around 200 tables, each table can be thousands of rows, all tables are originally in Excel sheets.
Each table has a different scheme. All data is text or numbers.
We would like to create an application that allows free text search on all tables (we define which columns will be searched in each table) efficiently - speed is important.
The main dilemma is which DB technology we should choose.
We created a mock up by importing all tables to MS SQL Server, and creating a full text index over them. The search is done using the CONTAINS keyword. This solution works well for a small number of tables, but it doesn't scale.
We thought about a NoSQL solution, but we don't yet have any experience in it.
Our limitations (which unfortunately I can not effect): Windows servers only. But we can install on them whatever we want.
Thank you.

Check out ElasticSearch! It's a search server based on Apache Lucene and has a clean REST- and JavaScript-based API. Although it's used usually as a search-index for a primary database, it can also be used stand-alone. So you may want to write a backup routine for a few of your tables/data and try it out.
http://www.elasticsearch.org/
http://en.wikipedia.org/wiki/ElasticSearch
Comparison of ElasticSearch and Apache Solr (another Lucene-based search server):
https://docs.google.com/present/view?id=dc6zhtt5_1frfxwfff&pli=1

sql server - full-text search

So let's say I have two databases, one for production purposes and another one for development purposes.
When we copied the development database, the full-text catalog did not get copied properly, so we decided to create the catalog ourselves. We matched all the tables and indexes and created the database and the search feature seems to be working okay too (but been entirely tested yet).
However, the former catalog had a lot more files in its folder than the one we manually created. Is that fine? I thought they would have exact same number of files (but the size may vary)

First...when using full text search I would suggest that you don't manually try to create what the wizard does for you. I have to wonder about missing more than just some data. Why not just recreate the indexes?
Second...I suggest that you don't use freetext feature of sql server unless you have no other choice. I used to be a big believer in freetext but was shown an example of creating a Lucene(.net) index and searching it in comparison to creating an index in SQL Server and searching it. Creating a SQL Server index in comparison to creating a Lucene index is considerably slower and hard to maintain. Searching a SQL Server index is considerably less accurate (poor results) in comparison to Lucene. Lucene is like having your own personal Google for searching data.
How? Index your data (only the data you need to search) in Lucene and include the Primary Key of the data that you are indexing for use later. Then search the index using your language and the Lucene(.net) API (many articles written on this topic). In your search results make sure you return the PK. Once you have identified the records you are interested in you can then go get the rest of the data and/or any related data based on the PK that was returned.
Gotchas? Updating the index is also much quicker and easier. However, you have to roll your own for creating the index, updating the index, and searching the index. SUPER EASY to do...but still...there are no wizards or one handed coding here! Also, the index is on the file system. If the file is open and being searched and you try to open it again for another search you will obviously have some issues...so writing some form of infrastructure around opening and reading these indexes needs to be built.
How does this help in SQL Server? You can easily wrap your Lucene search in a CLR function or proc which can be installed in the database that you can then use as though it were native to your t-SQL queries.

Updating client SQL Server database structure from text file

We have a "master database structure", and need a routine to keep the database structure on client sites up-to-date.
A number of suggestions have been given to a related question, but I am looking for a more specific solution, along these lines:
I would like to generate a text file (XML or other readable format) which describes the entire database structure (this could go into version control). This routine will run in-house, to provide a database schema file to be distributed with the next version of our product.
Then I need a way to update the database structure on the client site so that it corresponds to the master database structure. (In other words, I don't want to have to keep track of numerous change scripts for different versions of the database structure, but a more general routine which can get the client database structure updated to the current master database structure.)
So the main feature I'm looking for could be described as "database structure to text" and "text to database structure".

There are a whole lot of diff tools that can give you schema and stored procedures and constraint differences between two databases. You could roll your own, but I think it would be more expensive than one of these tools if you have a complex schema, many give a free trial so you can test.
The problem is you'd have to have the master database online to do so though and accessible from the client database installation (or install it there) which might or might not be feasible.
If you won't do that, the only other sane option I can think of is to use the migration idea, keep a list of SQL scripts + version pairs, plus current version on each database. This could be consolidated by a different tool that could generate a single script from a the client's database version number and the list of changes. And if you haven't the list of changes, you can start with a diff tool run, and keep track of them from there.
The comparing text route (comparing text SQL dumps of both schemas) you seem to prefer looks very hard to do it right and automatically to me, doesn't look like the right path to take.

Several popular strategies are variants of this:
Add a table to the database:
CREATE TABLE Release
(release_number int not null,
applied datetime not null
)
Each release, as part of its release script inserts a row into this table.
You can now find out with a single query which release each client is running, and run all the releases between that one and the release they want to be running.
In addition, you could check that their schema is correct for each version (correct table names, columns, etc.) by doing something like this:
SELECT so.name,
sc.name
FROM sysobjects so,
syscolumns sc
WHERE type = 'U'
ORDER BY 1, 2
then calculate a hash of the result and compare it with a pre-computed hash (generated by running the query on your reference installation) to see if the installation is now correct.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight