I'm creating news portal site. this saves to many news.Every news has html data. i'm using SQL Server 2005. I have 2 choices.
Save news data to ntext field.
Save news data to html file and save file name to nvarchar field.
What is best way to good performance and quick search operation. If i choose second way, when i search from news, i'm repeat every file and search from each.
What is best?
You have another way?
EDIT
Maybe my news count increasing over than 100,000. Now count is 1000. But SQL Server database size is 60Mb.
Use nvarchar(max), not ntext for storage. Use fulltext search for searching. Use the FILESTREAM storage if the content are documents that have to be accessed by Win32 API.
Querying varbinary(max) and xml Columns (Full-Text Search)
Best Practices for Integrated Full Text Search
SQL Server 2005 Full-Text Queries on Large Catalogs: Lessons Learned
Using FILESTREAM with Other SQL Server Features
Related
Azure's documentation suggests that we should leverage blobs to be able to index documents like MS Word, PDF, etc. We have an Azure SQL Server database of thousands of documents stored in a table's nvarchar(MAX) field. The nature of the contents in each database record is in plain English text. In fact the application converted the PDF / MS Word into plain text and stored in database.
My question is that would it be possible to index the stored "documents" in database in the same way as Azure would do against blobs? I know how to create an SQL Azure indexer but I'd like to make sure that the way that the underneath search performs against blobs will be the same for documents stored in database table.
Thanks in advance!
This is not currently possible - document extraction can only be done on blobs stored in Azure storage.
I'm new to elastic search and I have a basic question.
I want to load data from database and search them by using elastic search in MVC.NET project, but cause of data I have in my database's table I cant't convert all of them to the json and search in thme by using elastic search. How should I fill data of the elastic search from the database in an mvc.net project. I don't want the whole solution because it is impossible just a general and brief explanation. thank you very much.
First of all you should be able to model your data from SQL to ElasticSearch.
As ElasticSearch is a NoSQL and document oriented database/search engine.
You need an indexer to index SQL data to ElasticSearch.
Get all the columns associated with one record that you want to search in ElasticSearch from your SQL database (use joins if data is in multiple tables).
Use a dedicated Stored Procedure to get only needed data and construct a document class, serialize to JSON and index in your ElasticSearch cluster.
Use ElasticSearch.net client as they very neatly expose bulk index APIs.
Hope this will get you started. Have fun
I have a column, that is encrypted using symmetric key in a database. An encrypted content is just a text. I would like to query this text using full text search. Is it possible?
I was thinking about using full text search filters to index a column, but didn't find any ready-to-use filter.
So is it possible to develop such a filter, in particular, is it possible to access encryption key, that is stored in a database, from filter code and decrypt the text from the column?
Could you recommend any tutorial, how to start with such a development?
From what I understand, there is no support for encrypted indexes. You basically have two options:
You can index partial data in clear (without encryption) and match the partial data to the fully encrypted data.
Decrypt the data before searching
Although this post was for SQL Server 2005, it remains true for SQL Server 2008.
I have a large XML document in Xml column within SQL Server. I need to basically perform a free text search across the elements in the document.
Would you use
A) SQL Free Text Search
B) A stored procedure that traverses the XML and checks each value of each element
C) Use Lucene.NET to build an Index on the fly and search the index?
Users understand this will be slow to some degree. If the stored procedure wasn't a monster to write I'd lean toward that because its the least to maintain and decreases overall complexity.
The book "Pro SQL Server 2008 XML" has a section on Full-Text indexing of XML data that may be of interest to you. It mentions that when XML data is indexed a special "XML Word Breaker" is used to separate text content from the markup. Essentially this means is that only the content is indexed, not the markup. Full text indexes also support stemming and thesaurus matching.
Just noticed that you are using SQL Server 2005, so you'll have to check if this functionality is supported. I suspect that it is.
What database field type should I use to store web pages (html, pdf, text) files in the same field? nvarchar(max)?
Use VARBINARY(MAX). NVARCHAR is only for Unicode content, which won't handle PDFs well at all.
You'd want to use VARBINARY(MAX) or NVARCHAR(MAX), depending on the type of data being stored and what you want to do with it. If you going to store files (which it sounds like, especially with mixed extensions), use VARBINARY(MAX). You can full-text index off that data type too -- although PDF's require an additional iFilter (at least it did with our SQL Server 2005 instance -- it may be there by default in 2008).
Keep in mind that you don't want to use IMAGE, as that data type (along with TEXT and NTEXT) is deprecated and is being removed in a future version of SQL Server. Here's the link about that.
Hope this helps.
VARBINARY(MAX). If you're using SQL Server 2008, then FILESTREAM is also an option that should be considered. According to Microsoft's guidelines, consider FILESTREAM when:
Objects that are being stored are, on
average, larger than 1 MB.
Fast read access is important.
You are developing applications that
use a middle tier for application
logic.