Select query within more than 150 different conditions in where clause - sql-server

I have a table in my SQL Server database which has more than 400000 rows and I want to select the full names that starts with several names that are in a .txt file approximately more than 150 name, so how would the query will be inside my command in C# .. I could write it in this way but it will be too long and may create a delay or some kind of bugs !
select *
from tableName
where fullName like '%Jack%'
or fullName like '%Wathson%'
--.... and so on

First, SQL Server can handle very long queries. I have created queries that are at least 150k characters, and they work without problem. The limit is considerably larger than that.
Second, you are correct that a bunch of like statements is going to take a long time. There is overhead to like.
Third, your patterns do not conform to your statement. If you want names that start with a particular pattern, then remove the wildcard from the beginning of the pattern. This has the added benefit that SQL Server can use a regular index on FullName for the match.
Finally, if you are really looking at initial strings, then you might want to consider a full text index (here is one place to start). These are usually more efficient than using like.

Related

How to avoid SQL Server error on ORDER BY with duplicate columns

Although this question references PHP, it is not actually PHP-specific, so I have not flagged it as such.
We have a PHP framework which supports multiple DB back-ends.
There is a generic function in our data object class, which allows you to get records from the underlying table, with a specified criteria and sort order.
It looks something like this:
function GetAll($Criteria, $OrderBy = "") {
...
// Add primary key (column 1) to end of order by list,
// so that returned order is predictable.
if ($OrderBy != "") {
$OrderBy .= ", ";
}
$OrderBy .= "1";
...
// Build and run query, returning the result as an array.
}
If you specify an $OrderBy argument of StaffID on a Staff object, the resulting SQL looks something like the following:
SELECT * FROM adminStaff ORDER BY StaffID, 1;
This works fine on a MySQL back-end, and from my searching of the web it should also be fine on most other DB back-ends. However, when using SQL Server, we get the following error message:
A column has been specified more than once in the order by list.
Columns in the order by list must be unique.
This arises because SQL Server disallows the same column appearing multiple times in the ORDER BY clause. In this case StaffID is column 1 and therefore we have multiple instances of the same column.
Is there a way to disable this check in SQL Server? MySQL provides a lot of options to enable/disable strictness checks and incompatible features - does SQL Server provide anything of that nature that would allow the above query to run without errors?
If not, do you have any suggestions for how we could resolve this in our data-object layer? Bear in mind we need to maintain compatibility with existing projects which expect this behaviour, so it is not sufficient to only include the first column when $OrderBy is blank.
The situation is also slightly complicated in the fact that the field list is customisable elsewhere in the data object configuration, so we can't rely on * being used as the field list - it could contain pretty much anything that is valid in a normal SQL field list. However, if that is asking too much, a solution to the simpler case (as outlined above) would be a good start!
In SQL Server you are able to sort either by column name or by ordinal position of the column order in the SELECT list.
In your case the column StaffID became the ordinal position 1. Hence SQL Server cannot sort the same result set based on the same column twice.
If you remove the 1 from your query, the problem will be solved.
Avoid using the ordinal position of the column for sorting.
The basic question - is it possible to suppress this SQL Server restriction on ORDER BY column duplication - was answered by Venu: No it is not.
There are various suggestions (mostly from me) about how you could possibly code around this limitation in a generic manner. For any future readers, those answers are probably the most helpful if you are adapting an existing system. (If you are starting from scratch, just try and avoid this situation altogether.)
However, the actual solution that I came to was to add versioning to our internal API for our DBAL. The API version is now 2 but you can call setApiVersion(1) to instruct the back-end to use the old version of the API instead.
v2 is identical to v1* except it no longer automatically adds column 1 to the ORDER BY unless it is completely blank. Therefore, the SQL Server issue is resolved for new (v2) projects, whilst existing projects can be set to use the v1 API and therefore continue to work correctly (but without SQL server compatibility).
(* Actually, I've taken this opportunity to make some other breaking changes in v2, but that is not relevant to this answer.)
I've come up with a couple of potential solutions at the framework level. All of them have performance implications which would need to be profiled, and in practice that may rule some or all of them out. However, in theory at least, these are ways that a generic solution could be implemented.
Omit the ORDER BY altogether, and do the sorting in code. Would involve parsing the provided ORDER BY string. Would be problematic if ORDER BY contained expressions, but I can't remember ever seeing that in our projects, so can probably be ignored. Probably the slowest solution.
Perform the query without the ORDER BY, limiting the results set to a single row. Use resulting column list to work out whether column 1 is already in the ORDER BY clause, and therefore whether to add it. Then run the full query. Would require parsing the provided ORDER BY string. Query caching may mean this won't add as much overhead as it appears.
Parse the field list to get the first column name and see if this appears in the ORDER BY clause. If field list contains * or table.* would require a schema lookup. May be too difficult if we need to deal with table aliases and wildcards in combination.
Parse ORDER BY string and see if it contains any primary key. If so it is already uniquely ordered and doesn't require the addition of an extra field. Would require a schema look-up.
Use a sub-select to give us a new instance of the column that we can sort on instead. Not sure whether SQL Server would still complain that this is the 'same' column, though.
Could you just append '--' to your OrderBy parameter when working with SQL Server and just explicitly define the Order By fields where necessary?

How can I stop MiniProfiler showing "duplicate" SQL query warnings parameters are different?

I'm using MiniProfiler to check what NPoco is doing with SQL Server but I've noticed it reports duplicate queries even when the SQL parameters have different values.
Eg: if I fetch a string from a database by ID, I might call:
SELECT * FROM PageContent WHERE ID=#ID
...twice on the same page, with two different IDs, but MiniProfiler reports this as a duplicate query even though the results will obviously be different each time.
Is there any way I can get MiniProfiler to consider the SQL parameter values so it doesn't think these queries are duplicated? I'm not sure if this problem is part of MiniProfiler or if it's a problem in how NPoco reports it's actions to MiniProfiler so I'll tag both.
I think that this is by design, and is in fact one of the reasons for the existence of the duplicate query detection.
If you are running that query twice on one page where the only difference is the param value, then you could also run it one time and include both param values in that query.
SELECT * FROM PageContent WHERE ID in (#ID1, #ID2)
So you are doing with two queries what you could do with one (you would have to of course filter on server side, but that is faster than two queries).
The duplicate query label is not for saying that you are running the absolute identical query more than once (though it would apply there as well). Rather it is highlighting an opportunity for optimizing your query approach and consolidating different queries into one (think about what an N+1 situation would look like).
If the default functionality doesn't meet your needs, you can always change it! The functionality that calculates duplicateTimings is located in UI/includes.js. You can provide your own version of this file that defines duplicates in a different way (perhaps by looking at parameter values in addition to the command text when detecting duplicates) by turning on CustomUITemplates inside MiniProfiler, and putting your own version of includes.js in there.

SQL server search

I'm going to perform a search in my SQL server DB (ASP.NET, VS2010,C#), user types a phrase and I should search this phrase in several fields, how is it possible? do we have functions such as CONTAINS() in SQL server? can I perform my search using normal queries or I should work in my queries using C# functions?
for instance I have 3 fields in my table which can contain user search phrase, is it OK to write following sql command? (for instance user search phrase is GAME)
select * from myTable where columnA='GAME' or columnB='GAME' or columnC='GAME
I have used AND between different conditions, but can I use OR? how can I search inside my table fields? if one of my fields contains the phrase GAME, how can I find it? columnA='GAME' finds only those fields that are exactly 'GAME', is it right?
I'm a bit confused about my search approach, please help me, thanks guys
OR works fine if you want at least one of the conditions to be true.
If you want to search inside your text strings you can use LIKE
select * from myTable where columnA like '%GAME%' or columnB like '%GAME%' or columnC like '%GAME%'
Note that % is the wildcard.
If you want to find everything that begins with 'GAME' you type LIKE 'GAME%', if you allow 'GAME' to be in the middle you need % in both ends.
You can use LIKE instead of equals and then it can contain wildcard characters, so your example could be:
select * from myTable where columnA LIKE '%GAME%' or columnB LIKE '%GAME%' or columnC LIKE '%GAME%'
Further information may be found in MSDN
This is going to do some pretty heavy lifting in terms of what the database has to do though - I would suggest you consider something like full text search as I think it would more likely be suited to your scenario and provide faster results (of course, if you never have many records to search LIKE would probably do fine). Information on this is also in MSDN
Don't use LIKE, as suggested by other answers. It won't work with indexes, and therefore will be slow to return and expensive to run. Instead, you have two options:
Option 1: Full-Text Indexes
do we have functions such as CONTAINS() in SQL server?
Yes! You can use the CONTAINS() function in sql server. You just have to set up a full-text index for each of the columns you need to search on.
Option 2: Lucene.Net
Lucene.Net is a popular client-side library for searching text data that integrates closely with Sql Server. You can use it to make implementing your search a little easier.

How do I get “select for xml” to output to several files?

Our customer is complaining that our export file is too long; they would like us to split the export into many files with no more than “n” records per file. Is there a way of doing this with “select for xml”
At persent we are using Sql Server 2005 for this project.
(If this is too hard, I can always post process the single large file to split it up)
I don't think there's anything simple'n'easy you can do here.
My approach would probably be to limit the number of rows returned by each SELECT statement (by partioning the data returned by some criteria, e.g. by date or location or something), and then put those smaller XML streams into files one by one. Doable, but not very elegant or sophisticated..

SQL Server; index on TEXT column

I have a database table with several columns; most of them are VARCHAR(x) type columns, and some of these columns have an index on them so that I can search quickly for data inside it.
However, one of the columns is a TEXT column, because it contains a very large amount of data (23 kb of plain ascii text etc). I want to be able to search in that column (... WHERE col1 LIKE '%search string%'... ), but currently it's taking forever to perform the query. I know that the query is slow because of this column search because when I remove that criteria from the WHERE clause the query completes (what I would consider), instantaneously.
I can't add an index on this column because that option is grayed out for that column in the index builder / wizard in SQL Server Management Studio.
What are my options here, to speed up the query search in that column?
Thanks for your time...
Update
Ok, so I looked into the full text search and did all that stuff, and now I would like to run queries. However, when using "contains", it only accepts one word; what if I need an exact phrase? ... WHERE CONTAINS (col1, 'search phrase') ... throws an error.
Sorry, I'm new to SQL Server
Update 2
sorry, just figured it out; use multiple "contains" clauses instead of one clause with multiple words. Actually, this still doesn't get what I want (the exact phrase) it only makes sure that all words in the phrase are present.
Searching TEXT fields is always pretty slow. Give Full Text Search a try and see if that works better for you.
If your queries are like LIKE '%string%' (i. e. you search for a string inside a TEXT field), then you'll need a FULLTEXT index.
If you search for a substring in the beginning of the field (LIKE 'string%') and use SQL Server 2005 or higher, then you can convert your TEXT into a VARCHAR(MAX), create a computed column and index this column.
See this article in my blog for performance details:
Indexing VARCHAR(MAX)
You should be looking at using Full Text Indexing on the column.
You can do complex boolean querying in FTS; like
contains(yourcol,'"My first sting" or "my second string" and "my third string"')
Depending on your query ContainsTable or freetexttable might give better results.
If you are connecting through .Net you might want to look at A google full text search
And since nobody has already said it (maybe because it's obvious) querying LIKE '%string%' bypasses your existing indexes - so it'll run slow.
Hence - why you need to use full text indexing. (which is what Quassnoi said).
Correction - I'm sure I learnt this, and always believed it - but after some investigating it (using wildcard at the start) seems OK? My old regex queries run better with likes!

Resources