Creating Azure SQL V12 Full Text Index is very slow

Creating Azure SQL V12 Full Text Index is very slow - sql-server

I'm creating a full text index on 7 columns of a 30 million row table.
5 days on this query:
SELECT count(*) FROM sys.dm_fts_index_keywords( 5, OBJECT_ID('[dbo].[FormattedAddress]'))
Returns 500,000 rows and seems to be slowing.
This database is a standard S2.
Is there anything I can do to speed things up?

A few things you can do are
Scale up to higer SKU say P1 and let index build complete
Scale down to S2
Please also see resource usage of the database to see if you are hitting any resource limits. You can't turn off the singing as those config options are not available in SQL DB.

Lets try a few things. First lets run the query till it completes. Second we will want to clean the buffer cache and then rerun the query. It might also be a problem they said was fixed in 2008. Check to make sure that the signature verification is turned off. Also one more thing to check is the memory allocation for FTS. The FullText service runs independantly from the SQL service, so there's a chance it's starved for memory.

Related

How to increase SQL Server select query performance?

I have a table with 102 columns and 43200 rows. Id column is an identity column and 2 columns have an unique index.
When I just execute
Select *
from MyTable
it takes almost 8 minutes+ over the network.
This table has a Status column which contains 1 or 0. If I select with where Status = 1, then I'm getting 31565 rows and the select is taking 6 minutes+. For your information status 1 completed and will not change ever anymore. But 0 status is working in progress and the rows are changing different columns value by different user stage.
When I select with Status = 0, it takes 1.43 minutes and returns 11568 rows.
How can I increase performance for completed and WIP status query separately or cumulatively? Can I somehow use caching?

The SQL server takes care of caching. At least as long as there is enough free RAM. When it take so long to get the data at first you need to find the bottleneck.
RAM: Is there enough to hold the full table? And is the SQL server configured to use it?
Is there an upper limit to RAM usage? If not SQL server assumes unlimited RAM and this will often end caching in page file, which causes massive slow downs
You said "8+ minutes through network". How long does it take on local execution? Maybe the network is slow
Hard drive: When the table is too big to be held in RAM it gets read from hard drive. HDDs are somewhat slow. Maybe defragmenting the indices could help here (at least somewhat)
If none helps, the SQL profiler might help to show you where the bottleneck actually is to find

This is an interesting question, but it's a little open-ended, more info is needed. I totally agree with allmhuran's comment that maybe you shouldn't be using "select * ..." for a large table. (It could in fact be posted as an answer, it deserves upvotes).
I suspect there may be design issues - Are you using BLOB's? Is the data at least partially normalized? ref https://en.wikipedia.org/wiki/Database_normalization

I Suggest create a non clustered index on "Status" Column. It improves your queries with Where Clause that uses this column.

SQLSTATE[IMSSP]: Tried to bind parameter number 2101. SQL Server supports a maximum of 2100 parameters

I am trying to run this query
$claims = Claim::wherein('policy_id', $userPolicyIds)->where('claim_settlement_status', 'Accepted')->wherebetween('intimation_date', [$startDate, $endDate])->get();
Here, $userPolicyIds can have thousands of policy ids. Is there any way I can increase the maximum number of parameters in SQL server? If not, could anyone help me find a way to solve this issue?

The wherein method creates an SQL fragment of the form WHERE policy_id IN (userPolicyIds[0], userPolicyIds[1], userPolicyIds[2]..., userPolicyIds[MAX]). In other words, the entire collection is unwrapped into the SQL statement. The result is a HUGE SQL statement that SQL Server refuses to execute.
This is a well known limitation of Microsoft SQL Server. And it is a hard limit, because there appears to be no option for changing it. But SQL Server can hardly be blamed for having this limit, because trying to execute a query with as many as 2000 parameters is an unhealthy situation that you should not have put yourself into in the first place.
So, even if there was a way to change the limit, it would still be advisable to leave the limit as it is, and restructure your code instead, so that this unhealthy situation does not arise.
You have at least a couple of options:
Break your query down to batches of, say, 2000 items each.
Add your fields into a temporary table and make your query join that table.
Personally, I would go with the second option, since it will perform much better than anything else, and it is arbitrarily scalable.

I solved this problem by running this raw query
SELECT claims.*, policies.* FROM claims INNER JOIN policies ON claims.policy_id = policies.id
WHERE policy_id IN $userPolicyIds AND claim_settlement_status = 'Accepted' AND intimation_date BETWEEN '$startDate' AND '$endDate';
Here, $userPolicyIds is a string like this ('123456','654321','456789'). This query is a bit slow, I'll admit that. But the number of policy ids is always going to a very big number and I wanted a quick fix.

just use prepare driver_options (PDO::prepare)
PDO::ATTR_EMULATE_PREPARES => true
https://learn.microsoft.com/en-us/sql/connect/php/pdo-prepare
and split where in on peaces (where (column in [...]) or column in [...])

Deleting same number of records from SQL Server database takes either 0.2 sec or 30 sec

I am using SQL Server 2008 R2. I am deleting ~5000 records from a table. While I was testing performance with the same data I have found that the deletion either takes 1 sec or 31 sec.
The test database is confidential so can not share it here.
I have already tried to separate the load and only delete 1000 record at a time but I still experience the deviation.
How should I continue my investigation? What could be the reason for the performance difference?
The query is simple, something like: delete from PART where INVOICE_ID = 64225

DELETE clause uses a lot of system and transaction log resources and thats why it can take a lot of time. If possible try to TRUNCATE the table instead.

When you run the DELETE for the first time perhaps the data wasn't in the buffer pool so had to be read off storage, the second time you run it will be in storage. Other factors will include if the checkpoint process is running your changed pages off to storage. Post your query plan ->
set statistics xml on
select * from sys.objects
set statistics xml off

Argh! After having a closer look at the execution plan I have noticed that one index was missing. After adding the index all executions are running fast. Note that after this I have removed the index to test if the problem comes back. The performance deviations re appeared.

How to improve performance in SQL Server table with image fields?

I'm having a very particular performance problem at work!
In the system we're using there's a table that holds information about the current workflow process. One of the fields holds a spreadsheet that contains metadata about the process (don't ask me why!! and NO I CAN'T CHANGE IT!!)
The problem is that this spreadsheet is stored in an IMAGE field in an SQL Server 2005 (within a database set with SQL 2000 compatibility).
This table currently has 22K+ lines and even a simple query like this:
SELECT TOP 100 *
FROM OFFENDING_TABLE
Takes 30 seconds to retrieve the data in Query Analyser.
I'm thinking about updating the compatibility to SQL 2005 (once that I was informed that the app can handle it).
The second thing I'm thinking is to change the data-type of the column to varbinary(max) but I don't know if doing this will affect the application.
Another thing that I'm considering is to use sp_tableoption to set the large value types out of row to 1 as it's currently 0, but I have no information if doing this will improve performance.
Does anyone know how to improve performance in such scenario?
Edited to clarify
My problem is that I have no control on what the application asks to the SQL Server, and I did some Reflection on it (the app is a .NET 1.1 website) and it uses the offending field for some internal stuff that I have no idea what it is.
I need to improve the overall performance of this table.

I'd recommend you look into the offending table layout health:
select * from sys.dm_db_index_physical_stats(
db_id(), object_id('offending_table'), null, null, detailed);
Things too look for are avg_fragmentation_in_percent, page_count, avg_page_space_used_in_percent, record_count and ghost_record_count. Cues like high fragmentation, or a high number of ghost records, or a low page used percent indicate problems and things can be improved quite a bit just by rebuilding the index (ie. the table) from scratch:
ALTER INDEX ALL ON offending_table REBUILD;
I'm saying this considering that you cannot change the table nor the app. If you'd be able to change the table and the app, the advice you already got is good advice (don't use '*', dont' select w/o a condition, use the newer varbinary(max) type etc etc).
I'd also look into the average page lifetime in performance counters to understand if the system is memory starved. From your description of the symptomps the system looks IO bound which leads me to think there is little page caching going on, and more RAM could help, as well as a faster IO subsytem. On a SQL 2008 system I would also suggest turning page compression on, but on 2005 you can't.
And, just to be sure, make sure the queries are not blocked by contention from the app itself, ie. the query doesn't spend 90% of that 30 seconds waiting for a row lock. Look at sys.dm_exec_requests while the query is running, see the wait_time, wait_type and wait_resource. Is it PAGEIOLATCH_XX? Or is it a lock? Also, how is the sys.dm_os_wait_stats in your server, what are the top wait reasons?

First of all - don't ever do a SELECT * in production code - reporting or not.
You have three basic choices:
move that blob field out into a separate table if it's not always needed; probably not practical since you mention you cannot change the schema
be more careful with your SELECT statements to select only those fields that you really need - and omit the blob field
see if you can limit your query to include a WHERE clause and find a way to optimize the query plan by e.g. adding a suitable index to the table (if you can)
There's no magic "make this faster" switch - but you can optimize your query or optimize your table layout. Both help. If you can't change anything - neither the table layout, nor add an index, nor change the queries, you'll have a hard time optimizing anything, I'm afraid....
Just changing the field to VARBINARY(MAX) won't change anything at all - no performance improvement to be expected just from changing the data type.

A short answer is to only do SELECTs against multiple rows when the fields returned do not include the offending image field, ie no SELECT *. If you want the value of the image field, retrieve it on a case-by-case basis.

Setting the large value types out of row option should definitely help performance. The row size will be significantly smaller, SQL Server can do a lot fewer physical reads to get throught the table.

Have you ever encountered a query that SQL Server could not execute because it referenced too many tables?

Have you ever seen any of there error messages?
-- SQL Server 2000
Could not allocate ancillary table for view or function resolution.
The maximum number of tables in a query (256) was exceeded.
-- SQL Server 2005
Too many table names in the query. The maximum allowable is 256.
If yes, what have you done?
Given up? Convinced the customer to simplify their demands? Denormalized the database?
#(everyone wanting me to post the query):
I'm not sure if I can paste 70 kilobytes of code in the answer editing window.
Even if I can this this won't help since this 70 kilobytes of code will reference 20 or 30 views that I would also have to post since otherwise the code will be meaningless.
I don't want to sound like I am boasting here but the problem is not in the queries. The queries are optimal (or at least almost optimal). I have spent countless hours optimizing them, looking for every single column and every single table that can be removed. Imagine a report that has 200 or 300 columns that has to be filled with a single SELECT statement (because that's how it was designed a few years ago when it was still a small report).

For SQL Server 2005, I'd recommend using table variables and partially building the data as you go.
To do this, create a table variable that represents your final result set you want to send to the user.
Then find your primary table (say the orders table in your example above) and pull that data, plus a bit of supplementary data that is only say one join away (customer name, product name). You can do a SELECT INTO to put this straight into your table variable.
From there, iterate through the table and for each row, do a bunch of small SELECT queries that retrieves all the supplemental data you need for your result set. Insert these into each column as you go.
Once complete, you can then do a simple SELECT * from your table variable and return this result set to the user.
I don't have any hard numbers for this, but there have been three distinct instances that I have worked on to date where doing these smaller queries has actually worked faster than doing one massive select query with a bunch of joins.

#chopeen You could change the way you're calculating these statistics, and instead keep a separate table of all per-product stats.. when an order is placed, loop through the products and update the appropriate records in the stats table. This would shift a lot of the calculation load to the checkout page rather than running everything in one huge query when running a report. Of course there are some stats that aren't going to work as well this way, e.g. tracking customers' next purchases after purchasing a particular product.

This would happen all the time when writing Reporting Services Reports for Dynamics CRM installations running on SQL Server 2000. CRM has a nicely normalised data schema which results in a lot of joins. There's actually a hotfix around that will up the limit from 256 to a whopping 260: http://support.microsoft.com/kb/818406 (we always thought this a great joke on the part of the SQL Server team).
The solution, as Dillie-O aludes to, is to identify appropriate "sub-joins" (preferably ones that are used multiple times) and factor them out into temp-table variables that you then use in your main joins. It's a major PIA and often kills performance. I'm sorry for you.
#Kevin, love that tee -- says it all :-).

I have never come across this kind of situation, and to be honest the idea of referencing > 256 tables in a query fills me with a mortal dread.
Your first question should probably by "Why so many?", closely followed by "what bits of information do I NOT need?" I'd be worried that the amount of data being returned from such a query would begin to impact performance of the application quite severely, too.

I'd like to see that query, but I imagine it's some problem with some sort of iterator, and while I can't think of any situations where its possible, I bet it's from a bad while/case/cursor or a ton of poorly implemented views.

Post the query :D
Also I feel like one of the possible problems could be having a ton (read 200+) of name/value tables which could condensed into a single lookup table.

I had this same problem... my development box runs SQL Server 2008 (the view worked fine) but on production (with SQL Server 2005) the view didn't. I ended up creating views to avoid this limitation, using the new views as part of the query in the view that threw the error.
Kind of silly considering the logical execution is the same...

Had the same issue in SQL Server 2005 (worked in 2008) when I wanted to create a view. I resolved the issue by creating a stored procedure instead of a view.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight