INSERT after new id populated in base table - database

I have two pieces to this question, one will be bonus for me if you can answer.
Question #1
I have two SQLITE tables named tbl and tbl_search
Where tbl is a normal table and tbl_search is a FTS4 table
Let's say we are at run#0 of my app, the app runs and starts filling the tbl with data and then at the end, it copies everything from the tbl to tbl_search and job done but, this comes with a performance cost for incremental runs. For incremental runs(run#1), so far, i've been deleting and re-creating the fts table because during normal copy, it used to dump the run#0 items as well, creating duplicate data, plus this was costing a lot on the performance part.
After giving a thought i found 4 methods to tackle this situation:
1) Create a SQL Trigger for tbl to tbl_search copy.
2) Create a temp table in every run and copy its data to tbl_search to avoid duplication and then delete it
3) Use the unique ID's in the ID column of tbl and find the missing ones and replicate only those to tbl_search
4) create another table which will have last run details and use them to put everything after the --AFTER Date column. Let's say last run finished on 04-05-2018 9:40 PM, then replicate everything after that time from tbl to tbl_search.
I found option#3 to be most fruitful in terms of performance and would like to go for it, so how can i do that? Below is an example tbl for reference.
RUN#0
ID FILENAME LABEL_NUMBER
----------------------------------------------------------------------
1 C:/Test_Software/6.avi 11
2 C:/Test_Software/6.avi 10
3 C:/Test_Software/6.avi 8
4 C:/Test_Software/6.avi 6
5 C:/26.avi 10
6 C:/26.avi 8
RUN#1 (incremental)
ID FILENAME LABEL_NUMBER
----------------------------------------------------------------------
7 C:/Test_Software/36.avi 51
8 C:/Test_Software/556.avi 30
I would like to run a query like :
Select ID from tbl, If (ID not present in tbl_search) INSERT into tbl_search
Bonus Question: Out of all the methods that i shared, which one is the fastest and best method that i should choose? Please share WHY and How-To as well, I'll be grateful.

I'm going to answer this on my own as i did enough testing now :
Method #1 : SQL Triggers
Great method, if you don't have loops to take care of(nested loops)
Painful method if you have trillion loops going on before something can be executed into the tables hence, dis-qualified for me
Method #2 : Create a Temp Table
While this is a good method, it won't give you any performance bump, rather
there is going to be duplicating effect. Thus, not advisable.
Method #3 : Use ID
Best method because of the fact that ID's are created as unique key in the Table, so you can simply query the last id, store id somewhere and then compare the other table starting from the last_known id. Got the best performance benefit here.
Method #4 : Create another table - Aux Table
Naah, waste of efforts and resources. not recommended again.
Hope this helps fellow SO folks.
Thanks

Related

Many operations time out and block process in SQL Server

We have a lot of operations that time out in our site log.
After installing Redgate SQL Monitor on the server, we figured out we have many blocked processes and some times deadlock.
With Redgate we realized problem is for a stored procedure. It's a simple stored procedure and just increase view count of product (simple update)
ALTER PROCEDURE [dbo].[SP_IncreaseProductView]
#PID int
AS
BEGIN
UPDATE dbo.stProduct
SET ViewCount = ViewCount + 1
WHERE ID = #PID
END
When that stored procedure is off, everything is fine but some times block process error is back.
This table(product) hasn't any trigger. but its like heart of system and has 12000 records.
It has 3 indexes, 1 clustered and 2 non-clustered, and many statistics
We don't have any transaction. block process mostly happens in update query.
How can I figure out where the problem is?
Sorry for my bad languages
Thanks
Edit:
I think the problem isn't the SP, its about update on product table (Its my opinion). Its large table. I still get block process when SP is off but less.
We have a lot of select and update on this table.
Also i rewrite the increase view count with LINQ to SQL, but still get block process like when SP is on.
Edit 2:
I set profiler and get all query on product table.
530 select (most with join with another 2 table) and 25 update per minute (on product table only).
For now, [SP_IncreaseProductView] is off. Because when its on, we getting block process and operation timed out about every 10 second and web site stopped.
After that (set SP to off) block process still exist but roughly 50 per day.
I would go with Ingaz' second solution, but further optimizations or simplification can be performed:
1) store view count in a product 1:1 table. This is particularly useful when some queries do not need view count
2) view count redundancy
- keep view count in product table and read it from there
- also define view count in another table (just productid and viewcount columns)
- application can update asynchronously directly in the second table
- a job updates in the product table based on data from the second table
This ensures that locking is affecting product table much less than independent updates.
It is expected: I suppose that you have a lot of updates into single small (12K rows) table.
You can:
Work around your problem
Put ROWLOCK hint in your UPDATE
Change database option to READ_COMMITED_SNAPSHOT
Be warned though: it can create another problems.
More complex recipe to eliminate blocking completely
Not for faint of heart.
Create table dbo.stProduct_Increment.
Modify your [SP_IncreaseProductView] to INSERT into increment table.
Create periodic task that UPDATEs your dbo.stProduct and clear increment table.
Create view that combines stProduct and stProduct_Increment.
Modify all SELECT statements for stProduct to created view.

Copy data from one column to another in oracle table

My current project for a client requires me to work with Oracle databases (11g). Most of my previous database experience is with MSSQL Server, Access, and MySQL. I've recently run into an issue that seems incredibly strange to me and I was hoping someone could provide some clarity.
I was looking to do a statement like the following:
update MYTABLE set COLUMN_A = COLUMN_B;
MYTABLE has about 13 million rows.
The source column is indexed (COLUMN_B), but the destination column is not (COLUMN_A)
The primary key field is a GUID.
This seems to run for 4 hours but never seems to complete.
I spoke with a former developer that was more familiar with Oracle than I, and they told me you would normally create a procedure that breaks this down into chunks of data to be commited (roughly 1000 records or so). This procedure would iterate over the 13 million records and commit 1000 records, then commit the next 1000...normally breaking the data up based on the primary key.
This sounds somewhat silly to me coming from my experience with other database systems. I'm not joining another table, or linking to another database. I'm simply copying data from one column to another. I don't consider 13 million records to be large considering there are systems out there in the orders of billions of records. I can't imagine it takes a computer hours and hours (only to fail) at copying a simple column of data in a table that as a whole takes up less than 1 GB of storage.
In experimenting with alternative ways of accomplishing what I want, I tried the following:
create table MYTABLE_2 as (SELECT COLUMN_B, COLUMN_B as COLUMN_A from MYTABLE);
This took less than 2 minutes to accomplish the exact same end result (minus dropping the first table and renaming the new table).
Why does the UPDATE run for 4 hours and fail (which simply copies one column into another column), but the create table which copies the entire table takes less than 2 minutes?
And are there any best practices or common approaches used to do this sort of change? Thanks for your help!
It does seem strange to me. However, this comes to mind:
When you are updating the table, transaction logs must be created in case a rollback is needed. Creating a table, that isn't necessary.

SQL Query to copy tables between servers with auto insert / update depending on Identity column

Let's say I have 2 servers, and one identical table per server. In each tables, I have identity increment on (by 1 if u ask), and there is 'time' column to note when was the record updated/inserted.
so kinda like this:
ID Content Time
1 banana 2011-01-01 09:59:23.000
2 apple 2011-01-02 12:41:01.000
3 pear 2011-04-05 04:05:44.000
I want to copy (insert/update) all the contents from one table to another periodically with this requirements:
a. copy (insert/update) only before certain MONTH. i.e before August 2011. this is easy though.
b. insert only if records is really new (maybe if the ID isn't exist?)
c. update if you find the 'Time' column is newer (basically that means there is an update at that record) than the last performed copy (I save the date/time of last copy too)
I could do all that by building a program and check it record by record, but with hundred thousands records, it would be pretty slow.
Could it be done using just query?
Btw I'm using this query to copy between servers and I'm using SQL Server 2005
INSERT OPENQUERY(TESTSERVER, 'SELECT * FROM Table1')
SELECT * FROM Table1
thx for da help :)
My strategy in a situation like this is:
first do an outer join and determine the state of the data into a temp table
e.g.
#temp
id state
1 update
2 copy
3 copy
4 insert
Then run n statements joining the three tables together, one for each of the states. Sometimes you can eliminate the multiple statements by entering empty rows in the target with the correct keys. Then you only need to do a more complex update/copy.
However since these are cross server - I'd suggest the following strategy - copy the source table over entirely to the other server, and then do the above on the other server.
Only after this do I do performance analysis to optimise if needed.

What is a maintainable way to store large text fields without sacrificing performance?

I have been dancing around this issue for awhile but it keeps coming up. We have a system and our may of our tables start with a description that is originally stored as an NVARCHAR(150) and I then we get a ticket asking to expand the field size to 250, then 1000 etc, etc...
This cycle is repeated on ever "note" field and/or "description" field we add to most tables. Of course the concern for me is performance and breaking the 8k limit of the page. However, my other concern is making the system less maintainable by breaking these fields out of EVERY table in the system into a lazy loaded reference.
So here I am faced with these same to 2 options that have been staring me in the face. (others are welcome) please lend me your opinions.
Change all may notes and/or descriptions to NVARCHAR(MAX) and make sure we do exclude these fields in all listings. Basically never do a: SELECT * FROM [TableName] unless is it only retrieving one record.
Remove all notes and/or description fields and replace them with a forign key reference to a [Notes] table.
CREATE TABLE [dbo].[Notes] (
[NoteId] [int] NOT NULL,
[NoteText] [NVARCHAR](MAX)NOT NULL )
Obviously I would prefer use option 1 because it will change so much in our system if we go with 2. However if option 2 is really the only good way to proceed, then at least I can say these changes are necessary and I have done the homework.
UPDATE:
I ran several test on a sample database with 100,000 records in it. What I find is that the because of cluster index scans the IO required for option 1 is "roughly" twice that of option 2. If I select a large number of records (1000 or more) option 1 is twice as slow even if I do not include the large text field in the select. As I request less rows the lines blur more. I a web app where page sizes of 50 or so are the norm, so option 1 will work, but I will be converting all instances to option 2 in the (very) near future for scalability.
Option 2 is better for several reasons:
When querying your tables, the large
text fields fill up pages quickly,
forcing the database to scan more
pages to retrieve data. This is
especially taxing when you don't
actually need to return the text
data.
As you mentioned, it gives you
a clean break to change the data
type in one swoop. Microsoft has
deprecated TEXT in SQL Server 2008,
so you should stick with
VARCHAR/VARBINARY.
Separate filegroups. Having
all your text data in a slower,
cheaper storage location might be
something you decide to pursue in
the future. If not, no harm, no
foul.
While Option 1 is easier for now, Option 2 will give you more flexibility in the long-term. My suggestion would be to implement a simple proof-of-concept with the "notes" information separated from the main table and perform some of your queries on both examples. Compare the execution plans, client statistics and logical I/O reads (SET STATISTICS IO ON) for some of your queries against these tables.
A quick note to those suggesting the use of a TEXT/NTEXT from MSDN:
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature. Use varchar(max),
nvarchar(max) and varbinary(max) data
types instead. For more information,
see Using Large-Value Data Types.
I'd go with Option 2.
You can create a view that joins the two tables to make the transition easier on everyone, and then go through a clean-up process that removes the view and uses the single table wherever possible.
You want to use a TEXT field. TEXT fields aren't stored directly in the row; instead, it stores a pointer to the text data. This is transparent to queries, though - if you ask for a TEXT field, it will return the actual text, not the pointer.
Essentially, using a TEXT field is somewhat between your two solutions. It keeps your table rows much smaller than using a varchar, but you'll still want to avoid asking for them in your queries if possible.
The TEXT/NTEXT data type has practically unlimited length while taking up next to nothing in your record.
It comes with a few strings attached, like special behavior with string functions, but for a secondary "notes/description" type of field these may be less of a problem.
Just to expand on Option 2
You could:
Rename existing MyTable to MyTable_V2
Move the Notes column into a joined Notes table (with 1:1 joining ID)
Create a VIEW called MyTable that joins MyTable_V2 and Notes tables
Create an INSTEAD OF trigger on MyTable view which saves the Notes column into the Notes table (IF NULL then delete any existing Notes row, if NOT NULL then Insert if not found, otherwise Update). Perform appropriate action on MyTable_V2 table
Note: We've had trouble doing this where there is a Computed column in MyTable_V2 (I think that was the problem, either way we've hit snags when doing this with "unusual" tables)
All new Insert/Update/Delete code should be written to operate directly on MyTable_V2 and Notes tables
Optionally: Have the INSERT OF trigger on MyTable log the fact that it was called (it can do this minimally, UPDATE a pre-existing log table row with GetDate() only if existing row's date is > 24 hours old - so will only do an update once a day).
When you are no longer getting any log records you can drop the INSTEAD OF trigger on MyTable view and you are now fully MyTable_V2 compliant!
Huge amount of hassle to implement, as you surmised.
Alternatively trawl the code for all references to MyTable and change them to MyTable_V2, put a VIEW in place of MyTable for SELECT only, and not create the INSTEAD OF trigger.
My plan would be to fix all Insert/Update/Delete statements referencing the now deprecated MyTable. For me this would be made somewhat easier because we use unique names for all tables and columns in the database, and we use the same names in all application code, so making sure I had found all instances by a simple FIND would be high.
P.S. Option 2 is also preferable if you have any SELECT * lying around. We have had clients whos application performance has gone downhill fast when they added large Text/Blob columns to existing tables - because of "lazy" SELECT * statements. Hopefully that isn;t the case in your shop though!

TSQL "LIKE" or Regular Expressions?

I have a bunch (750K) of records in one table that I have to see they're in another table. The second table has millions of records, and the data is something like this:
Source table
9999-A1B-1234X, with the middle part potentially being longer than three digits
Target table
DescriptionPhrase9999-A1B-1234X(9 pages) - yes, the parens and the words are in the field.
Currently I'm running a .net app that loads the source records, then runs through and searches on a like (using a tsql function) to determine if there are any records. If yes, the source table is updated with a positive. If not, the record is left alone.
the app processes about 1000 records an hour. When I did this as a cursor sproc on sql server, I pretty much got the same speed.
Any ideas if regular expressions or any other methodology would make it go faster?
What about doing it all in the DB, rather than pulling records into your .Net app:
UPDATE source_table s SET some_field = true WHERE EXISTS
(
SELECT target_join_field FROM target_table t
WHERE t.target_join_field LIKE '%' + s.source_join_field + '%'
)
This will reduce the total number of queries from 750k update queries down to 1 update.
First I would redesign if at all possible. Better to add a column that contains the correct value and be able to join on it. If you still need the long one. you can use a trigger to extract the data into the column at the time it is inserted.
If you have data you can match on you need neither like '%somestuff%' which can't use indexes or a cursor both of which are performance killers. This should bea set-based task if you have designed properly. If the design is bad and can't be changed to a good design, I see no good way to get good performance using t-SQl and I would attempt the regular expression route. Not knowing how many different prharses and the structure of each, I cannot say if the regular expression route would be easy or even possible. But short of a redesign (which I strongly suggest you do), I don't see another possibility.
BTW if you are working with tables that large, I would resolve to never write another cursor. They are extremely bad for performance especially when you start taking about that size of record. Learn to think in sets not record by record processing.
One thing to be aware of with using a single update (mbeckish's answer) is that the transaction log (enabling a rollback if the query becomes cancelled) will be huge. This will drastically slow down your query. As such it is probably better to proces them in blocks of 1,000 rows or such like.
Also, the condition (b.field like '%' + a.field + '%') will need to check every single record in b (millions) for every record in a (750,000). That equates to more than 750 billion string comparisons. Not great.
The gut feel "index stuff" won't help here either. An index keeps things in order, so the first character(s) dictate the position in the index, not the ones you're interested in.
First Idea
For this reason I would actually consider creating another table, and parsing the long/messy value into something nicer. An example would be just to strip off any text from the last '(' onwards. (This assumes all the values follow that pattern) This would simplify the query condition to (b.field like '%' + a.field)
Still, an index wouldn't help here either though as the important characters are at the end. So, bizarrely, it could well be worth while storing the characters of both tables in reverse order. The index on you temporary table would then come in to use.
It may seem very wastefull to spent that much time, but in this case a small benefit would yield a greate reward. (A few hours work to halve the comparisons from 750billion to 375billion, for example. And if you can get the index in to play you could reduce this a thousand fold thanks to index being tree searches, not just ordered tables...)
Second Idea
Assuming you do copy the target table into a temp table, you may benefit extra from processing them in blocks of 1000 by also deleting the matching records from the target table. (This would only be worthwhile where you delete a meaningful amount from the target table. Such that after all 750,000 records have been checked, the target table is now [for example] half the size that it started at.)
EDIT:
Modified Second Idea
Put the whole target table in to a temp table.
Pre-process the values as much as possible to make the string comparison faster, or even bring indexes in to play.
Loop through each record from the source table one at a time. Use the following logic in your loop...
DELETE target WHERE field LIKE '%' + #source_field + '%'
IF (##row_count = 0)
[no matches]
ELSE
[matches]
The continuous deleting makes the query faster on each loop, and you're only using one query on the data (instead of one to find matches, and a second to delete the matches)
Try this --
update SourceTable
set ContainsBit = 1
from SourceTable t1
join (select TargetField from dbo.TargetTable t2) t2
on charindex(t1.SourceField, t2.TargetField) > 0
First thing is to make sure you have an index for that column on the searched table. Second is to do the LIKE without a % sign on the left side. Check the execution plan to see if you are not doing a table scan on every row.
As le dorfier correctly pointed out, there is little hope if you are using a UDF.
There are lots of ways to skin the cat - I would think that first it would be important to know if this is a one-time operation, or a regular task that needs to be completed regularly.
Not knowing all the details of you problem, if it was me, at this was a one-time (or infrequent operation, which it sounds like it is), I'd probably extract out just the pertinent fields from the two tables including the primary key from the source table and export them down to a local machine as text files. The files sizes will likely be significantly smaller than the full tables in your database.
I'd run it locally on a fast machine using a routine written in something like 'C'/C++ or another "lightweight" language that has raw processing power, and write out a table of primary keys that "match", which I would then load back into the sql server and use it as a basis of an update query (i.e. update source table where id in select id from temp table).
You might spend a few hours writing the routine, but it would run in a fraction of the time you are seeing in sql.
By the sounds of you sql, you may be trying to do 750,000 table scans against a multi-million records table.
Tell us more about the problem.
Holy smoke, what great responses!
system is on disconnected network, so I can't copy paste, but here's the retype
Current UDF:
Create function CountInTrim
(#caseno varchar255)
returns int
as
Begin
declare #reccount int
select #reccount = count(recId) from targettable where title like '%' + #caseNo +'%'
return #reccount
end
Basically, if there's a record count, then there's a match, and the .net app updates the record. The cursor based sproc had the same logic.
Also, this is a one time process, determining which entries in a legacy record/case management system migrated successfully into the new system, so I can't redesign anything. Of course, developers of either system are no longer available, and while I have some sql experience, I am by no means an expert.
I parsed the case numbers from the crazy way the old system had to make the source table, and that's the only thing in common with the new system, the case number format. I COULD attempt to parse out the case number in the new system, then run matches against the two sets, but with a possible set of data like:
DescriptionPhrase1999-A1C-12345(5 pages)
Phrase/Two2000-A1C2F-5432S(27 Pages)
DescPhraseThree2002-B2B-2345R(8 pages)
Parsing that became a bit more complex so I thought I'd keep it simpler.
I'm going to try the single update statement, then fall back to regex in the clr if needed.
I'll update the results. And, since I've already processed more than half the records, that should help.
Try either Dan R's update query from above:
update SourceTable
set ContainsBit = 1
from SourceTable t1
join (select TargetField
from dbo.TargetTable t2) t2
on charindex(t1.SourceField, t2.TargetField) > 0
Alternatively, if the timeliness of this is important and this is sql 2005 or later, then this would be a classic use for a calculated column using SQL CLR code with Regular Expressions - no need for a standalone app.

Resources