I have an application that almost continuously works with inserting or updating data. Since multiple requests are handled asynchronous I wrote my queries like below. I used an example based on SO, but that's not what I'm actually doing.
DECLARE #rows int;
INSERT INTO [user] ([username],[reputation])
SELECT [username],[reputation]
FROM (
SELECT [username]=:user,[reputation]=:rep
) A
WHERE A.[username] NOT IN (
SELECT [username]
FROM [user]
);
SET #rows = ##ROWCOUNT;
IF (#rows=0) BEGIN
UPDATE [user]
SET [reputation]=:rep, [updated]=GetDate()
WHERE [username]=:user
END;
This is passed in total to the database with PHP PDO. Because of the amount of data and other processing factors it's heavy on the (cheap) VPS it's running on. It's not really a problem if these processes run slow or get delayed, but on the other hand this data should be available via a website and then the queries on the data should be quick.
I was thinking about replicating parts of the processed data to a second server and running the website on that database. But I'm wondering how that would actually work with a query like above.
I'm guessing the UPDATE query will only be in the transaction log when #rows=0, so that won't be a problem.
But would the first part only send INSERT INTO [user] ([username],[reputation]) VALUES ('Hugo Delsing', '10k') or the entire query with the WHERE NOT IN () query?
Most of the time a user would exists, so if it only runs the new inserts that won't be a problem. But if it would run the entire query each time the benefits would be small.
Obviously I could wrap the first part up in another check if exists(select 1 from [user] where [username] = :user) to make sure it only runs when there is no user, but I'm wondering if that is necessary.
Bonus question, but feel free to ignore because it's a bit broad: Would replicating be the way to go or does MS SQL offer other/better solutions for something like this?
Related
I have a stored procedure that relies on a query to a linked server.
This stored procedure is roughly structured as follows:
-- Create local table var to stop query from needing round trips to linked server
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT eid FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
-- This view obscures sensitive information and shows only the data that I have permission to see
-- Many other things
The query itself is much more complex, but the key idea is building this temporary table from a linked server (because it takes the query 5 minutes to run if I don't, versus 3 seconds if I do).
I've recently had an issue where I ended up with updates to my table that failed to get checked against the linked server for duplicate information.
The logical chain of events is this:
Get all of the data from the original view
The original view contains maybe 3000 records, of which maybe 30 are
duplicates of the entity in question, but with 1 field having a
different value.
I then have to grab data from a different server to know which of
the duplicates is the correct one.
When the stored procedure runs, it updates each record.
ERROR STEP - when the stored procedure hits a duplicate record, it
updates my_table again - so es gets changed multiple times in a row.
The temp table was added after the fact when we realized incorrect es values were being introduced to my_table.
'my_database` does not contain the data needed to determine which is the correct tuple, hence the requirement for the linked server.
As far as I can tell, we had a temporary network interruption or a connection timeout that stopped my_server from getting the response back from linked_server, and it just passed an empty table to the rest of the procedure.
So, my question is - how can I guard against this happening?
I can't just check if the table is empty, because it could legitimately be empty. I need to definitively know if that initial SELECT from linked_server failed, if it timed out, or if it intentionally returned nothing.
without knowing the definition of the table you're querying you could get into an issue where your data is to long and you get a truncation error on your table.
Better make sure and substring it...
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT SUBSTRING(eid,1,6) FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
I had a similar problem where I needed to move data between servers, could not use a network connection so I ended up doing BCP out and BCP in. This is fast, clean and takes away the complexity of user authentication, drivers, trust domains. also it's repeatable and can be used for incremental loading.
In my SQL Server 2012 database, I have a linked server reference to a second SQL Server database that I need to pull records from and update accordingly.
I have the following update statement that I am trying to run:
UPDATE
Linked_Tbl
SET
Transferred = 1
FROM
MyLinkedServer.dbo.MyTable Linked_Tbl
JOIN
MyTable Local_Tbl ON Local_Tbl.LinkedId = Linked_Tbl.Id
JOIN
MyOtherTable Local_Tbl2 ON Local_Tbl.LocalId = Local_Tbl2.LocalId
Which I had to stop after an hour of running as it was still executing.
I've read online and found solutions stating that the best solution is to create a stored procedure on the Linked Server itself to execute the update statement rather than run it over the wire.
The problems I have are:
I don't have the ability to create any procedures on the other server.
Even if I could create that procedure, I would need to pass through all the Ids to the stored procedure for the update and I'm not sure how to do that efficiently with thousands of Ids (this, obviously, is the smaller of the issues, though since I can't create that procedure in the first place).
I'm hoping there are other solutions people may have managed to come up with given that it's often the case you don't have permissions to make changes to a different server.
Any ideas??
I am not sure, whether it can give more performance, you an try:
UPDATE
Linked_Tbl
SET
Transferred = 1
FROM OPENDATASOURCE([MyLinkedServer],'select Id, LocalId,Transferred from remotedb.dbo.MyTable') AS Linked_Tbl
JOIN MyTable Local_Tbl
ON Local_Tbl.LinkedId = Linked_Tbl.Id
JOIN MyOtherTable Local_Tbl2
ON Local_Tbl.LocalId = Local_Tbl2.LocalId
I'm working on a data virtualization solution. The user is able to write his own SQL queries as filters for a query i make. I would like not having to run this filter query every time i select something from the database(It will likely be a complex series of joins).
My idea was to use a # temp table at script level and keep the connection alive. This #temp table would then be selected from but updated only when the user changes the filter. The idea being i can actually use it from stored procedures and the table is scoped to that connection.
I got the idea from someone who suggested to use dynamic sql and ## global temp tables named with the connection process ID so to make each connection have a unique global temp table. This was to overcome sharing temp tables across stored procedures. But it seems a bit clumsy.
I did a quick test with the below code and seemed to work fine
-- Run script at connection open from some app
SELECT * INTO #test
FROM dataTable
-- Now we can use stored procedures with #test table
EXECUTE selectFromTempTable
EXECUTE updateTempTable #sqlFilterString
EXECUTE selectFromTempTable
Only real problem i can see is the connection have to be kept alive for the duration which can be a few hours maybe. A single user can have multiple connections running at the same time. The number of users on a single database server would be like max 20.
If its a huge issue i could make it so the application can close and open them as needed so each user only have 1 connection open at a time. And maybe even then close it if not in use, and reopen when needed again with the delay of having to wait for the query to run.
Would this be bad practice? or kill any performance benefit from not running the filter query? This is on SQL Server 2008 and up.
I think I would create a permanent table, using the spid (process ID) as a key value. Each connection has its own process ID, so anyone can use it to identify their entries in the table:
create table filter(
spid int,
filternum int,
filterstring varchar(255),
<other cols> );
create unique index filterindx on filter(spid, filternum);
Then when a user creates filter entries:
delete from filter where spid = ##spid
insert into filter(spid, filternum, filterstring) select ##spid, 1, 'some sql thing'
insert into filter(spid, filternum, filterstring) select ##spid, 2, 'some other sql thing'
Then you can access each user's filter values by selecting where spid = ##spid etc
I am deleting login records in my database that don't have a corresponding logout record, but right now it's very slow It does this:
First it gets the queries to loop over to check to delete
Next it needs to find out if the next record for that user is a login or logout, if it's a login, I delete it.
To get the next record of that type it does this query of query:
<cfquery dbtype="query" name="getnext" maxrows="1">
SELECT * FROM getlogs WHERE id > #id# AND logType = 'login'
</cfquery>
But it's slow, doing it thousands of times makes it take about 56 seconds.
What would be a faster way to do this? Would another cfloop inside my loop (basicly a loop until I get to the row I want) be faster? Is there another way?
This sounds like something that can be done entirely in one query -- perhaps something like this:
delete from login_table t
where exists (
select id
from login_table
where id > t.id
and logtype = 'login'
)
This has nothing to do with ColdFusion per se; the same approach would apply in any environment. If this is a maintenance function that has no synchronous dependence on your application, you could even stick it into a stored procedure invoked automatically by a recurring "cleanup" task in the database itself.
Your best bet is to do it all in sql, using cursors or a temp table. That saves the roundtrips between the CF and sql servers.
I want to insert some data on the local server into a remote server, and used the following sql:
select * into linkservername.mydbname.dbo.test from localdbname.dbo.test
But it throws the following error
The object name 'linkservername.mydbname.dbo.test' contains more than the maximum number of prefixes. The maximum is 2.
How can I do that?
I don't think the new table created with the INTO clause supports 4 part names.
You would need to create the table first, then use INSERT..SELECT to populate it.
(See note in Arguments section on MSDN: reference)
The SELECT...INTO [new_table_name] statement supports a maximum of 2 prefixes: [database].[schema].[table]
NOTE: it is more performant to pull the data across the link using SELECT INTO vs. pushing it across using INSERT INTO:
SELECT INTO is minimally logged.
SELECT INTO does not implicitly start a distributed transaction, typically.
I say typically, in point #2, because in most scenarios a distributed transaction is not created implicitly when using SELECT INTO. If a profiler trace tells you SQL Server is still implicitly creating a distributed transaction, you can SELECT INTO a temp table first, to prevent the implicit distributed transaction, then move the data into your target table from the temp table.
Push vs. Pull Example
In this example we are copying data from [server_a] to [server_b] across a link. This example assumes query execution is possible from both servers:
Push
Instead of connecting to [server_a] and pushing the data to [server_b]:
INSERT INTO [server_b].[database].[schema].[table]
SELECT * FROM [database].[schema].[table]
Pull
Connect to [server_b] and pull the data from [server_a]:
SELECT * INTO [database].[schema].[table]
FROM [server_a].[database].[schema].[table]
I've been struggling with this for the last hour.
I now realise that using the syntax
SELECT orderid, orderdate, empid, custid
INTO [linkedserver].[database].[dbo].[table]
FROM Sales.Orders;
does not work with linked servers. You have to go onto your linked server and manually create the table first, then use the following syntax:
INSERT INTO [linkedserver].[database].[dbo].[table]
SELECT orderid, orderdate, empid, custid
FROM Sales.Orders
WHERE shipcountry = 'UK';
I've experienced the same issue and I've performed the following workaround:
If you are able to log on to remote server where you want to insert data with MSSQL or sqlcmd and rebuild your query vice-versa:
so from:
SELECT * INTO linkservername.mydbname.dbo.test
FROM localdbname.dbo.test
to the following:
SELECT * INTO localdbname.dbo.test
FROM linkservername.mydbname.dbo.test
In my situation it works well.
#2Toad: For sure INSERT INTO is better / more efficient. However for small queries and quick operation SELECT * INTO is more flexible because it creates the table on-the-fly and insert your data immediately, whereas INSERT INTO requires creating a table (auto-ident options and so on) before you carry out your insert operation.
I may be late to the party, but this was the first post I saw when I searched for the 4 part table name insert issue to a linked server. After reading this and a few more posts, I was able to accomplish this by using EXEC with the "AT" argument (for SQL2008+) so that the query is run from the linked server. For example, I had to insert 4M records to a pseudo-temp table on another server, and doing an INSERT-SELECT FROM statement took 10+ minutes. But changing it to the following SELECT-INTO statement, which allows the 4 part table name in the FROM clause, does it in mere seconds (less than 10 seconds in my case).
EXEC ('USE MyDatabase;
BEGIN TRY DROP TABLE TempID3 END TRY BEGIN CATCH END CATCH;
SELECT Field1, Field2, Field3
INTO TempID3
FROM SourceServer.SourceDatabase.dbo.SourceTable;') AT [DestinationServer]
GO
The query is run on DestinationServer, changes to right database, ensures the table does not already exist, and selects from the SourceServer. Minimally logged, and no fuss. This information may already out there somewhere, but I hope it helps anyone searching for similar issues.