How to use SQLAlchemy to create a temp table for MSSQL? - sql-server

I've tried the following method of creating a temp table for MSSQL using SQLA:
table_name = "#foo"
meta = MetaData(bind = session.bind)
table = Table(quoted_name(table_name, quote=False),
meta,
Column('a_number', Integer),
Column('device_Id', Integer),
Column('cost', Integer)
)
table.create()
There are no errors when I execute this, but there are errors if I follow it up with SQL statements that try to access the table. (The errors indicate #foo doesn't exist)
Also, if I look at the temp tables in my MSSQL session, there's no mention of the table, further evidence that it doesn't exist.
Note that I don't think this is a connection issue - if I comment out the above table.create() and 'manually' create the table, as in session.execute("create #foo..") that succeeds and so does the subsequent insert and read. So I think I'm on the same connection the whole time. Also, I can single step through this in a debugger and intermittently request my MSSQL session ID and it comes back the same (meaning I'm on the same session from MSSQL's point of view too)
A later test: I enabled full SQLAlchemy debugging and I noticed that it table.create() was causing a "commit" to be issued after the create table statement. Somehow, this commit was causing the temp table to become inaccessible. I experimented and found that if this commit is not emitted, then table.create() works and the temp table can be accessed in subsequent statements.
Here's my "work around" until I figure out why the commit is being emitted and/or why the commit is causing the temp table to "go away":
table_name = "#foo"
meta = MetaData(bind = session.bind)
table = Table(quoted_name(table_name, quote=False),
meta,
Column('a_number', Integer),
Column('device_Id', Integer),
Column('cost', Integer)
)
session.execute(CreateTable(table))
In the above approach, CreateTable is returning the actual SQL creation syntax and it's then executed via session.execute (which does not issue a commit)

A couple of points:
-> Check if you are creating engine properly. Check the link and look for Microsoft SQL Server heading. Link: http://docs.sqlalchemy.org/en/latest/core/engines.html
-> Check if your metadata is bounded to the engine.

Related

How to do an inner join rather than for each loop in SSIS?

On the ETL server I have a DW user table.
On the prod OLTP server I have the sales database. I want to pull the sales only for users that are present in the user table on the ETL server.
Presently I am using an execute SQL task to fetch the DW users into a SSIS System.Object variable. Then using a for each loop to loop through each item (userid) in this variable and via a data flow task fetch the OLTP sales table for each user and dump it into the DW staging table. The for each is taking long time to run.
I want to be able to do an inner join so that the response is quicker, but I cant do this since they are on separate servers. Neither can I use a global temp table to make the inner join, for the same reason.
I tried to collect the DW users into a comma separated string variable and then using it (via string_split) to query into OLTP, but this is also taking more time at the pre-execute phase (not sure why exactly) even for small number of users.
I also am aware of lookup transform but that too will result in all oltp rows to be brought into the dw etl server to test the lookup condition.
Is there any alternate approach to be able to do an inner join by taking the list of users into the source?
Note: I do not have write permissions on the OLTP db.
Based on the comments, I think we can use a temporary table to solve this.
Can you help me understand this restriction? "Neither can I use a global temp table to make the inner join, for the same reason."
The restriction is since oltp server and dw server are separate so can't have global temp table common to both servers. Hope makes sense.
The general pattern we're going to do is
Execute SQL Task to create a temporary table on the OLTP server
A Data Flow task to populate the new temporary table. Source = DW. Destination = OLTP. Ensure Delay Validation = True
Modify existing Data Flow. Modify source to be a query that uses the temporary table i.e. SELECT S.* FROM oltp.sales AS S WHERE EXISTS (SELECT * FROM #SalesPerson AS SP WHERE SP.UserId = S.UserId); Ensure Delay Validation = True
A long form answer on using temporary tables (global to set the metadata, regular thereafter)
I don't use temp table in SSIS
Temporary tables, live in tempdb. Your OLTP and DW connection managers likely do not point to tempdb. To be able to reference a temporary table, local or global, in SSIS you need to either define an additional connection manager for the same server that points explicitly at tempdb so you can use the drop down in the source/destination components (technically accurate but dumb). Or, you use an SSIS Variable to hold the name of the table and use the ~From Variable~ named option in source/destination component (best option, maximum flexibility).
Soup to nuts example
I will use WideWorldImporters as my OLTP system and WideWorldImportersDW as my DW system.
One-time task
Open SQL Server Management Studio, SSMS, and connect to your OLTP system. Define a global temporary table with a unique name and the expected structure. Leave your connection open so the table structure remains intact during initial development.
I used the following statement.
DROP TABLE IF EXISTS #SO_70530036;
CREATE TABLE #SO_70530036(EmployeeId int NOT NULL);
Keep track of your query because we'll use it later on but as I advocate in my SSIS answers, perform the smallest task, test that it works and then go on to the next. It's the only way to debug.
Connection Managers
Define two OLE DB Connection Managers. WWI_DW uses points to the named instance DEV2019UTF8 and WWI_OLTP points to DEV2019EXPRESS. Right click on WWI_OLTP and select Properties. Find the property RetainSameConnection and flip that from the default of False to True. This ensures the same connection is used throughout the package. As temporary tables go out of scope when the connection goes away, closing and reopening a connection in a package will result in a fatal error.
These two databases on different instances so we can't cheat and directly comingle data.
Variables
Define 4 variables in SSIS, all of type String.
TempTableName - I used a value of ##SO_70530036 but use whatever value you specified in the One-time task section.
QuerySourceEmployees - This will be the query you run to generate the candidate set of data to go into the temporary table. I used SELECT TOP (3) E.[WWI Employee ID] AS EmployeeId FROM Dimension.Employee AS E WHERE E.[Is SalesPerson] = CAST(1 AS bit);
QueryDefineTables - Remember the drop/create statements from the on-time task? We're going to use the essence of them but use the expression builder to let us dynamically swap the table name. I clicked the ellipses, ..., on the Expression section and used the following "DROP TABLE IF EXISTS " + #[User::TempTableName] + "; CREATE TABLE " + #[User::TempTableName] + "( EmployeeId int NOT NULL);" You should be able to copy the Value from the row and paste it into SSMS to confirm it works.
QuerySales - This is the actual query you're going to use to pull your filtered set of sales data. Again, we'll use the Expression to allow us to dynamically reference the temporary table name. The prettified version of the expression would look something like
"SELECT
SI.InvoiceID
, SI.SalespersonPersonID
, SO.OrderID
, SOL.StockItemID
, SOL.Quantity
, SOL.OrderLineID
FROM
Sales.Invoices AS SI
INNER JOIN
Sales.Orders AS SO
ON SO.OrderID = SI.OrderID
INNER JOIN
Sales.OrderLines AS SOL
ON SO.OrderID = SOL.OrderID
WHERE
EXISTS (SELECT * FROM " + #[User::TempTableName] + " AS TT WHERE TT.EmployeeID = SI.SalespersonPersonID);"
Again, you should be able to pull the Value from the three queries and run them independently and verify they work.
Execute SQL Task
Add an Execute SQL task to the Control Flow. I named mine SQL Create temporary table My Connection Manager is WWI_OLTP and I changed the SQLSourceType to Variable and the SourceVariable is User::QueryDefineTables
Every time your package runs, the first thing it will do is establish create the temporary table. Which is good because SSIS is a metadata driven ETL engine and the next two steps would fail if the table didn't exist.
Data Flow Task - Prime the pump
This data flow is where we'll transfer DW data back to the OLTP system so can filter in the source system.
Drag a Data Flow Task onto the Control Flow. I named mine DFT Load Temp and before you click into it, right click on the Task and find the DelayValidation property and change this from the default of False to True. Normally, a package validates all metadata before actual execution begins as the idea is you want to know everything is good before any data starts moving. Since we're using temporary tables, we need to tell the execution engine "trust us, it'll be ready"
Double click inside the Data Flow Task.
Add an OLE DB Source. I named mine OLESRC SourceEmployees I use the connection manager WWI_DW. My data access mode changes to SQL command from variable and then I select my variable User::QuerySourceEmployees
Add an OLE DB Destination. I named mine OLEDST TempTableName and double clicked to configure it. The Connection Manager is WWI_OLTP and again, since the table lives in tempdb, we can't select it from the drop down. Change the Data access mode to Table name or view name variable - fast load and then select your variable name User::TempTableName. Click the Mapping tab and ensure source columns map to destination columns.
Data Flow Task - Transfer data
Finally, we will pull our source data, nicely filtered against the data from our target system.
Add an OLE DB Source. I named it OLESRC QuerySales. The Connection Manager is WWI_OLTP. Data access mode again changes to SQL command from variable and the variable name is User::QuerySales
From here, do whatever else you need to do to make the magic happen.
Instead of having 270k rows with an unfiltered query
I have 67k as there are only 3 employees in the temporary table.
Reference package
But wait, there's more!
Close out visual studio, open it back up and try to touch something in the data flows. Suddenly, there are red Xs everywhere! Any time you close a data flow component, it fires a revalidate metadata operation and guess what, it can't do that as the connection to the temporary table is gone.
The package will run fine, it will not throw VS_NEEDSNEWMETADATA but editing/maintenance becomes a pain.
If you switched from global temporary table to local, switch the table name variable's value back to a global and then run the define statement in SSMS. Once that's done, then you can continue editing the package.
I assure you, the local temporary table does work once you have the metadata set and you use queries via variables for source/destination.
No need for the global temporary table hack, or the SET FMTONLY OFF hack (which no longer works).
Just specify the result set metadata in the SQL query with WITH RESULT SETS. eg
EXEC ('
create table #t
(
ID INT,
Name VARCHAR(150),
Number VARCHAR(15)
)
insert into #t (Id, Name, Number)
select object_id, name, 12
from sys.objects
select * from #t
')
WITH RESULT SETS
(
(
ID INT,
Name VARCHAR(150),
Number VARCHAR(15)
)
)
If you need to parameterize the query, there's a bit of a catch because there are some limitations in how SSIS discovers parameters. SSIS runs sp_describe_undeclared_parameters, which doesn't really work with batches that call sp_executesql, because sp_executesql has a very unique way it handles parameters, one which you couldn't replicate with a user stored procedure.
So to parameterize the query you'll either need to pass the parameter values into the query using the "query from variable" and SSIS expressions, or push all this TSQL into a stored procedure.

how to remove dirty data in yugabyte ( postgresql )

I try to add a column to a table with GUI Tableplus, but no response for long time.
So I turn to the db server, but got these error:
Maybe some inconsistent data generated during the operation through the Tableplus.
I am new to postgresql , and don't know what to do next.
-----updated------
I did some operation as #Dri372 told, and got some progress.
The failed reason for table sys_role and s2 is that the tables are not empty, they have some records.
If I run sql like this create table s3 AS SELECT * FROM sys_role; alter table s3 add column project_code varchar(50);, I successed.
Now how could I still work on the table sys_role?

How does SQL Server handle failed query to linked server?

I have a stored procedure that relies on a query to a linked server.
This stored procedure is roughly structured as follows:
-- Create local table var to stop query from needing round trips to linked server
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT eid FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
-- This view obscures sensitive information and shows only the data that I have permission to see
-- Many other things
The query itself is much more complex, but the key idea is building this temporary table from a linked server (because it takes the query 5 minutes to run if I don't, versus 3 seconds if I do).
I've recently had an issue where I ended up with updates to my table that failed to get checked against the linked server for duplicate information.
The logical chain of events is this:
Get all of the data from the original view
The original view contains maybe 3000 records, of which maybe 30 are
duplicates of the entity in question, but with 1 field having a
different value.
I then have to grab data from a different server to know which of
the duplicates is the correct one.
When the stored procedure runs, it updates each record.
ERROR STEP - when the stored procedure hits a duplicate record, it
updates my_table again - so es gets changed multiple times in a row.
The temp table was added after the fact when we realized incorrect es values were being introduced to my_table.
'my_database` does not contain the data needed to determine which is the correct tuple, hence the requirement for the linked server.
As far as I can tell, we had a temporary network interruption or a connection timeout that stopped my_server from getting the response back from linked_server, and it just passed an empty table to the rest of the procedure.
So, my question is - how can I guard against this happening?
I can't just check if the table is empty, because it could legitimately be empty. I need to definitively know if that initial SELECT from linked_server failed, if it timed out, or if it intentionally returned nothing.
without knowing the definition of the table you're querying you could get into an issue where your data is to long and you get a truncation error on your table.
Better make sure and substring it...
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT SUBSTRING(eid,1,6) FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
I had a similar problem where I needed to move data between servers, could not use a network connection so I ended up doing BCP out and BCP in. This is fast, clean and takes away the complexity of user authentication, drivers, trust domains. also it's repeatable and can be used for incremental loading.

Replace NULL columns in live database with data from a SQL Server backup

I recently had a horrible blunder.
While attempting to fix an issue we were having with our Exact Synergy system I was attempting to replace the data in two columns for one account with NULL, instead I replaced those two columns in ALL accounts with NULL. Completely restoring from a backup is not an option so now I am left trying to figure out how to replace the missing data.
I have made a full restore of a recent backup for this database to a test database and have confirmed that the data I need is there. I am trying to figure out how to properly write a query that will replace the data in the two columns.
Since this is a backup of the same database, the tables and columns are all identically named.
The databases are Synergy and Synergy_TESTDB
The owner of the tables is dbo
The table is called Addresses
The columns are called textfield1 and textfield2
What I would like to do is take the data in textfield1 and textfield2 from the backup database and use it to populate the empty, or NULL, columns in the live database.
I am extremely new to SQL, and would appreciate any help.
This is obviously untested. I take no responsibility for you using this code.
That said I'd like to try and help you.
The main point is the 3 part database.table naming: I'm assuming you restored backup to same server. I'm also assuming you have a primary key on the table? And that Synergy_TESTDB is the restored database:
update target
set target.textfield1 = source.textfield1
from Synergy.dbo.Addresses target
join Synergy_TESTDB.dbo.Addresses source on target.PrimaryKeyCol = source.PrimaryKeyCol
where target.textfield1 IS NULL
update target
set target.textfield2 = source.textfield2
from Synergy.dbo.Addresses target
join Synergy_TESTDB.dbo.Addresses source on target.PrimaryKeyCol = source.PrimaryKeyCol
where target.textfield2 IS NULL
(Sure it could be done in a single update, but I'm trying to keep it simple.)
I strongly suggest you try in another test database first.
A good habit to get in to is to use a pattern like this:
BEGIN TRANSACTION
-- Perform updates
-- Examine the results: select * from dbo.Blah ...
-- If results are wrong, we just rollback anyway
ROLLBACK
-- If results are what you want, uncomment the COMMIT and comment out the ROLLBACK
-- COMMIT TRANS

error when insert into linked server

I want to insert some data on the local server into a remote server, and used the following sql:
select * into linkservername.mydbname.dbo.test from localdbname.dbo.test
But it throws the following error
The object name 'linkservername.mydbname.dbo.test' contains more than the maximum number of prefixes. The maximum is 2.
How can I do that?
I don't think the new table created with the INTO clause supports 4 part names.
You would need to create the table first, then use INSERT..SELECT to populate it.
(See note in Arguments section on MSDN: reference)
The SELECT...INTO [new_table_name] statement supports a maximum of 2 prefixes: [database].[schema].[table]
NOTE: it is more performant to pull the data across the link using SELECT INTO vs. pushing it across using INSERT INTO:
SELECT INTO is minimally logged.
SELECT INTO does not implicitly start a distributed transaction, typically.
I say typically, in point #2, because in most scenarios a distributed transaction is not created implicitly when using SELECT INTO. If a profiler trace tells you SQL Server is still implicitly creating a distributed transaction, you can SELECT INTO a temp table first, to prevent the implicit distributed transaction, then move the data into your target table from the temp table.
Push vs. Pull Example
In this example we are copying data from [server_a] to [server_b] across a link. This example assumes query execution is possible from both servers:
Push
Instead of connecting to [server_a] and pushing the data to [server_b]:
INSERT INTO [server_b].[database].[schema].[table]
SELECT * FROM [database].[schema].[table]
Pull
Connect to [server_b] and pull the data from [server_a]:
SELECT * INTO [database].[schema].[table]
FROM [server_a].[database].[schema].[table]
I've been struggling with this for the last hour.
I now realise that using the syntax
SELECT orderid, orderdate, empid, custid
INTO [linkedserver].[database].[dbo].[table]
FROM Sales.Orders;
does not work with linked servers. You have to go onto your linked server and manually create the table first, then use the following syntax:
INSERT INTO [linkedserver].[database].[dbo].[table]
SELECT orderid, orderdate, empid, custid
FROM Sales.Orders
WHERE shipcountry = 'UK';
I've experienced the same issue and I've performed the following workaround:
If you are able to log on to remote server where you want to insert data with MSSQL or sqlcmd and rebuild your query vice-versa:
so from:
SELECT * INTO linkservername.mydbname.dbo.test
FROM localdbname.dbo.test
to the following:
SELECT * INTO localdbname.dbo.test
FROM linkservername.mydbname.dbo.test
In my situation it works well.
#2Toad: For sure INSERT INTO is better / more efficient. However for small queries and quick operation SELECT * INTO is more flexible because it creates the table on-the-fly and insert your data immediately, whereas INSERT INTO requires creating a table (auto-ident options and so on) before you carry out your insert operation.
I may be late to the party, but this was the first post I saw when I searched for the 4 part table name insert issue to a linked server. After reading this and a few more posts, I was able to accomplish this by using EXEC with the "AT" argument (for SQL2008+) so that the query is run from the linked server. For example, I had to insert 4M records to a pseudo-temp table on another server, and doing an INSERT-SELECT FROM statement took 10+ minutes. But changing it to the following SELECT-INTO statement, which allows the 4 part table name in the FROM clause, does it in mere seconds (less than 10 seconds in my case).
EXEC ('USE MyDatabase;
BEGIN TRY DROP TABLE TempID3 END TRY BEGIN CATCH END CATCH;
SELECT Field1, Field2, Field3
INTO TempID3
FROM SourceServer.SourceDatabase.dbo.SourceTable;') AT [DestinationServer]
GO
The query is run on DestinationServer, changes to right database, ensures the table does not already exist, and selects from the SourceServer. Minimally logged, and no fuss. This information may already out there somewhere, but I hope it helps anyone searching for similar issues.

Resources