Synapse and on-prem tempdb usage - sql-server

I've made several pipelines with several copy data activities in Synapse. They get data from different sources (SQL, CSV, REST API) and sink that to an on-premise SQL server.
The pipelines run smooth, data is processed correctly, no issues there. What I'm facing right now is that the on-prem tempdb eats a lot of memory, but it's not clear for me when this space gets flushed or becomes available again. With most copy data activities I'm making use of the upsert function, so the checkbox 'Use TempDB' becomes available.
As soon as the table is synchronized I don't need the temp table anymore of course. Does Synapse or SQL Server automatically truncates this db? Or is this a setting on the tempdb? I've already shrank the tempdb but this only helps temporary ofcourse...
Let me know what the best options is here, thanks in advance! :)

Related

What is the purpose of tempdb in SQL Server?

I need a clarification about tempdb in SQL Server and need some clarifications on following things
What is the purpose of its?
Can we create a own tempdb and how to make refer the own tempdb to own database?
FROM MSDN
The tempdb system database is a global resource that is available to all users connected to the instance of SQL Server and is used to hold the following:
Temporary user objects that are explicitly created, such as: global
or local temporary tables, temporary stored procedures, table
variables, or cursors.
Internal objects that are created by the SQL Server Database Engine,
for example, work tables to store intermediate results for spools or
sorting.
Row versions that are generated by data modification transactions in
a database that uses read-committed using row versioning isolation
or snapshot isolation transactions.
Row versions that are generated by data modification transactions
for features, such as: online index operations, Multiple Active
Result Sets (MARS), and AFTER triggers.
Operations within tempdb are minimally logged.
This enables transactions to be rolled back. tempdb is re-created every time SQL Server is started so that the system always starts with a clean copy of the database.
Temporary tables and stored procedures are dropped automatically on disconnect, and no connections are active when the system is shut down. Therefore, there is never anything in tempdb to be saved from one session of SQL Server to another. Backup and restore operations are not allowed on tempdb.
TempdB is a system database and we cant create system databases .Tempdb is a global resource for all databases ,which means temp tables,table variables,version store for user databases...all will use tempdb..This is a pretty basic explanation for tempdb.Refer to below links on how it is used for other purposes like database emails,..
https://msdn.microsoft.com/en-us/library/ms190768.aspx
1: It is what it says. A temporary storage. FOr example when you ask for DISTINCT results, SQL Server must remember what rows it already sent you. Same with a temporary table.
2: Makes no sense. Tempdb is not a database but a server thing - ONE TempDB regardless how many database. You can change where it is and how it is (file number, size) but it is never related to one database (except obviously if you only have one database on a SQL Server instance). Having your own Tempdb is NOT how SQL Server works. And while we are at it - no need to ever make a backup (of tempdb). When SQL Server starts, Tempdb is reinitialized as empty.
And, btw., this would be obvious if you would bother with such things as being borderline competent. Which includes for me reading the documentation of every major technology I work with once. You should consider this to be something to adopt because it is the only way to know what you are doing.

Copying Large Amounts of Data to Replicated Database

I have a local SQL Server database that I copy large amounts of data from and into a remote SQL Server database. Local version is 2008 and remote version is 2012.
The remote DB has transactional replication set-up to one local DB and another remote DB. This all works perfectly.
I have created an SSIS package that empties the destination tables (the remote DB) and then uses a Data Flow object to add the data from the source. For flexibility, I have each table in it's own Sequence Container (this allows me to run one or many tables at a time). The data flow settings are set to Keep Identity.
Currently, prior to running the SSIS package, I drop the replication settings and then run the package. Once the package completes, I then re-create the replication settings and reinitialise the subscribers.
I do it this way (deleting the replication and then re-creating) for fear of overloading the server with replication commands. Although most tables are between 10s and 1000s of rows, a couple of them are in excess of 35 million.
Is there a recommended way of emptying and re-loading the data of a large replicated database?
I don't want to replicate my local DB to the remote DB as that would not always be appropriate and doing a back and restore of the local DB would also not work due to the nature of the more complex permissions, etc. on the remote DB.
It's not the end of the world to drop and re-create the replication settings each time as I have it all scripted. I'm just sure that there must be a recommended way of managing this...
Not doing it. Empty / Reload is bad. Try to update the table via merge - this way you can avoid the drop and recreate, which also will result in 2 replicated operations. Load the new data into temp tables on the other server (not replicated), then merge them into the replicated tables. If a lot of data is unchanged, this will seriously reduce the replication load.

T-SQL: advise on copying data across to another database

I need advise on copying daily data to another server.
Just to give you an image of the situation, I will explain a little. there are workstations posting transactions to 2 database servers (DB1 and DB2). These db servers hosted on 2 separate physical servers and are linked. Daily transactions are 50.000 for now but will increase soon. There might be days some workstations down (operational but cannot post data) and transactions posted after a few days.
So, what I do is I run a query on those 2 linked servers. The daily query output contains ~50.000 records with minimum 15 minutes fetching time as linked servers have performance problems.I will create a SP and schedule it to run 2AM in the morning.
My concern starts from here, the output will be copied across to another data warehouse (DW). This is our client's special land, I do not know much about. This DW will be linked onto these db servers to make it possible to send the data (produced by my stored procedure) across.
Now, what would you do to copy the data across:
Create a dummy table on DB1 to copy stored procedure output on the same server so make sure it is available and we do not need to rerun stored procedure again. Then client retrieves it later.
Use "select into" statement to copy the content to remote DW table. I do not know what happens with this one during fetching and sending data across to DW. Remember it takes ~15 mins to fetch the data by my stored procedure.
post the data (retrieved by stored procedure) with xml file through ftp.
Please tell me if there is a way of setting an alert or notification on jobs.
I just want to take precautions so it will be easier to track when something goes wrong.
Any advice is appreciated very much. Thank you. Oz.
When it comes to coping data in SQL Server you need to look at High Availability Solutions, depending on the version and edition of your SQL Server you will have different options.
http://msdn.microsoft.com/en-us/library/ms190202(v=sql.105).aspx
If you need just to move data for specific tables you can have options like SSIS job or SQL Server Replication.
If you are looking to have all tables in a given databases copied to another server you should use Log Shipping. Which allows you to copy entire content of source database to another location. Because this is done of smaller interval the your load will be distributed over larger period of time instead of having large transaction running at once.
Another great alternative is SQL Server Replication. This option will capture transaction on the source and push them to the target. This model requires publisher (source), distributor (can be source or another db) and subscriber (target).
Also you can create SSIS job that runs on frequent basis and just moves specified amount of data.

Anything like a SQL Server proxy that will cache data accessed locally?

Wild shot this, but does anyone know of a kind of proxy that will cache the data used in a SQL Server query locally - so the same query can be run again offline?
What I'm doing here is developing stored procedures against a (very large) database of several dozen tables. I'd like to be able to have the data that was used in the returning of the result set cached locally, so that when I'm not connected to the network, I can continue refining the stored proc business logic and have it return data off the local cached copy of tables I have. I only want a portion of each table locally.
I could SSIS a chunk of each of the tables to similarly named copies locally, but to find all the related data is a chunk of work in itself. I was wondering if there's anything out there that does something similar.
(I want to cache the data locally in the same table structure as the networked SQL Server, so I can continue tweaking the SQL against it).
disclaimer : googled "SQL Server cache query data locally" and variations...
Only "MTCache" turned up, but I can't find out anything beyond the research paper?
Thanks,
Ray

What type of database replication should I use?

I have 2 databases, one on local server and one on a remote server.
I created a transactional replication publication on the local DB, which feeds the remote DB every minute with whatever updates it gets. So far, this is working perfectly.
However, the local DB needs to get cleaned (all its information deleted) daily. THIS is the part I'm having trouble with, I was expecting a replication mode that would only feed the server DB with the inserts, and make it ignore the part when the local DB gets cleaned. At the moment, the remote DB is also getting cleaned.
Would a different kind of replication help me achieve what I want, or is replication no longer the way to do it?
Have a look at this SO question here

Resources