I have a range (day wise) partitioned table managed using pg_partman extension and replica of the same is created in a different database.
I want to copy a partition specific of that day into another database automatically on a daily basis. What would be the ideal way to approach it?
I am evaluating below approaches,
Using pg_dump to copy a specific partition
Copy partition using dblink extension
By running a daily cron job to copy all the rows which are yet to be copied for a specific day using dblink
I feel second option is good enough but I have to provide a specific partition name that I want to copy which I found a bit difficult to derive and append it into a dblink query.
Related
I have a directory in HDFS, where .csv files with fixed structure and column names will be dumped at the end of every day that may look like this:
I have a hive table that should have new data appended to it, at the beginning of every day, with data from .csv of previous day's .csv file. How do i accomplish this.
I can suggest to use CRON Jobs. You create a script that update the tables, and you configure a CRON job to execute that script each at a specific time of the day (for your case the beginning of the day), and then the tables will get updated automatically.
PS: this solution can be applied only if you're having your server in production, I mean the CRON job should be used in a server that's running 24/24, else, you should use Anacron.
Build Hive table on top of that directory in HDFS. After new files will be dumped in table location, select from that table will pick new files. I'd suggest to change the process which dumps files to write into date subfolders and create partitioned table by date. All you need after this is to run recover partitions command before selecting table.
I am developing a SSIS package and migrating billions of records into the destination table. I have currently created 5 staging tables and used a split function to segregate the records by year for each staging table.
I am planning to partition the main destination table and then switch the 5 staging records to it. I would like to know if it is better for the SSIS to write directly to the partition or write to a staging tables and then perform a switch to the partition table.
Could somebody tell me which is the desired approach ?
Conditional split in SSIS
After some discussion back and forth (see discussion) it seems the in this case have staging tables is not the best method. You have a partition already created to handle a single table and splitting that into many staging tables is more work that is not needed.
I'm currently trying to build a data flow in SSIS to select all records from a mapping table where an ID column exists in the related Item table. There are two complications:
The two tables are currently in different databases on different servers.
The databases are in Azure, for which I've read Linked Servers are not supported.
To be more clear, the job to migrate data from Staging environment to Production. I only want to push lookup records into prod if the associated Item IDs are in there. Here's some psudo-TSQL to give a clear goal of what I'm trying to achieve:
SELECT *
FROM [Staging_Server].[SourceDB].[dbo].[Lookup] L
WHERE L.[ID] IN (
SELECT P.[Item]
FROM [Production_Server].[TargetDB].[dbo].[Item] P
)
I haven't found a good way to create this in SSIS. I think I've created a work-around that involves sorting both tables and performing a merge join, but sorting both sides is an unnecessary hit on performance. I'm looking for a more direct and intuitive design for this seemingly simple data flow.
Doing this in a data flow, you'd have your Source query, sans filter, fed into a Lookup Component which is the subquery.
The challenge with this is SSIS is likely on-premises so that means you are going to pull all of your data out of Stage Azure to the server running SSIS and push it back to the Prod Azure instance.
That's a lot of network activity and as I'm reading the Azure pricing guide, I guess as long as you have the appropriate DTUs, you'd be fine. Back in the day, you were charged for Reads and not Writes so the idiom was to just push all your data to target server and then do the comparison there, much as ElendaDBA mentions. Only suggestion I'd make on the implementation is to avoid temporary tables or ad-hoc creation/destruction of them. Just implement as a physical table and truncate and reload prior to transmission to production.
You could create a temp table on staging server to copy production data into. Then you could create a query joining those two tables. After SSIS package runs, you could delete the temp table on staging server
We have 2 servers with a database containing the same tables and structures but different data. One is the testing environment, the other one contains productive data.
We want to copy the data from the productive database to the testing database. What is the best approach to achieve this?
If I delete the data first; will I be able to insert the data? Or will the primary key's count from where they were? What about inserting the primary keys for tables that have autonumbering?
It will depend on your specific need and data structure but here are some options to think over (prioritised in terms of what I would recommend):-
A simple backup and restore will be the easiest and quickest solution;
Using a data scripting tool (like Red-Gate's Data Compare) could solve your needs;
A SSIS package could be developed to pump data back and forth between the two instances; or
Write your own script using the SET IDENTITY INSERT ON / OFF command for the identity seeded tables
I'm using SQL Server 2008. My database is almost 2GB in size. 90% of it is one table (as per sp_spaceused), that I need don't for most of my work.
I was wondering if it was possible to take this table, and have it backed up in a separate file, allowing me to transfer the important data on a more frequent basis than this one.
My guess is the easiest way to do this is create a new database, create the table there, copy the table contents to the new database, drop the table relationships, drop the table, create a view pointing to the other database and use that view in my applications.
However, I was wondering if you had any pointers to different strategies that I may not be aware of at this point.
Create the table in a different FileGroup.
Here's a link with some good examples.
This creates a second physical file for just that table. It can be placed on a different physical drive for performance. You can do a backup or restore of just specific filegroups, which is what it sounds like you need.
This is one example of the larger topic of "Data Partitioning", which involves various methods of dividing large tables across multiple files.
I suggest the filegroup solution. However to copy a table from a database to another you can do this trick:
SELECT * INTO MyNewDatabase..MyTable FROM MyOldDatabase..MyTable