I would like to copy all data from one DB table to another DB.
I have created two connections but am not sure of what I need to do regarding the actual SQL to make this happen.
$one = DB::connection('mysql');
$two = DB::connection('mysql_2');
DB::statement("CREATE TABLE {$one}.products LIKE {$two}.products;");
Not completely unexpectedly, the response to this is
Object of class Illuminate\Database\MySqlConnection could not be converted to string
Is this even possible between two DBs?
Besides inserting each row into the new table, what are some other options for copying a large set of data from one DB to another?
Related
I'm SUPER new to Snowflake and Snowpark, but I do have respectable SQL and Python experience. I'm trying to use Snowpark to do my data prep and eventually use it in a data science model. However, I cannot write to the database from which I'm pulling from -- I need to create all tables in a second DB.
I've created code blocks to represent both input and output DBs in their own sessions, but I'm not sure that's helpful, since I have to be in the first session in order to even get the data.
I use code similar to the following to create a new table while in the session for the "input" DB:
my_table= session.table("<SCHEMA>.<TABLE_NAME>")
my_table.toPandas()
table_info = my_table.select(col("<col_name1>"),
col("<col_name2>"),
col("<col_name3>").alias("<new_name>"),
col("<col_name4"),
col("<col_name5")
)
table_info.write.mode('overwrite').saveAsTable('MAINTABLE')
I need to save the table MAINTABLE to a secondary database that is different from the one where the data was pulled from. How do I do this?
It is possible to provide fully qualified name:
table_info.write.mode('overwrite').saveAsTable('DATABASE_NAME.SCHEMA_NAME.MAINTABLE')
DataFrameWriter.save_as_table
Parameters:
table_name – A string or list of strings that specify the table name or fully-qualified object identifier (database name, schema name, and table name).
I have source and destination tables. In that almost 30 millions of records are there in Source Table in One server. Now I want to copy this data to another table which is in another server. And at the same time, this copy of data is not once. When ever the data changes in source table , need to insert/update/delete in destination by comparing with a key.
Solutions that I tried
Step 1. There is already linked connection between two servers. For inserting data to destination table, I used OPENROWSET function like this for copying data from one server to another server.
INSERT INTO destination table
SELECT *
FROM OPENROWSET('SQLOLEDB', 'provider string', query from source table)
Step 2. After this for making recent changes(delta mode) to destination table , I was using MERGE statement
I have created a procedure for step-1 & 2
Problem
But the problem is since the data is huge it is taking lot of time (more than 2 hours) for inserting the data and for MERGE statement.
Is anybody aware of how can I achieve this case with in less time. Please suggest
Thanks
Do you know how to transfer only new records between two different databases (ie. Oracle and MSSQL) using SSIS? There is no problem transfering new data only between two tables in the same database and server, but is this possible to do such operation between completely different servers and databases?
Ps. I know about solution using Lookup but it is not very efficient if anybody needs to check and add a lot of records (50k and more) several times per day. I would like to operate with new data only.
You have several options:
Timestamp based solution
If you have a column which stores the insertation time in the source system, you can select only the new records created since the last load. With the same logic, you can transfer modified records too, just mark the records with the timestamp value when it change.
Sequence based solution
If there is a sequence in the source table, you can load the new records based on that sequence. Query the last value from the destination system, then load avarything which is larger than that value.
CDC based solution
If you have CDC (Change Data Capture) in your source system, you can track the changes and you can load them based on the CDC entries.
Full load
This is the most resource hungry solution: you have to copy all data from the source to the destination. If you do not have any column which marks the new records, you should use this solution.
You have several options to achieve this:
TRUNCATE the destination table and reload it from source
Use a Lookup component to determine which records are missing
Load all data from source to a temporary table and write a query which retrieves the new/changed records.
Summary
If you have at least one column, which marks the new/modified records, you can use it to implement a differential/incremental load with SSIS. If you do not have any clue, which columns/rows are changed, you have to load (or at least query) all of them.
There is no solution which enables a one-query (INSERT .. SELECT) solution using multiple servers without transferring all data. (Please note, that a multi-server query using Linked Servers are transfers the data from the source system).
What about variables? Is it possible to use the same variable between different databases and servers in SSIS?
I would like to transfer last id number from a destination table and transfer it to the source table (different server!).
I can set a variable in a database scope like this:
DECLARE #Last int
SET #Last = (SELECT TOP 1 Id FROM dbo.Table_1 ORDER BY Id DESC)
SELECT *
FROM dbo.Table_2
WHERE ID > #Last;
However it works between two tables in the same database (as a SQL command) only. I can create a variable for a entire SSIS package in Variables --> Add variable, but I don't know it is possible to use the variable in a similar way as above - to keep an information about last id in a destination table and pass it to another table on a source server as data limit.
I have a desktop application through which data is entered and it is being captured in MS Access DB. The application is being used by multiple users(at different locations). The idea is to download data entered for that particular day into an excel sheet and load it into a centralized server, which is an MSSQL server instance.
i.e. data(in the form of excel sheets) will come from multiple locations and saved into a shared folder in the server, which need to be loaded into SQL Server.
There is a ID column with IDENTITY in the MSSQL server table, which is the primary key column and there are no other columns in the table which contains unique value. Though the data is coming from multiple sources, we need to maintain single auto-updating series(IDENTITY).
Suppose, if there are 2 sources,
Source1: Has 100 records entered for the day.
Source2: Has 200 records entered for the day.
When they get loaded into Destination(SQL Server), table should have 300 records, with ID column values from 1 to 300.
Also, for the next day, when the data comes from the sources, Destination has to load data from 301 ID column.
The issue is, there may be some requests to change the data at Source, which is already loaded in central server. So how to update the data for that row in the central server as the ID column value will not be same in Source and Destination. As mentioned earlier ID is the only unique value column in the table.
Please suggest some ides to do this or I've to take up different approach to accomplish this task.
Thanks in advance!
Krishna
Okay so first I would suggest .NET and doing it through a File Stream Reader, dumping it to the disconnected layer of ADO.NET in a DataSet with multiple DataTables from the different sources. But... you mentioned SSIS so I will go that route.
Create an SSIS project in Business Intelligence Development Studio(BIDS).
If you know for a fact you are just doing a bunch of importing of Excel files I would just create many 'Data Flow Task's or many Source to Destination tasks in a single 'Data Flow Task' up to you.
a. Personally I would create tables in a database for each location of an excel file and have their columns map up. I will explain why later.
b. In a data flow task, select 'Excel Source' as the source file. Put in the appropriate location of 'new connection' by double clicking the Excel Source
c. Choose an ADO Net Destination, drag the blue line from the Excel Source to this endpoint.
d. Map your destination to be the table you map to from SQL.
e. Repeat as needed for each Excel destination
Set up the SSIS task to automate from SQL Server through SQL Management Studio. Remember you to connect to an integration instance, not a database instance.
Okay now you have a bunch of tables right instead of one big one? I did that for a reason as these should be entry points and the logic to determinate dupes and import time I would leave to another table.
I would set up another two tables for the combination of logic and for auditing later.
a. Create a table like 'Imports' or similar, have the columns be the same except add three more columns to it: 'ExcelFileLocation', 'DateImported'. Create an 'identity' column as the first column and have it seed on the default of (1,1), assign it the primary key.
b. Create a second table like 'ImportDupes' or similar, repeat the process above for the columns.
c. Create a unique constraint on the first table of either a value or set of values that make the import unique.
c. Write a 'procedure' in SQL to do inserts from the MANY tables that match up to the excel files to insert into the ONE 'Imports' location. In the many inserts do a process similar to:
Begin try
Insert into Imports (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End try
-- if logic breaks unique constraint put it into second table
Begin Catch
Insert into ImportDupes (datacol1, datacol2, ExcelFileLocation, DateImported) values
Select datacol1, datacol2, (location of file), getdate()
From TableExcel1
End Catch
-- repeat above for EACH excel table
-- clean up the individual staging tables for the next import cycle for EACH excel table
truncate TableExcel1
d. Automate the procedure to go off
You now have two tables, one for successful imports and one for duplicates.
The reason I did what I did is two fold:
You need to know more detail than just the detail a lot of times like when it came in, from what source it came from, was it a duplicate, if you do this for millions of rows can it be indexed easily?
This model is easier to take apart and automate. It may be more work to set up but if a piece breaks you can see where and easily stop the import for one location by turning off the code in a section.
As usual, some background information first:
Database A (Access database) - Holds a table that has information I need from only two columns. The information from these two columns is needed for an application that will be used by people that cannot access database A.
Database B (Access database) - Holds a table that contains only two columns (mirrors to what we need from table A). Database B is accessible to all users of the application. One issue is that on of the column names is not the same as it is in the table from Database A.
What I need to do is transfer the necessary data via a utility that will run automatically, say once a week (the two databases don't need to be totally in sync, just close). The transfer utility will be run from a user account that has access to both databases (obviously).
Here's the approach I've taken (again if there is a better way, please suggest away):
Grab the data from database A. It is only the two columns from the necessary table.
Write the data out to [tablename].txt file using a DataReader object and WriterStream object. I've done this so I can use a schema.ini file and force the data columns to have the same name as they will be in Database B.
Create a DataSet object, containing a DataTable that mirrors the table from Database B.
Suck the information from the .txt file into the DataTable using the Microsoft.Jet.OLEDB.4.0 provider with extended properties of text, hdr=yes and fmt=delimited (to match how I have the schema.ini file setup and the .txt file setup). I'm using a DataAdapter to fill the DataTable.
Create another DataSet object, containing a DataTable that mirrors the table from Database B.
Suck in the information from Database B so that it contains all the current data found in the table that needs to be updated from Database A. Again I'm using a DataAdapter to fill this DataTable (a different one from Step 5, since they are both using different data sources).
Merge the DataTable that holds the data from Database A (or the .txt file, technically).
Update Database B's table with the changes.
I've written update, delete and insert commands manually for the DataAdapter that is repsonsible for talking to Database B. However, this logic is never used because the DataSet-From-Database-B.Merge(Dataset-From-TxtFile[tableName]) doesn't flip the HasChanges flag. This means the DataSet-From-Database-B.Update doesn't fire any of the commands.
So is there any way I can get the data from DataSet-From-TxtFile to merge and apply to Database B using the method I'm using? Am I missing a crucial step here?
I know I could always delete all the records from Database B's table and then just insert all the records from the text file (even if I had to loop through each record in the DataSet and apply row.SetAdded to ensure it triggers the HasChanges flag), but I'd rather have it apply ONLY the changes each time.
I'm using c# and the 2.0 Framework (which I realize means I can use DataTables and TableAdapters instead of DataSets and DataAdapters since I'm only dealing with a single table, but anyway).
TIA
Setting aside for a moment that I would use SQLServer and only have a single table with multiple views controlling who could see what information in it to avoid the whole synchronization problem...
I think that #Mitchel is correct here. Just write a program that connects to both databases, load A table and B table, respectively. Then, for each element (column pair) in A make sure it is in B. If not, then insert it in B. Then, for each element in B, make sure it is in A. If not, then remove it from B. The save B. I don't see the need to go to a file first.
Pseudocode:
DataTable A = load table from A
DataTable B = load table from B
foreach row in A
col1 = row[col1]
col2 = row[col2]
matchRow = B.select( "col1 = " + col1 + " and col2 = " + col2)
if not matchRow exists
add new row to B with col1,col2
end
end
foreach row in B
col1 = row[col1]
col2 = row[col2]
matchRow = A.select( "col1 = " + col1 + " and col2 = " + col2)
if not matchRow exists
remove row from B
end
end
update B
Why not simply use a data reader, and loop through the records, doing manual inserts if needed into database B?
Rather than working with datasets, merging, etc..