I have two tables in SQL Server and both of those tables have the same headers, which means its the same columns, but since I added them from Excel, it means that I was not able to import them as one table, since it is more then 1 million rows.
So now I have one table with a bit less than a million rows and one with like 400000 rows and actually it should be one table, but Excel only allows around one million.
I have them both imported into SQL Server and actually I really want them to be both in one table like union.
The question is how to do it.
I just want to put one of them below the other since it is exactly the same column header.
What you should have done was import the first sheet, and create the table at the same time, and then import the second sheet into the existing table, in a separate import process. Or, if you were using SSIS, you could have used a Union Data Transformation to "combine" the 2 datasets into one, and then insert all the data into a single table.
You can, however, easily get the data into one table. Assuming you want to retain Table1 and that Table1 and Table2 do indeed have the same definitions (and don't have IDENTITY columns) you can just do the following:
INSERT INTO dbo.Table1
SELECT *
FROM dbo.Table2;
DROP TABLE dbo.Table2;
Now all your data is in one table, Table1.
Related
Is there any way of converting the last a.ROWID > b.ROWID values in below code in to snowflake? the below is the oracle code. Need to take the ROW ID to snowflake. But snowflake does not maintain ROW ID. Is there any way to achieve the below and convert the row id issue?
DELETE FROM user_tag.user_dim_default a
WHERE EXISTS (SELECT 1
FROM rev_tag.emp_site_weekly b
WHERE a.number = b.ID
AND a.accountno = b.account_no
AND a.ROWID > b.ROWID)
So this Oracle code seem very broken, because ROWID is a table specific pseudo column, thus comparing value between table seem very broken. Unless the is some aligned magic happening, like when user_tag.user_dim_default is inserted into rev_tag.emp_site_weekly is also written. But even then I can imagine data flows where this will not get what you want.
So as with most things Snowflake, "there is no free lunch", so the data life cycle that is relying on ROW_ID needs to be implemented.
Which implies if you are wanting to use two sequences, then you should do explicitly on each table. And if you are wanting them to be related to each other, it sounds like a multi table insert or Merge should be used so you can access the first tables SEQ and relate it in the second.
ROWID is an internal hidden column used by the database for specific DB operations. Depending on the vendor, you may have additional columns such as transaction ID or a logical delete flag. Be very carful to understand the behavior of these columns and how they work. They may not be in order, they may not be sequential, they may change in value as a DB Maint job runs while your code is running, or someone else runs an update on a table. Some of these internal columns may have the same value for more than one row for example.
When joining tables, the RowID on one table has no relation to the RowID on another table. When writing Dedup logic or delete before insert type logic, you should use the primary key, and then additionally an audit column that has the date of insert or date of last update in combo with that. Check the data model or ERD digram for the PK/FK relationships between the tables and what audit columns are available.
I have a Lookup Transformation on a table with 30 columns but I only am using two columns: ID column for the join and Update column as Output.
On the connection should I enter a query Select ID, Update From T1 or Use Table in the drop down?
Using table in Drop down would this be like doing Select * From T1 or is SSIS clever enough to know I only need 2 columns.
I'm thinking I should go with the Query Select ID, Update From T1.
On the connection should I enter a query Select ID, Update From T1 or Use Table in the drop down?
It is best to specify which columns you want.
Using table in Drop down, would this be like doing Select * From T1
Yes, it is a SELECT *.
or is SSIS clever enough to know I only need 2 columns?
Nope.
Keep in mind that Lookups are good for pulling data from Dimension Tables where the row count and record set is small. If you are dealing with large amounts of unique data, then it will be better to perform a MERGE JOIN, instead. The performance difference can be substantial. For example, when using a Lookup on 20K rows of data, you could experience run times in the tens of minutes. A MERGE JOIN, however, would run within seconds.
Lookups have the drawback of behaving like correlated sub-queries in that they fire off a query to the server for every row passing through it. You can have the Lookup cache the data, which means SSIS will store the results in memory and then check the memory before going to the server for all subsequent rows passing through the Lookup. As a result, this is only effective if there are a large number of matching records for a small cache set. In other words, Lookups are not optimal when there is large amount of Distinct ID's to lookup. To that point, caching data is almost pointless.
This is where you would switch over to using a MERGE JOIN. Note: you will need to perform a SORT on both of the data flows before the MERGE JOIN because the MERGE JOIN component requires the incoming rows to be sorted.
When handled incorrectly, a single poorly placed Lookup can bring an entire package to its knees - lookups can be huge performance bottlenecks. Though, handled correctly, a Lookup can simplify the design of the dataflow and speed development by removing the extra development required to MERGE JOIN data flows.
The bottom line to all of this is that you want the Lookup performing the fewest number of queries against the server.
If you need only two columns from the lookup table then it is better to use a select query then selecting table from drop down list but the columns specified must contains the primary key (ID). Because reading all columns will consume more resources. Even if it may not meaningful effect in small tables.
You can refer to the following answer on database administrators community for more information:
SSIS OLE DB Source Editor Data Access Mode: “SQL command” vs “Table or view”
Note that what #JWeezy mentioned about lookup from large table is right. Lookups is not designed for large table, i will use SQL JOINs instead.
I am new to SSIS and I hope someone can point me in the right direction!
I need to move data from one database to another. I have written a query that takes data from a number of tables (SOURCE). I then use a conditional split (Condition: Id = id) to a number of tables in the destination database. Here is my problem, I need another table populating which takes the ‘id’ value from the three tables and uses them in a fourth table as attributes, along with additional data from SOURCE.
I think I need to pass the id values to parameters but there does not seem a way to do this when inserting to ADO NET Destination.
Fourth table will have inserted id values(auto incremented) from table1, table2 and table3.
Am I going about this correctly or is there a better way?
Thanks in advance!
I know of no way to get the IDENTITY values of rows inserted in a Dataflow destination for use in the same Dataflow.
Probably the way to do what you want to do is to make a fourth branch in your dataflow inserting the columns that you have into the fourth table, and leaving the foreign keys (the ids from the other 3 tables) blank.
Then after the Dataflow, use an ExecuteSQL task to call a stored procedure that populates the missing columns in the fourth table by looking up their ids in the other three tables.
If your fourth table doesn't have the values you need to lookup the ids in the other three tables, then you can have the dataflow go to a staging table that does have those values, and populate the fourth table from the staging table while looking up the ids from the corresponding values.
I have a database with 51 tables all with the same schema (one table per state). Each table has a couple million rows and about 50 columns.
I've normalized the columns into 6 other tables, and now I want to import all of the data from those 51 tables into the 6 new tables. The column names are all the same, and so I'm hoping I can automate the process of importing all the data.
I'm assuming what I'll need to do is:
Select the names of all the lists that have the raw schema
SELECT TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'raw'
Iterate over all the results
Grab all rows from that table, and SELECT INTO the appropriate cols into the appropriate tables
Delete row from raw table
Is there anything I'm missing? Also, is there any way to have this run on the SQL Server so I don't have to have my SQL Server Management Studio open the whole time?
Yes, obviously, you can automate it with t-sql. But I recommened you to use SSIS in this case. As you say, structure of all tables are the same than you can make some ETL process and then you just change table name in the source. Consecuently, you will have the folowwing advantages:
Solve issue with couple of clicks
Low risk of errors
You will able to use the number of data transformations
I have one hbase table named Table1,has row T1,T2,T3,...Tn , the new table named Table2. How can I copy all data in (T1,T3,T5 ...) from Table1 to Table2 ? One by one to get the row, then put to the new table is so slow.
CopyTable is one utility which runs mapreduce to copy 2 tables and its much faster. However, it does not support selective copy (odd rows in your case. It only support timerange for portions of the hbase table. So, one option could be to run CopyTable first to copy all the data and then delete one by one. Another option could be to use Hive ,if you don't want to deal with Hbase tables and are more comfortable with SQL.
Example of CopyTable -
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=sample_new sample_old Here sample_old is table to be copied and sample_new is new table.