I have a table with few columns Name,stage1, stage2, stage3, stage4, stage5. Now I want to insert values into these columns, but every time when a new row is being inserted into the table, the number of stages to be entered under each row is undefined For Example:-
Suppose in row 1 values for stage1 and stage2 are defined, and for row 2 values for stage1, stage2, stage3 and stage4 are defined and so on
Problem
I am not able to insert a new row in the table because of the uneven distribution of values for each name.
You basically want to use a Relational database for unstructured data. I assume this because you tagged SQL Server.
This is what document dbs or noSQL dbs were designed for.
However, you can emulate this if you want by storing the data in a single column. Within that column you can store either JSON or XML. JSON is going to be hard to query, but SQL Server does support an XML field type that you can query with XPath.
The other option is to rotate your data 90 degrees. Instead of each stage being a column stage1, stage2, stage3.... create a row for each stage. Each row would have a stageNumber field or some such. You could later pivot this data to display it with stages as columns in Excel or some other pivot table or whatever.
Related
Is there any way of converting the last a.ROWID > b.ROWID values in below code in to snowflake? the below is the oracle code. Need to take the ROW ID to snowflake. But snowflake does not maintain ROW ID. Is there any way to achieve the below and convert the row id issue?
DELETE FROM user_tag.user_dim_default a
WHERE EXISTS (SELECT 1
FROM rev_tag.emp_site_weekly b
WHERE a.number = b.ID
AND a.accountno = b.account_no
AND a.ROWID > b.ROWID)
So this Oracle code seem very broken, because ROWID is a table specific pseudo column, thus comparing value between table seem very broken. Unless the is some aligned magic happening, like when user_tag.user_dim_default is inserted into rev_tag.emp_site_weekly is also written. But even then I can imagine data flows where this will not get what you want.
So as with most things Snowflake, "there is no free lunch", so the data life cycle that is relying on ROW_ID needs to be implemented.
Which implies if you are wanting to use two sequences, then you should do explicitly on each table. And if you are wanting them to be related to each other, it sounds like a multi table insert or Merge should be used so you can access the first tables SEQ and relate it in the second.
ROWID is an internal hidden column used by the database for specific DB operations. Depending on the vendor, you may have additional columns such as transaction ID or a logical delete flag. Be very carful to understand the behavior of these columns and how they work. They may not be in order, they may not be sequential, they may change in value as a DB Maint job runs while your code is running, or someone else runs an update on a table. Some of these internal columns may have the same value for more than one row for example.
When joining tables, the RowID on one table has no relation to the RowID on another table. When writing Dedup logic or delete before insert type logic, you should use the primary key, and then additionally an audit column that has the date of insert or date of last update in combo with that. Check the data model or ERD digram for the PK/FK relationships between the tables and what audit columns are available.
I am new to SSIS and have been tasked with taking records from a source table and inserting new or updating existing tables in the target. I estimate in the region of 10-15 records per day at maximum.
I have investigated the Lookup Transformation object and this looks like it will do the job.
Both source and target table columns are identical. However, there is no unique key/value in target or source to perform the lookup on. The existing fields I have to work with and do the lookup on are Date or Load_Date...
How can I achieve only adding new records or updating existing records in the target without a key/ID field? Do I really need to add another ID column to compare source and target with? And if so, can someone tell me how to do this? The target table must not have a key/ID field, so if using key/ID field, it would need to be dropped after any inserts/updates are done.
Could this be achieved this using Load_Date? Currently all Load_Date values are NULL in target table, so would it be as simple as matching on Load_Date and if Load_Date is already populated, then do not load? If Load_Date is NULL, then load the record?
Thanks.
I have two tables in SQL Server and both of those tables have the same headers, which means its the same columns, but since I added them from Excel, it means that I was not able to import them as one table, since it is more then 1 million rows.
So now I have one table with a bit less than a million rows and one with like 400000 rows and actually it should be one table, but Excel only allows around one million.
I have them both imported into SQL Server and actually I really want them to be both in one table like union.
The question is how to do it.
I just want to put one of them below the other since it is exactly the same column header.
What you should have done was import the first sheet, and create the table at the same time, and then import the second sheet into the existing table, in a separate import process. Or, if you were using SSIS, you could have used a Union Data Transformation to "combine" the 2 datasets into one, and then insert all the data into a single table.
You can, however, easily get the data into one table. Assuming you want to retain Table1 and that Table1 and Table2 do indeed have the same definitions (and don't have IDENTITY columns) you can just do the following:
INSERT INTO dbo.Table1
SELECT *
FROM dbo.Table2;
DROP TABLE dbo.Table2;
Now all your data is in one table, Table1.
I have two tables.
1) Staging table with multiple columns and Date being one of them.
Date
9/1/2018
2) A Date Dimension Table which has only one column called as Date
Date
1/1/2018
2/1/2018
3/1/2018
4/1/2018
I am writing a logic in SSIS, where it checks the staging table with the dimension table and inserts the missing dates in the dimension table.
To do this, I use the following logic.
The Lookup component has the correct 1 row input from staging table and returns a value of NULL. So, the insertion fails due to constrains.
I do have re-direct non matched rows to no match output enabled inside the Lookup on screen 1.
Kindly help me with this.
The solution is to change the Lookup operation to add as a New Column :-
That's not the problem. The problem is when a date gets past the lookup and is a duplicate.
Run it through an Aggregate [group by] on the date column before inserting into the dimension.
Make sure you are using the correct date. There will be no lookup matches for the records you want to insert (therefore that column is null by default). You shouldn't even add any columns from lookup for this use.
I am new to SSIS and I hope someone can point me in the right direction!
I need to move data from one database to another. I have written a query that takes data from a number of tables (SOURCE). I then use a conditional split (Condition: Id = id) to a number of tables in the destination database. Here is my problem, I need another table populating which takes the ‘id’ value from the three tables and uses them in a fourth table as attributes, along with additional data from SOURCE.
I think I need to pass the id values to parameters but there does not seem a way to do this when inserting to ADO NET Destination.
Fourth table will have inserted id values(auto incremented) from table1, table2 and table3.
Am I going about this correctly or is there a better way?
Thanks in advance!
I know of no way to get the IDENTITY values of rows inserted in a Dataflow destination for use in the same Dataflow.
Probably the way to do what you want to do is to make a fourth branch in your dataflow inserting the columns that you have into the fourth table, and leaving the foreign keys (the ids from the other 3 tables) blank.
Then after the Dataflow, use an ExecuteSQL task to call a stored procedure that populates the missing columns in the fourth table by looking up their ids in the other three tables.
If your fourth table doesn't have the values you need to lookup the ids in the other three tables, then you can have the dataflow go to a staging table that does have those values, and populate the fourth table from the staging table while looking up the ids from the corresponding values.