what to do when we may need to save slave data first - database

In a one-to-many relationship what's the best way to handle data so it's flexible enough for the user to save the slave data before he saves the master table data.
reserving the row ID of the master so the I can save de slave data with the reserved master id
save slave data in a temporary table so that when we save the master data we can "import" the data in the temporary table
other??
Example in a ticket/upload multiple files form where the users has the possibility to upload the files before sendind the ticket information:
Master table
PK
ticket description
Slave table
PK
Master_FK
File

Are your id's autogenerated?
You have several choices all with possible problems.
First don't define a FK relationship. Now how do you account for records in a partial state and those who never get married up to the real record? And how do you intend to marry up the records when the main record is inserted?
Insert a record into the master table first that where everything is blank except the id. This makes enforcing all required fields default to the user application, which I'm not wild about from a data integrity standpoint.
Third and most complex but probably safest - use 3 tables. Create the master record in a table that only contains the master recordid and return that to your application on opening the form to create a new record. Create a pk/fk relationship to both the orginal master table and the foreign key table. Remove the autogeneration of the id from the orginal master table and insert the id from the new master table instead when you insert the record. Insert the new master table id when you insert records to the orginal FK table as well. At least this way, you can continue to have all the required fields marked as required in the database but the relationship is between the new table and the other table not the original table and the other table. This won't affect querying (as long as you have proper indexing), but will make things more complicated if you delete records as you could leave some hanging out if you aren't careful. Also you would have to consider if there are other processes (such as data imports from another source) which might be inserting records to the main table which would have to be adjusted as the id would no longer be autogenerated..

In Oracle (maybe others?) you can defer a constraint's validation until COMMIT time.
So you could insert the child rows first. (You'd need the parent key, obviously.)

Why can't you create the master row and flag it as incomplete?

In case of upload you will have to create temporary storage for not committed upload. So that once upload started you save all new files in a separate table. Once user ready to submit ticket you save ticket and append files from temp table.
Also you can create fake record if it possible with some fixed id in master table. You then have to make sure that fake record does not appear in queries in other places.
Third, you can create stored procedure which would generate id for primary table and increment identity counter. If user aborts operation reserved id will not affect anything. It is just like if you create master record and then delete it. You can create temporary records in master table as well.

Related

SSIS flat file with joins

I have a flat file which has following columns
Device Name
Device Type
Device Location
Device Zone
Which I need to insert into SQL Server table called Devices.
Devices table has following structure
DeviceName
DeviceTypeId (foreign key from DeviceType table)
DeviceLocationId (foreign key from DeviceLocation table)
DeviceZoneId (foreign key from DeviceZone table)
DeviceType, DeviceLocation and DeviceZone tables are already prepopulated.
Now I need to write ETL which reads flat file and for each row get DeviceTypeId, DeviceLocationId and DeviceZoneId from corresponding tables and insert into Devices table.
I am sure this is not new but its being a while I worked on such SSIS packages and help would be appreciated.
Load the flat content into a staging table and write a stored procedure to handle the inserts and updates in T-SQL.
Having FK relationships between the destination tables, can probably make a lot of trouble with a single data flow and a multicast.
The problem is that you have no control over the order of the inserts so the child record could be inserted before the parent.
Also, for identity columns on the tables, you cannot retrieve the identity value from one stream and use it in another without using subsequent merge joins.
The simplest way to do that, is by using Lookup Transformation to get the ID for each value. You must be aware that duplicates may lead to a problem, you have to make sure that the value is not found multiple times in the foreign tables.
Also, make sure to redirect rows that have no match into a staging table to check them later.
You can refer to the following article for a step by step guide to Lookup Transformation:
An Overview of the LOOKUP TRANSFORMATION in SSIS

SSIS only extract Delta changes

After some advice. I'm using SSIS\SQL Server 2014. I have a nightly SSIS package that pulls in data from non-SQL Server db's into a single table (the SQL table is truncated beforehand each time) and I then extract from this table to create a daily csv file.
Going forward, I only want to extract to csv on a daily basis the records that have changed i.e. the Deltas.
What is the best approach? I was thinking of using CDC in SSIS, but as I'm truncating the SQL table before the initial load each time, will this be best method? Or will I need to have a master table in SQL with an initial load, then import into another table and just extract where there are different? For info, the table in SQL contains a Primary Key.
I just want to double check as CDC assumes the tables are all in SQL Server, whereas my data is coming from outside SQL Server first.
Thanks for any help.
The primary key on that table is your saving grace here. Obviously enough, the SQL Server database that you're pulling the disparate data into won't know from one table flush to the next which records have changed, but if you add two additional tables, and modify the existing table with an additional column, it should be able to figure it out by leveraging HASHBYTES.
For this example, I'll call the new table SentRows, but you can use a more meaningful name in practice. We'll call the new column in the old table HashValue.
Add the column HashValue to your table as a varbinary data type. NOT NULL as well.
Create your SentRows table with columns for all the columns in the main table's primary key, plus the HashValue column.
Create a RowsToSend table that's structurally identical to your main table, including the HashValue.
Modify your queries to create the HashValue by applying HASHBYTES to all of the non-key columns in the table. (This will be horribly tedious. Sorry about that.)
Send out your full data set.
Now move all of the key values and HashValues to the SentRows table. Truncate your main table.
On the next pull, compare the key values and HashValues from SentRows to the new data in the main table.
Primary key match + hash match = Unchanged row
Primary key match + hash mismatch = Updated row
Primary key in incoming data but missing from existing data set = New row
Primary key not in incoming data but in existing data set = Deleted row
Pull out any changes you need to send to the RowsToSend table.
Send the changes from RowsToSend.
Move the key values and HashValues to your SentRows table. Update hashes for changed key values, insert new rows, and decide how you're going to handle deletes, if you have to deal with deletes.
Truncate the SentRows table to get ready for tomorrow.
If you'd like (and you'll thank yourself later if you do) add a computed column to the SentRows table with default of GETDATE(), which will tell you when the row was added.
And away you go. Nothing but deltas from now on.
Edit 2019-10-31:
Step by step (or TL;DR):
1) Flush and Fill MainTable.
2) Compare keys and hashes on MainTable to keys and hashes on SentRows to identify new/changed rows.
3) Move new/changed rows to RowsToSend.
4) Send the rows that are in RowsToSend.
5) Move all the rows from RowsToSend to SentRows.
6) Truncate RowsToSend.

How to synchronize data between a file and a database using Spring Batch?

I have to synchronize data from a file (Excel) to a database (MySQL) using Spring Batch.
The file will be processed record by record. Adding and updating database records works fine but I wonder how to detect and delete entries from the database that were removed from the file?
I consider to implement this:
read the file record-by-record
create or update the record in the database and remember the primary key
remove all records with different primary keys (final step after all records have been processed)
Do you know how to collect and pass all processed primary keys to a final step?
Or do you recommend another implementation?
Thanks,
Patrick
Update: I'm not allowed to alter the database tables.
Use a column to mark updated/added records.
After main step create a new one where you delete record not marked.
If DB schema modification is not an option:
Step 1. Dump primary keys from DB to CSV (original.csv)
Step 2. Create/update DB and store primary keys of updated data to CSV (updated.CSV)
After step 2. Create a differential file: original minus updated (diff.CSV)
Step 3. Read diff.CSV and delete records by PK

Cascade Delete Use Case

I am pretty new to Business Analysis. I have to write requirements that show both (for now) cascade delete (for two tables) and the rest of the tables will delete explicitly.
I need some guidance for how to write the requirements for cascade deletion.
Delete child entities on parent
deletion.
Delete collection members if collection entity is deleted.
Actually it is hard to understand the task without context and also it smells like university/colledge homework (we had one very similar to this).
Use the ON DELETE CASCADE option to specify whether you want rows deleted in a child table when corresponding rows are deleted in the parent table. If you do not specify cascading deletes, the default behavior of the database server prevents you from deleting data in a table if other tables reference it.
If you specify this option, later when you delete a row in the parent table, the database server also deletes any rows associated with that row (foreign keys) in a child table. The principal advantage to the cascading-deletes feature is that it allows you to reduce the quantity of SQL statements you need to perform delete actions.
For example, the all_candy table contains the candy_num column as a primary key. The hard_candy table refers to the candy_num column as a foreign key. The following CREATE TABLE statement creates the hard_candy table with the cascading-delete option on the foreign key:
CREATE TABLE all_candy
(candy_num SERIAL PRIMARY KEY,
candy_maker CHAR(25));
CREATE TABLE hard_candy
(candy_num INT,
candy_flavor CHAR(20),
FOREIGN KEY (candy_num) REFERENCES all_candy
ON DELETE CASCADE)
Because ON DELETE CASCADE is specified for the dependent table, when a row of the all_candy table is deleted, the corresponding rows of the hard_candy table are also deleted. For information about syntax restrictions and locking implications when you delete rows from tables that have cascading deletes, see Considerations When Tables Have Cascading Deletes.
Source: http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqls.doc/sqls292.htm
You don't write use cases for functionality - that is the reason why it is hard to properly answer your question - we don't know the actor who interacts with the system and of course we know nothing about the system, so we cannot tell you how to write description of their interactions.
You should write your use cases first and from them derive the functionality.

Table Structure for Multiple Histrory

i want to create table to keep histroy of the ammendments & history of the object.
for That i have created two column Primary Key ( Id & update date)
I have 3 more date columns to maintain history & Status Column for Actual object history.
Status , StatusFrom , Statusto, UpdateDate & NextUpdateDate
UpdateDate & NextUpdateDate is for maintain histroy of ammendment.
Is there any better way to maintain actual history of the Record & Ammend histroy of the record?
You're creating what is known as an "audit table". There are many ways to do this; a couple of them are:
Create a table with appropriate key fields and before/after fields for all columns that you're interested in on the source table, along with a timestamp so you know when the change was made.
Create a table with a appropriate key fields, a modification timestamp, a field name, and before/after columns.
Method (1) has the problem that you end up with a lot of fields in the audit table - basically two for every field in your source table. In addition, if only one or two fields on the source table change then most of the fields on the audit table will be NULL which may waste space (depending on your database). It also requires a lot of special-purpose code to figure out which field changed when you go back to process the audit table.
Method (2) has the problem that you end up creating a separate row in the table for each field that is changed on your source table, which can result in a lot of rows in the audit table (one row for each field changed). Because each field change results in a new row being written to the audit table you also have the same key values in multiple rows which can use up a bunch of space just for the keys.
Regardless of how the audit table is structured it's usual to use a trigger to maintain them.
I hope this helps.

Resources