I need to improve the performance of data loading. The current algorythm makes a full select from a table:
select Field1, Field2,...,FieldN from Table1 order by FieldM
The new data is read from a text file (say, textfile line per datatable row).
The table has a primary key, containing two fields. For each line of a textfile it locates the necessary row by these two fields (i.e. the primary key).
query.Locate('Field1;Field2',VarArrayOf([Value1,Value2]),[]);
If Locate returns True, it edits the row, otherwise it appends a new one.
So, as far as the table consists of about 200000 rows, each Locate operation takes certain amount of time...so it manages to update about 5-6 rows per second.
What things should I consider to improve it?
Probably replace locating through this great select with separate queries?
DON'T use Locate(). If you use locate() then Delphi searches row on the client side just scanning row set from your query it takes a LOT of time.
If you have access to MSSQL to create stored procedures then create following procedure and just run it for each line from your TEXT file without any conditions (Use TAdoStoredProc.ExecProc in Delphi). So in this case your don't need first select and Locate procedure. It updates record if Filed1 and Field2 are found and insert if don't.
CREATE PROCEDURE dbo.update_table1
#Field1 int, --key1
#Field2 int, --key2
#Field3 int, -- data fileds
#Field4 int
AS
SET NOCOUNT ON
update table1 set Field3=#Field3,Field4=#Field4
where Field1=#Field1 and Field2=#Field2;
IF(##Rowcount=0)
BEGIN
insert into table1(Field1,Field2,Field3,Field4)
values (#Field1,#Field2,#Field3,#Field4);
END
GO
Here is Delphi code to invoke this stored procedure with ADO:
......
var
ADOStoredP: TADOStoredProc;
......
begin
........
ADOStoredP:=TADOStoredProc.Create(nil);
try
ADOStoredP.Connection:=DataMod.SQL_ADOConnection; //Your ADO Connection instance here
ADOStoredP.ProcedureName:='Update_table1';
ADOStoredP.Parameters.CreateParameter('#Field1', ftInteger, pdInput, 0, 0);
ADOStoredP.Parameters.CreateParameter('#Field2', ftInteger, pdInput, 0, 0);
ADOStoredP.Parameters.CreateParameter('#Field3', ftInteger, pdInput, 0, 0);
ADOStoredP.Parameters.CreateParameter('#Field4', ftInteger, pdInput, 0, 0);
While () -- Your text file loop here
begin
ADOStoredP.Parameters.ParamByName('#Field1').Value:=Field1 value from text file here;
ADOStoredP.Parameters.ParamByName('#Field2').Value:=Field2 value from text file here;
ADOStoredP.Parameters.ParamByName('#Field3').Value:=Field3 value from text file here;
ADOStoredP.Parameters.ParamByName('#Field4').Value:=Field4 value from text file here;
ADOStoredP.ExecProc;
end
finally
if Assigned(ADOStoredP) then
begin
ADOStoredP.Free;
end;
end;
........
end;
If it is possible, then you should send the text file to the server running SQL Server. Then use OPENROWSET(BULK) to open the text file (see "E. Using the OPENROWSET BULK provider with a format file to retrieve rows from a text file").
If you cannot send the text file to the server, then create a temporary or persistent DB table and use INSERT to insert all text file rows into the table.
If you are using SQL Server 2008, then you should use MERGE operator. If more old SQL Server version, then you can use two SQL commands: UPDATE and INSERT. And as a data source use (1) OPENROWSET or (2) DB table.
Related
I have a data set and want to import it into my database with the condition:
In case there is a record that cannot be imported, it can be extracted into a log
Although existing records can not be imported but still allow import of records that can be imported (other records) and continue to process
Currently I use the BCP utility to import data into the table from the csv file with:
bcp table_name IN C:\Users\09204086121\Desktop\data.csv -T -c -o C:\Users\09204086121\Desktop\logOut.log -e C:\Users\09204086121\Desktop\errOut.log
It just satisfies my condition 1 above.
I need that when the record has error (duplicate primary key,...), write to log (1) and continue to insert into the table the other normal records (2).
I came up with the idea that combining trigger with bcp, after creating a trigger and adding the parameter -h "FIRE_TRIGGERS" to the bcp statement, the insert will ignore records that have the same key but it won't write to the log.
This is my trigger.
ALTER TRIGGER [PKGORDERCOMMON].[T_ImportData] ON [PKGORDERCOMMON].[IF_R_BUNRUI1]
INSTEAD OF INSERT
AS
BEGIN
--Insert non duplicate records
INSERT INTO [IF_R_BUNRUI1]
(
SYSTEM_KB,
BUNRUI1_CD,
BUNRUI1_KANJI_NA,
BUNRUI1_KANA_NA,
CREATE_TS
)
SELECT SYSTEM_KB,
BUNRUI1_CD,
BUNRUI1_KANJI_NA,
BUNRUI1_KANA_NA,
CREATE_TS
FROM inserted i
WHERE NOT EXISTS
(
SELECT *
FROM [IF_R_BUNRUI1] c
WHERE c.BUNRUI1_CD = i.BUNRUI1_CD
AND c.SYSTEM_KB = i.SYSTEM_KB
);
END;
Is there anyone who can help me.
BCP is not meant for what you are asking it to do (separate good and bad records). For instance, bcp -e option has a limit to how many records it will show. Im not sure if this limit is tied to the "max errors" option, but regardless there is a limit.
Your best option is to load all the records and address bad data in t-sql.
Load all records in such a way to ignore conversion errors. Either:
load entire line from file into a single, large varchar column. Parse out columns and qc data as needed.
or
load all columns from source file into generic varchar columns with large enough size to accomodate your source data.
Either way, when done, use t-sql to inspect your data and split among good/bad records.
I have a database in SQL Server. Basically, table consists of a number of XML documents that represent same table data at given time (like backup history). What is the best method to cut off all the old (3 months) backups, remove from DB and save them archived?
There is no export out of the box in SQL Server.
Assuming
Your table can be pretty big, since it looks like you and image of the table every minute.
If you want to do it all from inside SQL Server.
Then I'll suggest doing cleanup in chunks.
The usual process in SQL to delete by chunks is using DELETE in combination with OUTPUT statement.
The easiest way to archive and remove then would be having the OUTPUT to a table in another database, for that sole purpose.
so your steps would be:
Create a new database (ArchiveDatabase)
Create an Archive table in ArchiveDatabase (ArchiveTable) with same structure of the table that you want to remove.
In a while loop perform the DELETE/OUTPUT
Backup the ArchiveDatabase
TRUNCATE ArchiveTable table in ArchiveDatabase
The DELETE/OUTPUT loops will look like something like
declare #RowsToDelete int = 1000
declare #DeletedRowsCNT int = 1000
while #DeletedRowsCNT = #RowsToDelete
begin
delete top (#RowsToDelete)
from MyDataTable
output deleted.* into ArchiveDatabase.dbo.ArchiveTable
where dt < dateadd(month, 3, getdate())
set #DeletedRowsCNT = ##ROWCOUNT
end
PROBLEM SUMMARY
I have to write I/U/D-statement-generating-triggers for a bucardo/symmetricDS-inspired homemade bidirectional replication system between Sybase ADS and Postgresql 11 groups of nodes, using BEFORE triggers on any Postgresql and Sybase DB that creates Insert/Update/Delete commands based on the command entered in a replicating source table: e.g. an INSERT INTO PERSON (first_name,last_name,gender,age,ethnicity) Values ('John','Doe','M',42,'C') and manipulate them into a corresponding Insert statement, and UPDATE by getting OLD and NEW values to dynamically make an UPDATE statement, along with getting OLD values to make a DELETE command, all to run per command on a destination at some interval.
I know this is difficult and no one does this but it is for a job and I have no other options and can't object to offer a different solution. I have no other teammates or human resources to help outside of SO and something like Codementors, which was not so helpful. My idea/strategy is to copy parts of bucardo/SymmetricDS when inserting OLD and NEW values for generating a statement/command to run on the destination. Right now, I am snapshotting the whole table to a CSV as opposed to doing by individual command, but by command and looping through table that generates and saves commands will make the job much easier.
One big issue is that they come from Sybase ADS and have a mixed Key/Index structure (many tables have NO PK) and are mirroring that in Postgresql, so I am trying to write PK-less statements, or all-column commands to get around the no-pk tables. They also will only replicate certain columns for certain tables, so I have a column in a table for them to insert the column names delimited by ';' and then split it out into an array and link the column names to the values for each statement to generate a full command for I/U/D, Hopefully. I am open to other strategies but this is a big solo project and I have gone at it many ways with much difficulty.
I mostly come from DBA background and have some programming experience with the fundamentals, so I am mostly pseudocoding each major sequence,googling for syntax by part, and adjusting as I go or encounter a language incapability. I am thankful for any help given, as I am getting a bit desperate and discouraged.
WHAT I HAVE TRIED
I have to do this for Sybase ADS and Postgresql but this question is intially over ADS since it's more challenging and older.
To have one "Log" table which tracks row changes for each of the replicating tables and records and ultimately dynamically generates a command is the goal for both platforms. I am trying to make trigger statements like:
CREATE TRIGGER PERSON_INSERT
ON PERSON
BEFORE
INSERT
BEGIN
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'INSERT','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
END;
CREATE TRIGGER PERSON_UPDATE
ON PERSON
BEFORE
UPDATE
BEGIN
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'U','UPDATE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
UPDATE Backlog SET OldValues=select ''first_name';'last_name';'gender';'age';'ethnicity'' from __old where SourceTableID=select ID from __old;
END;
CREATE TRIGGER PERSON_DELETE
ON PERSON
BEFORE
DELETE
BEGIN
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, OldValues) select ID, 'D','DELETE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __old;
END;
but I would like the "''first_name';'last_name';'gender';'age';'ethnicity''" to come from another table as a value to make it dynamic since multiple tables will write their value and statement info to the single log table. Then, it can be made into a variable and then probably split to link to the corresponding values so the IUD statements can be made which will be executed on the destination one at a time.
ATTEMPTED INCOMPLETE SAMPLE TRIGGER CODE
CREATE TRIGGER PERSON_INSERT
ON PERSON
BEFORE
INSERT
BEGIN
--Declare #Columns string
--#Columns=select Columns from metatable where tablename='PERSON'
--String Split(#Columns,';') into array to correspond to new and old VALUES
--#NewValues=#['#Columns='+NEW.#Columns+'']
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'INSERT','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
END;
CREATE TRIGGER PERSON_UPDATE
ON PERSON
BEFORE
UPDATE
BEGIN
--Declare #Columns string
--#Columns=select Columns from metatable where tablename='PERSON'
--String Split(#Columns,';') into array to correspond to new and old VALUES
--#NewValues=#['#Columns='+NEW.#Columns+'']
--#OldValues=#['#Columns='+OLD.#Columns+'']
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, NewValues) select ID, 'U','UPDATE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __new;
UPDATE Backlog SET OldValues=select ''first_name';'last_name';'gender';'age';'ethnicity'' from __old where SourceTableID=select ID from __old;
END;
CREATE TRIGGER PERSON_DELETE
ON PERSON
BEFORE
DELETE
BEGIN
--Declare #Columns string
--#Columns=select Columns from metatable where tablename='PERSON'
--String Split(#Columns,',') into array to correspond to new and old VALUES
--#OldValues=#['#Columns='+OLD.#Columns+'']
INSERT INTO Backlog (SourceTableID, TriggerType, Status, CreateTimeDate, OldValues) select ID, 'D','DELETE','READY', NOW(),''first_name';'last_name';'gender';'age';'ethnicity'' from __old;
END;
CONCLUSION
For each row inserted,updated, or deleted; in a COMMAND column in the log table, I am trying to generate a corresponding 'INSERT INTO PERSON ('+#Columns+') VALUES ('+#NewValues+')' type statement, or an UPDATE or DELETE. Then a Foreach service will run each command value ordered by create time, as the main replication service.
To be clear, I am trying to make my sample code trigger write all old values and new values to a column in a dynamic way without hardcoding the columns in each trigger since it will be used for multiple tables, and writing the values into a single column delimited by a comma or semicolon.
An even bigger wish or goal behind this is to find a way to save/script each IUD command and then be able to run them on subscriber server.DBs of postgresql and Sybase platform, therefore making my own replication from a log
It is a complex but solvable problem that would take time and careful planning to write. I think what you are looking for is the "Execute Immediate" command in ADS SQL syntax. With this command you can create a dynamic statement to then be executed once construction of the SQL statement is terminated. Save each desired column value to a temp table by carefully constructing the statement as a string and then execute it with Execute Immediate. For example:
DECLARE TableColumns Cursor ;
DECLARE FldName Char(100) ;
...
OPEN TableColumns AS SELECT *
FROM system.columns
WHERE parent = #cTableName
AND field_type < 21 //ADS_ROWVERSION
AND field_type <> 6 //ADS_BINARY
AND field_type <> 7; //ADS_IMAGE
While Fetch TableColumns DO
FldName = Trim( TableColumns.Name) ;
StrSql = 'SELECT New.[' + Trim( FldName ) + '] newVal' +
'INTO #myTmpTable FROM ___New n' ;
After constructing the statement as a string it can then be executed like this:
EXECUTE IMMEDIATE STRSQL ;
You can pickup old and new values from __old and __new temp tables that are always available to triggers. Insert values into temp table myTmpTable and then use it to update the target. Remember to drop myTmpTable at the end.
Furthermore, I would think you can create a function on the DD that can actually be called from each trigger on the tables you want to keep track of instead of writing a long trigger for each table and cTableName can be a parameter sent to the function. That would make maintenance a little easier.
I have a very flat, simple log file (6 rows of which one row is blank) that I want to insert into a simple 5 column SQL Server table.
Please excuse my SQL ignorance as my knowledge around this topic is not educated.
Below is the .log file content :-
-----------Log File content start----------
07/30/2016 00:02:03 : BATCH CLOSE SUMMARY
MerchantID - 000022673665
TerminalID - 013
BatchItemCount - 650
NetBatchTotal - 5095.00
----------Log file content end-------------
Below is the simple SQL Server table layout:
CREATE TABLE dbo.CCClose
(
CloseTime NVARCHAR(50) NOT NULL,
MercID NVARCHAR(50) NOT NULL,
TermID NVARCHAR(50) NOT NULL,
BatchCount NVARCHAR(30) NOT NULL,
NetBatcTotal NVARCHAR(50) NOT NULL
);
I'm hoping that somehow have each row looked at by SQL for example:
if .log file like 'Batch close Summary' then insert into CloseTime else
if .log file like 'MerchantID' then insert into MercID else
if .log file like 'BatchItemCount' then insert into BatchCount else
if .log file like 'NetBatchTotal' then insert into NetBatchTotal
Off course it would be great if the proper formatting for each column was in place but at this time I just looking at getting the .log file data populated from a directory of these logs.
I plan to use Crystal Reports to build on the SQL Server tables.
This is not going to be a simple process. You can probably do it with bulk insert. The idea is to read it into a staging table, using:
a record terminator of something like "----------Log file content end-------------" + newline
a field separator of a newline
a staging table with several columns of varchars
Then process the staging table to extract the values (and types) that you want. There are probably other options, if you set up a format file, but that adds another level of complexity.
I would read the table into a staging table with one line per row in the table. Then, I would:
use window functions to assign a record number to rows, based on the "content start" lines
aggregate based on the record number
extract the values using aggregations, string functions, and conversions
I'm still fairly new to T-SQL and SQL 2005. I need to import a column of integers from a table in database1 to a identical table (only missing the column I need) in database2. Both are sql 2005 databases. I've tried the built in import command in Server Management Studio but it's forcing me to copy the entire table. This causes errors due to constraints and 'read-only' columns (whatever 'read-only' means in sql2005). I just want to grab a single column and copy it to a table.
There must be a simple way of doing this. Something like:
INSERT INTO database1.myTable columnINeed
SELECT columnINeed from database2.myTable
Inserting won't do it since it'll attempt to insert new rows at the end of the table. What it sounds like your trying to do is add a column to the end of existing rows.
I'm not sure if the syntax is exactly right but, if I understood you then this will do what you're after.
Create the column allowing nulls in database2.
Perform an update:
UPDATE database2.dbo.tablename
SET database2.dbo.tablename.colname = database1.dbo.tablename.colname
FROM database2.dbo.tablename INNER JOIN database1.dbo.tablename ON database2.dbo.tablename.keycol = database1.dbo.tablename.keycol
There is a simple way very much like this as long as both databases are on the same server. The fully qualified name is dbname.owner.table - normally the owner is dbo and there is a shortcut for ".dbo." which is "..", so...
INSERT INTO Datbase1..MyTable
(ColumnList)
SELECT FieldsIWant
FROM Database2..MyTable
first create the column if it doesn't exist:
ALTER TABLE database2..targetTable
ADD targetColumn int null -- or whatever column definition is needed
and since you're using Sql Server 2005 you can use the new MERGE statement.
The MERGE statement has the advantage of being able to treat all situations in one statement like missing rows from source (can do inserts), missing rows from destination (can do deletes), matching rows (can do updates), and everything is done atomically in a single transaction. Example:
MERGE database2..targetTable AS t
USING (SELECT sourceColumn FROM sourceDatabase1..sourceTable) as s
ON t.PrimaryKeyCol = s.PrimaryKeyCol -- or whatever the match should be bassed on
WHEN MATCHED THEN
UPDATE SET t.targetColumn = s.sourceColumn
WHEN NOT MATCHED THEN
INSERT (targetColumn, [other columns ...]) VALUES (s.sourceColumn, [other values ..])
The MERGE statement was introduced to solve cases like yours and I recommend using it, it's much more powerful than solutions using multiple sql batch statements that basically accomplish the same thing MERGE does in one statement without the added complexity.
You could also use a cursor. Assuming you want to iterate all the records in the first table and populate the second table with new rows then something like this would be the way to go:
DECLARE #FirstField nvarchar(100)
DECLARE ACursor CURSOR FOR
SELECT FirstField FROM FirstTable
OPEN ACursor
FETCH NEXT FROM ACursor INTO #FirstField
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO SecondTable ( SecondField ) VALUES ( #FirstField )
FETCH NEXT FROM ACursor INTO #FirstField
END
CLOSE ACursor
DEALLOCATE ACursor
MERGE is only available in SQL 2008 NOT SQL 2005
insert into Test2.dbo.MyTable (MyValue) select MyValue from Test1.dbo.MyTable
This is assuming a great deal. First that the destination database is empty. Second that the other columns are nullable. You may need an update instead. To do that you will need to have a common key.