string or binary data would be truncated when using batched inserts - sql-server

I'm using SQL Server 2008 R2 and am trying to insert some records from one table to another using the form:
INSERT INTO table2
SELECT ... From table1
And am receiving the dreaded error.
"string or binary data would be truncated"
table1 is a view to another database from which we are taking records and doing some processing of the data before inserting them into table2 which is a table in the database.
I assumed that this message meant that one of the fields I'm populating is too small to hold the data I'm inserting, but it doesn't appear to be the case.
The select does some parsing of the fields from table 1 and inserts those values into table2. However the inserted data is smaller than the original fields (being only part of the original field) and I have checked to confirm that they all fit into the fields they are being inserted into.
I'm struggling to find the real cause of this problem as the fields in table2 are all of sufficient size and type to hold the data that will be added to them.
Additionally, when I limited the number of records to be added (doing it in batches to try and find the offending record) it would run without error for 2-3 batches of 10k records before getting this error. So i thought it was in the next 10k records where there was some anomalous data.
However, when I reduced the batch size to 6k records, instead of getting the error at the second batch (records 6k-12k) which is where the problem data should have been, I was able to run the insert without error another 10 times (60k records in total) before the error happened again.
Then I reduced the batch size to 3k records and was able to run it another 4 times (12k records) before seeing the error.
I kept reducing the size of the batch and each time I could insert more records using multiple smaller batches than I could with a single bigger batch.
This would suggest that it's not actually the size of the data being inserted that is causing the problem, but some other aspect.
It has now got to the point where I can no longer insert a single record. Inspecting the record to be inserted confirms that it complies with the sizes of all the fields and should not be causing this error.
Given that initially I couldn't insert the next 10k records because of this error, but have managed to insert over 100k records (in ever decreasing batch sizes) I was wondering if anyone could suggest where the real problem lies?
The question is:
Is there some system table, log or some other limit that may be blocking these inserts?
Any suggestions would be greatly appreciated.

Related

Reading a table from Oracle is 10 times slower than reading from Sql-Server

I have a table with just 5 columns and 8500 rows. The following simple query on an Oracle database is taking around 8 seconds whereas if I import the same table into a Sql-Server database then, it takes less than 1 seconds.
SELECT CustomerList.* FROM CustomerList ORDER BY TypeID, OrderNr, Title
QUESTION: I am completely new to databases and have started acquiring knowledge about it, but 8 seconds for 8500 records is a way too long time. Is there anything that I can try to resolve the issue?
UPDATE: I exported the table from the Oracle database as a text file and then imported the test file into another fresh Oracle database to create the same table. When I executed the above query onto this newly created database, the execution time of the query is again the same as before (i.e. around 8 seconds).
Regarding High Water Mark (HWM). IN oracle, space for a table's rows is allocated in big chunks called an 'extent'. When an extent is filled up with rows of data a new extent is allocated. The HWM is the pointer to the highest allocated address.
If rows are deleted, the space occupied remains allocated to that table and available for new rows without have to acquire more space for them. And the HWM remains. Even if you delete ALL of the rows (simple DELETE FROM MYTABLE), all of the space remains allocated to the table and available for new rows without having to acquire more space. And the HWM remains. So say you have a table with 1 billion rows. Then you delete all but one of those rows. You still have the space for 1 billion, and the HwM set accordingly. Now, if you select from that table without a WHERE condition that would use an index (thus forcing a Full Table Scan, or FTS) oracle still has to scan that billion-row space to find all of the rows, which could be scattered across the whole space. But when you insert those rows into another database (or even another table in the same database) you only need enough space for those rows. So selecting against the new table is accordingly faster.
That is ONE possibility of your issue.

What does row_produced count represent in snowflake query_history view when query is MERGE from file

I am executing MERGE query to perform CDC operation. I have a target table which is holding around 50 million records and the incoming file which is source for MERGE contains 230 records. There is simple join in ID of table and id column from file data. After execution , the History view shows records inserted 200 and records updated 30. However its showing rows_produced as 5K. I need to understand what does rows_produced in this case. Does it show the rows return as a part of join ? if its yes, then it should be matching the row count of file.
I believe that rows_produced is the total number of records that were created when the underlying micropartitions were written out.
For example, if you updated 1 record, you are actually recreating the entire micropartition of data that this 1 record exists in (micropartitions are immutable, so therefore never updated). If that 1 record exists in a micropartition that contains 100 records, then you'd get an output that has 1 record updated, but 100 rows_produced.
This information is "interesting" but not helpful when trying to make sure the right outcome of your MERGE statement. Using the insert, update, and delete output for the MERGE is the accurate way to look at that.

Find out the recently selected rows from a Oracle table and can I update a LAST_ACCESSED column whenever the table is accessed

I have a database table which have more than 1 million records uniquely identified by a GUID column. I want to find out which of these record or rows was selected or retrieved in the last 5 years. The select query can happen from multiple places. Sometimes the row will be returned as a single row. Sometimes it will be part of a set of rows. there is select query that does the fetching from a jdbc connection from a java code. Also a SQL procedure also fetches data from the table.
My intention is to clean up a database table.I want to delete all rows which was never used( retrieved via select query) in last 5 years.
Does oracle DB have any inbuild meta data which can give me this information.
My alternative solution was to add a column LAST_ACCESSED and update this column whenever I select a row from this table. But this operation is a costly operation for me based on time taken for the whole process. Atleast 1000 - 10000 records will be selected from the table for a single operation. Is there any efficient way to do this rather than updating table after reading it. Mine is a multi threaded application. so update such large data set may result in deadlocks or large waiting period for the next read query.
Any elegant solution to this problem?
Oracle Database 12c introduced a new feature called Automatic Data Optimization that brings you Heat Maps to track table access (modifications as well as read operations). Careful, the feature is currently to be licensed under the Advanced Compression Option or In-Memory Option.
Heat Maps track whenever a database block has been modified or whenever a segment, i.e. a table or table partition, has been accessed. It does not track select operations per individual row, neither per individual block level because the overhead would be too heavy (data is generally often and concurrently read, having to keep a counter for each row would quickly become a very costly operation). However, if you have you data partitioned by date, e.g. create a new partition for every day, you can over time easily determine which days are still read and which ones can be archived or purged. Also Partitioning is an option that needs to be licensed.
Once you have reached that conclusion you can then either use In-Database Archiving to mark rows as archived or just go ahead and purge the rows. If you happen to have the data partitioned you can do easy DROP PARTITION operations to purge one or many partitions rather than having to do conventional DELETE statements.
I couldn't use any inbuild solutions. i tried below solutions
1)DB audit feature for select statements.
2)adding a trigger to update a date column whenever a select query is executed on the table.
Both were discarded. Audit uses up a lot of space and have performance hit. Similary trigger also had performance hit.
Finally i resolved the issue by maintaining a separate table were entries older than 5 years that are still used or selected in a query are inserted. While deleting I cross check this table and avoid deleting entries present in this table.

SSIS - Error Output - Redirect row

I've got a question about the result I'm getting with the execution of a task in SSIS.
First of all, this query is been executing from Access. The original source is a set of table in Oracle and the Destination is a local table in Access. This table has a composite primary key. When I execute the query from access as a result I'm getting over one million registers, but before insert this result in the table, Access is showing me a message where it informs that 26 registers violate the primary key constraint (they are repeated). So they are not taken into account.
I have created the destination table in SQL SERVER with the same primary key, I am using the same source used in Access (the same query), but when the data flow begins to work, immediately more than 200.000 register are being redirecting as a error output. And, of course, I was waiting the same result seen in Access, only 26 registers taken as an error.
These are the message from Access:
This is my configuration for SSIS, and its result:
Result
I tried to explain this doubt as clear as possible, but English is not my mother tongue.
If you need clarify something about , please ask me.
Regards.
I'll make the assumption that you're using the default configuration for the OLEDB Destination. This means that the Rows per batch is empty (-1) and a Maximum insert commit size of 2147483647.
Rows per batch
Specify the number of rows in a batch. The default value of this
property is –1, which indicates that no value has been assigned.
Maximum insert commit size
Specify the batch size that the OLE DB destination tries to commit
during fast load operations. The value of 0 indicates that all data is
committed in a single batch after all rows have been processed.
If the rows are offered to the OLEDB Destination in batches of 200.000 all those rows will be inserted in one batch/transaction. If the batch contains one error then the whole batch will fail.
Changes to Rows per batch to 1 will solve this problem but will have a performance impact since it has to insert each row separately.

Copy data from one column to another in oracle table

My current project for a client requires me to work with Oracle databases (11g). Most of my previous database experience is with MSSQL Server, Access, and MySQL. I've recently run into an issue that seems incredibly strange to me and I was hoping someone could provide some clarity.
I was looking to do a statement like the following:
update MYTABLE set COLUMN_A = COLUMN_B;
MYTABLE has about 13 million rows.
The source column is indexed (COLUMN_B), but the destination column is not (COLUMN_A)
The primary key field is a GUID.
This seems to run for 4 hours but never seems to complete.
I spoke with a former developer that was more familiar with Oracle than I, and they told me you would normally create a procedure that breaks this down into chunks of data to be commited (roughly 1000 records or so). This procedure would iterate over the 13 million records and commit 1000 records, then commit the next 1000...normally breaking the data up based on the primary key.
This sounds somewhat silly to me coming from my experience with other database systems. I'm not joining another table, or linking to another database. I'm simply copying data from one column to another. I don't consider 13 million records to be large considering there are systems out there in the orders of billions of records. I can't imagine it takes a computer hours and hours (only to fail) at copying a simple column of data in a table that as a whole takes up less than 1 GB of storage.
In experimenting with alternative ways of accomplishing what I want, I tried the following:
create table MYTABLE_2 as (SELECT COLUMN_B, COLUMN_B as COLUMN_A from MYTABLE);
This took less than 2 minutes to accomplish the exact same end result (minus dropping the first table and renaming the new table).
Why does the UPDATE run for 4 hours and fail (which simply copies one column into another column), but the create table which copies the entire table takes less than 2 minutes?
And are there any best practices or common approaches used to do this sort of change? Thanks for your help!
It does seem strange to me. However, this comes to mind:
When you are updating the table, transaction logs must be created in case a rollback is needed. Creating a table, that isn't necessary.

Resources