Pentaho error data truncate when insert into SQL Server - sql-server

I'm developing Pentaho job to get data from BigQuery and insert into SQL Server. The job is quite simple as you can see below but during insert to a SQL Server table process in thrown 'Data truncation' error. Then I checked max length for this column. It is just 64 while in database it is nvarchar(500). Moreover that I want to know how is look like then for error records I log into text file. You can see it below. I've spent for 3 days with this problem but still not get an answer yet. Please do guide me.
What I have done so far
String cut step to sub string
String Operation step to trim
put left function in SELECT statement
put REGEXP_REPLACE(uuid, ' ', '') which remove spaces in SELECT statement.
All I have done getting the same error.
Pentaho job
Error records in text file

I have been solved this problem. It is my stupid mistake. I just recreate table and put more number for length of that column.
My case
post_name nvarchar(50) -> nvarchar(150)

Related

How does SQL Server handle failed query to linked server?

I have a stored procedure that relies on a query to a linked server.
This stored procedure is roughly structured as follows:
-- Create local table var to stop query from needing round trips to linked server
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT eid FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
-- This view obscures sensitive information and shows only the data that I have permission to see
-- Many other things
The query itself is much more complex, but the key idea is building this temporary table from a linked server (because it takes the query 5 minutes to run if I don't, versus 3 seconds if I do).
I've recently had an issue where I ended up with updates to my table that failed to get checked against the linked server for duplicate information.
The logical chain of events is this:
Get all of the data from the original view
The original view contains maybe 3000 records, of which maybe 30 are
duplicates of the entity in question, but with 1 field having a
different value.
I then have to grab data from a different server to know which of
the duplicates is the correct one.
When the stored procedure runs, it updates each record.
ERROR STEP - when the stored procedure hits a duplicate record, it
updates my_table again - so es gets changed multiple times in a row.
The temp table was added after the fact when we realized incorrect es values were being introduced to my_table.
'my_database` does not contain the data needed to determine which is the correct tuple, hence the requirement for the linked server.
As far as I can tell, we had a temporary network interruption or a connection timeout that stopped my_server from getting the response back from linked_server, and it just passed an empty table to the rest of the procedure.
So, my question is - how can I guard against this happening?
I can't just check if the table is empty, because it could legitimately be empty. I need to definitively know if that initial SELECT from linked_server failed, if it timed out, or if it intentionally returned nothing.
without knowing the definition of the table you're querying you could get into an issue where your data is to long and you get a truncation error on your table.
Better make sure and substring it...
DECLARE #duplicates TABLE (eid NVARCHAR(6))
INSERT INTO #duplicates(eid)
SELECT SUBSTRING(eid,1,6) FROM [linked_server].[linked_database].[dbo].[linked_table]
WHERE es = 'String'
-- Update on my server using data from linked server
UPDATE [my_server].[my_database].[dbo].[my_table]
-- Many things, including
[status] = CASE
WHEN
eid IN (
SELECT eid FROM #duplicates
)
THEN 'String'
ELSE es
END
FROM [my_server].[another_database].[dbo].[view]
I had a similar problem where I needed to move data between servers, could not use a network connection so I ended up doing BCP out and BCP in. This is fast, clean and takes away the complexity of user authentication, drivers, trust domains. also it's repeatable and can be used for incremental loading.

SSIS package breaks accented characters on SELECT

I have a SQL Server SSIS package that inserts data in a table according to data in another table. The thing is this job breaks the accents of the varchar data in the process. I suppose it has something to do with encoding.
Simplified, my package does the following :
Select data with an OLE DB Source through SQL command data access mode
SELECT id, name, lastname
FROM Client
<WHERE id = 1>
Insert the selected data with an OLE DB Destination through mapping to ClientCopy table.
I noticed that previewing the data in the first step was returning me the broken accents already. Because of this, the data inserted in ClientCopy obviously has those broken accents.
id name lastname
1 Andr‚ BriŠre
This same query when executed in SQL Server returns the data correctly so I'm a bit lost right now.
id name lastname
1 André Brière
Thanks for helping me out!

Dynamic SQL Insert: Column name or number of supplied values does not match table definition

I encounter some strange behavior with a dynamic SQL Query.
In a stored procedure I construct an insert query string out of multiple Strings. I execute the insert query in the SP like that - due to single nvarchar length restrictions.
EXEC(#QuerySelectPT+#QueryFromPT+#QueryFromPT)
If I print each part of the query, put these parts together and execute them manually in Management Studio the query works fine and inserts the data. But, if i execute the query in the EXEC() Method in the stored procedure, I get a
Column name or number of supplied values does not match table definition.
Error Message.
Did multiple check on the amount, spelling of columns in my query and in my insert table, but I have not found any differences so far.
Any advices?
Your count of columns for insert are different from count of columns for select. Print the statement before exec and find the error.
It as shot in the dark but seen you are telling the queries are valid and if you build the final query manually and it is working, the issue could be caused by string truncation.
Could you try:
EXEC(CAST(#QuerySelectPT AS VARCHAR(MAX))+#QueryFromPT+#QueryFromPT);
Also, as the Management Studio's message tab and selects are limited to 4000 symbols I think, you can test if the whole query is assembled correctly like this:
SELECT CAST(#QuerySelectPT+#QueryFromPT+#QueryFromPT AS XML)

CASE Statement in SQL Server to get output I need

So a friend of mine told me to check the almost 5 million email addresses that got 'Hacked' on Gmail.
I downloaded the file a text file, with almost 5 million emails (4804288). I figured I'd just open it in my text editor and Ctrl+F my email address. Well it took forever to just open the .txt document and it crashed. I then exported it to Excel but it has a limit of 1 million+ rows. Since I'm studying SQL I figured I'll just load it into SQL Server and query it creating a stored procedure. It should be cool.
So.. what did I do?
Created a table called 5Mil.
And bulk inserted the info from the .txt file:
BULK INSERT [dbo].[5Mil]
FROM 'C:\list\google.txt'
WITH (fieldterminator = ',', rowterminator = '#gmail.com')
GO
First question, since the txt file had one line per email without a ',' at the end the only way I could load the info was using the rowterminator = '#gmail.com' which truncated '#gmail.com' and left only the username part of the email.
Maybe someone can help me understand how to import the information including the #gmail.com.
I was able to import the email addresses 1 per row. total rows 4804288.
So far so good.
I am currently learning CTE's so I figured I'd apply this to my stored procedure.
This is what I did.
CREATE PROC googlemails
#email VARCHAR(MAX)
AS
WITH CTE AS
(
SELECT Emails
FROM dbo.[5Mil]
WHERE Emails LIKE '%'+#email+'%'
)
SELECT
CASE
WHEN Emails IS NOT NULL
THEN Emails
ELSE 'you are safe'
END AS 'Google Email'
FROM CTE
When I run the procedure and it finds emails it properly lists them.
But when I put an email address that's not in the list I get
Google Email
Blank. What I want is to be able to show 'You are safe' letting the user know that your email was not part of the 5 Mill 'hacked'.
What would be the proper way to use the CASE statement here. Or like always other ways of accomplishing this task. For learning purposes.
Thank you.
select case when exists( select 1 from 5MIL WHERE Emails LIKE '%'+#email+'%')
then 'hacked'
else 'not hacked' end
CTE not appropriate here; they're mostly for simplifying large queries and for recursive queries.
As I understand it you now have a table called 5Mil with one column called Email each with a username without the "#gmail.com". It is unnecessarily redundant to append "#gmail.com" to each one.
Soooo.... why not just search for the username part? Using a LIKE '%email%' is slow and inefficient. Much simpler to just to do this in your stored procedure (similar to dudNumber4's answer but don't use "LIKE"):
--SET #Email = 'someusername' -- note no "#gmail.com"
if EXISTS(SELECT 1 from 5Mil where Email = #Email)
SELECT 'Hacked'
ELSE
SELECT "Not hacked"
if you create an index on 5Mil.Email the search will be almost instant.

Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)"

I try to load my database with tons of data from a .csv file sized 1.4 GB. But when I try to run my code I get errors.
Here's my code:
USE [Intradata NYSE]
GO
CREATE TABLE CSVTest1
(Ticker varchar(10) NULL,
dateval date NULL,
timevale time(0) NULL,
Openval varchar(10) NULL,
Highval varchar(10) NULL,
Lowval varchar(10) NULL,
Closeval varchar(10) NULL,
Volume varchar(10) NULL
)
GO
BULK
INSERT CSVTest1
FROM 'c:\intramerge.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
--Check the content of the table.
SELECT *
FROM CSVTest1
GO
--Drop the table to clean up database.
DROP TABLE CSVTest1
GO
I try to build a database with lots of stockquotes. But I get this error message:
Msg 4832, Level 16, State 1, Line 2 Bulk load: An unexpected end of
file was encountered in the data file. Msg 7399, Level 16, State 1,
Line 2 The OLE DB provider "BULK" for linked server "(null)" reported
an error. The provider did not give any information about
the error. Msg 7330, Level 16, State 2, Line 2 Cannot fetch a row from
OLE DB provider "BULK" for linked server "(null)"
I do not understand much of SQL, but I hope to catch a thing or two. Hope someone see what might be very obvious.
Resurrecting an old question, but in case this helps someone else: after much trial-and-error I was finally (finally!) able to get rid of this error by changing this:
ROWTERMINATOR = '\n'
To this:
ROWTERMINATOR = '0x0A'
I had same issue.
Solution:
Verify the CSV or textfile in text editors like notepad+. Last line might be incomplete. Remove it.
I got the same error when I had a different number of delimited fields in my CSV than columns I had in my table. Check if you have the right number of fields in intramerge.csv.
Methods to determine rows with issues:
Open CSV in spreadsheet, add Filter to all data and look for empty values
and here are the rows with less columns
Use this page https://csvlint.com to create your validation rules and you can detect your problems in your CSV as well.
This is my solution: just give up.
I always end up using SSMS and [ Tasks > Import Data ].
I have never managed to get a real world .csv file to import using this method. This is utterly an useless function that only works on pristine datasets that don't exist in the real world. Perhaps I've never had any luck because the datasets I deal with are quite messy and are generated by third parties.
And if it goes wrong, it doesn't give any clue as to why. Microsoft, you sadden me with your utter incompetence in this area.
Microsoft, perhaps add some error messages, so it says why it rejected it? Which line did it fail on? Which column did it fail on? It's almost impossible to fix the issue if you don't know why it failed!
It was an old question but It seems that my finding would enlight some other people having a similar issue.
The default SSIS timeout value appears to be 30 seconds. This makes any service bound or IO bound operation in your package goes well beyond that timeout value and causes a timeout. Increasing that timeout value (change to "0" for no timeout) will resolve the issue.
I got this error when my format file (i.e. specified using the FORMATFILE param) had a column width that was smaller than the actual column size (e.g. varchar(50) instead of varchar(100)).
I got this exception when the char field in my SQL table was too small for the text coming in. Try making the column bigger.
This might be a bad idea with a full 1.5GB, but you can try it on a subset (start with a few rows):
CREATE TABLE CSVTest1
(Ticker varchar(MAX) NULL,
dateval varchar(MAX) NULL,
timevale varchar(MAX) NULL,
Openval varchar(MAX) NULL,
Highval varchar(MAX) NULL,
Lowval varchar(MAX) NULL,
Closeval varchar(MAX) NULL,
Volume varchar(MAX) NULL
)
... do your BULK INSERT, then
SELECT MAX(LEN(Ticker)),
MAX(LEN(dateval)),
MAX(LEN(timevale)),
MAX(LEN(Openval)),
MAX(LEN(Highval)),
MAX(LEN(Lowval)),
MAX(LEN(Closeval)),
MAX(LEN(Volume))
This will help tell you if your estimates of column are way off. You might also find your columns are out of order, or the BULK INSERT might still fail for some other reason.
I encountered a similar issue, but in this case the file being loaded contained some blank lines. Removing the blank lines solved it.
Alternatively, as the file was delimited, I added the correct number of delimiters to the blank lines, which again allowed the file to import successfully - use this option if the blank lines need to be loaded.
This can also happen if you file columns are separated with ";" but you are using "," as the FIELDTERMINATOR (or the other way around)
i just want to share my solution to this. The problem was the size of table columns, use varchar(255) and all should work.
The bulk insert will not tell you if the import values will "fit" into the field format of the target table.
For example: I tried to import decimal values into a float field. But as the values all had a comma as decimal point, it was unable to insert them into the table (it was expecting a point).
These unexpected results often happen when the provided CVS value is an export from an Excel file. Your computer's regional settings will decide which decimal point will be used when saving an Excel file into a CSV. CSV's provided by different people will cause different results.
Solution: import all fields as VARCHAR, and try to deal with the values afterwards.
For anyone who happens to come across this post, my problem was a simple oversight in regard to syntax. I had this inline with some Python, and brought it straight into SSMS:
BULK
INSERT access_log
FROM '[my path]'
WITH (FIELDTERMINATOR = '\\t', ROWTERMINATOR = '\\n');
The problem being, of course, the double backslashes which were needed in Python for the way I had this embedded as a string in the script. Correcting to '\t' and '\n' obviously fixed it.
Same happend with me, Turns out that this was due to duplicate column names. Renamed the columns to be unique. & It works fine
Please look at your file, if any special characters or spaces at end of the file, then remove and try again.
I came across another potential reason. I got this error when my table had a data source as int but the user had commas in the csv file. Change to number formatting and it imported the data.
My case is I use txt file to import data into SQL Server. All the columns are matched and I can't find what's wrong. At the end, it's encoding problem.
Solution: Use notepad++ to change to the right file encoding.
I am getting this error when I try to pass Null for int columns even though those columns are nullable.
So, I opened the csv file in a editor and replaced all Null values with empty value. And it worked.
Before Data:
636,NULL,NULL,1,5,K0007,105,NULL,2023-02-15 11:27:11.563
After Data:
636,,,1,5,K0007,105,,2023-02-15 11:27:11.563

Resources