I am copying data from Excel sheet to the SQL server tables.
In some of the sheets I have data bigger in size of the Table's schema in SQL.
i.e. Table's column has data type nvarchar(50) where as my Excel sheet has data of more than 50 characters in some of the shells.
Now while copying, the rows which has such data are not being inserted in to the database. Instead I would like to insert rows with such data by truncating extra characters. How do I do this?
You can use Java's substring method with a check to the length of the string with something like:
row1.foobar.length() > 50 ? row1.foobar.substring(0,50) : row1.foobar
This uses Java's String length method to test to see if it's longer than 50. If it is then it uses the substring method to get the characters between 0 and 50 (so the first 50 characters) and if it's not then it returns the whole string.
If you pop this in a tMap or a tJavaRow then you should be able to limit strings to 50 characters (or whatever you want with some tweaking):
If you'd prefer to remove any rows not compliant with your database schema then you should define your job's schema to match the database schema and then use a tSchemaComplianceCheck component to filter out the rows that don't match that schema.
Related
I'm trying to insert some value that is having Russian characters from SSIS.
Say I have a MYDATA.TXT (comma separated) and data as below
REM,DES,ID
FR1,Головка,8
GY2,6-гр,9
MO0,Болт,2
1st row is column headers. I'm using this in Flat file. After executing the task, the value are different in my table something like Головка
After some research I found that i have to use N before the text and column should be NVARCHAR. But i'm not sure how to do this in SSIS. I have around 1.2 million records, should I prefix my column with N for all rows or is there any other way in SSIS?
Try to change the code page to 65001 (UTF-8).
In Advanced tab change the Datatype of DES column to unicode string [DT_WSTR]
Output:
In SQL Server 2016 I have a relational dimension table that has a field set to varchar(MAX). Some of the data in that field is over 2k characters. When this data is processed by SSAS the field is truncated. It seems to be truncating at 2,050. I have searched the XML for the whole cube to see if I can find 2050 (or 2,050) but it doesn't show up.
In the Data Source View the field length is -1. My understanding is that this means unlimited. In the dimension definition the field is WChar and the DataSize is 50,000.
I can't for the life of me find why this field is being truncated. Where else can I look?
UPDATE: The issue was with Excel. When we view this data using PowerBI the field is not truncated. So the data in SSAS is fine.
I have faced this issue while importing an excel file with a field containing more than 255 characters. I solved the issue using Python.
Simply, import the excel in a pandas data frame and then calculate the length of each of those string values per row.
Then, sort the dataframe in descending order. This will enable SSIS to allocate maximum space for that field as it scans the first 8 rows to allocate storage:
df = pd.read_excel(f,sheet_name=0,skiprows = 1)
df = df.drop(df.columns[[0]], axis = 1)
df['length'] = df['Item Description'].str.len()
df.sort_values('length', ascending=False, inplace=True)
writer = ExcelWriter('Clean/Cleaned_'+f[5:])
df.to_excel(writer,sheet_name='Billing',index=False)
writer.save()
Can someone tell me how I code in SQL Server so that I am looking in a varchar text column to see if it contains a numerical range within the text?
For example, I'm looking for columns that contain anything between 100000 and 999999. The column may have a value like
this field contains a number `567391`
so I want to select that one, but not if it had
this field contains a number `5391`
For your given example, you can check the digits:
where col like '%[^0-9][1-9][0-9][0-9][0-9][0-9][0-9][^0-9]%'
This is not a generic solution, but it works for your example. In general, parsing strings in SQL Server is difficult. It is better to extract the values you are interested in when loading the data, so the relevant values are correctly in their own columns.
in the DataFlow-Task of an SSIS-Package I have a Flat-File-Source. The text in this file is encoded as UTF-8.
When I am using a lookup task to check a dimension table in the database, I get no match for text with special characters. The lookup task finds matches for many rows, but not for those with characters like "ü,ä,ö...". This is not corret, because there are rows in the table, so it should match.
What I've done so is, using a convert-task and convert the string from UTF-8 to Windows ANSI - Codepage 1252 with the exact same string length as the column in the Dimension table. The column in the dimension table has the collation "Latin1_General_CI_AS". I've also set the DefaulCodepage-Property of the lookup-task to 1252.
Does everyone know, what I did miss to do?
Thanks.
HI ALL,
I am using sql server express to store some data but it also store spaces with data. for example if a have a nchar(20) column in a table and i store "computer" (8 characters) to this column, then remaining character (20-8=12) is filled with blank spaces. Is there any way to over come this problem. Because when I shows this data to flow document (center alignment), then it produces alignment error.
Thanks for help
You can use the NVARCHAR data type instead. The NVARCHAR type is a variable length data type and will only store the actual data.
If you don't have control over the data types then you'll need to trim off any extra characters manually. In T-SQL you can do this with the RTRIM command.