In SQL Server 2016 I have a relational dimension table that has a field set to varchar(MAX). Some of the data in that field is over 2k characters. When this data is processed by SSAS the field is truncated. It seems to be truncating at 2,050. I have searched the XML for the whole cube to see if I can find 2050 (or 2,050) but it doesn't show up.
In the Data Source View the field length is -1. My understanding is that this means unlimited. In the dimension definition the field is WChar and the DataSize is 50,000.
I can't for the life of me find why this field is being truncated. Where else can I look?
UPDATE: The issue was with Excel. When we view this data using PowerBI the field is not truncated. So the data in SSAS is fine.
I have faced this issue while importing an excel file with a field containing more than 255 characters. I solved the issue using Python.
Simply, import the excel in a pandas data frame and then calculate the length of each of those string values per row.
Then, sort the dataframe in descending order. This will enable SSIS to allocate maximum space for that field as it scans the first 8 rows to allocate storage:
df = pd.read_excel(f,sheet_name=0,skiprows = 1)
df = df.drop(df.columns[[0]], axis = 1)
df['length'] = df['Item Description'].str.len()
df.sort_values('length', ascending=False, inplace=True)
writer = ExcelWriter('Clean/Cleaned_'+f[5:])
df.to_excel(writer,sheet_name='Billing',index=False)
writer.save()
Related
I wan to know whether there is any restriction on the size of the pyspark dataframe column
when i am reading a json file into the data frame using pyspark
df = spark.read.option('multiline', True).json('path')
display(df)
it is throwing the error and tasks are getting failed during execution
if I am reading the same file using
df = spark.read.option('multiline', True).text('path')
it is able to read the data.
my json contains a field which is holding the entire data of 2gb. this is field is nested array with 3 to 4 level arrays. any help is appreciated
As suggested by # Lamanus transformation, split JSON into several rows.
And also try this alternative approach with delta lake. As per official doc delta lake has a high capacity to optimize the large datasets and it processing large JSON files in Azure Databricks.
I reproduce the same thing in my environment and got this output.
df = spark.read.option('multiline', True).json('dbfs:/FileStore/data__13_.json')
# write the dataFrame into a delta lake table
df.write.format("delta").mode("overwrite").save("/mnt/delta/sampl1")
sd1 = spark.read.format("delta").load("/mnt/delta/sampl1")
display(sd1)
while migrating data from SQL Server nvarchar(max) to Snowflake Varchar(16777216) I am getting the below issues, the error is throwing for only One record . Appreciate for any help on this.
" Max LOB size (16777216) exceeded, actual size of parsed column is 62252375 File 'XXX.csv', line 24190978, character 62238390 Row 24190977, column "TRANSIENT_STAGE_TABLE"[notes:4] "
This is a hard limit:
https://docs.snowflake.com/en/sql-reference/data-types-text.html#data-types-for-text-strings
You may try to split the data into smaller columns while exporting the data from Ms SQL Server.
I am setting up a SQL Azure database. I need to write data into the database on daily basis. I am using 64-bit R version 3.3.3 on Windows10. Some of the columns contain text (more than 4000 characters). Initially, I have imported some data from a csv into the SQL Azure database using Microsoft SQL Server Management Studios. I set up the text columns as ntext format, because when I tried using nvarchar the max was 4000 and some of the values got truncated even though they were about 1100 characters long.
In order to append to the database I am first saving the records in a temp table when I have predefined the varTypes:
varTypesNewFile <- c("Numeric", rep("NTEXT", ncol(newFileToAppend) - 1))
names(varTypesNewFile) <- names(newFileToAppend)
sqlSave(dbhandle, newFileToAppend, "newFileToAppendTmp", rownames = F, varTypes = varTypesNewFile, safer = F)
and then append them by using:
insert into mainTable select * from newFileToAppendTmp
If the text is not too long, the above does work. However, sometimes I get the following error during the sqlSave command:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
'Calloc' could not allocate memory (1073741824 of 1 bytes)
My questions are:
How can I counter this issue?
Is this the format I should be using?
Additionally, even when the above works, it takes about an hour to upload about 5k of records. Is it not too long? Is this the normal amount of time it should take? If not, what could I do better.
RODBC is very old, and can be a bit flaky with NVARCHAR columns. Try using the RSQLServer package instead, which offers an alternative means to connect to SQL Server (and also provides a dplyr backend).
I am copying data from Excel sheet to the SQL server tables.
In some of the sheets I have data bigger in size of the Table's schema in SQL.
i.e. Table's column has data type nvarchar(50) where as my Excel sheet has data of more than 50 characters in some of the shells.
Now while copying, the rows which has such data are not being inserted in to the database. Instead I would like to insert rows with such data by truncating extra characters. How do I do this?
You can use Java's substring method with a check to the length of the string with something like:
row1.foobar.length() > 50 ? row1.foobar.substring(0,50) : row1.foobar
This uses Java's String length method to test to see if it's longer than 50. If it is then it uses the substring method to get the characters between 0 and 50 (so the first 50 characters) and if it's not then it returns the whole string.
If you pop this in a tMap or a tJavaRow then you should be able to limit strings to 50 characters (or whatever you want with some tweaking):
If you'd prefer to remove any rows not compliant with your database schema then you should define your job's schema to match the database schema and then use a tSchemaComplianceCheck component to filter out the rows that don't match that schema.
HI ALL,
I am using sql server express to store some data but it also store spaces with data. for example if a have a nchar(20) column in a table and i store "computer" (8 characters) to this column, then remaining character (20-8=12) is filled with blank spaces. Is there any way to over come this problem. Because when I shows this data to flow document (center alignment), then it produces alignment error.
Thanks for help
You can use the NVARCHAR data type instead. The NVARCHAR type is a variable length data type and will only store the actual data.
If you don't have control over the data types then you'll need to trim off any extra characters manually. In T-SQL you can do this with the RTRIM command.