How to import huge blob into SQL Server database? - sql-server

I have a .csv file with one column of blob data type (from cassandra) that is binary data. That data can be huge - much more then 8000 bytes.
I tried to set source and destination data type DT_BYTES->binary/varbinary in SQL Server import wizard but failed with error that data will be truncated.
How to import such data?

You need to set column type to varbinary(max) not varbinary only, so that column will accept more than 8000 bytes. See following microsoft link.
varbinary [ ( n | max) ]
Variable-length binary data. n can be a value from 1 through 8,000.
max indicates that the maximum storage size is 2^31-1 bytes.
The storage size is the actual length of the data entered + 2 bytes.
The data that is entered can be 0 bytes in length.
The ANSI SQL synonym for varbinary is binary varying.
For integration services data types you can look to following link. What you want is DT_IMAGE:
DT_IMAGE
A binary value with a maximum size of 231-1 (2,147,483,647) bytes.

Related

Varchar(MAX) field being truncated in SSAS cube

In SQL Server 2016 I have a relational dimension table that has a field set to varchar(MAX). Some of the data in that field is over 2k characters. When this data is processed by SSAS the field is truncated. It seems to be truncating at 2,050. I have searched the XML for the whole cube to see if I can find 2050 (or 2,050) but it doesn't show up.
In the Data Source View the field length is -1. My understanding is that this means unlimited. In the dimension definition the field is WChar and the DataSize is 50,000.
I can't for the life of me find why this field is being truncated. Where else can I look?
UPDATE: The issue was with Excel. When we view this data using PowerBI the field is not truncated. So the data in SSAS is fine.
I have faced this issue while importing an excel file with a field containing more than 255 characters. I solved the issue using Python.
Simply, import the excel in a pandas data frame and then calculate the length of each of those string values per row.
Then, sort the dataframe in descending order. This will enable SSIS to allocate maximum space for that field as it scans the first 8 rows to allocate storage:
df = pd.read_excel(f,sheet_name=0,skiprows = 1)
df = df.drop(df.columns[[0]], axis = 1)
df['length'] = df['Item Description'].str.len()
df.sort_values('length', ascending=False, inplace=True)
writer = ExcelWriter('Clean/Cleaned_'+f[5:])
df.to_excel(writer,sheet_name='Billing',index=False)
writer.save()

Length vs Precision

When I describe a table with nvarchar data type, what's the difference between Precision and Length? I see that Length is always the double. For example the values is nvarchar(64), the precision is 64 and the length is 128.
CREATE TABLE T(X nvarchar(64))
EXEC sp_columns 'T'
Precision has no meaning for text data.
As for the Length property, I think you confuse it with the Size reported by SQL Server Management Studio, which is the size of a column in bytes. The Length of an nvarchar(64) column is 64 while Size is 128.
The size of unicode types (nchar, nvarchar) is double the number of characters because Unicode uses two bytes for each character.
You can get these values using the LEN function to get the number of characters and the DATALENGTH function to get the number of bytes, eg.
select len(N'some value'), datalength(N'some value')
which returns 10 20
EDIT
From your comments I see you use sp_columns to get at the table's schema info. You shouldn't use any of the catalog stored procedures and use the catalog views instead.
As the documentation states, catalog stored procedures are used to support ODBC applications, hence their results are limited and may need interpretation, as you found out. sp_columns doesn't differentiate between character and data length for example.
Schema views like those in the INFORMATION_SCHEMA or sys schemas return detailed and unambiguous information. For example, INFORMATION_SCHEMA.COLUMNS returns the character lnegth in CHARACTER_MAXIMUM_LENGTH and byte size in CHARACTER_OCTET_LENGTH. It also includes collation and character set information not returned by sp_columns.
The INFORMATION_SCHEMA views are defined by ISO so they don't include some SQL Server-specific info like whether a column is a computed columns, stored in a filestream or replicated. You can get this info using the system object catalog views like sys.columns
NVARCHAR doesnt have precision. Precision is used for decimal. And length is the character length.
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max
indicates that the maximum storage size is 2^31-1 bytes (2 GB). The
storage size, in bytes, is two times the actual length of data entered
+ 2 bytes. The ISO synonyms for nvarchar are national char varying and national character varying.
From the source:-
Precision is the number of digits in a number.
Length for a numeric data type is the number of bytes that are used to
store the number. Length for a character string or Unicode data type
is the number of characters
NVARCHAR doesn't have precision, and the length would be the character length. Try the following SQL
SELECT LEN('Whats the length of this string?')
You may be confucing this with a Numeric or Decimal, see the chart here
3 Years later but still relevant.. had a similar issue;
I encountered a UDDT 'd_style', using alt+f1 I see this is a 'nvarchar, length=2, prec=1'.
And indeed, I cannot insert e.g. 'EU' in this field, as data would be truncated.
So length 2 does not mean 2 characters here. This is because each character is saved as 2 bytes, thus prec = length/2.
In this case, 'precision' is indeed the maximum amount of characters allowed.
However, when you create a new table with the 'nvarchar' datatype, you can simply enter the desired length (e.g. my_field nvarchar(2)) if you want to be able to insert example value 'EU'.
Mind that this is all quite 'practical', in theory, precision is only applicable for numeric values, and is the number of digits in a number. So we shouldn't be talking about a 'precision' for nvarchar.
Indeed, this is a confusing topic in SQL server, even the documentation is inconsistent.
For example see the syntax definition of CREATE TABLE
<data type> ::=
[ type_schema_name . ] type_name
[ ( precision [ , scale ] | max |
[ { CONTENT | DOCUMENT } ] xml_schema_collection ) ]
You see "precision", but don't see "length | precision".
And as others pointed out, everywhere else in the documentation, they refer to the same attribute as length, when talking about character datatypes. Except where they don't, like the documentation of sp_help and companion.
Length int Column length in bytes.
Prec char(5) Column precision.
Scale char(5) Column scale.
Here length is the storage length. (Most people would call it size.)
So unfortunately there is no correct or final answer to your question, you must always consider the context, and consult the documentation, when in doubt.

SQL Server 2008 R2 Varbinary Max Size

What is the max size of a file that I can insert using varbinary(max) in SQL Server 2008 R2? I tried to change the max value in the column to more than 8,000 bytes but it won't let me, so I'm guessing the max is 8,000 bytes, but from this article on MSDN, it says that the max storage size is 2^31-1 bytes:
varbinary [ ( n | max) ]
Variable-length binary data. n can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size is the actual length of the data entered + 2 bytes. The data that is entered can be 0 bytes in length. The ANSI SQL synonym for varbinary is binary varying.
So how can i store larger files in a varbinary field? I'm not considering using a FILESTREAM since the files I want to save are from 200kb to 1mb max, The code I'm using:
UPDATE [table]
SET file = ( SELECT * FROM OPENROWSET ( BULK 'C:\A directory\A file.ext', SINGLE BLOB) alias)
WHERE idRow = 1
I have been able to execute that code successfully to files less or equal than 8000 bytes. If i try with a file 8001 bytes size it will fail. My file field on the table has a field called "file" type varbinary(8000) which as I said, I can't change to a bigger value.
I cannot reproduce this scenario. I tried the following:
USE tempdb;
GO
CREATE TABLE dbo.blob(col VARBINARY(MAX));
INSERT dbo.blob(col) SELECT NULL;
UPDATE dbo.blob
SET col = (SELECT BulkColumn
FROM OPENROWSET( BULK 'C:\Folder\File.docx', SINGLE_BLOB) alias
);
SELECT DATALENGTH(col) FROM dbo.blob;
Results:
--------
39578
If this is getting capped at 8K then I would guess that either one of the following is true:
The column is actually VARBINARY(8000).
You are selecting the data in Management Studio, and analyzing the length of the data that is displayed there. This is limited to a max of 8192 characters in results to text, if this is the case, so using DATALENGTH() directly against the column is a much better approach.
I would dare to say, use file stream for files bigger than 1 MB based on the following from: MS TechNet | FILESTREAM Overview.
In SQL Server, BLOBs can be standard varbinary(max) data that stores
the data in tables, or FILESTREAM varbinary(max) objects that store
the data in the file system. The size and use of the data determines
whether you should use database storage or file system storage. If the
following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs in the database
often provides better streaming performance.
"SET TEXTSIZE" Specifies the size of varchar(max), nvarchar(max), varbinary(max), text, ntext, and image data returned by a SELECT statement.
select ##TEXTSIZE
The SQL Server Native Client ODBC driver and SQL Server Native Client OLE DB Provider for SQL Server automatically set TEXTSIZE to 2147483647 when connecting. The maximum setting for SET TEXTSIZE is 2 gigabytes (GB), specified in bytes. A setting of 0 resets the size to the default (4 KB).
As mentioned, for big files you should prefer file stream.

What data type use instead of 'ntext' data type?

I want to write a trigger for one of my tables which has an ntext datatype field an as you know the trigger can't be written for ntext datatype.
Now I want to replace the ntext with nvarchar datatype. The ntext maximum length is 2,147,483,647 character whereas nvarchar(max) is 4000 character.
what datatype can I use instead of ntext datatype.
Or are there any ways to write trigger for when I have ntext datatype?
It's better to say my database is designed before with SQL 2000 and it is full of data.
You're out of luck with sql server 2000, but you can possibly chain together a bunch of nvarchar(4000) variables. Its a hack, but it may be the only option you have. I would also do an assesment of your data, and see what the largest data you actually have in that column. A lot of times, columns are made in anticipation of a large data set, but in the end it doesn't have them.
in MSDN i see this :
* Important *
ntext, text, and image data types will be removed in a future version of Microsoft SQL Server. Avoid using these data types in new development work, and plan to modify applications that currently use them. Use nvarchar(max), varchar(max), and varbinary(max) instead.
Fixed and variable-length data types for storing large non-Unicode and Unicode character and binary data. Unicode data uses the UNICODE UCS-2 character set.
and it preferd nvarchar(MAX) , You can see details below :
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is two times the actual length of data entered + 2 bytes. The ISO synonyms for nvarchar are national char varying and national character varying.

VarBinary vs Image SQL Server Data Type to Store Binary Data?

I need to store binary files to the SQL Server Database. Which is the better Data Type out of Varbinary and Image?
Since image is deprecated, you should use varbinary.
per Microsoft (thanks for the link #Christopher)
ntext , text, and image data types will be removed in a future
version of Microsoft SQL Server. Avoid using these data types in new
development work, and plan to modify applications that currently use
them. Use nvarchar(max), varchar(max), and varbinary(max) instead.
Fixed and variable-length data types for storing large non-Unicode and
Unicode character and binary data. Unicode data uses the UNICODE UCS-2
character set.
varbinary(max) is the way to go (introduced in SQL Server 2005)
There is also the rather spiffy FileStream, introduced in SQL Server 2008.
https://learn.microsoft.com/en-us/sql/t-sql/data-types/ntext-text-and-image-transact-sql
image
Variable-length binary data from 0 through 2^31-1 (2,147,483,647)
bytes. Still it IS supported to use image datatype, but be aware of:
https://learn.microsoft.com/en-us/sql/t-sql/data-types/binary-and-varbinary-transact-sql
varbinary [ ( n | max) ]
Variable-length binary data. n can be a value from 1 through 8,000. max indicates that the maximum storage
size is 2^31-1 bytes. The storage size is the actual length of the
data entered + 2 bytes. The data that is entered can be 0 bytes in
length. The ANSI SQL synonym for varbinary is binary varying.
So both are equally in size (2GB). But be aware of:
https://learn.microsoft.com/en-us/sql/database-engine/deprecated-database-engine-features-in-sql-server-2016#features-not-supported-in-a-future-version-of-sql-server
Though the end of "image" datatype is still not determined, you should use the "future" proof equivalent.
But you have to ask yourself: why storing BLOBS in a Column?
https://learn.microsoft.com/en-us/sql/relational-databases/blob/compare-options-for-storing-blobs-sql-server

Resources