We are using MS-SQL and Oracle as our database.
We have used hibernate annotations to create tables, in the annotation class file we have declared column definition as
#Column(name="UCAALSNO",nullable=false,columnDefinition="nvarchar(20)")
and this works fine for MS-SQL.
But when it comes to Oracle nvarchar throws an exception as oracle supports only nvarchar2.
How to create annotation file to support datatype nvarchar for both the databases.
You could use NCHAR:
In MSSQL:
nchar [ ( n ) ]
Fixed-length Unicode string data. n defines the string length and must
be a value from 1 through 4,000. The storage size is two times n
bytes. When the collation code page uses double-byte characters, the
storage size is still n bytes. Depending on the string, the storage
size of n bytes can be less than the value specified for n. The ISO
synonyms for nchar are national char and national character.
while in Oracle:
NCHAR
The maximum length of an NCHAR column is 2000 bytes. It can hold up to
2000 characters. The actual data is subject to the maximum byte limit
of 2000. The two size constraints must be satisfied simultaneously at
run time.
Nchar occupies a fixed space, so for very large table there will be a considerable space difference between an nchar and an nvarchar, so you should take this into consideration.
I usually have incremental DB schema migration scripts for my production DBs and I only rely on Hibernate DDL generation for my integration testing in-memory databases (e.g. HSQLDB or H2). This way I choose the production schema types first and the "columnDefinition" only applies to the testing schema, so there is no conflict.
You might want to read this too, which disregards the N(VAR)CHAR(2) additional complexity, so you might consider setting a default character encoding:
Given that, I'd much rather go with the approach that maximizes
flexibility going forward, and that's converting the entire database
to Unicode (AL32UTF8 presumably) and just using that.
Although that you might be recommanded to use VARCHAR2, VARCHAR has been synonym with VARCAHR2 for a long time now.
So quoting a DBA opinion:
The Oracle 9.2 and 8.1.7 documentation say essentially the same thing,
so even though Oracle continually discourages the use of VARCHAR, so
far they haven't done anything to change it's parity with VARCHAR2.
I'd say give it a try for VARCHAR too, as it's supported on most DBs.
Related
I have a column (CLOB type) in a database which holds json strings. These size of these json strings can be quite variable. In the case where these strings are less than 4000 characters I have heard that Oracle treats these CLOBs as VARCHAR internally. However, I am curious how exactly this process works. My interest is in the performance and ability to visually see json being stored.
If a CLOB in the DB has 50 characters does Oracle treat this single object as VARCHAR2(50) ? Do all CLOBs stored in the column need to be less than 4000 characters for Oracle to treat the whole column as a VARCHAR ? How does this all work?
Oracle does not always treat short CLOB values as VARCHAR2 values. It only does this if you allow it to do so, using the CLOB storage option of ENABLE STORAGE IN ROW. E.g.,
create table clob_test (
id number NOT NULL PRIMARY KEY,
v1 varchar2(60),
c1 clob
) lob(c1) store as (enable storage in row);
In this case, Oracle will store the data for C1 in the table blocks, right next to the values for ID and V1. It will do this as long as the length of the CLOB value is less than close to 4000 bytes (i.e., 4000 minus system control information that takes space in the CLOB).
In this case, the CLOB data will be read like a VARCHAR2 (e.g., the storage CHUNK size becomes irrelevant).
If the CLOB grows too big, Oracle will quietly move it out of the block into separate storage, like any big CLOB value.
If a CLOB in the DB has 50 characters does Oracle treat this single object as VARCHAR2(50)?
Basically, if the CLOB was created with ENABLE STORAGE IN ROW. This option cannot be altered after the fact. I wouldn't count on Oracle treating the CLOB exactly like a VARCHAR2 in every respect. E.g., there is system control information stored in the in-row CLOB that is not stored in a VARCHAR2 column. But for many practical purposes, including performance, they're very similar.
Do all CLOBs stored in the column need to be less than 4000 characters for Oracle to treat the whole column as a VARCHAR?
No. It's on a row-by-row basis.
How does this all work?
I explained what I know as best I could. Oracle doesn't publish its internal algorithms.
We are migrating some data from sql server to oracle. For columns defined as NVARCHAR in SQL server we started creating NVARCHAR columns in Oracle thinking them to be similar..But it looks like they are not.
I have read couple of posts on stackoverflow and want to confirm my findings.
Oracle VARCHAR2 already supports unicode if the database character set is say AL32UTF8 (which is true for our case).
SQLServer VARCHAR does not support unicode. SQLServer explicitly requires columns to be in NCHAR/NVARCHAR type to store data in unicode (specifically in the 2 byte UCS-2 format)..
Hence would it be correct to say that SQL Server NVARCHAR columns can/should be migrated as Oracle VARCHAR2 columns ?
Yes, if your Oracle database is created using a Unicode character set, an NVARCHAR in SQL Server should be migrated to a VARCHAR2 in Oracle. In Oracle, the NVARCHAR data type exists to allow applications to store data using a Unicode character set when the database character set does not support Unicode.
One thing to be aware of in migrating, however, is character length semantics. In SQL Server, a NVARCHAR(20) allocates space for 20 characters which requires up to 40 bytes in UCS-2. In Oracle, by default, a VARCHAR2(20) allocates 20 bytes of storage. In the AL32UTF8 character set, that is potentially only enough space for 6 characters though most likely it will handle much more (a single character in AL32UTF8 requires between 1 and 3 bytes. You probably want to declare your Oracle types as VARCHAR2(20 CHAR) which indicates that you want to allocate space for 20 characters regardless of how many bytes that requires. That tends to be much easier to communicate than trying to explain why some 20 character strings are allowed while other 10 character strings are rejected.
You can change the default length semantics at the session level so that any tables you create without specifying any length semantics will use character rather than byte semantics
ALTER SESSION SET nls_length_semantics=CHAR;
That lets you avoid typing CHAR every time you define a new column. It is also possible to set that at a system level but doing so is discouraged by the NLS team-- apparently, not all the scripts Oracle provides have been thoroughly tested against databases where the NLS_LENGTH_SEMANTICS has been changed. And probably very few third-party scripts have been.
I'm new to Microsoft SQL. I'm planning to store text in Microsoft SQL server and there will be special international characters. Is there a "Data Type" specific to Unicode or I'm better encoding my text with a reference to the unicode number (i.e. \u0056)
Use Nvarchar/Nchar (MSDN link). There used to be an Ntext datatype as well, but it's deprecated now in favour of Nvarchar.
The columns take up twice as much space over the non-unicode counterparts (char and varchar).
Then when "manually" inserting into them, use N to indicate it's unicode text:
INSERT INTO MyTable(SomeNvarcharColumn)
VALUES (N'français')
When you say special international characters, what do you mean? If special means they aren't common and just occasional, then the overhead of nvarchar might not make sense in your situation on a table with a very large number of rows or a lot of indexing.
I'm all for using Unicode where appropriate, but understanding when it is appropriate is important.
If you are mixing data with different implied code pages (Japanese and Chinese in same database) or you just want to be forward-looking for internationalization and localization, then you want the column to be Unicode and use nvarchar data type and that's perfectly fine. Unicode is not going to magically solve all sorting problems for you.
If you are know that you will always be storing mainly ASCII but some occasional foreign characters, just store your UTF-8 data or HTML encoded data in varchar. If your data is all in Japanese and code page 932 (or any other single code page), you can still store double-byte characters in varchar, they still take up two bytes. My point is, that when you are already in a DBCS collation, international characters are no longer "special". It's not just the data storage, but any indexes as well as the working set when dealing with such a column in queries and in other dataflows.
And do not make a blanket rule that all character data should be nvarchar - it's a waste for many columns which are codes or identifiers.
Any time you have a column, go through the same questions:
What is the type of data?
What is the range?
Are NULLs allowed?
What is the limit of the size?
Are there any constraints I should apply now to stop bad data getting in from the beginning?
People have had success with using the following code to force Unicode at insert data manipulation.
INSERT INTO <table> (text) values (N'<text here>)
1
Character set features of tables and string inside them are specified for the database and if your database has a Unicode collation, strings inside the tables are unicode. As well for string columns you have to use nvarchar or nchar data types to make them able to store unicode strings. But this feature works if your database has a utf8 or unicode characterset or collation. Read this link for more information. Unicode and SQL Server
SQL Server Text type vs. varchar data type:
As a rule of thumb, if you ever need you text value to exceed 200
characters AND do not use join on this column, use TEXT.
Otherwise use VARCHAR.
Assuming my data now is 4000 characters AND i do not use join on this column. By that quote, it is more advantageous to use TEXT/varchar(max) compared to using varchar(4000).
Why so? (what advantage does TEXT/varchar(max) have over normal varchar in this case?)
TEXT is deprecated, use nvarchar(max), varchar(max), and varbinary(max) instead: http://msdn.microsoft.com/en-us/library/ms187993.aspx
I disagree with the 200 thing because it isn't explained, unless it relate to the deprecated "text in row" option
If your data is 4000 characters then use char(4000). It is fixed length
Text is deprecated
BLOB types are slower
In old versions of SQL (2000 and earlier?) there was a max row length of 8 KB (or 8060 bytes). If you used varchar for lots of long text columns they would be included in this length, whereas any text columns would not, so you can keep more text in a row.
This issue has been worked around in more recent versions of SQL.
This MSDN page includes the statement:
SQL Server 2005 supports row-overflow storage which enables variable
length columns to be pushed off-row. Only a 24-byte root is stored in
the main record for variable length columns pushed out of row; because
of this, the effective row limit is higher than in previous releases
of SQL Server. For more information, see the "Row-Overflow Data
Exceeding 8 KB" topic in SQL Server 2005 Books Online.
What should i know about involving basic data types in SQL Server?
In my database i need
Flags/bits, I assume I should use byte
64bit ids/ints
a variable length string. It could be 5 letters it could be 10,000 (for desc but i plan to allow unlimited length usernames)
Is there a TEXT type in SQL Server? I dont want to use varchar(limit) unless i could use something ridiculously high like 128k. How do i specify 1byte - 8byte ints?
For 1), use BIT - it's one bit, e.g. eight of those fields will be stuck into a single byte.
For 2), use BIGINT - 64-bit signed int
For 3), definitely do NOT use TEXT/NTEXT - those are deprecated as of SQL Server 2005 and up.
Use VARCHAR(MAX) or NVARCHAR(MAX) for up to 2 GB of textual information instead.
Here's the list of the SQL Server 2008 data types:
http://msdn.microsoft.com/en-us/library/ms187594.aspx
Flags/bits, I assume I should use byte
Use "bit" which is exactly that: one bit
64bit ids/ints
bigint is 64 bit signed
a variable length string. It could be 5 letters it could be 10,000 (for desc but i plan to allow unlimited length usernames)
varchar(max) is up to 2GB. Otherwise varchar(8000) is the conventional limit
Microsoft even put into a nice handy web page for you
Here is the document.
http://msdn.microsoft.com/en-us/library/aa258271%28SQL.80%29.aspx
Others have already provided good answers to your question. If you are doing any .NET development, and need to map SQL data types to CLR data types, the following link will be quite useful.
http://msdn.microsoft.com/en-us/library/bb386947.aspx
Randy