Case sensitive sybase query: Invalid column name - database

Details:
2 databases: sybase version 15 and sybase version 16
1 table each (identical): AuthRole with columns id, rolename and description
Tried both jTDS and jconn drivers
Query:
SELECT t1.roleName FROM AuthRole t1;
Results:
Sybase 15: rows returned successfully. 'roleName' could be upper, lower or a mix of case, i.e. not case sensitive
Sybase 16: Invalid column name 'roleName'. It will only work with 'rolename' which is the exact case of the column. Anyone know why this would happen and how to resolve it?

If on ASE 15 both queries work - with "rolename" and "roleName" - that means the the sort order in this database is case insensitive.
If on ASE 16 "rolename" is different than "roleName" - that means the the sort order in this database is case sensitive.
You can check this by querying:
if "a" = "A" print "Case insensitive" else print "Case sensitive"
This setting is set and static for the whole server (and for all the databases that the server contains), but can be changed. Of course changing the sort order is a time consuming process, as it requires to rebuild all indexes based on character types.
You can check the server sortorder setting:
exec sp_configure 'sortorder id'
The information about sort order should be visible in the ASE errorlog when the database server starts:
00:0002:00000:00002:2017/07/04 16:49:26.35 server ASE's default unicode sort order is 'binary'.
00:0002:00000:00002:2017/07/04 16:49:26.35 server ASE's default sort order is:
00:0002:00000:00002:2017/07/04 16:49:26.35 server 'bin_iso_1' (ID = 50)
00:0002:00000:00002:2017/07/04 16:49:26.35 server on top of default character set:
00:0002:00000:00002:2017/07/04 16:49:26.35 server 'iso_1' (ID = 1).
In my example the sort order is binary - which is case sensitive.
Information how to change the sort order for the server is in the ASE manual. Basicaly to change the sort order you need to:
add the new sort order using the charset program,
change the config parameter 'sortorder id'
reboot the ASE server (the server boots, rebuilds the disk devices and then it shuts down)
reboot the ASE server again
indexes that are build on character types are marked as invalid and need to be rebuild

Sounds like an issue with the sort order, eg:
ASE 15 is configured with a case-insensitive sort order
ASE 16 is configured with a case-sensitive sort order
You should be able to confirm the above by running sp_helpsort.
In ASE, case (in)sensitivity applies to data as well as identifiers (eg, table/column names).
To get ASE 16 to function like ASE 15, the DBA will need to change the sort order in the ASE 16 dataserver (I'd suggest they also verify the character set while they're at it).
Keep in mind that changing the sort order (and/or character set) is a dataserver-wide configuration and will require (at a minimum) a rebuild of all indexes and re-running of update index statistics. [For more info the DBA should refer to the ASE System Administration Guide, Chapter on Configuring Character Sets, Sort Orders and Languages.]

Off the top of my head:
In older versions of Sybase ASE you had to carefully set the case-sensitivity at server installation time. The installer defaults to case-sensitive. Maybe the admin who installed ASE15 noticed this (and changed the default to case-insensitive), whereas the admin who installed your ASE16 didn't.
Yes, case-Sensitivity is a property of the Server. You can change it at a later time with sp_configure or ALTER DATABASE, or both (I don't remember and I don't have the time to look it up). You can also use a graphical admin tool to change the server default sort order.
In any case only databases created after that configuration change will be affected. Confusingly, older databases will still be case-sensitive, or lots of warnings will be issued. This is because in your older tables, all primary keys (PK) are implemented as indices, in assume case-sensitivity, and PKs and the PK indices cannot be changed by an installer or a config wizard.
In fact, you have to drop and re-create the indices and run dbcc something (again I don't remember).
For small databases, this drop-and-recreate of indices can of cause be done (use a script or a database reengineering tool to do so). For larger databases this can take some time.
Maybe it's different for ASE16 -check the docuemntation

Related

SQL Server 2019 CHARINDEX returns weird result

When I run the following query in SQL Server 2019, the result is 1, whereas it should be 0.
select CHARINDEX('αρ', 'αυρ')
What could be the problem?
As was mentioned in the comments it may be because you have not declared your string literals as Unicode strings but are using Unicode characters in the strings. SQL Server will be converting the strings to another codepage and doing a bad job of it. Try running this query to see the difference.
SELECT 'αρ', 'αυρ', N'αρ', N'αυρ'
On my server, this gives the following output:
a? a?? αρ αυρ
Another issue is that CHARINDEDX uses the collation of the input which I think is probably not set correctly in this instance. You can force a collation by setting it on one of the inputs. It is also possible to set it at the instance, database and column level.
There are different collations that may be applicable. These have different features, for example some are case sensitive some are not. Also not all collations are installed with every SQL Server instance. It would be worth running SELECT * from sys.fn_helpcollations() to see the descriptions of all the installed ones.
If you change your query to this you should get the result you are looking for.
SELECT CHARINDEX(N'αρ' COLLATE Greek_BIN, N'αυρ')

JDBC Driver Class for MS SQL Server is not found in Hue

We installed a clustered Hadoop server andwe use Hue as our interface, our goal is sqooping data from MS SQL Server to Hadoop. We found a tutorial here
However I get the follwowing error in Hue
I found the solution with the help of http://capnjosh.com/
If you’re using the Sqoop stuff in the web-based interface, you’re actually using Sqoop2
You have to download and install the JDBC driver for SQL Server yourself
– curl -L ‘http://download.microsoft.com/download/0/2/A/02AAE597-3865-456C-AE7F-613F99F850A8/sqljdbc_4.0.2206.100_enu.tar.gz’ | tar xz
– sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop2/
– while you’re at it, you may as well put it in the sqoop directory too: sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop/
Sqoop2 home directory is /var/lib/sqoop2/
restart Sqoop2 service after copying in the JDBC driver file
5a. “Connector” is a Sqoop thing for how it communicates with various processes in Hadoop. Unless you’ve got a lot more experience, it should just be “generic-jdbc-connector”
5b. Class Path is “com.microsoft.sqlserver.jdbc.SQLServerDriver”
connection string in “Manage Connections” is like this: jdbc:sqlserver://192.168.1.102:1433 (though port number defaults to 1433)
For the Actions of the job:
Schema name: I just leave this blank and instead paste in the TSQL query I want
– if you specify a TSQL statement below, then this needs to be blank
Table name: I leave this blank and instead do it all in the TSQL.
– if you specify a TSQL statement below, then this needs to be blank
Table SQL statement: Paste in your query (you can craft it in SSMS and paste it in here). Then, append this to the end of it: +and+${CONDITIONS}. ${CONDITIONS} expands out to be some range of values of the Partition Column name you can specify below this field.
Table Column names: put them in if you want to limit the columns that actually get extracted.
Partition column name: Make sure this column is indexed somehow – Sqoop first queries the min and max values then issues a series of queries that return evenly-distributed portions of all rows based on this column value. e.g. a transactions table; I specify the transaction date column in Partition column name; sqoop gets the min and max dates; Sqoop then issues a series of queries replacing ${CONDITIONS} with “where transDate >= ‘2015-01-01’ and transDate < '2015-04-01'" (moving that window for each query). Each query can be sent from any node in your cluster (though I bet you can restrict which nodes those are)12. Nulls in partition Column: if you do have nulls, this helps Sqoop.13. You can manually specify the query Sqoop uses to get the min/max of the partition column name (by default it looks like this "select min(), max() from ()”.
if you mess with the connection you create in Hue/Sqoop2, note you have to type in the password again
if you get errors, don’t fight it – you have to log in via SSH and look at /var/log/sqoop2/sqoop2.log
if your jobs are failing, and in SQL Server Profiler on the SQL Server you’re querying you only see queries with “where … (1 = 0)…” in them, check your firewall rules: all the nodes in the cluster need to be able to talk out to the SQL Server instance. Yeah, Sqoop will distribute the various partitioned queries across your cluster :)

SQL Server 2012- Server collation and database collation

I have SQL Server 2012 installed that is used for a few different applications. One of our applications needs to be installed, but the company is saying that:
The SQL collation isn't correct, it needs to be: SQL_Latin1_General_CP1_CI_AS
You can just uninstall the SQL Server Database Engine & upon reinstall select the right collation.
What possible reason would this company have to want to change the collation of the database engine itself?
Yes, you are able to set the collation at the database level. To do so, here is an example:
USE master;
GO
ALTER DATABASE <DatabaseName>
COLLATE SQL_Latin1_General_CP1_CI_AS;
GO
You can alter the database Collation even after you have created the database using the following query
USE master;
GO
ALTER DATABASE Database_Name
COLLATE Your_New_Collation;
GO
For more information on database collation Read here
What possible reason would this company have to want to change the collation of the database engine itself?
The other two answers are speaking in terms of Database-level Collation, not Instance-level Collation (i.e. "database engine itself"). The most likely reason that the vendor has for wanting a highly specific Collation (not just a case-insensitive one of your choosing, for example) is that, like most folks, they don't really understand how Collations work, but what they do know is that their application works (i.e. does not get Collation conflict errors) when the Instance and Database both have a Collation of SQL_Latin1_General_CP1_CI_AS, which is the Collation of their Instance and Database (that they develop the app on), because that is the default Collation when installing on an OS having English as its language.
I'm guessing that they have probably had some customers report problems that they didn't know how to fix, but narrowed it down to those Instances not having SQL_Latin1_General_CP1_CI_AS as the Instance / Server -level Collation. The Instance-level Collation controls not just tempdb meta-data (and default column Collation when no COLLATE keyword is specified when creating local or global temporary tables), which has been mentioned by others, but also name resolution for variables / parameters, cursors, and GOTO labels. Even if unlikely that they would be using GOTO statements, they are certainly using variables / parameters, and likely enough to be using cursors.
What this means is that they likely had problems in one or more of the following areas:
Collation conflict errors related to temporary tables:
tempdb being in the Collation of the Instance does not always mean that there will be problems, even if the COLLATE keyword was never used in a CREATE TABLE #[#]... statement. Collation conflicts only occur when attempting to combine or compare two string columns. So assuming that they created a temporary table and used it in conjunction with a table in their Database, they would need to be JOINing on those string columns, or concatenating them, or combining them via UNION, or something along those lines. Under these circumstances, an error will occur if the Collations of the two columns are not identical.
Unexpected behavior:
Comparing a string column of a table to a variable or parameter will use the Collation of the column. Given their requirement for you to use SQL_Latin1_General_CP1_CI_AS, this vendor is clearly expecting case-insensitive comparisons. Since string columns of temp tables (that were not created using the COLLATE keyword) take on the Collation of the Instance, if the Instance is using a binary or case-sensitive Collation, then their application will not be returning all of the data that they were expecting it to return.
Code compilation errors:
Since the Instance-level Collation controls resolution of variable / parameter / cursor names, if they have inconsistent casing in any of their variable / parameter / cursor names, then errors will occur when attempting to execute the code. For example, doing this:
DECLARE #CustomerID INT;
SET #customerid = 5;
would get the following error:
Msg 137, Level 15, State 1, Line XXXXX
Must declare the scalar variable "#customerid".
Similarly, they would get:
Msg 16916, Level 16, State 1, Line XXXXX
A cursor with the name 'Customers' does not exist.
if they did this:
DECLARE customers CURSOR FOR SELECT 1 AS [Bob];
OPEN Customers;
These problems are easy enough to avoid, simply by doing the following:
Specify the COLLATE keyword on string columns when creating temporary tables (local or global). Using COLLATE DATABASE_DEFAULT is handy if the Database itself is not guaranteed to have a particular Collation. But if the Collation of the Database is always the same, then you can specify either DATABASE_DEFAULT or the particular Collation. Though I suppose DATABASE_DEFAULT works in both cases, so maybe it's the easier choice.
Be consistent in casing of identifiers, especially variables / parameters. And to be more complete, I should mention that Instance-level meta-data is also affected by the Instance-level Collation (e.g. names of Logins, Databases, server-Roles, SQL Agent Jobs, SQL Agent Job Steps, etc). So being consistent with casing in all areas is the safest bet.
Am I being unfair in assuming that the vendor doesn't understand how Collations work? Well, according to a comment made by the O.P. on M.Ali's answer:
I got this reply from him: "It's the other way around, you need the new SQL instance collation to match the old SQL collation when attaching databases to it. The collation is used in the functioning of the database, not just something that gets set when it's created."
the answer is "no". There are two problems here:
No, the Collations of the source and destination Instances do not need to match when attaching a Database to a new Instance. In fact, you can even attach a system DB to an Instance that has a different Collation, thereby having a mismatch between the attached system DB and the Instance and the other system DBs.
It's unclear if "database" in that last sentence means actual Database or the Instance (sometimes people use the term "database" to refer to the RDBMS as a whole). If it means actual "Database", then that is entirely irrelevant because the issue at hand is the Instance-level Collation. But, if the vendor meant the Instance, then while true that the Collation is used in normal operations (as noted above), this only shows awareness of simple cause-effect relationship and not actual understanding. Actual understanding would lead to doing those simple fixes (noted above) such that the Instance-level Collation was a non-issue.
If needing to change the Collation of the Instance, please see:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
For more info on working with Collations / encodings / Unicode / etc, please visit:
Collations.Info

SQL Server default character encoding

By default - what is the character encoding set for a database in Microsoft SQL Server?
How can I see the current character encoding in SQL Server?
Encodings
In most cases, SQL Server stores Unicode data (i.e. that which is found in the XML and N-prefixed types) in UCS-2 / UTF-16 (storage is the same, UTF-16 merely handles Supplementary Characters correctly). This is not configurable: there is no option to use either UTF-8 or UTF-32 (see UPDATE section at the bottom re: UTF-8 starting in SQL Server 2019). Whether or not the built-in functions can properly handle Supplementary Characters, and whether or not those are sorted and compared properly, depends on the Collation being used. The older Collations — names starting with SQL_ (e.g. SQL_Latin1_General_CP1_CI_AS) xor no version number in the name (e.g. Latin1_General_CI_AS) — equate all Supplementary Characters with each other (due to having no sort weight). Starting in SQL Server 2005 they introduced the 90 series Collations (those with _90_ in the name) that could at least do a binary comparison on Supplementary Characters so that you could differentiate between them, even if they didn't sort in the desired order. That also holds true for the 100 series Collations introduced in SQL Server 2008. SQL Server 2012 introduced Collations with names ending in _SC that not only sort Supplementary Characters properly, but also allow the built-in functions to interpret them as expected (i.e. treating the surrogate pair as a single entity). Starting in SQL Server 2017, all new Collations (the 140 series) implicitly support Supplementary Characters, hence there are no new Collations with names ending in _SC.
Starting in SQL Server 2019, UTF-8 became a supported encoding for CHAR and VARCHAR data (columns, variables, and literals), but not TEXT (see UPDATE section at the bottom re: UTF-8 starting in SQL Server 2019).
Non-Unicode data (i.e. that which is found in the CHAR, VARCHAR, and TEXT types — but don't use TEXT, use VARCHAR(MAX) instead) uses an 8-bit encoding (Extended ASCII, DBCS, or EBCDIC). The specific character set / encoding is based on the Code Page, which in turn is based on the Collation of a column, or the Collation of the current database for literals and variables, or the Collation of the Instance for variable / cursor names and GOTO labels, or what is specified in a COLLATE clause if one is being used.
To see how locales match up to collations, check out:
Windows Collation Name
SQL Server Collation Name
To see the Code Page associated with a particular Collation (this is the character set and only affects CHAR / VARCHAR / TEXT data), run the following:
SELECT COLLATIONPROPERTY( 'Latin1_General_100_CI_AS' , 'CodePage' ) AS [CodePage];
To see the LCID (i.e. locale) associated with a particular Collation (this affects the sorting & comparison rules), run the following:
SELECT COLLATIONPROPERTY( 'Latin1_General_100_CI_AS' , 'LCID' ) AS [LCID];
To view the list of available Collations, along with their associated LCIDs and Code Pages, run:
SELECT [name],
COLLATIONPROPERTY( [name], 'LCID' ) AS [LCID],
COLLATIONPROPERTY( [name], 'CodePage' ) AS [CodePage]
FROM sys.fn_helpcollations()
ORDER BY [name];
Defaults
Before looking at the Server and Database default Collations, one should understand the relative importance of those defaults.
The Server (Instance, really) default Collation is used as the default for newly created Databases (including the system Databases: master, model, msdb, and tempdb). But this does not mean that any Database (other than the 4 system DBs) is using that Collation. The Database default Collation can be changed at any time (though there are dependencies that might prevent a Database from having it's Collation changed). The Server default Collation, however, is not so easy to change. For details on changing all collations, please see: Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
The server/Instance Collation controls:
local variable names
CURSOR names
GOTO labels
Instance-level meta-data
The Database default Collation is used in three ways:
as the default for newly created string columns. But this does not mean that any string column is using that Collation. The Collation of a column can be changed at any time. Here knowing the Database default is important as an indication of what the string columns are most likely set to.
as the Collation for operations involving string literals, variables, and built-in functions that do not take string inputs but produces a string output (i.e. IF (#InputParam = 'something') ). Here knowing the Database default is definitely important as it governs how these operations will behave.
Database-level meta-data
The column Collation is either specified in the COLLATE clause at the time of the CREATE TABLE or an ALTER TABLE {table_name} ALTER COLUMN, or if not specified, taken from the Database default.
Since there are several layers here where a Collation can be specified (Database default / columns / literals & variables), the resulting Collation is determined by Collation Precedence.
All of that being said, the following query shows the default / current settings for the OS, SQL Server Instance, and specified Database:
SELECT os_language_version,
---
SERVERPROPERTY('LCID') AS 'Instance-LCID',
SERVERPROPERTY('Collation') AS 'Instance-Collation',
SERVERPROPERTY('ComparisonStyle') AS 'Instance-ComparisonStyle',
SERVERPROPERTY('SqlSortOrder') AS 'Instance-SqlSortOrder',
SERVERPROPERTY('SqlSortOrderName') AS 'Instance-SqlSortOrderName',
SERVERPROPERTY('SqlCharSet') AS 'Instance-SqlCharSet',
SERVERPROPERTY('SqlCharSetName') AS 'Instance-SqlCharSetName',
---
DATABASEPROPERTYEX(N'{database_name}', 'LCID') AS 'Database-LCID',
DATABASEPROPERTYEX(N'{database_name}', 'Collation') AS 'Database-Collation',
DATABASEPROPERTYEX(N'{database_name}', 'ComparisonStyle') AS 'Database-ComparisonStyle',
DATABASEPROPERTYEX(N'{database_name}', 'SQLSortOrder') AS 'Database-SQLSortOrder'
FROM sys.dm_os_windows_info;
Installation Default
Another interpretation of "default" could mean what default Collation is selected for the Instance-level collation when installing. That varies based on the OS language, but the (horrible, horrible) default for systems using "US English" is SQL_Latin1_General_CP1_CI_AS. In that case, the "default" encoding is Windows Code Page 1252 for VARCHAR data, and as always, UTF-16 for NVARCHAR data. You can find the list of OS language to default SQL Server collation here: Collation and Unicode support: Server-level collations. Keep in mind that these defaults can be overridden; this list is merely what the Instance will use if not overridden during install.
UPDATE 2018-10-02
SQL Server 2019 introduces native support for UTF-8 in VARCHAR / CHAR datatypes (not TEXT!). This is accomplished via a set of new collations, the names of which all end with _UTF8. This is an interesting capability that will definitely help some folks, but there are some "quirks" with it, especially when UTF-8 isn't being used for all columns and the Database's default Collation, so don't use it just because you have heard that UTF-8 is magically better. UTF-8 was designed solely for ASCII compatibility: to enable ASCII-only systems (i.e. UNIX back in the day) to support Unicode without changing any existing code or files. That it saves space for data using mostly (or only) US English characters (and some punctuation) is a side-effect. When not using mostly (or only) US English characters, data can be the same size as UTF-16, or even larger, depending on which characters are being used. And, in cases where space is being saved, performance might improve, but it might also get worse.
For a detailed analysis of this new feature, please see my post, "Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?".
If you need to know the default collation for a newly created database use:
SELECT SERVERPROPERTY('Collation')
This is the server collation for the SQL Server instance that you are running.
The default character encoding for a SQL Server database is iso_1, which is ISO 8859-1. Note that the character encoding depends on the data type of a column. You can get an idea of what character encodings are used for the columns in a database as well as the collations using this SQL:
select data_type, character_set_catalog, character_set_schema, character_set_name, collation_catalog, collation_schema, collation_name, count(*) count
from information_schema.columns
group by data_type, character_set_catalog, character_set_schema, character_set_name, collation_catalog, collation_schema, collation_name;
If it's using the default, the character_set_name should be iso_1 for the char and varchar data types. Since nchar and nvarchar store Unicode data in UCS-2 format, the character_set_name for those data types is UNICODE.
SELECT DATABASEPROPERTYEX('DBName', 'Collation') SQLCollation;
Where DBName is your database name.
I think this is worthy of a separate answer: although internally unicode data is stored as UTF-16 in Sql Server this is the Little Endian flavour, so if you're calling the database from an external system, you probably need to specify UTF-16LE.
You can see collation settings for each table like the following code:
SELECT t.name TableName, c.name ColumnName, collation_name
FROM sys.columns c
INNER JOIN sys.tables t on c.object_id = t.object_id where t.name = 'name of table';

Export tables from SQL Server to be imported to Oracle 10g

I'm trying to export some tables from SQL Server 2005 and then create those tables and populate them in Oracle.
I have about 10 tables, varying from 4 columns up to 25. I'm not using any constraints/keys so this should be reasonably straight forward.
Firstly I generated scripts to get the table structure, then modified them to conform to Oracle syntax standards (ie changed the nvarchar to varchar2)
Next I exported the data using SQL Servers export wizard which created a csv flat file. However my main issue is that I can't find a way to force SQL Server to double quote column names. One of my columns contains commas, so unless I can find a method for SQL server to quote column names then I will have trouble when it comes to importing this.
Also, am I going the difficult route, or is there an easier way to do this?
Thanks
EDIT: By quoting I'm refering to quoting the column values in the csv. For example I have a column which contains addresses like
101 High Street, Sometown, Some
county, PO5TC053
Without changing it to the following, it would cause issues when loading the CSV
"101 High Street, Sometown, Some
county, PO5TC053"
After looking at some options with SQLDeveloper, or to manually try to export/import, I found a utility on SQL Server management studio that gets the desired results, and is easy to use, do the following
Goto the source schema on SQL Server
Right click > Export data
Select source as current schema
Select destination as "Oracle OLE provider"
Select properties, then add the service name into the first box, then username and password, be sure to click "remember password"
Enter query to get desired results to be migrated
Enter table name, then click the "Edit" button
Alter mappings, change nvarchars to varchar2, and INTEGER to NUMBER
Run
Repeat process for remaining tables, save as jobs if you need to do this again in the future
Use the SQLDeveloper migration tools
I think quoting column names in oracle is something you should not use. It causes all sort of problems.
As Robert has said, I'd strongly advise agains quoting column names. The result is that you'd have to quote them not only when importing the data, but also whenever you want to reference that column in a SQL statement - and yes, that probably means in your program code as well. Building SQL statements becomes a total hassle!
From what you're writing, I'm not sure if you are referring to the column names or the data in these columns. (Can SQLServer really have a comma in the column name? I'd be really surprised if there was a good reason for that!) Quoting the column content should be done for any string-like columns (although I found that other characters usually work better as the need to "escape" quotes becomes another issue). If you're exporting in CSV that should be an option .. but then I'm not familiar with the export wizard.
Another idea for moving the data (depending on the scale of your project) would be to use an ETL/EAI tool. I've been playing around a bit with the Pentaho suite and their Kettle component. It offered a good range of options to move data from one place to another. It may be a bit oversized for a simple transfer, but if it's a big "migration" with the corresponding volume, it may be a good option.

Resources