Here is the case:
There are lot of columns in my databases inside one sqlserver that contains the same data but there is a big inconsistency in datatypes lenght for them.
For example I have column called "name" in schemas stage and dbo in DB1 and the same column in DB2. In all those places this column has different lenght of data type:
stage.name is defined as varchar(10),
dbo.name is defined as varchar(20),
column "name" in DB2 is defined as varchar(max)
Is there any tool that can help me to fix that?
I mean different that writing SQL queries agains Information_schema.columns and then genereting manually alter scripts to implement changes.
As they are in different schemas and even in a different database, do you know for sure that they actually represent the same piece of information? A generic term like name could mean many things in different contexts - e.g. Australian State Names have a very specific max size that I can define, but a City Name could vary hugely. Don't change anything without understanding the actual domain of the data and then ensure the size is appropriate.
Background
I'm using Azure data factory v2 to load data from on-prem databases (for example SQL Server) to Azure data lake gen2. Since I'm going to load thousands of tables, I've created a dynamic ADF pipeline that loads the data as-is in the source based on parameters for schema, table name, modified date (for identifying increments) and so on. This obviously means I can't specify any type of schema or mapping manually in ADF. This is fine since I want the data lake to hold a persistent copy of the source data in the same structure. The data is loaded into ORC files.
Based on these ORC files I want to create external tables in Snowflake with virtual columns. I have already created normal tables in Snowflake with the same column names and data types as in the source tables, which I'm going to use in a later stage. I want to use the information schema for these tables to dynamically create the DDL statement for the external tables.
The issue
Since column names are always UPPER case in Snowflake, and it's case-sensitive in many ways, Snowflake is unable to parse the ORC file with the dynamically generated DDL statement as the definition of the virtual columns no longer corresponds to the source column name casing. For example it will generate one virtual column as -> ID NUMBER AS(value:ID::NUMBER)
This will return NULL as the column is named "Id" with a lower case D in the source database, and therefore also in the ORC file in the data lake.
This feels like a major drawback with Snowflake. Is there any reasonable way around this issue? The only options I can think of is to:
1. Load the information schema from the source database to Snowflake separately and use that data to build a correct virtual column definition with correct cased column names.
2. Load the records in their entirety into some variant column in Snowflake, converted to UPPER or LOWER.
Both options add a lot of complexity or even messes up the data. Is there any straight forward way to only return the column names from an ORC file? Ultimately I would need to be able to use something like Snowflake's DESCRIBE TABLE on the file in the data lake.
Unless you set the parameter QUOTED_IDENTIFIERS_IGNORE_CASE = TRUE you can declare your column in the casing you want:
CREATE TABLE "MyTable" ("Id" NUMBER);
If your dynamic SQL carefully uses "Id" and not just Id you will be fine.
Found an even better way to achieve this, so I'm answering my own question.
With the below query we can get the path/column names directly from the ORC file(s) in the stage with a hint of the data type from the source. This filters out colums that only contains NULL values. Will most likely create some type of data type ranking table for the final data type determination for the virtual columns we're aiming to define dynamically for the external tables.
SELECT f.path as "ColumnName"
, TYPEOF(f.value) as "DataType"
, COUNT(1) as NbrOfRecords
FROM (
SELECT $1 as "value" FROM #<db>.<schema>.<stg>/<directory>/ (FILE_FORMAT => '<fileformat>')
),
lateral flatten(value, recursive=>true) f
WHERE TYPEOF(f.value) != 'NULL_VALUE'
GROUP BY f.path, TYPEOF(f.value)
ORDER BY 1
I find some columns in tables such as name column in sysdbspaces table of sysmaster database and likewise some columns in tables in Informix take only lower case letters . Suppose I create a dbspace with name SAmple using onspaces command the new value in name column of sysdbspaces would be sample and not SAmple and if I query for dbspace whose name = 'SAmple' it is returning null . how to deal with case sensitivity in this type of cases ?
Transferring comments into an answer.
You can get upper-case (or mixed-case) user names into the system if you want to by enclosing the name in quotes. With sysdbspaces, I think you're stuck; the names are converted to lower-case, period. Don't use mixed case searches on columns that only contain lower-case values.
As an example of mixed-case names in the system catalogue:
CREATE TABLE 'McDonald'.Ronald(t INTEGER NOT NULL)
works and uses the name McDonald in the owner column. You can also play with delimited identifiers (names delimited by double quotes), but you need to set the DELIMIDENT environment variable.
Incidentally, in a MODE ANSI database, you might write:
create table whodunnit.murder_mystery (t integer not null);
and the system catalog will record the owner as WHODUNNIT in caps. Unless you quote the user name, or the user name is informix, it will be case-converted.
I have database with a column name "State/Province". All the queries and data transfers work properly. But in the "SelectedValue" property of the dropdownlist control, bind expressions throws an error.
When I edit the column name by removing the slash sign, it works well.
So using slash in the column name is not a proper way of naming?
Basically using anything different than:
Alphabets
Numbers (not at start of the column name)
Underscore (_)
is not recommended as it is not a good way to name fields and some datasources might throw errors on other characters.
Some good points about Column Naming convention:
Avoid underscores, they look unnatural and slow the reader down.
Never use a column name that requires [ ]. Shame on Microsoft for
excessive use of ID which requires the use of a table qualifier.
Use Proper Case, descriptive names and don't abbreviate.
Name primary keys with a suffix that denotes it data type.
TableNameID for integer (the preferred choice for all primary keys).
TableNameCode for varchar.
TableNameKey (other data types).
Do not change the spelling of the primary key from a parent table
when it's used in a child table.
Don't use acronyms unless they are well know by programmers or all
employees of your company.
I know it's an old threat, but if you're not the designer of the table and fields but just want to use the data, I would suggest you use:
SELECT * FROM <YOUR TABLE NAME>
You probably notice that SQL Management studio returns a field name for your column like 'State_Province'.
This is the SQL fieldname that you can use in your queries
This is probably a n00blike (or worse) question. But I've always viewed a schema as a table definition in a database. This is wrong or not entirely correct. I don't remember much from my database courses.
schema -> floor plan
database -> house
table -> room
A relation schema is the logical definition of a table - it defines what the name of the table is, and what the name and type of each column is. It's like a plan or a blueprint. A database schema is the collection of relation schemas for a whole database.
A table is a structure with a bunch of rows (aka "tuples"), each of which has the attributes defined by the schema. Tables might also have indexes on them to aid in looking up values on certain columns.
A database is, formally, any collection of data. In this context, the database would be a collection of tables. A DBMS (Database Management System) is the software (like MySQL, SQL Server, Oracle, etc) that manages and runs a database.
In a nutshell, a schema is the definition for the entire database, so it includes tables, views, stored procedures, indexes, primary and foreign keys, etc.
This particular posting has been shown to relate to Oracle only and the definition of Schema changes when in the context of another DB.
Probably the kinda thing to just google up but FYI terms do seem to vary in their definitions which is the most annoying thing :)
In Oracle a database is a database. In your head think of this as the data files and the redo logs and the actual physical presence on the disk of the database itself (i.e. not the instance)
A Schema is effectively a user. More specifically it's a set of tables/procs/indexes etc owned by a user. Another user has a different schema (tables he/she owns) however user can also see any schemas they have select priviliedges on. So a database can consist of hundreds of schemas, and each schema hundreds of tables. You can have tables with the same name in different schemas, which are in the same database.
A Table is a table, a set of rows and columns containing data and is contained in schemas.
Definitions may be different in SQL Server for instance. I'm not aware of this.
Schema behaves seem like a parent object as seen in OOP world. so it's not a database itself. maybe this link is useful.
But, In MySQL, the two are equivalent. The keyword DATABASE or DATABASES
can be replaced with SCHEMA or SCHEMAS wherever it appears. Examples:
CREATE DATABASE <=> CREATE SCHEMA
SHOW DATABASES <=> SHOW SCHEMAS
Documentation of MySQL
SCHEMA & DATABASE terms are something DBMS dependent.
A Table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows. A database contains one or more(usually) Tables . And you store your data in these tables. The tables may be related with one another(See here).
As per https://www.informit.com/articles/article.aspx?p=30669
The names of all objects must be unique within some scope. Every
database must have a unique name; the name of a schema must be unique
within the scope of a single database, the name of a table must be
unique within the scope of a single schema, and column names must be
unique within a table. The name of an index must be unique within a
database.
From the PostgreSQL documentation:
A database contains one or more named schemas, which in turn contain tables. Schemas also contain other kinds of named objects, including data types, functions, and operators. The same object name can be used in different schemas without conflict; for example, both schema1 and myschema can contain tables named mytable. Unlike databases, schemas are not rigidly separated: a user can access objects in any of the schemas in the database he is connected to, if he has privileges to do so.
There are several reasons why one might want to use schemas:
To allow many users to use one database without interfering with each other.
To organize database objects into logical groups to make them more manageable.
Third-party applications can be put into separate schemas so they do not collide with the names of other objects.
Schemas are analogous to directories at the operating system level, except that schemas cannot be nested.
Contrary to some of the above answers, here is my understanding based on experience with each of them:
MySQL: database/schema :: table
SQL Server: database :: (schema/namespace ::) table
Oracle: database/schema/user :: (tablespace ::) table
Please correct me on whether tablespace is optional or not with Oracle, it's been a long time since I remember using them.
As MusiGenesis put so nicely, in most databases:
schema : database : table :: floor plan : house : room
But, in Oracle it may be easier to think of:
schema : database : table :: owner : house : room
More on schemas:
In SQL 2005 a schema is a way to group objects. It is a container you can put objects into. People can own this object. You can grant rights on the schema.
In 2000 a schema was equivalent to a user. Now it has broken free and is quite useful. You could throw all your user procs in a certain schema and your admin procs in another. Grant EXECUTE to the appropriate user/role and you're through with granting EXECUTE on specific procedures. Nice.
The dot notation would go like this:
Server.Database.Schema.Object
or
myserver01.Adventureworks.Accounting.Beans
A Schema is a collection of database objects which includes logical structures too.
It has the name of the user who owns it.
A database can have any number of Schema's.
One table from a database can appear in two different schemas of same name.
A user can view any schema for which they have been assigned select privilege.
I try answering based on my understanding of the following analogy:
A database is like the house
In the house there are several types of rooms. Assuming that you're living in a really big house. You really don't want your living rooms, bedrooms, bathrooms, mezzanines, treehouses, etc. to look the same. They each need a blueprint to tell how to build/use them. In other words, they each need a schema to tell how to build/use a bathroom, for example.
Of course, you may have several bedrooms, each looks slightly different. You and your wife/husband's bedroom is slightly different from your kids' bedroom. Each bedroom is analogous to a table in your database.
A DBMS is like a butler in the house. He manages literally everything.
In oracle Schema is one user under one database,For example scott is one schema in database orcl.
In one database we may have many schema's like scott
Schemas contains Databases.
Databases are part of a Schema.
So, schemas > databases.
Schemas contains views, stored procedure(s), database(s), trigger(s) etc.
A schema is not a plan for the entire database. It is a plan/container for a subset of objects (ex.tables) inside a a database. This goes to say that you can have multiple objects(ex. tables) inside one database which don't neccessarily fall under the same functional category. So you can group them under various schemas and give them different user access permissions. That said, I am unsure whether you can have one table under multiple schemas. The Management Studio UI gives a dropdown to assign a schema to a table, and hence making it possible to choose only one schema. I guess if you do it with TSQL, it might create 2 (or multiple) different objects with different object Ids.
A database schema is a way to logically group objects such as tables, views, stored procedures etc. Think of a schema as a container of objects.
And tables are collections of rows and columns.
combination of all tables makes a db.