Create external tables without the EXTERNAL keyword? - database

I am using hive for work. When I created some external tables today, I forgot to type the EXTERNAL keyword, and the HiveQL is like:
CREATE TABLE year_2012_main (
some BIGINT,
fields BIGINT,
should BIGINT,
beee BIGINT,
here STRING,
buttt STRING,
Iveee STRING,
decide STRING,
tohide STRING,
them BIGINT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ' '
MAP KEYS TERMINATED BY ':'
STORED AS TEXTFILE location '/data/content/year_2012_main';
Then I tried select count (*) from year_2012_main; , and it worked well.
So, just out of curious, what's the difference with or without EXTERNAL?

A Hive table that's not external is called a managed table. One of the main differences between an external and a managed table in Hive is that when an external table is dropped, the data associated with it (in your case /data/content/year_2012_main) doesn't get deleted, only the metadata (number of columns, type of columns, terminators, etc.) gets dropped from the Hive metastore. When a managed table gets dropped, both the metadata and data get dropped. I have so far always preferred making tables external because if the schema of my Hive table changes, I can just drop the external table and re-create another external table over the same HDFS data with the new schema. However, most (if not all) of the changes to schema can now be made through ALTER TABLE or similar commands so my recommendation/preference to use external tables over managed ones might be more of a legacy concern than a contemporary one.
You can learn more about the terminologies here.

Related

When exporting a SQL script, how do I conditionally quote object names that begin with a digit?

Using PowerDesigner 16.5, I am trying to export a SQL script from a model that I have reversed from an existing Oracle schema. Some object names in this schema begin with a digit. When the script is exported, these object names are emitted verbatim (which is normally what you want) but the leading digit in the object name causes a SQL statement to be parsed incorrectly.
For example,
create table MY_SCHEMA.FOO_1234_ACTION
(
MY_ID NUMBER not null,
MY_COLUMN VARCHAR2(32)
)
/
alter table MY_SCHEMA.FOO_1234_ACTION
add constraint 1234_ACTION_PK primary key (MY_ID)
/
In the example above, the constraint name begins with a digit causing this statement to be mis-parsed resulting in ORA-0902 invalid datatype. If I manually edit the script to quote the constraint name, the statement is correctly parsed and the table is altered, adding the primary key:
alter table MY_SCHEMA.FOO_1234_ACTION
add constraint "1234_ACTION_PK" primary key (MY_ID)
/
I understand that maybe the name could simply be changed but I'm trying to establish a baseline from the schema we have in order to iterate it to what it needs to be.
I have looked in both the options for the Oracle database definition as well as the options on the model itself that might influence script output (such as naming conventions). Is there any way to cause PowerDesigner to, in the case of object names with leading digits, emit the name with quotes but leaving all other names alone?

Creating a history table without using triggers

I have a TABLE A with 3000 records with 25 columns. I want to have a history table called Table A history holding all the changes updates and deletes for me to look up any day. I usually use cursors. Now thought using triggers which I was not asked to. Do you have any other suggestions? Many thanks!
If your using tsql /SQL server and you can't use triggers, which is the only sure way to get every change, maybe use a stored procedure that is scheduled in job to run every x amount of time, the stored procedure using a MERGE statement with the two tables to get new records or changes. I would not suggest this if you need every single change without question.
CREATE TABLE dbo.TableA (id INT, Column1 nvarchar(30))
CREATE TABLE dbo.TableA_History (id INT, Column1 nvarchar(30), TimeStamp DateTime)
(this code isn't production, just the general idea)
Put the following code inside a stored procedure and use a Sql Server Job with a schedule on it.
MERGE INTO dbo.TableA_History
USING dbo.TableA
ON TableA_History.id = TableA.id AND TableA_History.Column1 = TableA.Column1
WHEN NOT MATCHED BY TARGET THEN
INSERT (id,Column1,TimeStamp) VALUES (TableA.id,TableA.Column1,GETDATE())
So basically if the record either doesn't exist or doesn't match meaning a column changed, insert the record into the history table.
It is possible to create history without triggers in some case, even if you are not using SQL Server 2016 and system-versioned table are not available.
In some cases, when you can identify for sure which routines are modifying your table, you can create history using OUTPUT INTO clause.
For example,
INSERT INTO [dbo].[MainTable]
OUTPUT inserted.[]
,...
,'I'
,GETUTCDATE()
,#CurrentUserID
INTO [dbo].[HistoryTable]
SELECT *
FROM ... ;
In routines, when you are using MERGE I like that we can use $action:
Is available only for the MERGE statement. Specifies a column of type
nvarchar(10) in the OUTPUT clause in a MERGE statement that returns
one of three values for each row: 'INSERT', 'UPDATE', or 'DELETE',
according to the action that was performed on that row.
It's very handy that we can add the user which is modifying the table. Using triggers you need to use session context or session variable to pass the user. In versioning table you need to add additional column to the main table in order to log the user as it only logs the current table columns (at least for now).
So, basically it depends on your data and application. If you have many sources of CRUD over the table, the trigger is the most secure way. If your table is very big and heavily used, using MERGE is not good as it my cause blocking and harm performance.
In our databases we are using all of the methods depending on the situation:
triggers for legacy
system-versioning for new development
direct OUTPUT in the history, when sure that data is modified only by given set of routines

Database insert script for SQL Server

I have a requirement where data from 150 tables with different columns should be copied to another table which has all these columns. I need a script which will do this activity automatically instead of manually inserting one by one.
Any suggestions?
You can get the column names from either sys.columns or information_schema.columns along with the datatype, then it's just a simple matter of de-duping the columns (based on name) and sorting out any conflicts with differing datatypes to create your destination table.
once you have that, you can create and execute all your insert statements.
Good luck.

Trigger to log inserted/updated/deleted values SQL Server 2012

I'm using SQL Server 2012 Express and since I'm really used to PL/SQL it's a little hard to find some answers to my T-SQL questions.
What I have: about 7 tables with distinct columns and an additional one for logging inserted/updated/deleted values from the other 7.
Question: how can I create one trigger per table so that it stores the modified data on the Log table, considering I can't used Change Data Capture because I'm using the SQL Server Express edition?
Additional info: there is only two columns in the Logs table that I need help filling; the altered data from all the columns merged, example below:
CREATE TABLE USER_DATA
(
ID INT IDENTITY(1,1) NOT NULL,
NAME NVARCHAR2(25) NOT NULL,
PROFILE INT NOT NULL,
DATE_ADDED DATETIME2 NOT NULL
)
GO
CREATE TABLE AUDIT_LOG
(
ID INT IDENTITY(1,1) NOT NULL,
USER_ALTZ NVARCHAR(30) NOT NULL,
MACHINE SYSNAME NOT NULL,
DATE_ALTERERED DATETIME2 NOT NULL,
DATA_INSERTED XML,
DATA_DELETED XML
)
GO
The columns I need help filling are the last two (DATA_INSERTED and DATA_DELETED). I'm not even sure if the data type should be XML, but when someone either
INSERTS or UPDATES (new values only), all data inserted/updated on the all columns of USER_DATA should be merged somehow on the DATA_INSERTED.
DELETES or UPDATES (old values only), all data deleted/updated on the all columns of USER_DATA should be merged somehow on the DATA_DELETED.
Is it possible?
Use the inserted and deleted Tables
DML trigger statements use two special tables: the deleted table and
the inserted tables. SQL Server automatically creates and manages
these tables. You can use these temporary, memory-resident tables to
test the effects of certain data modifications and to set conditions
for DML trigger actions. You cannot directly modify the data in the
tables or perform data definition language (DDL) operations on the
tables, such as CREATE INDEX. In DML triggers, the inserted and
deleted tables are primarily used to perform the following: Extend
referential integrity between tables. Insert or update data in base
tables underlying a view. Test for errors and take action based on the
error. Find the difference between the state of a table before and
after a data modification and take actions based on that difference.
And
OUTPUT Clause (Transact-SQL)
Returns information from, or expressions based on, each row affected
by an INSERT, UPDATE, DELETE, or MERGE statement. These results can be
returned to the processing application for use in such things as
confirmation messages, archiving, and other such application
requirements. The results can also be inserted into a table or table
variable. Additionally, you can capture the results of an OUTPUT
clause in a nested INSERT, UPDATE, DELETE, or MERGE statement, and
insert those results into a target table or view.
Just posting because this is what solved my problem. As user #SeanLange said in the comments to my post, he said to me to use an "audit", which I didn't know it existed.
Googling it, led me to this Stackoverflow answer where the first link there is a procedure that creates triggers and "shadow" tables doing sort of what I needed (it didn't merge all values into one column, but it fits the job).

How to dynamically exclude non-copyable fields in trigger tables

Background: I am trying to have an after update trigger which stores the changed values dynamically into another table. Since this trigger should be generic and easy to transfer to other tables and won't cause problems, if I add additional columns (If my whole code should be required to solve this, I'll update the question)
While trying to do this, I encounter following issue: I want to store the inserted table into an temporary table, which I do in this way:
SELECT *
INTO #tempINSERTED
FROM INSERTED
But the original table contains both: ntext and timestamp columns which aren't allowed in temporary tables.
Another approach I tried, was looping through the system table INFORMATION_SCHEMA.COLUMNS and build a SQL statement as a string excluding non-copyable columns, but this way I cannot access the inserted table. - I already figured I cannot access inserted if I use sp_executesql.
So my question: is there a way to access the inserted table and exclude non-copyable columns as ntext, text, image ?
Thanks in advance
You want the triggers to run fast. So the better approach would be to generate the create trigger code rather than looping through the fields in the trigger itself. Then if the table schema changes you will need to regenerate the trigger.
For your #TEMPINSERTED table you can use nvarchar(max) in place of ntext,
varchar(max) for text and varbinary(max) in place of image. You can also use and binary(8) or bigint in place of timestamp.
I would suggest using a table variable instead of an #temptable. I.e.:
declare #tempTable table (
fieldname int, -- and so on
)

Resources