Netezza import from external file error: Unsupported external table reference, unable to derive shape - netezza

I'm working to create a new table from an external file in Netezza, but am getting the following error:
Unsupported external table reference, unable to derive shape
I get the same error whether trying to create a new table or insert into an existing table. Here is the sql I'm using:
select * from external 'FILEPATH.txt' using (delim '|');

you need to define the column format in your query. then the query will fire
SYSTEM.ADMIN(ADMIN)=> select * from external '/tmp/testfile.txt' (v1 int, v2 int) using (delim '|');
V1 | V2
----+----
3 | 4
3 | 6
(2 rows)
note that when inserting into an existing table you don't need to specify the types
SYSTEM.ADMIN(ADMIN)=> create table test (v1 int, v2 int);
CREATE TABLE
SYSTEM.ADMIN(ADMIN)=> insert into test select * from external '/tmp/testfile.txt' using (delim '|');
INSERT 0 2

Related

Snowflake - Keeping target table schema in sync with source table variant column value

I ingest data into a table source_table with AVRO data. There is a column in this table say "avro_data" which will be populated with variant data.
I plan to copy data into a structured table target_table where columns have the same name and datatype as the avro_data fields in the source table.
Example:
select avro_data from source_table
{"C1":"V1", "C2", "V2"}
This will result in
select * from target_table
------------
| C1 | C2 |
------------
| V1 | V2 |
------------
My question is when schema of the avro_data evolves and new fields get added, how can I keep schema of the target_table in sync by adding equivalent columns in the target table?
Is there anything out of the box in snowflake to achieve this or if someone has created any code to do something similar?
Here's something to get you started. It shows how to take a variant column and parse out the internal columns. This uses a table in the Snowflake sample data database, which is not always the same. You can to adjust the table name and column name.
SELECT DISTINCT regexp_replace(regexp_replace(f.path,'\\\\[(.+)\\\\]'),'(\\\\w+)','\"\\\\1\"') AS path_name, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING') AS attribute_type, -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\\\[(.+)\\\\]'),'[^a-zA-Z0-9]','_') AS alias_name -- This generates column aliases based on the path
FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."JCUSTOMER",
LATERAL FLATTEN("CUSTOMER", RECURSIVE=>true) f
WHERE TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[');
This is a snippet of code modified from here: https://community.snowflake.com/s/article/Automating-Snowflake-Semi-Structured-JSON-Data-Handling. The blog author attributes credit to a colleague for this section of code.
While the current incarnation of the stored procedure will create a view from the internal columns in a variant, an alternate version could create and/or alter a table to keep it in sync with changes.

Can table columns be created by copying the datatype from another column ? (for ex like %type in Oracle)

For example, this is possible in Oracle. I wanted to know if snowflake has a similar concept.
CREATE TABLE Purchases
(
purchase_date calendar.date%type,
customer_nr customer.customer_nr%type,
purchase_amount numeric(10,2)
)
I'm afraid there's no way to do that right now. You can use system$typeof to check for a column type, but that can't be used in a create table statement.
The referenceability that you have in your example is not available. You can build a table by joining one or more tables and/or views together and build the column list with columns from any of the joins and any that you explicitly add to the list. The key is to join on 1 = 2 or FALSE
Example
CREATE OR REPLACE TEMP TABLE TMP_X
AS
SELECT A."name" AS NAME
,A."owner" AS OWNER
,B.STG_ARRAY
,NULL::NUMERIC(10,2) AS PURCHASE_AMOUNT
,NULL AS COMMENT
FROM TABLE_A A
JOIN TABLE_B B
ON 1 = 2
;
NAME - takes datatype from A."name" column
OWNER - takes datatype from A."owner" column
STG_ARRAY - takes datatype from B.STG_ARRAY column
PURCHASE_AMOUNT - takes the datatype explicitly specified NUMERIC(10,2)
COMMENT - no explicit datatype -- takes default datatype of VARCHAR(16777216)

How to efficiently replace long strings by their index for SQL Server inserts?

I have a very large DataTable-Object which I need to import from a client into an MS SQL-Server database via ODBC.
The original Data-Table has two columns:
* First column is the Office Location (quite a long string)
* Second column is a booking value (integer)
Now I am looking for the most efficient way to insert these data into an external SQL-Server. My goal is to replace each office location automatically by an index instead using the full string because each location occurs VERY often in the initial table.
Is this possible via a trigger or via a view on the SQL-server?
At the end I want to insert the data without touching them in my script because this is very slow for these large amount of data and let the optimization done by the SQL Server.
I expect that if I do INSERT the data including the Office location, that SQL Server looks up an index for an already imported location and then use just this index. And if the location did not already exist in the index table / view then it should create a new entry here and then use the new index.
Here a sample of the data I need to import via ODBC into the SQL-Server:
OfficeLocation | BookingValue
EU-Germany-Hamburg-Ostend1 | 12
EU-Germany-Hamburg-Ostend1 | 23
EU-Germany-Hamburg-Ostend1 | 34
EU-France-Paris-Eifeltower | 42
EU-France-Paris-Eifeltower | 53
EU-France-Paris-Eifeltower | 12
What I do need on the SQL-Server is something like these 2 tables as a result:
OId|BookingValue OfficeLocation |Oid
1|12 EU-Germany-Hamburg-Ostend1 | 1
1|23 EU-France-Paris-Eifeltower | 2
1|43
2|42
2|53
2|12
My initial idea was, to write the data into a temp-table and have something like an intelligent TRIGGER (or a VIEW?) to react on any INSERT into this table to create the 2 desired (optimized) tables.
Any hint are more than welcome!
Yes, you can create a view with an INSERT trigger to handle this. Something like:
CREATE TABLE dbo.Locations (
OId int IDENTITY(1,1) not null PRIMARY KEY,
OfficeLocation varchar(500) not null UNIQUE
)
GO
CREATE TABLE dbo.Bookings (
OId int not null,
BookingValue int not null
)
GO
CREATE VIEW dbo.CombinedBookings
WITH SCHEMABINDING
AS
SELECT
OfficeLocation,
BookingValue
FROM
dbo.Bookings b
INNER JOIN
dbo.Locations l
ON
b.OId = l.OId
GO
CREATE TRIGGER CombinedBookings_Insert
ON dbo.CombinedBookings
INSTEAD OF INSERT
AS
INSERT INTO Locations (OfficeLocation)
SELECT OfficeLocation
FROM inserted where OfficeLocation not in (select OfficeLocation from Locations)
INSERT INTO Bookings (OId,BookingValue)
SELECT OId, BookingValue
FROM
inserted i
INNER JOIN
Locations l
ON
i.OfficeLocation = l.OfficeLocation
As you can see, we first add to the locations table any missing locations and then populate the bookings table.
A similar trigger can cope with Updates. I'd generally let the Locations table just grow and not attempt to clean it up (for no longer referenced locations) with triggers. If growth is a concern, a periodic job will usually be good enough.
Be aware that some tools (such as bulk inserts) may not invoke triggers, so those will not be usable with the above view.

Hive: How to insert data in a column of type array <string>

Have a table with following schema:
CREATE TABLE `student_details`(
`id_key` string,
`name` string,
`subjects` array<string>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'path'
When trying to insert the values in the table getting an error message:
Tried:
INSERT INTO student_details values ('AA87U','BRYAN',array('ENG','CAL_1','CAL_2','HST','MUS'));
Error:
FAILED: SemanticException [Error 10293]: Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values
This doesn't make sense to me. Tried looking online and found a similar one: ExternalLink. The solution also not making any sense.
Any help please.
You can't insert a complex type directly in Hive.
Either you have to create a dummy table like below:
INSERT INTO student_details select 'AA87U','BRYAN', array('ENG','CAL_1','CAL_2','HST','MUS') from dummy;
For Hive 2+, you can run without dummy table.
INSERT INTO student_details select 'AA87U','BRYAN', array('ENG','CAL_1','CAL_2','HST','MUS');
You have to first create a dummy table with one row:
create table dummy(a int);
insert into dummy values (1);
then you can do this:
INSERT INTO student_details select 'AA87U','BRYAN', array('ENG','CAL_1','CAL_2','HST','MUS') from dummy;

Temporary tables in hana

it it possible to write script in hana that crate temporary table that is based
on existing table (with no need to define columns and columns types hard coded ):
create local temporary table #mytemp (id integer, name varchar(20));
create temporary table with the same columns definitions and contain the
same data ? if so ..i ill be glad to get some examples
i am searching the internet for 2 days and i couldn't find anything useful
thanks
Creating local temporary tables based on dynamic structure definition is not supported in SQLScript.
The question would be: for what do you want to use it?
Instead of a local temp. table you can use a table variable in most cases.
By querying sys.table_columns view, you can get the list and properties of source table and build a dynamic CREATE script then Execute to create the table.
You can find SQL codes for a sample case at Create Table Dynamically on HANA Database
For table columns read
select * from sys.table_columns where table_name = 'TABLENAME';
Seems to work in the hana version I have. I'm not sure how to find out what the version.
PROCEDURE "xxx.yyy.zzz::MY_TEST"(
OUT "OUT_COL" NVARCHAR(200)
)
LANGUAGE SQLSCRIPT
SQL SECURITY INVOKER
AS
BEGIN
create LOCAL TEMPORARY TABLE #LOCALTEMPTABLE
as
(
SELECT distinct 'Cola' as out_col
FROM "SYNONYMS1"
);
select * from #LOCALTEMPTABLE ;
DROP TABLE #LOCALTEMPTABLE;
END
The newer HANA version (HANA 2 SPS 04 Patch 5 ( Build 4.4.17 )) supports your request:
create local temporary table #tempTableName' like "tableTypeName";
This should inherit the data types and all exact values from whatever query is in the parenthesis:
CREATE LOCAL COLUMN TEMPORARY TABLE #mytemp AS (
SELECT
"COLUMN1",
"COLUMN2",
"COLUMN3"
FROM MyTable
);
-- Now you can add the rest of your query here as such:
SELECT * FROM #mytemp
I suppose you can just write :
create column table #MyTempTable as ( select * from MySourceTable);
BR,

Resources