I want to load a plain file into Greenplum database using external tables.
Can I specify input format for timestamps/date/time fields? (If you know the answer for PostgreSQL, please reply as well)
For example, with Oracle I can use DATE_FORMAT DATE MASK 'YYYYMMDD' to tell how to parse the date. For Netezza I can specify DATESTYLE 'YMD'. For Greenplum I cannot find the answer. I can describe fields as char, and then parse them during the load, but this is an ugly workaround.
Here is my tentative code:
CREATE EXTERNAL TABLE MY_TBL (X date, Y time, Z timestamp )
LOCATION (
'gpfdist://host:8001/file1.txt',
'gpfdist://host:8002/file2.txt'
) FORMAT 'TEXT' (DELIMITER '|' NULL '')
It appears that you can:
SET DATESTYLE = 'YMD';
before SELECTing from the table. This will affect the interpretation of all dates, though, not just those from the file. If you consistently use unambiguous ISO dates elsewhere that will be fine, but it may be a problem if (for example) you need to also accept 'D/M/Y' date literals in the same query.
This is specific to GreenPlum's CREATE EXTERNAL TABLE and does not apply to SQL-standard SQL/MED foreign data wrappers, as shown below.
What surprises me is that PostgreSQL proper (which does not have this CREATE EXTERNAL TABLE feature) always accepts ISO-style YYYY-MM-DD and YYYYMMDD dates, irrespective of DATESTYLE. Observe:
regress=> SELECT '20121229'::date, '2012-12-29'::date, current_setting('DateStyle');
date | date | current_setting
------------+------------+-----------------
2012-12-29 | 2012-12-29 | ISO, MDY
(1 row)
regress=> SET DateStyle = 'DMY';
SET
regress=> SELECT '20121229'::date, '2012-12-29'::date, current_setting('DateStyle');
date | date | current_setting
------------+------------+-----------------
2012-12-29 | 2012-12-29 | ISO, DMY
(1 row)
... so if GreenPlum behaved the same way, you should not need to do anything to get these YYYYMMDD dates to be read correctly from the input file.
Here's how it works with a PostgreSQL file_fdw SQL/MED foreign data wrapper:
CREATE EXTENSION file_fdw;
COPY (SELECT '20121229', '2012-12-29') TO '/tmp/dates.csv' CSV;
SET DateStyle = 'DMY';
CREATE SERVER csvtest FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE csvtest (
date1 date,
date2 date
) SERVER csvtest OPTIONS ( filename '/tmp/dates.csv', format 'csv' );
SELECT * FROM csvtest ;
date1 | date2
------------+------------
2012-12-29 | 2012-12-29
(1 row)
The CSV file contents are:
20121229,2012-12-29
so you can see that Pg will always accept ISO dates for CSV, irrespective of datestyle.
If GreenPlum doesn't, please file a bug. The idea of DateStyle changing the way a foreign table is read after creation is crazy.
Yes you can.
You do this by specifying the field in the external table to be of type text. Then, use a transformation in the insert statement. You can also use gpload and define the transformation. Both solutions are similar to the solution described above.
Here is a simple file with an integer and a date expressed as year month day, separated by a space:
date1.txt
1|2012 10 12
2|2012 11 13
Start gpfdist:
gpfdist -p 8010 -d ./ -l ./gpfdist.log &
Use psql to create the external table, the target table, and load the data:
psql test
test=# create external table ext.t2( i int, d text )
location ('gpfdist://walstl-mbp.local:8010/date1.txt')
format 'TEXT' ( delimiter '|' )
;
test=# select * from ext.t2; i | d
---+------------
1 | 2012 10 12
2 | 2012 11 13
(2 rows)
Now, create the table that the data will be loaded into:
test=# create table test.t2 ( i int, d date )
;
And,load the table:
test=# insert into test.t2 select i, to_date(d,'YYYY MM DD') from ext.t2 ;
test=# select * from test.t2;
i | d
---+------------
1 | 2012-10-12
2 | 2012-11-13
Related
How to create a table with a timestamp column that defaults to DATETIME('now')?
Like this:
CREATE TABLE test (
id INTEGER PRIMARY KEY AUTOINCREMENT,
t TIMESTAMP DEFAULT DATETIME('now')
);
This gives an error.
As of version 3.1.0 you can use CURRENT_TIMESTAMP with the DEFAULT clause:
If the default value of a column is CURRENT_TIME, CURRENT_DATE or CURRENT_TIMESTAMP, then the value used in the new row is a text representation of the current UTC date and/or time. For CURRENT_TIME, the format of the value is "HH:MM:SS". For CURRENT_DATE, "YYYY-MM-DD". The format for CURRENT_TIMESTAMP is "YYYY-MM-DD HH:MM:SS".
CREATE TABLE test (
id INTEGER PRIMARY KEY AUTOINCREMENT,
t TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
according to dr. hipp in a recent list post:
CREATE TABLE whatever(
....
timestamp DATE DEFAULT (datetime('now','localtime')),
...
);
It's just a syntax error, you need parentheses: (DATETIME('now'))
The documentation for the DEFAULT clause says:
If the default value of a column is an expression in parentheses, then the expression is evaluated once for each row inserted and the results used in the new row.
If you look at the syntax diagram you'll also notice the parentheses around 'expr'.
This is a full example based on the other answers and comments to the question. In the example the timestamp (created_at-column) is saved as unix epoch UTC timezone and converted to local timezone only when necessary.
Using unix epoch saves storage space - 4 bytes integer vs. 24 bytes string when stored as ISO8601 string, see datatypes. If 4 bytes is not enough that can be increased to 6 or 8 bytes.
Saving timestamp on UTC timezone makes it convenient to show a reasonable value on multiple timezones.
SQLite version is 3.8.6 that ships with Ubuntu LTS 14.04.
$ sqlite3 so.db
SQLite version 3.8.6 2014-08-15 11:46:33
Enter ".help" for usage hints.
sqlite> .headers on
create table if not exists example (
id integer primary key autoincrement
,data text not null unique
,created_at integer(4) not null default (strftime('%s','now'))
);
insert into example(data) values
('foo')
,('bar')
;
select
id
,data
,created_at as epoch
,datetime(created_at, 'unixepoch') as utc
,datetime(created_at, 'unixepoch', 'localtime') as localtime
from example
order by id
;
id|data|epoch |utc |localtime
1 |foo |1412097842|2014-09-30 17:24:02|2014-09-30 20:24:02
2 |bar |1412097842|2014-09-30 17:24:02|2014-09-30 20:24:02
Localtime is correct as I'm located at UTC+2 DST at the moment of the query.
It may be better to use REAL type, to save storage space.
Quote from 1.2 section of Datatypes In SQLite Version 3
SQLite does not have a storage class set aside for storing dates
and/or times. Instead, the built-in Date And Time Functions of SQLite
are capable of storing dates and times as TEXT, REAL, or INTEGER
values
CREATE TABLE test (
id INTEGER PRIMARY KEY AUTOINCREMENT,
t REAL DEFAULT (datetime('now', 'localtime'))
);
see column-constraint .
And insert a row without providing any value.
INSERT INTO "test" DEFAULT VALUES;
It is syntax error because you did not write parenthesis
if you write
Select datetime('now')
then it will give you utc time but if you this write it query then you must add parenthesis before this
so (datetime('now')) for UTC Time.
for local time same
Select datetime('now','localtime')
for query
(datetime('now','localtime'))
If you want millisecond precision, try this:
CREATE TABLE my_table (
timestamp DATETIME DEFAULT (strftime('%Y-%m-%dT%H:%M:%fZ', 'now'))
);
This will save the timestamp as text, though.
This alternative example stores the local time as Integer to save the 20 bytes. The work is done in the field default, Update-trigger, and View.
strftime must use '%s' (single-quotes) because "%s" (double-quotes) threw a 'Not Constant' error on me.
Create Table Demo (
idDemo Integer Not Null Primary Key AutoIncrement
,DemoValue Text Not Null Unique
,DatTimIns Integer(4) Not Null Default (strftime('%s', DateTime('Now', 'localtime'))) -- get Now/UTC, convert to local, convert to string/Unix Time, store as Integer(4)
,DatTimUpd Integer(4) Null
);
Create Trigger trgDemoUpd After Update On Demo Begin
Update Demo Set
DatTimUpd = strftime('%s', DateTime('Now', 'localtime')) -- same as DatTimIns
Where idDemo = new.idDemo;
End;
Create View If Not Exists vewDemo As Select -- convert Unix-Times to DateTimes so not every single query needs to do so
idDemo
,DemoValue
,DateTime(DatTimIns, 'unixepoch') As DatTimIns -- convert Integer(4) (treating it as Unix-Time)
,DateTime(DatTimUpd, 'unixepoch') As DatTimUpd -- to YYYY-MM-DD HH:MM:SS
From Demo;
Insert Into Demo (DemoValue) Values ('One'); -- activate the field Default
-- WAIT a few seconds --
Insert Into Demo (DemoValue) Values ('Two'); -- same thing but with
Insert Into Demo (DemoValue) Values ('Thr'); -- later time values
Update Demo Set DemoValue = DemoValue || ' Upd' Where idDemo = 1; -- activate the Update-trigger
Select * From Demo; -- display raw audit values
idDemo DemoValue DatTimIns DatTimUpd
------ --------- ---------- ----------
1 One Upd 1560024902 1560024944
2 Two 1560024944
3 Thr 1560024944
Select * From vewDemo; -- display automatic audit values
idDemo DemoValue DatTimIns DatTimUpd
------ --------- ------------------- -------------------
1 One Upd 2019-06-08 20:15:02 2019-06-08 20:15:44
2 Two 2019-06-08 20:15:44
3 Thr 2019-06-08 20:15:44
I ingest data into a table source_table with AVRO data. There is a column in this table say "avro_data" which will be populated with variant data.
I plan to copy data into a structured table target_table where columns have the same name and datatype as the avro_data fields in the source table.
Example:
select avro_data from source_table
{"C1":"V1", "C2", "V2"}
This will result in
select * from target_table
------------
| C1 | C2 |
------------
| V1 | V2 |
------------
My question is when schema of the avro_data evolves and new fields get added, how can I keep schema of the target_table in sync by adding equivalent columns in the target table?
Is there anything out of the box in snowflake to achieve this or if someone has created any code to do something similar?
Here's something to get you started. It shows how to take a variant column and parse out the internal columns. This uses a table in the Snowflake sample data database, which is not always the same. You can to adjust the table name and column name.
SELECT DISTINCT regexp_replace(regexp_replace(f.path,'\\\\[(.+)\\\\]'),'(\\\\w+)','\"\\\\1\"') AS path_name, -- This generates paths with levels enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
DECODE (substr(typeof(f.value),1,1),'A','ARRAY','B','BOOLEAN','I','FLOAT','D','FLOAT','STRING') AS attribute_type, -- This generates column datatypes of ARRAY, BOOLEAN, FLOAT, and STRING only
REGEXP_REPLACE(REGEXP_REPLACE(f.path, '\\\\[(.+)\\\\]'),'[^a-zA-Z0-9]','_') AS alias_name -- This generates column aliases based on the path
FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."JCUSTOMER",
LATERAL FLATTEN("CUSTOMER", RECURSIVE=>true) f
WHERE TYPEOF(f.value) != 'OBJECT'
AND NOT contains(f.path, '[');
This is a snippet of code modified from here: https://community.snowflake.com/s/article/Automating-Snowflake-Semi-Structured-JSON-Data-Handling. The blog author attributes credit to a colleague for this section of code.
While the current incarnation of the stored procedure will create a view from the internal columns in a variant, an alternate version could create and/or alter a table to keep it in sync with changes.
On SQL server 2012. I have normalized tables that will consist of "notes". Each "note" record can have many notelines tied to it with a foreign key. I'm looking for a SQL statement that will parse a block of text and, for each line within that text, insert a separate record.
I'm guessing some sort of "WHILE loop" for each block of text but can't get my head around how it would work.
To be clear: The end result of this would be to just paste the block of text into the query and execute so that I can get each individual line of it into the note without messing around creating multiple insert statements.
I agree with Zohar Peled, this is probably not be the right way to normalize these tables. If this is still something you need to do then:
In SQL Server 2016+ you can use string_split().
Prior to that version; using a CSV Splitter table valued function by Jeff Moden:
table setup:
create table dbo.note (
id int not null identity(1,1) primary key
, created datetime2(2) not null
/* other cols */
);
create table dbo.note_lines (
id int not null identity(1,1) primary key
, noteId int not null foreign key references dbo.note(id)
, noteLineNumber int not null
, noteLineText varchar(8000) not null
);
insert statements:
declare #noteId int;
insert into dbo.note values (sysutcdatetime());
set #noteId = scope_identity();
declare #note_txt varchar(8000) = 'On SQL server 2012. I have normalized tables that will consist of "notes". Each "note" record can have many notelines tied to it with a foreign key. I''m looking for a SQL statement that will parse a block of text and, for each line within that text, insert a separate record.
I''m guessing some sort of "WHILE loop" for each block of text but can''t get my head around how it would work.
To be clear: The end result of this would be to just paste the block of text into the query and execute so that I can get each individual line of it into the note without messing around creating multiple insert statements.'
insert into dbo.note_lines (noteId, noteLineNumber, noteLineText)
select #noteId, s.ItemNumber, s.Item
from [dbo].[delimitedsplit8K](#note_txt, char(10)) s
And after the insert:
select * from note;
select * from note_lines;
rextester demo: http://rextester.com/SCODAG90159
return (respectively):
+----+---------------------+
| id | created |
+----+---------------------+
| 1 | 2017-04-06 15:16:59 |
+----+---------------------+
+----+--------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | noteId | noteLineNumber | noteLineText |
+----+--------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | 1 | 1 | On SQL server 2012. I have normalized tables that will consist of "notes". Each "note" record can have many notelines tied to it with a foreign key. I'm looking for a SQL statement that will parse a block of text and, for each line within that text, insert a separate record. |
| 2 | 1 | 2 | I'm guessing some sort of "WHILE loop" for each block of text but can't get my head around how it would work. |
| 3 | 1 | 3 | To be clear: The end result of this would be to just paste the block of text into the query and execute so that I can get each individual line of it into the note without messing around creating multiple insert statements. |
+----+--------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
splitting strings reference:
Tally OH! An Improved SQL 8K “CSV Splitter” Function - Jeff Moden
Splitting Strings : A Follow-Up - Aaron Bertrand
Split strings the right way – or the next best way - Aaron Bertrand
string_split() in SQL Server 2016 : Follow-Up #1 - Aaron Bertrand
Note: the function used for the example is designed for a limit of 8000 characters, if you need more, then there are alternatives in the links above.
I have columns with numeric(5,2) data type.
When the actual data is 1.25, it displays correctly as 1.25.
When the actual data is 1.00, it displays as 1.
Can anyone tell me why? Is there something that I need to set to have it so the two decimal 0's display?
I think this may be an issue specific to pgadmin4. Consider:
> createdb test
> psql -d test
psql (9.4.9)
Type "help" for help.
test=# create table mytest(id serial not null primary key, name varchar(30),salary numeric(5,2));
CREATE TABLE
test=# select * from mytest;
id | name | salary
----+------+--------
(0 rows)
test=# insert into mytest(name,salary) values('fred',10.3);
INSERT 0 1
test=# insert into mytest(name,salary) values('mary',11);
INSERT 0 1
test=# select * from mytest;
id | name | salary
----+------+--------
1 | fred | 10.30
2 | mary | 11.00
(2 rows)
test=# select salary from mytest where name = 'mary';
salary
--------
11.00
(1 row)
This example is with version 9.4 as you can see, but would be a simple test to see if the problem is with 9.6 or pgadmin4. In pgadmin3 the value is displayed correctly with decimal places.
Last time I tried pgadmin4 it had a number of annoying issues that sent me scurrying back to pgadmin3 for the time being. However there is a list where you can seek confirmation of the bug: https://redmine.postgresql.org/projects/pgadmin4
This is a bug with pgAdmin4 and already reported https://redmine.postgresql.org/issues/2039
I'm trying to take a raw data set that adds columns for new data and convert it to a more traditional table structure. The idea is to have the script pull the column name (the date) and put that into a new column and then stack each dates data values on top of each other.
Example
Store 1/1/2013 2/1/2013
XYZ INC $1000 $2000
To
Store Date Value
XYZ INC 1/1/2013 $1000
XYZ INC 2/1/2013 $2000
thanks
There are a few different ways that you can get the result that you want.
You can use a SELECT with UNION ALL:
select store, '1/1/2013' date, [1/1/2013] value
from yourtable
union all
select store, '2/1/2013' date, [2/1/2013] value
from yourtable;
See SQL Fiddle with Demo.
You can use the UNPIVOT function:
select store, date, value
from yourtable
unpivot
(
value
for date in ([1/1/2013], [2/1/2013])
) un;
See SQL Fiddle with Demo.
Finally, depending on your version of SQL Server you can use CROSS APPLY:
select store, date, value
from yourtable
cross apply
(
values
('1/1/2013', [1/1/2013]),
('2/1/2013', [2/1/2013])
) c (date, value)
See SQL Fiddle with Demo. All versions will give a result of:
| STORE | DATE | VALUE |
|---------|----------|-------|
| XYZ INC | 1/1/2013 | 1000 |
| XYZ INC | 2/1/2013 | 2000 |
Depending on the details of the problem (i.e. source format, number and variability of dates, how often you need to perform the task, etc), it very well may be much easier to use some other language to parse the data and perform either a reformatting function or the direct insert into the final table.
The above said, if you're interested in a completely SQL solution, it sounds like you're looking for some dynamic pivot functionality. The keywords being dynamic SQL and unpivot. The details vary based on what RDBMS you're using and exactly what the specs are on the initial data set.
I would use a scripting language (Perl, Python, etc.) to generate an INSERT statement for each date column you have in the original data and transpose it into a row keyed by Store and Date. Then run the inserts into your normalized table.