What is the difference between table and external table in Netezza? - netezza

What is the difference between table and external table in Netezza? Does it always reads datafile in the backend after loading data is it required to again copy data from external table to normal database table?

This is covered pretty well in lot of blogs and tech sites, like this one : http://tennysusantobi.blogspot.no/2012/08/netezza-external-tables.html
Basically external tables are just a definition residing in Netezza, allowing it to query data from (usually) local textfiles and not having to load them onto a database in netezza physically. Also used to export data easily (as covered in the link).

Tables:
Both definition and data resides in databases. More precisely data is stored physically in each data slice based on distribution key.
External Table:
Only table definition resides in database but not the actual data. Data resides in file itself.
It is mainly used to load/ unload the data. It can also be used to backup netezza tables or to transfer data from one netezza box to another netezza box.

Related

Flink difference between view vs temporary table vs table

What is the difference between view vs temporary table vs table and it's usecases. Trying to understand when to use which?
You can read more on this topic at https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/#temporary-vs-permanent-tables
Temporary tables are always stored in memory and only exist for the duration of the Flink session they are created within. These tables are not visible to other sessions. They are not bound to any catalog or database but can be created in the namespace of one. Temporary tables are not dropped if their corresponding database is removed.
Tables can be either virtual (VIEWS) or regular (TABLES). VIEWS can be created from an existing Table object, usually the result of a Table API or SQL query. TABLES describe external data, such as a file, database table, or message queue.

what is the difference between external tables and global temporary tables in oracle?

I have worked with external tables in oracle, It can be created on a file containing data (with many other conditions). Then, How global temporary tables are different from External tables ?
An external table gets its content from e.g. a CSV file. The database itself does not store any data. Their content is visible to all sessions (=connections) to the server (provided necessary access privileges exists). The data exists independently of the database and is only deleted (or changed) if the file is changed externally (as far as I know Oracle can not write to an external table, only read from it - but I haven't used them for ages, so maybe this changed in Oracle 18 or later)
The data for a temporary table is stored and managed inside the database, but each session keeps its own copy of the data in the table. The data is automatically removed by Oracle when the session is disconnected or if the transaction is ended (depending on the definition of the temporary table). Data in a temporary table never survives a restart of the database server.
Broadly an external table is a place holder definition which points to a file somewhere on the OS. These are generally used (not limited to) when you have an external interface sending you data in files. You could either load the data in a normal table using sqlldr OR you could use External tables to point to the file itself, you can simply query the table to read from the file. There are some limitations though like you can not update an external table.
GTT - global temporary tables are used when you want to keep some on the fly information in a table such that it is only visible in the current session. There are good articles on both these tables if you want to go more in detail.
One more thing a GTT table access would be faster as compared to an external table access.

Is there anyway to store files in a NoSQL db and store the link to the file(of NoSQL) in my PostgreSQL db?

I am working on an assignment which needs to store the files uploaded by the user. I thought of storing it in the filesystem and store the path in the DB. Usually the files will be within 5MB.
I have a doubt whether I can store the file in a NoSQL db and give the reference of the NoSQL db(the file) in my Postgre DB. Kinldy help. thanks in advance.
Are you looking for FOREIGN DATA WRAPPERS in postgresql?
Postgresql allows you to store and access data that resides outside postgresql's storage via ForeignDataWrappers. FDWs are like extensions to postgresql. You can piggy back on the postgresql's SQL standards and store the data in you wrapper. Such tables are called FOREIGN TABLES.
From the docs,
A foreign table can be used in queries just like a normal table, but a
foreign table has no storage in the PostgreSQL server.
There are various FDWs available for NoSQLs. Refer link
More about FDWs

How/Where is table structure(not data) stored in SQL server?

I know that the data in SQL Server is stored in Data pages, But I don't know where the table structure is stored. I came across a statement about TRUNCATE as
"TRUNCATE removes the data by deallocating the data pages.TRUNCATE removes all rows from a table, but the table structure and columns remains"
This made me realize that, table structure, column information is stored outside pages(or Data pages in particular). SO, How/Where is table structure(not data) is stored in SQL server ?
Thank You.
You can access SQL server metadata on INFORMATION_SCHEMA. Following find the most useful views and its content:
INFORMATION_SCHEMA.TABLES: Contains information about the schemas, tables and views in the server.
INFORMATION_SCHEMA.COLUMNS: Full information about the table columns like data type, if it's nullable...
INFORMATION_SCHEMA.VIEWS: Containing information about the views and the code for creating them again.
INFORMATION_SCHEMA.KEY_COLUMN_USAGE: Information about foreign keys, unique keys, primary keys...
To use them, simply query them as they are data views: SELECT * FROM INFORMATION_SCHEMA.TABLES
For a full reference go to MSDN: https://msdn.microsoft.com/en-us/library/ms186778.aspx
There are system tables that store all of the metadata about the database. These tables are not directly queryable (except when using the DAC) but there are numerous views and functions built atop these tables. These are referred to as the Catalog Views.
So, for instance, there is the sys.columns view which describes each column in the database. It's a view built atop the syscolpars table, which is one of the system tables mentioned above that you cannot directly query.
There are also the INFORMATION_SCHEMA views which hespi mentions. These are meant to be a "standard" way of accessing metadata supported by all SQL database systems. Unfortunately, support for them is not 100%, and because they're meant to be cross-platform, they do not tend to reveal advanced features that are product specific.
A SQL Server Database consists of 2 Files (usually):
Master Data File (*.mdf)
Transaction Log File (*.ldf)
)The Master Data File contains: Schema and Data Information
)The Transaction Log Files contains Log Information for Actions in your DB
If you run select * from sys.database_files in your DB it will show you the filenames, location, size, etc..

Database Problems

I have a database schema in an Oracle database. I also have data dumps from third party vendors. I load their data using sql loader scripts on a Linux machine.
We also have batch updates everyday.
The data is assumed to be free from data errors. E.g. if on the first day a data viz 'A' is inserted into the db and the data 'A' would not occur in the further loading (assumption). If we get a data named 'A' then we get a primary key violation.
Question: To avoid these violations should we build an analyzer to analyze the data errors or are there better solutions.
I built an ETL system for a company that had daily feeds of flat files containing line of business transaction data. The data was supposed to follow a documented schema but in practice there were lots of different types of violations from day to day and file to file.
We built SQL staging tables containing all nullable columns with bigger than ought to be needed varchars and loaded up the flat file data into these staging tables using efficient bulk-loading utilities. Then we ran a series of data consistency checks within the context of the database to ensure that the raw (staged) data could be cross-loaded to the proper production tables.
Nothing got out of the staging table environment until all of the edits were passed.
The advantage of loading the flat files into staging tables is that you can take advantage of the RDBMS to perform set actions and to easily compare new values with existing values from previous files, all without having to build special flat file handling code.

Resources