Flink difference between view vs temporary table vs table - apache-flink

What is the difference between view vs temporary table vs table and it's usecases. Trying to understand when to use which?

You can read more on this topic at https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/#temporary-vs-permanent-tables
Temporary tables are always stored in memory and only exist for the duration of the Flink session they are created within. These tables are not visible to other sessions. They are not bound to any catalog or database but can be created in the namespace of one. Temporary tables are not dropped if their corresponding database is removed.
Tables can be either virtual (VIEWS) or regular (TABLES). VIEWS can be created from an existing Table object, usually the result of a Table API or SQL query. TABLES describe external data, such as a file, database table, or message queue.

Related

what is the difference between external tables and global temporary tables in oracle?

I have worked with external tables in oracle, It can be created on a file containing data (with many other conditions). Then, How global temporary tables are different from External tables ?
An external table gets its content from e.g. a CSV file. The database itself does not store any data. Their content is visible to all sessions (=connections) to the server (provided necessary access privileges exists). The data exists independently of the database and is only deleted (or changed) if the file is changed externally (as far as I know Oracle can not write to an external table, only read from it - but I haven't used them for ages, so maybe this changed in Oracle 18 or later)
The data for a temporary table is stored and managed inside the database, but each session keeps its own copy of the data in the table. The data is automatically removed by Oracle when the session is disconnected or if the transaction is ended (depending on the definition of the temporary table). Data in a temporary table never survives a restart of the database server.
Broadly an external table is a place holder definition which points to a file somewhere on the OS. These are generally used (not limited to) when you have an external interface sending you data in files. You could either load the data in a normal table using sqlldr OR you could use External tables to point to the file itself, you can simply query the table to read from the file. There are some limitations though like you can not update an external table.
GTT - global temporary tables are used when you want to keep some on the fly information in a table such that it is only visible in the current session. There are good articles on both these tables if you want to go more in detail.
One more thing a GTT table access would be faster as compared to an external table access.

How to create temp tables in SQL to be used in several ADF activities?

I need to create a global temp table in my SQL Server while executing an Azure Data Factory pipeline. This table will be used in several activities.
I already tried several approaches including one using the Stored Procedure activity targeting the sys.sp_executesql SP and the CREATE TABLE statement as the parameter. With this approach the table is actually created, but it's automaticaly dropped a second later, I don't understand why.
This is the script used to create the temp table:
CREATE TABLE ##tempGL
(
GLAccount NVARCHAR(15),
GLSubAccount NVARCHAR(15)
)
So, how can I create a SQL Server temp table from an Azure Data Factory Pipeline activity that persists until I dropped it?
I have been struggling with this myself. Apparently this is by design (see quote below from Microsoft employee) and it is not possible to achieve this using Azure Data Factory even though the documentation mentions that it is possible.
That is by design. We won’t keep connection between 2 activities.
If you use a real table instead of temporary table. Then you will get the expected result.
The suggestion is don’t used temporary table in ADF if the data need more than 1 activities to access.
https://github.com/MicrosoftDocs/azure-docs/issues/35449#issuecomment-517451867
The reason this happens is the session is dropped when a pipeline activity ends, which causes the temporary table to also be dropped.
Global temporary tables are automatically dropped when the session that created the table ends and all other tasks have stopped referencing them. The association between a task and a table is maintained only for the life of a single Transact-SQL statement. This means that a global temporary table is dropped at the completion of the last Transact-SQL statement that was actively referencing the table when the creating session ended.
https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql?view=sql-server-2017#temporary-tables
Hopefully Microsoft fixes this at some point and makes it possible to use temporary tables across activities with Azure Data Factory.
I have raised this as a suggestion for Azure here https://feedback.azure.com/forums/270578-data-factory/suggestions/38287108-persist-global-temporary-tables-between-activities
For anyone reading this that might want his feature please upvote that suggestion.

How/Where is table structure(not data) stored in SQL server?

I know that the data in SQL Server is stored in Data pages, But I don't know where the table structure is stored. I came across a statement about TRUNCATE as
"TRUNCATE removes the data by deallocating the data pages.TRUNCATE removes all rows from a table, but the table structure and columns remains"
This made me realize that, table structure, column information is stored outside pages(or Data pages in particular). SO, How/Where is table structure(not data) is stored in SQL server ?
Thank You.
You can access SQL server metadata on INFORMATION_SCHEMA. Following find the most useful views and its content:
INFORMATION_SCHEMA.TABLES: Contains information about the schemas, tables and views in the server.
INFORMATION_SCHEMA.COLUMNS: Full information about the table columns like data type, if it's nullable...
INFORMATION_SCHEMA.VIEWS: Containing information about the views and the code for creating them again.
INFORMATION_SCHEMA.KEY_COLUMN_USAGE: Information about foreign keys, unique keys, primary keys...
To use them, simply query them as they are data views: SELECT * FROM INFORMATION_SCHEMA.TABLES
For a full reference go to MSDN: https://msdn.microsoft.com/en-us/library/ms186778.aspx
There are system tables that store all of the metadata about the database. These tables are not directly queryable (except when using the DAC) but there are numerous views and functions built atop these tables. These are referred to as the Catalog Views.
So, for instance, there is the sys.columns view which describes each column in the database. It's a view built atop the syscolpars table, which is one of the system tables mentioned above that you cannot directly query.
There are also the INFORMATION_SCHEMA views which hespi mentions. These are meant to be a "standard" way of accessing metadata supported by all SQL database systems. Unfortunately, support for them is not 100%, and because they're meant to be cross-platform, they do not tend to reveal advanced features that are product specific.
A SQL Server Database consists of 2 Files (usually):
Master Data File (*.mdf)
Transaction Log File (*.ldf)
)The Master Data File contains: Schema and Data Information
)The Transaction Log Files contains Log Information for Actions in your DB
If you run select * from sys.database_files in your DB it will show you the filenames, location, size, etc..

What is the difference between table and external table in Netezza?

What is the difference between table and external table in Netezza? Does it always reads datafile in the backend after loading data is it required to again copy data from external table to normal database table?
This is covered pretty well in lot of blogs and tech sites, like this one : http://tennysusantobi.blogspot.no/2012/08/netezza-external-tables.html
Basically external tables are just a definition residing in Netezza, allowing it to query data from (usually) local textfiles and not having to load them onto a database in netezza physically. Also used to export data easily (as covered in the link).
Tables:
Both definition and data resides in databases. More precisely data is stored physically in each data slice based on distribution key.
External Table:
Only table definition resides in database but not the actual data. Data resides in file itself.
It is mainly used to load/ unload the data. It can also be used to backup netezza tables or to transfer data from one netezza box to another netezza box.

Migrate multiple Access DB (with same function but different data) to single SQL Server DB

Situation
I have 5 Access DB files, each one has 10 tables, 40 queries and 8 macros. All 5 Access DB files have same table name, table structure, same queries and same macros. The only different is the data contain in the table. If it matters, some tables on each database has rows between few hundreds to 100K+.
What I am trying to achieve
I am migrating these 5 Access DB files to single SQL Server (2008) database. Edit: After migrating, I do need to know which tables belong to which database since each original Access DB is associated with company's department so I need to keep track of this.
My Solutions or Options
Tables will be imported to SQL Server as tables. Queries will be imported as Stored Procedures. Macro will be imported as new Stored Procedures.
Import each Access DB's tables and queries to SQL Server DB and rename each tables and queries by giving them prefix to identify which tables belong to which database.
Same as #1, however, only import tables. As for the queries, only import one set of queries (40 queries) and modify them to dynamically select, insert, update or delete from the tables.
Import table A from 1st Access DB, table A from 2nd Access DB, table A from 3rd Access DB and so on, to one new table in SQL Server and give them unique identifier to identify which row of data belong to which database.
What do you think is the best approach? Please tell me if there is better way to do this than what I have listed. Thanks!
I would migrate them to MS SQL like so:
Import all tables from database 1 into corresponding tables from SQL Server, but add a new primary key with the name of the old one, rename the old pk and identifier for the database.
Update all foreign keys to the new pk field using the old pk and the identifier.
Repeat for databases 2-5
Either delete the identifier or keep it, depending if you need to know where the rows came from (same for old primary keys)
Only import queries/macros once, as they are the same.
When doing it this way, you keep the pk-fk relations and the queries intact and still know where the rows came from.
I would say number 3. You would get no duplication code and much easier maintenance.
One example of easier maintenance is performance tuning. You say the queries are the same in the 5 access DBs: say you detect one of the queries runs too slow and you decide that you need to create an index on an underlying table. In option #1 and #2 this would mean recreating the same index on 5 "twin" tables.
In access for each of these databases, you could assign each of the department field id (new field) with it's on identifier in a new table (each department has different value), and then add this value to each of the tables that is to be imported. Create a new table that has the department information in it, then create join table that connect these tables. Thus, each department is differentiated between each other.

Resources