In citusDB, how do I find out how my data is distributed? - distributed

In CitusDB, I can create an empty table with:
CREATE TABLE table1 (col1 text, col2 text);
I can tell table1 how to partition the data, which will later be loaded into the table, by running this:
SELECT create_distributed_table('table1', 'col1');
In this moment, I will then know how my table is distributed across CitusDB nodes.
However, if I come across a new table that I didn't create, but I know it is distributed, how do I know what column the table is distributed on?

You want to use the citusDB column_to_column function described in the citus db docs: http://docs.citusdata.com/en/v9.3/develop/api_udf.html
SELECT column_to_column_name(logicalrelid, partkey) AS dist_col_name
FROM pg_dist_partition
WHERE logicalrelid='<table>'::regclass;

Related

Update SQL Table Based On Composite Primary Key

I have an ETL process (CSV to SQL database) that runs daily, but the data in the source sometimes changes, so I want to have it run again the next day with an updated file.
How do I write a SQL statement to find all the differences?
For example, let's say Table_1 has a composite PRIMARY KEY consisting of FK_1, FK_2 and FK_3.
Do I do this in SQL or in the ETL process?
Thanks.
Edit
I realize now this question is too broad. Disregard.
You can use EXCEPT to find which are the IDs which are missing. For example:
SELECT FK_1, FK_2, FK_2
FROM new_data_table
EXCEPT
SELECT FK_1, FK_2, FK_2
FROM current_data_table;
It will be better (in performance prospective) to materialized these IDs and then to join this new table to the new_data_table in order to insert all of the columns.
If you need to do this in one query, you can use simple LEFT JOIN. For example:
INSERT INTO current_data_table
SELECT A.*
FROM new_data_table A
LEFT JOIN current_data_table B
ON A.FK_1 = B.FK_1
AND A.FK_2 = B.FK_2
AND A.FK_3 = B.FK_3
WHRE B.[FK_1] IS NULL;
The idea is to get all records in the new_data_table for which, there is no match in the current_data_table table (WHRE B.[FK_1] IS NULL).

Replicate some columns of table

I have a simple model in postgresql database and I want to replicate rows of this table in another database in another machine.
The replication should be for some columns of this table and not for all columns.
What is the solution?
If by replication you mean import, look into foreign data wrappers:
http://wiki.postgresql.org/wiki/Foreign_data_wrappers
One of them ought to do the trick.
If you truly mean replication, then… if the other DB is not using Postgres, you could imagine using the above and triggers to keep the changes in sync, assuming of course that Postgres remains the master. If it is using Postgres, there are plenty of additional options to choose from:
http://www.postgresql.org/docs/current/static/high-availability.html
Take a look at the db_link module. You could try something like:
create extension dblink;
then once you have that installed:
select dblink_connect('myconn', 'hostaddr=127.0.0.1 port=5432 dbname=gis user=ubuntu password=ubuntu');
create table some_columns_table as select * from dblink('myconn','select col1, col2 from all_columns_table') AS t(col1 int, col2 text, col3 text);
select dblink_disconnect('myconn);

Merge query using two tables in SQL server 2012

I am very new to SQL and SQL server, would appreciate any help with the following problem.
I am trying to update a share price table with new prices.
The table has three columns: share code, date, price.
The share code + date = PK
As you can imagine, if you have thousands of share codes and 10 years' data for each, the table can get very big. So I have created a separate table called a share ID table, and use a share ID instead in the first table (I was reliably informed this would speed up the query, as searching by integer is faster than string).
So, to summarise, I have two tables as follows:
Table 1 = Share_code_ID (int), Date, Price
Table 2 = Share_code_ID (int), Share_name (string)
So let's say I want to update the table/s with today's price for share ZZZ. I need to:
Look for the Share_code_ID corresponding to 'ZZZ' in table 2
If it is found, update table 1 with the new price for that date, using the Share_code_ID I just found
If the Share_code_ID is not found, update both tables
Let's ignore for now how the Share_code_ID is generated for a new code, I'll worry about that later.
I'm trying to use a merge query loosely based on the following structure, but have no idea what I am doing:
MERGE INTO [Table 1]
USING (VALUES (1,23-May-2013,1000)) AS SOURCE (Share_code_ID,Date,Price)
{ SEEMS LIKE THERE SHOULD BE AN INNER JOIN HERE OR SOMETHING }
ON Table 2 = 'ZZZ'
WHEN MATCHED THEN UPDATE SET Table 1.Price = 1000
WHEN NOT MATCHED THEN INSERT { TO BOTH TABLES }
Any help would be appreciated.
http://msdn.microsoft.com/library/bb510625(v=sql.100).aspx
You use Table1 for target table and Table2 for source table
You want to do action, when given ID is not found in Table2 - in the source table
In the documentation, that you had read already, that corresponds to the clause
WHEN NOT MATCHED BY SOURCE ... THEN <merge_matched>
and the latter corresponds to
<merge_matched>::=
{ UPDATE SET <set_clause> | DELETE }
Ergo, you cannot insert into source-table there.
You could use triggers for auto-insertion, when you insert something in Table1, but that will not be able to insert proper Shared_Name - trigger just won't know it.
So you have two options i guess.
1) make T-SQL code block - look for Stored Procedures. I think there also is a construct to execute anonymous code block in MS SQ, like EXECUTE BLOCK command in Firebird SQL Server, but i don't know it for sure.
2) create updatable SQL VIEW, joining Table1 and Table2 to show last most current date, so that when you insert a row in this view the view's on-insert trigger would actually insert rows to both tables. And when you would update the data in the view, the on-update trigger would modify the data.

Populating a table with fields from two other tables

I have two tables in Filemaker:
tableA (which includes fields idA (e.g. a123), date, price) and
tableB (which includes fields idB (e.g. b123), date, price).
How can I create a new table, tableC, with field id, populated with both idA and idB, (with the other fields being used for calculations on the combined data of both tables)?
The only way is to script it (for repeating uses) or do it 'manually', if this is an ad-hoc process. Details depend on the situation, so please clarify.
Update: Sorry, I actually forgot about the question. I assume the ID fields do not overlap even across tables and you do not need to add the same record more than once, but update it instead. In such a case the simplest script would be like that:
Set Variable[ $self, Get( FileName ) ]
Import Records[ $self, Table A -> Table C, sync on ID, update and add new ]
Import Records[ $self, Table B -> Table C, sync on ID, update and add new ]
The Import Records step is managed via rather elaborate dialog, but the idea is that you import from the same file (you can just type file:<YourFileName> there), the format is FileMaker Pro, and then set the field mapping. Make sure to choose the Update matching records and Add remaining records options and select the ID fields as key files to sync by.
It would be a FileMaker script. It could be run as a script trigger, but then it's not going to be seamless to the user. Your best bet is to create the tables, then just run the script as needed (manually) to build Table C. If you have FileMaker Server, you could schedule this script to be run periodically to keep Table C up-to-date.
Maybe you can use the select into statement.
I'm unsure if you wish to use calculated field from TableA and TableB or if your intension was to only calculate fields from the same table?
If tableA.IdA exists also in tableB.IdA, you could join the two tables and select into.
Else, you run the statement once for each table.
Select into statement
Select tableA.IdA, tableA.field1A, tableA.field2A, tableA.field1A * tableB.field2A
into New_Table from tableA
Edit: missed the part where you mentioned FileMaker.
But maybe you could script this on the db and just drop the table.

How to optimize oracle query for repeated full table sorts?

I have a database infrastructure where we are regularly (at least once a day) replicating the full content of tables from a source database to approximately 20 target databases. Due to the replication code in use (we have to use regular oracle queries, no control or direct access to source database) - this results in 20 full-table sorts of the source table.
Is there any way to optimize for this in the query? I'm looking for something that would basically tell oracle "I'm going to be repeatedly sorting this entire table"? MySQL had an option with myisamchk where you could tell it to sort a table and keep it in sorted order, but obviously that wouldn't apply here for multiple reasons.
Currently, there are also some intermediate tables involved (sync from A to B, then from B to C.) We do have control over the intermediate tables, so if there are tuning options there, that would be useful as well.
Generally, the queries are almost all of the very simplistic form:
select a, b, c, d, e, ... z from tbl1 order by a, b, c, d, e, ... z;
I'm aware of streams, but as described above, the primary source tables are outside of our control, so we won't be able to use streams there. (Additionally, those source tables are rebuilt completely from a snapshot daily, so streams wouldn't really work anyway.)
you could look into the multi-table INSERT feature. It should perform a single FULL SCAN and will insert into multiple tables. Consider (10gR2):
SQL> CREATE TABLE t1 (ID NUMBER);
Table created
SQL> CREATE TABLE t2 (ID NUMBER);
Table created
SQL> INSERT ALL
2 INTO t1 VALUES (d_id)
3 INTO t2 VALUES (d_id)
4 /* your select goes here */
5 SELECT ROWNUM d_id FROM dual d CONNECT BY LEVEL <= 5;
10 rows inserted
SQL> SELECT COUNT(*) FROM t1;
COUNT(*)
----------
5
SQL> SELECT COUNT(*) FROM t2;
COUNT(*)
----------
5
You will have to check if it works over database links.
Some things that would help the sorting issue is to have indexes on the columns that you are sorting on (and also joining the tables on, if they're not there already). You could also create materialized views which are already sorted, and Oracle would keep the sorted results cached.
You don't say exactly how the replication is done or the data volumes involved (or why you are sorting the data).
If the aim is to minimise the impact on the source database, your best bet may be to extract into an intermediate file and load the file into the destination databases. The sort could be done on the intermediate file (if plain text), or as part of either the export or import into the destination databases.
In source database :
create table export_emp_info
organization external
( type oracle_datapump
default directory DATA_PUMP_DIR
location ('emp.dmp')
) as select emp_id, emp_name, dept_id from emp order by dept_id
/
Copy file then, import in dest database:
create table import_emp_info
(EMP_ID NUMBER(12),
EMP_NAME VARCHAR2(100),
DEPT_ID NUMBER)
organization external
( type oracle_datapump
default directory DATA_PUMP_DIR
location ('emp.dmp')
)
/
insert into emp_info select * from import_emp_info;
If you don't want or can't have the external table on the source db, you can use a straight expdp of the emp table (possibly using NETWORK_LINK if you have limited access to the source database directory structure) and QUERY to do the ordering.
You could load data from source table A to an intermediate table B and then do a partition exchange between B and destination table C. Exact replication, no sorting involved.
This I/U/D form of replication is what the MERGE command is there for. It's very doubtful that an expensive sort-merge would be required, and I'd expect to see hash joins instead. As long as the hash table can be stored in memory the hash join is barely more expensive than scanning the tables.
A handy optimisation is to store a hash value based on the non-key attributes, so that you can join between source and target tables on the key column(s) and compare small hash values instead of the full set of columns - change detection made easy.

Resources