efficiently move data between schemas of the same database in postgresql - database

How can I move data in similar tables (same number of columns, data types. If they are not same, it can be achieved with a view I hope) most efficiently between schemas of the same postgresql database?
EDIT
Sorry for the vagueness. I intend to use the additional schemas as archives for data not often needed (to improve performance). To be more precise data older than 2 years is to be archived. It is okay to take the server offline, but by not more than a day, at most 2. It is an accounting software for a medium sized company. By liberal estimates the number of records in an year wont go near a million.

insert into target_schema.table_one (col1, col2, col3)
select col1, col2, col3
from source_schema.other_table
where <some condition to select the data to be moved>;
If you really want to "move" the data (i.e. delete the rows from the source table), you need to can use
If the table is the target of a foreign key you cannot use truncate in that case you need to use
delete from source_schema.other_table
where <some condition to select the data to be moved>;
You can combine both steps into a single statement, if you want to:
with deleted_data as (
delete from source_schema.other_table
where <some condition to select the data to be moved>;
returning *
)
insert into target_schema.table_one (col1, col2, col3)
select col1, col2, col3
from deleted_data;

Related

Indexing a single-use temporary table

A colleague works in a business which uses Microsoft SQL Server. Their team creates stored procedures that are executed daily to create data extracts. The underlying tables are huge (some have billions of rows), so most stored procedures are designed such that first they extract only the relevant rows of these huge tables into temporary tables, and then the temp tables are joined with each other and with other smaller tables to create a final extract. Something similar to this:
SELECT COL1, COL2, COL3
INTO #TABLE1
FROM HUGETABLE1
WHERE COL4 IN ('foo', 'bar');
SELECT COL1, COL102, COL103
INTO #TABLE2
FROM HUGETABLE2
WHERE COL14 = 'blah';
SELECT COL1, COL103, COL306
FROM #TABLE1 AS T1
JOIN #TABLE2 AS T2
ON T1.COL1 = T2.COL1
LEFT JOIN SMALLTABLE AS ST
ON T1.COL3 = ST.COL3
ORDER BY T1.COL1;
Generally, the temporary tables are not modified after their creation (so no subsequent ALTER, UPDATE or INSERT operations). For the purpose of this discussion, let's assume the temporary tables are only used once later on (so only one SELECT query would rely on them).
Here is the question: is it a good idea to index these temporary tables after they are created and before they are used in the subsequent query?
My colleague believes that creating an index will make the join and the sort operations faster. I believe, however, that the total time will be larger, because index creation takes time. In other words, I assume that except for edge cases (like a temporary table which itself is extremely large, or the final SELECT query is very complex), SQL Server will use the statistics it has on the temporary tables to optimize the final query, and in doing so it will effectively index the temp tables as it sees fit.
In other words, I am used to think that creating an index is only useful if you know that table is used often; a single-use temporary table that is dropped once the stored procedure is complete is not worth indexing.
Neither of us knows enough about SQL Server optimizer to know in what ways we are right or wrong. Can you please help us better understand which of our assumptions are closer to truth?
Your friend is probably correct, because even if a table's going to be used in a single query, without seeing the query (and even if we do, we still don't have a great idea of what it's execution plan looks like) we have no idea how many times SQL Server will need to find data within various columns of each of those tables for joins, sorts, etc.
However, we'll never know for sure until it's actually done both ways and the results measured and compared.
If you are doing daily data extracts with billions of rows, I would recommend you use a staging tables instead of a temporary table. This will isolate your extracts from other resources using tempdb.
Here is the question: is it a good idea to index these temporary tables after they are created and before they are used in the subsequent query?
Create the index after loading the data into temp table. This will eliminate fragmentation and statistics will be created.
the optimizer will use statistics to generate the optimal plan. So if you don't have a statistics, it could dramatically affect your query performance especially for large datasets.
Example below query the before and after comparison of index creation in temp table:
/* Create index after data load into temp table -- stats is created */
CREATE TABLE #temp ( [text] varchar(50), [num] int);
INSERT INTO #temp([text], [num]) VALUES ('aaa', 1), ('bbb', 2) , ('ccc',3);
CREATE UNIQUE CLUSTERED INDEX [IX_num] ON #temp (num);
DBCC SHOW_STATISTICS ('tempdb..#temp', 'IX_num');
/* Create index before data load into temp table -- stats is not created */
CREATE TABLE #temp_nostats ( [text] varchar(50), [num] int);
CREATE UNIQUE CLUSTERED INDEX [IX_num] ON #temp_nostats (num);
INSERT INTO #temp_nostats([text], [num]) VALUES ('aaa', 1), ('bbb', 2) , ('ccc',3);
DBCC SHOW_STATISTICS ('tempdb..#temp_nostats', 'IX_num');
You need to test if the index will help you or not. You need to balance how many index you can have because it can also impact your performance if you have too many index.

MS Access VBA/SQL: Import CSV and Compare to Existing Records

I need to import sales data from an external source into an Access database. The system that generates the sales reports allows me to export data within a specified date range, but that data may change due to updates, late reported data, etc. I want to loop through each line of the CSV and see if that row already exists. If it does, ignore it; if it doesn't add a new record to the sales table.
Unless I'm misunderstanding it, I don't believe I can use DoCmd.TransferText as the data structure does not match the table I'm importing it to - I am only looking at importing several of the columns in the file.
What is my best option (1) access the data within my file to loop through, and (2) to compare the contents of a given row against a given table to see if it already exists?
Consider directly querying the csv file with Access SQL, selecting needed columns and run either of the NOT IN / NOT EXISTS / LEFT JOIN ... NULL queries to avoid duplicates.
INSERT INTO [myTable] (Col1, Col2, Col3)
SELECT t.Col1, t.Col2, t.Col3
FROM [text;HDR=Yes;FMT=Delimited(,);Database=C:\Path\To\Folder].myFile.csv t
WHERE NOT EXISTS
(SELECT 1 FROM [myTable] m
WHERE t.Col1 = m.Col1); -- ADD COMPARISON FIELD(S) IN WHERE CLAUSE

what is the best practice create reports in SSRS with breaking down same data all possible ways?

I need to create SSRS report where user need to look at the data many different ways(grouping, comparing etc). Using Visual Studio 2010. SQL server 2012.
I am using one stored procedure for that, dumping data into cte and from there doing all grouping.
My question is:
Is any way I can dump data from one stored procedure into #TempTable, then from that #TempTable into multiple views (grouping the way I need), and then in SSRS query that data from different views (use datasets as view).
Something like that:
CREATE PROCEDURE ProcName
(
#DateFrom datetime,
#DateTo datetime
)
AS
BEGIN
CREATE #TempTable
INSERT INTO #TempTable
SELECT Col1,
Col2,
Col3,
col4...
FROM Table1 INNER JOIN Table2
CREATE VIEW MyView1
AS
SELECT Col1,
Col2
GROUP BY Col1
FROM #TempTable
CREATE VIEW MyView2
AS
SELECT Col3,
Col4
GROUP BY Col3
FROM #TempTable
CREATE VIEW MyView3
AS
SELECT Col1,
Col4
GROUP BY Col2
FROM #TempTable
END -- End of SP
GO
And what would be the BEST way in terms of performance to create reports where user need to bread down the data all possible ways?
Right now I am trying to do every calculation in SSMS, then bring this data in SSRS. Performance is good, but sometimes I have up to 100 columns. It gets messy.
First of all, SSRS is not a very dynamic environment. It's difficult to predict every way a used will want to breakout the data. So my suggestion would be to offer a PowerPivot of the data in Excel as an alternative. That way, they can easily add and remove row/column groupings and see the subtotals. The slicers are also very handy.
More specifically to your question, I think in general the best practice would be to let the stored procedure just return all the raw data. SSRS is very good at grouping and sorting the data. That will put less strain on the server and require slightly more processing time. Of course, if you reach a point where there's just too much to process at run-time, you'll have to re-balance that workload or get more resources allocated.
Sometimes you have to get creative. For example, I have a report with almost 300 columns and over 100K rows. Normally it would take over 10 minutes to run or to export. So I turned on caching and scheduled a subscription to export it as a .csv each day. That way the data is available to open in Excel and the report is quick to re-run throughout the day.

MS SQL query performance - in vs table variable

I have some list of strings (the number of string varies from 10 to 100) and I need to select values from some large table (100K-5M records)
as efficiently as I can.
It seems to me that I have 3 options basically - to use 'in' clause, to use table variable or to use temp table.
something like this:
select col1, col2, col3, name from my_large_table
where index_field1 = 'xxx'
and index_field2 = 'yyy'
and name in ('name1', 'name2', 'name3', ... 'nameX')
or
declare #tbl table (name nvarchar(50))
insert #tbl(name) values ('name1', 'name2', 'name3', ... 'nameX')
select col1, col2, col3, name
from my_large_table inner join #tbl as tbl on (tbl.name = my_large_table.name)
where index_field1 = 'xxx'
and index_field2 = 'yyy'
The large table has clustered index on (index_field1, index_field2, name, index_field3).
Actually for each set of names I have 4-5 queries from the large table:
select, then update and/or insert and/or delete according to some logic - each time constraining the query on this set of names.
The name set and the queries are built dynamically in .net client, so there is no problems of readability, code simplicity or similar. The only goal is to reach the best performance, since this batch will be executed a lot of times.
So the question is - should I use 'in' clause, table variable or something else to write my condition?
As already mentioned you should avoid using table variables for pretty large data, as they do not allow indexes (more details here).
If I got it correctly, you have multiple queries using the same set of names, so I would suggest the following approach:
1) create a persistent table (BufferTable) to hold words list: PkId, SessionId, Word.
2) for each session using some set of words: bulk insert your words here (SessionId will be unique to each batch of queries). This should be very fast for tens-hundreds of words.
3) write your queries like the one below:
select col1, col2, col3, name
from my_large_table LT
join BufferTable B ON B.SessionId = #SessionId AND name = B.Word
where LT.index_field1 = 'xxx'
and LT.index_field2 = 'yyy'
4) An index on SessionId is name is required for best performance.
This way, you do not have to push the words for every query.
BufferTable is best emptied periodically, as deletes are expensive (truncate it when nobody is doing somehting on it is an option).

improving views performace in sql server 2012

I have one view vwBalance which returns more than 150.000.000 rows, bellow is the code:
SELECT *
FROM dbo.balance INNER JOIN
dbo.NomBalance ON dbo.balance.IdNomBil = dbo.NomBalance.Id
But I want to transpose the values return so I use a PIVOTE function like this:
SELECT An, cui,caen, col1, col2, ... col 100
FROM (SELECT cui, valoare, cod_campbil,caen,An
FROM vwBilant WITH (NOLOCK)
p PIVOT (MAX(valoare) FOR cod_campbil IN ( col1, col2, ... col100 ) AS pvt
The questions are:
Should i use query hint inside the view vwBalance? This hint could improve or could lock the transpose action?
It's a problem if I use NOLOCK hint instead of the other query hints?
There are better ways to improve transposing many columns?
Thanks!
I can give the following adviŅes:
you can use hint readpast if it does not broke your business logic
you can create clustered index for this view. It materialize you view, but performance of changing operation will be decreased for all tables that are used in this view.
also, you should check indexes for fields that you use in join and where clauses.
and you can use preprocessing. So, you insert this values in some
another table (for example at night). In this case you can use
columnstore index or just make page compression for this table
as well as you can use page compression for all tables that are
used in this view.

Resources