Using SQL Server Export Wizard in batches - sql-server

I'm using SQL Server Export Wizard to migrate 2 million rows over to a Postgres database. After 10 hours, I got to 1.5 million records and it quit. Argh.
So I'm thinking the safest way to get this done is to do it in batches. 100k rows at a time. But how do you do that?
Conceptually, I'm thinking:
SELECT * FROM invoices WHERE RowNum BETWEEN 300001 AND 400000
But RowNum doesn't exist, right? Do I need to create a new column and somehow get a +1 incremental ID in there that I can use in a statement like this? There is no primary key and there are no columns with unique values.
Thanks!

The rows are invoices, so I created a new variable 'Quartile' that divides the invoice dollar values into quartiles using:
SELECT *,
NTILE(4) OVER(ORDER BY TOTAL_USD) AS QUARTILE
INTO invoices2
FROM invoices
This created four groups of 500k rows each. Then in the export wizard, I asked to:
SELECT * FROM invoices2 WHERE QUARTILE = 1 -- (or 2, 3, 4 etc)
And I'm going to send each group of 500k rows to its own Postgres table and then merge them over on pgAdmin. That way, if any one crashes, I can just do that smaller grouping over again without affecting the integrity of the others. Does that make sense? Maybe would have been just as easy to create an incrementing primary key?
Update:
All four batches transferred successfully. Worth noting that the total transfer time was 4x faster when sending the 2M rows as four simultaneous batches of 500k--4 hours instead of 16! Combined them back into a single table using the following query in pgAdmin:
--Combine tables back into one, total row count matches original
SELECT * INTO invoices_all FROM (
SELECT * FROM quar1
UNION All
SELECT * FROM quar2
UNION All
SELECT * FROM quar3
UNION All
SELECT * FROM quar4
) as tmp
And checked sums of all variables that had to be converted from SQL Server "money" to Postgres "numeric"
--All numeric sums match those from original
SELECT SUM("TOTAL_BEFORE_TP_TAX")
,SUM("TP_SELLER")
,SUM("TOTAL_BEFORE_TAX")
,SUM("TAX_TOTAL")
,SUM("TOTAL")
,SUM("TOTAL_BEFORE_TP_TAX_USD")
,SUM("TP_SELLER_USD")
,SUM("TOTAL_BEFORE_TAX_USD")
,SUM("TAX_TOTAL_USD")
,SUM("TOTAL_USD")
FROM PUBLIC.invoices_all

Related

Save result of select statement into wide table SQL Server

I have read about the possibilty to create wide tables (30,000 columns) in SQL server (1)
But how do I actually save the result of a select statement (one that has 1024+ columns) into a wide table?
Because if I do:
Select *
Into wide_table
From (
**Select statement with 1024+ columns**
) b
I get: CREATE TABLE failed because column 'c157' in table 'wide_table' exceeds the maximum of 1024 columns.
And, will I be able to query that table and all it's columns in a regular manner?
Thank you for your help!
You are right you are allowed to created table with 30 000 columns, but you can SELECT or INSERT 'only' 4096 column in one clause:
So, in case of SELECT you will need to get the columns in parts or concatenate the results. All of this does not seem to be practical and easier and performance efficient.
If you are going to have so many columns, maybe it will be better to try to UNPIVOT the data and normalized it further.

SQL query runs into a timeout on a sparse dataset

For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.
I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.

sql select performance (Microsoft)

I have a very big table with many rows (50 million) and more than 500 columns. Its indexes are period and client. I need to keep for a period the client and another column (not an index). It takes too much time. So I'm trying to understand why:
If I do:
select count(*)
from table
where cd_periodo=201602
It takes less than 1 sec and returns the number 2 million.
If I select into a temp table the period it also takes no time (2 secs)
select cd_periodo
into #table
from table
where cd_periodo=201602
But if I select another column that it's not part of an index it takes more than 3 minutes.
select not_index_column
into #table
from table
where cd_periodo=201602
Why is this happening? I'm not doing any filter on the column.
When you select an indexed column, the reader doesn't have to process and go into the entire table and read the entire row. The index helps the reader to select the value without having to actually get the row.
When you select a nonindexed column, the opposite of what I said happens, and the reader have to read the whole table in order to get the value from this column.

How to SELECT LIMIT in ASE 12.5? LIMIT 10, 10 gives syntax error?

How can I LIMIT the result returned by a query in Adaptive Server IQ/12.5.0/0306?
The following gives me a generic error near LIMIT:
SELECT * FROM mytable LIMIT 10, 10;
Any idea why? This is my first time with this dbms
Sybase IQ uses row_count to limit the number of rows returned.
You will have to set the row count at the beginning of your statement and decide whether the statement should be TEMPORARY or not.
ROW_COUNT
SET options
LIMIT statement is not supported in Sybase IQ 12. I think there is not simple or clean solution, similarly like in old SQL server. But there are some approches that works for SQL server 2000 and should work also for Sybase IQ 12. I don't promise that queries below will work on copy&paste.
Subquery
SELECT TOP 10 *
FROM mytable
WHERE Id NOT IN (
SELECT TOP 10 Id FROM mytable ORDER BY OrderingColumn
)
ORDER BY OrderingColumn
Basically, it fetches 10 rows but also skips first 10 rows. To get this works, rows must be unique and ordering is important. Id cannot be more times in results. Otherwise you can filter out valid rows.
Asc-Desc
Another workaround depends on ordering. It uses ordering and fetches 10 rows for second page and you have to take care of last page (it does not work properly with simple formula page * rows per page).
SELECT *
FROM
(
SELECT TOP 10 *
FROM
(
SELECT TOP 20 * -- (page * rows per page)
FROM mytable
ORDER BY Id
) AS t1
ORDER BY Id DESC
) AS t2
ORDER BY Id ASC
I've found some info about non working subqueries in FROM statement in ASE 12. This approach maybe is not possible.
Basic iteration
In this scenario you can just iterate through rows. Let's assume id of tenth row is 15. Then it will select next 10 rows after tenth row. Bad things happen when you will order by another column than Id. It is not possible.
SELECT TOP 10 *
FROM mytable
WHERE Id > 15
ORDER BY Id
Here is article about another workarounds in SQL server 2000. Some should also works in similar ways in Sybase IQ 12.
http://www.codeproject.com/Articles/6936/Paging-of-Large-Resultsets-in-ASP-NET
All those things are workarounds. If you can try to migrate on newer version.

How to Select row range wise in SQL Server

I Have a very large database in SQL Server ( around 20 Million Rows ). I have to take backup of that database in .csv format. But .csv supports only 1 million rows in a single file. So I am not able to take backup of whole database. So I have to break that database in 20 Parts.
For that I have to select 1-1 Million then 1 million to 2 million and so on.
So for this I can take 1st 1 million by using select TOP query.But I am not able to retrieve 1 Million to 2 million and so on....
So Please help me to do so. What tricks I should use or what should I do to take backup in .csv files.
try the following
select tab.col1,tab.col2,..tab.coln
from (
select a.*.row_number() over(order by Id) as rn
from table as a) as tab
where tab.rn between limit1 and limit 2
order by id
Use the LIMIT clause:
http://dev.mysql.com/doc/refman/5.0/en/select.html
Ie:
/* rows 0-1 million */
SELECT * FROM tbl LIMIT 0,1000000;
/* rows 1-2 million */
SELECT * FROM tbl LIMIT 1000000,1000000;
/* etc (these might be off by 1) */

Resources