I Have a very large database in SQL Server ( around 20 Million Rows ). I have to take backup of that database in .csv format. But .csv supports only 1 million rows in a single file. So I am not able to take backup of whole database. So I have to break that database in 20 Parts.
For that I have to select 1-1 Million then 1 million to 2 million and so on.
So for this I can take 1st 1 million by using select TOP query.But I am not able to retrieve 1 Million to 2 million and so on....
So Please help me to do so. What tricks I should use or what should I do to take backup in .csv files.
try the following
select tab.col1,tab.col2,..tab.coln
from (
select a.*.row_number() over(order by Id) as rn
from table as a) as tab
where tab.rn between limit1 and limit 2
order by id
Use the LIMIT clause:
http://dev.mysql.com/doc/refman/5.0/en/select.html
Ie:
/* rows 0-1 million */
SELECT * FROM tbl LIMIT 0,1000000;
/* rows 1-2 million */
SELECT * FROM tbl LIMIT 1000000,1000000;
/* etc (these might be off by 1) */
Related
I'm using SQL Server Export Wizard to migrate 2 million rows over to a Postgres database. After 10 hours, I got to 1.5 million records and it quit. Argh.
So I'm thinking the safest way to get this done is to do it in batches. 100k rows at a time. But how do you do that?
Conceptually, I'm thinking:
SELECT * FROM invoices WHERE RowNum BETWEEN 300001 AND 400000
But RowNum doesn't exist, right? Do I need to create a new column and somehow get a +1 incremental ID in there that I can use in a statement like this? There is no primary key and there are no columns with unique values.
Thanks!
The rows are invoices, so I created a new variable 'Quartile' that divides the invoice dollar values into quartiles using:
SELECT *,
NTILE(4) OVER(ORDER BY TOTAL_USD) AS QUARTILE
INTO invoices2
FROM invoices
This created four groups of 500k rows each. Then in the export wizard, I asked to:
SELECT * FROM invoices2 WHERE QUARTILE = 1 -- (or 2, 3, 4 etc)
And I'm going to send each group of 500k rows to its own Postgres table and then merge them over on pgAdmin. That way, if any one crashes, I can just do that smaller grouping over again without affecting the integrity of the others. Does that make sense? Maybe would have been just as easy to create an incrementing primary key?
Update:
All four batches transferred successfully. Worth noting that the total transfer time was 4x faster when sending the 2M rows as four simultaneous batches of 500k--4 hours instead of 16! Combined them back into a single table using the following query in pgAdmin:
--Combine tables back into one, total row count matches original
SELECT * INTO invoices_all FROM (
SELECT * FROM quar1
UNION All
SELECT * FROM quar2
UNION All
SELECT * FROM quar3
UNION All
SELECT * FROM quar4
) as tmp
And checked sums of all variables that had to be converted from SQL Server "money" to Postgres "numeric"
--All numeric sums match those from original
SELECT SUM("TOTAL_BEFORE_TP_TAX")
,SUM("TP_SELLER")
,SUM("TOTAL_BEFORE_TAX")
,SUM("TAX_TOTAL")
,SUM("TOTAL")
,SUM("TOTAL_BEFORE_TP_TAX_USD")
,SUM("TP_SELLER_USD")
,SUM("TOTAL_BEFORE_TAX_USD")
,SUM("TAX_TOTAL_USD")
,SUM("TOTAL_USD")
FROM PUBLIC.invoices_all
I want to select if a user produced more than 1000 logs. Given these queries I let SQL Server Studio display estimated execution plan.
select count(*) from tbl_logs where id_user = 3
select 1 from tbl_logs where id_user = 3 having count(1) > 1000
I thought the second one should be better because it can return as soon as SQL Server found 1000 rows. Whereas the first one returns the actual count of rows.
Also when I profile the queries they are equal in terms of Reads, CPU and Duration.
What would be the most efficient query for my task?
This query should also improve the performance :
select 1 from tbl_logs order by 1 offset (1000) rows fetch next (1) rows only
You get a 1 when more than 1.000 rows exists, and an empty dataset when they doesn't.
It only fetches the first 1.001 rows, as Alexander's answer does, but after that it has the advantage that it doesn't need to re-count the rows already fetched.
If you want the result to be exactly 1 or 0, then you could read it like this:
with Row_1001 as (
select 1 as Row_1001 from tbl_logs order by 1 offset (1000) rows fetch next (1) rows only
)
select count(*) as More_Than_1000_Rows_Exist from Row_1001
I think, some performance improvement can be achieved this way:
select 1 from (
select top 1001 1 as val from tbl_logs where id_user = 3
) cnt
having count(*) > 1000
In this example, derived query will fetch only first 1001 rows (if they are exists) and outer query will perform a logical check on count.
However, it will not lead to reduction of reads if table tbl_logs is tiny, so index seek uses so small index that only few pages to be fetched
How can I LIMIT the result returned by a query in Adaptive Server IQ/12.5.0/0306?
The following gives me a generic error near LIMIT:
SELECT * FROM mytable LIMIT 10, 10;
Any idea why? This is my first time with this dbms
Sybase IQ uses row_count to limit the number of rows returned.
You will have to set the row count at the beginning of your statement and decide whether the statement should be TEMPORARY or not.
ROW_COUNT
SET options
LIMIT statement is not supported in Sybase IQ 12. I think there is not simple or clean solution, similarly like in old SQL server. But there are some approches that works for SQL server 2000 and should work also for Sybase IQ 12. I don't promise that queries below will work on copy&paste.
Subquery
SELECT TOP 10 *
FROM mytable
WHERE Id NOT IN (
SELECT TOP 10 Id FROM mytable ORDER BY OrderingColumn
)
ORDER BY OrderingColumn
Basically, it fetches 10 rows but also skips first 10 rows. To get this works, rows must be unique and ordering is important. Id cannot be more times in results. Otherwise you can filter out valid rows.
Asc-Desc
Another workaround depends on ordering. It uses ordering and fetches 10 rows for second page and you have to take care of last page (it does not work properly with simple formula page * rows per page).
SELECT *
FROM
(
SELECT TOP 10 *
FROM
(
SELECT TOP 20 * -- (page * rows per page)
FROM mytable
ORDER BY Id
) AS t1
ORDER BY Id DESC
) AS t2
ORDER BY Id ASC
I've found some info about non working subqueries in FROM statement in ASE 12. This approach maybe is not possible.
Basic iteration
In this scenario you can just iterate through rows. Let's assume id of tenth row is 15. Then it will select next 10 rows after tenth row. Bad things happen when you will order by another column than Id. It is not possible.
SELECT TOP 10 *
FROM mytable
WHERE Id > 15
ORDER BY Id
Here is article about another workarounds in SQL server 2000. Some should also works in similar ways in Sybase IQ 12.
http://www.codeproject.com/Articles/6936/Paging-of-Large-Resultsets-in-ASP-NET
All those things are workarounds. If you can try to migrate on newer version.
I selected the first 50,000 rows with TOP 50,000, now I want to get the next group of 50,000. I can't use ROW_NUMBER since the entries do not have an id. I use SQL Server 2012 with the help of SQL Server Management Studio.
How can I get the entries that come after the first 50,000?
With SQL Server 2012, you get the next OFFSET and FETCH NEXT keywords on the ORDER BY clause.
So you can select the first 50'000 rows with:
SELECT (list of columns)
FROM dbo.YourTable
WHERE (some condition)
ORDER BY (some column)
OFFSET 0 ROWS FETCH NEXT 50000 ROWS ONLY
and then the next 50'000 later with:
SELECT (list of columns)
FROM dbo.YourTable
WHERE (some condition)
ORDER BY (some column)
OFFSET 50000 ROWS FETCH NEXT 50000 ROWS ONLY
(just a side-note: a page size of 50'000 rows seems overly large - how about 1'000 or something?)
See Using the new OFFSET and FETCH NEXT options for more details and background info
If you have some way to assure a consistent order of the results I can't see why you can't use row_number with a common table expression like this:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY something_unique DESC) AS 'Row'
FROM my_table
)
-- To get the first 50k rows:
SELECT * FROM CTE WHERE Row BETWEEN 0 AND 50000
-- To get the next 50k rows:
--SELECT * FROM CTE WHERE Row BETWEEN 50001 AND 100000
It's not very effecient though as you construct the row number over the whole table even though you only need 50k. If possible you might want to consider adding an extra column to the source table to include some artifical numering.
As noted in the comments just using TOP X without ordering the results first will render inconsistent results as order isn't guaranteed if not explicitly specified.
I have a system that needs to suck up an entire MS SQL database. Currently it does so with something like:
select top 1000 from table where id > 0 order by id;
Then, for the next chunk:
select top 1000 from table where id > 1000 order by id;
And then:
select top 1000 from table where id > 2000 order by id;
And so forth.
In MySQL, I've learned that doing LIMIT and OFFSET queries is brutally slow because the database has to first sort the results, then scan over the OFFSET count. When that count gets big, life starts to suck as the read count skyrockets.
My question is this: does the same problem apply to TOP? Put another way, can I expect a really high read count when I run these queries on a database with, say 10,000,000 records, at the time when id > 9,999,000? If so, are there any ways to handle this better?
It will be very fast if ID is indexed. If that column is not indexed then it would case a full table scan.
I would suggest the following in addition:
select * from table where id > 0 and id <= 1000 order by id ;
This way if you don't have all records you don't have bugs (duplicates).