I have a query that has around 160 million rows that would take me 4 hours to import into Power BI.
Ever since my company changed their server to Azure, I can never import this query successfully. It would start loading for 1 million rows-ish, after 1 minute or two this error always pops out.
I tried:
changing the command time out to 200 minutes, still errors out within loading for a minute or two, sometimes within 10 seconds
if I select top 1000 rows in my query, it will complete without error. But when I switch back to the original query, it always fails.
Attaching the error message. I have talked to the DE in my team and they don't seem to have a clue. Does anyone have any idea on how to fix this?
this is the error message
We had the same problem and went for a delta lake in combination with option 1.
You have to ask your self first why you are importing so much data. Always keep your model as small as possible. I can't imagine you are looking at every row. If this is needed you could you could use a combination of direct query for details and loading an aggregate for your reporting. But load aggregates instead of everything.
Maybe your can load less history, like the last two years.
You could look into loading incrementally, you could load per partition.
You can try to increase the DTU's for your server.
Related
I have a Power BI report pulling from SQL Server that needs to be set up for incremental refresh due to the large data pull. As the load is fairly complex (and PQuery editor is tedious and often breaks folding), I need to use a SQL query (aka a "native query" in PBI speak) while retaining query folding (so that incremental refresh works).
I've been using the nice...
Value.NativeQuery( Source, query, null, [EnableFolding = true])
... trick found here to get that working.
BUT it only seems to work if the native query finishes fairly quickly. When my WHERE clause only pulls data for this year, it's no problem. When I remove the date filter in the WHERE clause (so as to not conflict with the incremental refresh filter), or simply push the year farther back, it takes longer seemingly causing PBI to determine that:
"We cannot fold on top of this native query. Please modify the native query or remove the 'EnableFolding' option."
The error above comes up after a few minutes both in the PQuery editor or if I try to "bypass" it by quickly quickly clicking Close & Apply. And unfortunately, the underlying SQL is probably about as good as it gets due to our not-so-great data structures. I've tried tricking PBI's seeming time-out via an OPTION (FAST 1) in the script, but it just can't pull anything quick enough.
I'm now stuck. This seems like a silly barrier as all I need to do is get that first import to complete as obviously it can query fold for the shorter loads. How do I work past this?
In retrospect, it's silly that I didn't try this initially, but even though the Value.NativeQuery() M step doesn't allow a command time-out option, you can still put it in a preceding Sql.Database() step manually and it carries forward. I also removed some common table expressions from my query which also were breaking query folding (not sure why, but easy fix by saving my complex logic as a view in sql server itself and just joining to that). Takes a while to run now, but doesn't time-out!
I have a coding problem I can't solve.
The requirement is show progress of a sql query. I am using Spring Boot and Angularjs, so the idea is to show progress of sql to UI, not need to be real time, but preferred. Basically user click on button on UI, it will trigger API to retrieve data from db and then return the completed data to UI.
We have a computation extensive sql which will takes long time to finish. When the row we want to retrieve is about 10 million, it roughly takes about 15 minutes. We want to show the progress of the SQL so User have a idea of how long it will take. So the idea is to check how many rows has completed:
Say 1 million retrieved, then it should return 10% and so on. Next 1 million, then 20%.
I have no idea how to approach this. Need some suggestion.
Thanks in Advance.
Assuming this is one long select call, you can run the following query to get the stats on a long-running statement:
select * from v$session_longops where time_remaining > 0 and sid = ???
sid value should be replaced with the oracle session id of the session that is running the long query. You can determine that from looking in v$session. You will need to execute the above query in a separate thread (or whatever a concurrent unit of execution is in spring).
Thanks Everyone. I've tried to use OldProgrammer's solution, but failed to do so. Somehow I just can't retrieve Hibernate sesison ID, and all related post I found were either too old or just manually create new session which I don't want.
I did a work around to this problem, not a solution though.
So what I did is that for that complex SQL, first I did total count, then chunk it into 100 chunks. Then with each chunk finish(Pagination), I will update the completion percentage. Meanwhile, on the UI, I will have time-interval to check the progress of this SQL from a different API endpoint, currently I set it to every 2 seconds.
So the outcome is like this, for fairly small amount of rows, it will jump from 2% to 6% for every API call; for large amount, only after few API progress check, it will go from 1% to 2%.
This is not a clean solution, but it did its purpose, it will give User some idea of how long it will take to finish.
I have a small c# application which using Entitiy Framework 6 to parse text files into some database structure.
In general file content is parsed into 3 tables:
Table1 --(1-n)-- Table2 --(1-n)-- Table3
the application worked for months without any issues on Dev, Stage and Production environment.
Last week it stopped on stage and now I am trying to figure out why.
One file contains ~ 100 entries Table1, ~2000 Entries Table 2, ~2000 Entries Table 3
.SaveChanges() is called after each file.
I get the following timeout exception:
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The statement has been terminated.
AutoDetectChangesEnabled is set to false.
Because there is a 4th table were I execute one update statement after each file there were transactions arround the whole thing, so I removed the 4th table and transaction stuff but the problem persists.
To test if it's just an performance issue I set Database.CommandTimeout = 120 without any effect, it's still running into timeout after 2 minutes.
(Before the issue one file was stored in about 5 seconds which is absolutely ok)
If I look at the SQL Server using SQL Server Profiler I can see the following after .SaveChanges() is called:
SQL Server Profiler
Only the first few INSERT statements for Table3 are shown (always first 4-15 statements and all of them shortly after .SaveChanges())
After that: no new entries until the timeout occurs.
I have absolutely no idea what to check because there is no error or something like that in code.
If I look at SQL Server, there is absolutely no reason for it to delay the queries or something like that (CPU, memory and disk space are ok).
Would be glad for each comment on this, if you want more infos please let me know.
Best Regards
Fixed it by rebuilding fragmented indexes in Table1.
The following article was helpful to understand how to take care of fragmented indexes:
https://solutioncenter.apexsql.com/why-when-and-how-to-rebuild-and-reorganize-sql-server-indexes/
(If some mod is still thinking this is no valid answer, any explanation would be great)
I have a SQL connection to a table on my SQLServer, which I have imported with the following line:
master_table <- RxSqlServerData(etc...)
Then, my goal is to save/import this table using rxImport and save it to a .xdf file, which I have called readTest <- 'read_test.xdf
The table is quite large, so I have set this in my rxImport:
rxImport(master_table, outFile=readTest, rowsPerRead=100000,reportProgress=1)
However, it has been running for 10 minutes now, and NO progress of rows being read/imported is being printed on the screen. Did I do this correctly? I wanted to output similar "progress" that is printed when a ML algorithm is run like RxForest or similar?
Thanks.
It's possible that the connection to your SQL Server database is relatively slow, report progress will only show progress when a batch of rows is complete. If the rows are relatively large, you could see nothing returned on the terminal for quite some time.
For best performance with rxImport(), ensure that rowsPerRead is the largest possible size that your local machine memory can handle. This will make progress reports less frequent, but, it will give you a faster import time. The only case where this isn't true is when importing an XDF file.
I am using SQL Server 2008 R2. I am deleting ~5000 records from a table. While I was testing performance with the same data I have found that the deletion either takes 1 sec or 31 sec.
The test database is confidential so can not share it here.
I have already tried to separate the load and only delete 1000 record at a time but I still experience the deviation.
How should I continue my investigation? What could be the reason for the performance difference?
The query is simple, something like: delete from PART where INVOICE_ID = 64225
DELETE clause uses a lot of system and transaction log resources and thats why it can take a lot of time. If possible try to TRUNCATE the table instead.
When you run the DELETE for the first time perhaps the data wasn't in the buffer pool so had to be read off storage, the second time you run it will be in storage. Other factors will include if the checkpoint process is running your changed pages off to storage. Post your query plan ->
set statistics xml on
select * from sys.objects
set statistics xml off
Argh! After having a closer look at the execution plan I have noticed that one index was missing. After adding the index all executions are running fast. Note that after this I have removed the index to test if the problem comes back. The performance deviations re appeared.