measure data usage of the c# application - sql-server

I have c# application that is communicating frequently with the MS SQL server that is on remote server. The application runs almost 24/7. I have noticed that in 1 month the data usage is 20 GB which i find too much for SQL queries.
How can i calculate how much would be data usage by reading from DB e.g. only Int32 column? I guess minimum each query would be 4 byte, but there is likely some overhead for establishing communication with the remote server? It is hard for me to imagine how could SQL queries spend around 800MB per day.

. How can i calculate how much would be data usage by reading from DB e.g. only Int32 column?
Enable enable .NET's SqlConnection statistics, and then examine the results with SqlConnection.RetrieveStatistics.

Related

Power BI dealing with 16gb CSV file

I have a 16GB CSV that I have imported into Power BI desktop. The workstation I am using is an Azure VM running Windows Server 2016 (64GB Memory). The import of the file takes a few seconds, however, when I try to filter the data set in query editor to a specific date range, it takes a fairly long time (it is still running and has been around 30 minutes so far). The source file (16GB CSV) is being read from a RAM disk that has been created on the VM.
What is the best approach/practice when working with data sets of this size? Would I get better performance importing the CSV in SQL server and then using direct query when filtering the data set to a date range? I would have thought it would run fairly quickly with my current setup as I have 64GB memory on available on that VM.
When the data size is significant, you also need appropriate computing power to process it. When you import these rows in Power BI, the Power BI itself needs this computing power. If you import the data in SQL Server (or in Analysis Services, or other), and you use Direct Query or Live Connection, you can delegate computations to the database engine. With Live Connection all your modeling is done on the database engine, while in Direct Query modeling is also done in Power BI and you can add computed columns and measures. So if you you Direct Query, you still must be careful what is computed where.
You ask for "the best", which is always a bit vague. You must decide for yourself depending on many other factors. Power BI is Analysis Services by itself (when you run Power BI Desktop you can see the Microsoft SQL Server Analysis Services child process running), so importing the data in Power BI should give you similar performance as if it was imported in SSAS. To improve the performance in this case, you need to tune your model. If you import the data in SQL Server, you need to tune the database (proper indexing and modeling).
So to reach a final decision you must test these solutions, consider pricing and hardware requirements and depending on that, decide what is the best for your case.
Recently, Microsoft made a demo with 1 trillion rows of data. You may want to take a look at it. I will also recommend to take a look at aggregations, which could help you improve the performance of your model.

SSIS Transferring Data to an Oracle DB is Extremely Slow

We are transferring data to an Oracle Database from two different sources and it's extremely slow.
Please see notes and images below. Any suggestions?
Notes:
We're using the Microsoft OLE DB Provider for Oracle.
One data source is SQL Server and includes about 5M records.
The second data source is Oracle and includes about 700M records.
When trying to transfer the SQL Server data, we broke it up into
five "Data Flow Tasks" in the "Control Flow". Each "Data Flow Task"
in turn use an "OLE DB Source" which internally uses a "SQL command"
that effectively selects 1M of the 5M records. When we run this
package it ran the first data flow task for about 3 hours and only
transferred about 50,000 records until we ended the process.
We had similar experience with the Oracle data as well.
For some reason saving to a Oracle Destination is extremely slow.
Interestingly, we once transfer the same 700M records from Oracle to
SQL Server (so the opposite direction) and it worked as expected in
about 4.5 to 5 hours.
Images:
On the Oracle side you can examine v$session to see where the time is being spent (if AWR is licensed on the Oracle instance you can use DBA_HIST_ACTIVE_SESS_HISTORY or v$active_session_history).
I work on Oracle performance problems every day (over 300 production Oracle instances), so I feel qualified to say that I can't give you a specific answer to your question, but I can point you in the right direction.
Typical process mistakes that make inserts slow:
not using array insert
connecting to the DB for each insert (sound strange? believe me
I've seen DataStage and other ETL tools set-up this way)
app server/client not on same local area network as the Oracle instance
indexes on table(s) being inserted into (especially problematic with
bit mapped indexes); requires index update and table update per
statement
redo log files too small on Oracle instance (driving up
redo log file switching)
log_buffer parameter on DB side too small
not enough db writers (see db_writer_processes initialization
parameter)
committing too often
Not an answer, just a bunch of observations and questions...
Any one of the components in the data pipeline could be the bottleneck.
You first need to observe the row counts when running interactively in SSIS and see if there is any obvious clogging going on - i.e. do you have a large rowcount right before your Data conversion transformation and a low one after? Or is it at the Oracle destination? Or is it just taking a long time to come out of SQL? A quick way to check the SQL side is to dump it to a local file instead - that mostly measures the SQL select performance without any blocking from Oracle.
When you run your source query in SQL Server, how long does it take to return all rows?
Your data conversion transformation can be performed in the source query. Every transformation requires set up of buffers, memory etc. and can slow down and block your dataflow. Avoid these and do it in the source query instead
Various buffers and config that exists in Oracle driver. Already addressed in detail by #RogerCornejo. For read performance out of Oracle, I have found altering FetchBufferSize made a huge difference, but you are doing writes here so that's not the case.
Lastly, where are the two database servers and the SSIS client tool situated network wise? If you are running this across three different servers then you have network throughput to consider.
If you use a linked server as suggested, note that SSIS doesn't do any processing at all so you take that whole piece out of the equation
And if you're just looking for the fastest way to transfer data, you might find that dumping to a file and bulk inserting is the fastest
Thank you all for your suggestions. For those who may run into a similar problem in the future, I'm posting what finally worked for me. The answer was ... switching the provider. The ODBC or Attunity providers were much faster, by a factor of almost 800X.
Remember that my goal was to move data from a SQL Server Database to an Oracle database. I originally used an OLE DB provider for both the source and destination. This provider works fine if you are moving data from SQL Server to SQL Server because it allows you to use the "Fast Load" option on the destination which in turn allows you to use batch processing.
However, the OLE DB provider doesn't allow the "Fast Load" option with an Oracle DB as the destination (couldn't get it to work and read elsewhere that it doesn't work). Because I couldn't use the "Fast Load" option I couldn't batch and instead was inserting records row by row which was extremely slow.
A colleague suggested trying ODBC and others suggested trying Microsoft's Attunity Connectors for Oracle. I didn't think the difference would be so great because in my experience ODBC had similar (and sometimes less) performance than OLE DB (hadn't tried Attunity). BUT... that was when moving data from and to a SQL Server database or staying in the Microsoft world.
When moving data from a SQL Server database to an Oracle database, there was a huge difference! Both ODBC and Attunity out performed OLE DB dramatically.
Here were my summarized performance test results inserting 5.4M records from a SQL Server database to an Oracle Database.
When doing all the work on one local computer.
OLE DB source and destination inserted 12 thousand records per minute which would have taken approx. 7 hours to complete.
ODBC source and destination inserted 9 Million records per minute which only took approx. 30 seconds to complete.
When moving data from one network/remote computer to another network/remote computer.
OLE DB source and destination inserted 115 records per minute which would have taken approx. 32 days to complete.
ODBC source and destination inserted 1 Million records per minute which only took approx. 5 minutes to complete.
Big difference!
Now why when working locally it only took 30 seconds and remotely it took 5 minutes is another issue for another day, but for now I have something workable (it should be slower on the network, but surprised it's that much slower).
Thanks again to everyone!
Extra notes:
My OLE DB results were similar with either Microsoft's or Oracle OLE DB providers for Oracle databases.
Attunity was a little faster than ODBC. I didn't get to test on remote servers or on larger data set, but locally it was a consitently about 2 to 3 seconds faster than ODBC. Those seconds could add up on a large data set so take note.

SQL Server 2014 standard edition slows the machine when Database size grows

I have a scenario where an application server saves 15k rows per second in SQL Server database. At first initial hours machine is still usable but whenever the database size increases ~20gig, it seems that machine is becoming unusable.
I saw some topics/forums/answers/blogs suggesting to limit the max memory usage of SQL Server. Any thoughts on this?
Btw, using SQL Bulkcopy to insert rows in the database.
I have two suggestions for you:
1 - Database settings:
When you create the database, try to use a large initial size, and consider to have a bigger autogrowth percentage/size.
You will want to minimize the times your filegroups need to grow.
2 - Server settings:
In your SQL Server settings I would recommend that you remove one logical processor from the SQL Server. The OS will use this processor when the SQL Server is busy with heavy loads on the other processors. In my experience, this usually gives a nice boost to the OS .

Offloading SQL Report Server processing to a dedicated server - is it worth it?

Not sure if this is a SO or a ServerFault question, so please feel free to move if it's not in the right place:
I have a client with a large database containing a table with around 30-35 million rows running on a SQL2008R2 server (the server is pretty high spec, 16 cores, 92 gig ram, RAID etc). There are other tables this table may join on, but it is the main driver of a several reports.
Their SSRS instance/database and query source database are both running on the same box/sql instance
They regularly run ad-hoc reports from this database (which have undergone extensive optimisation), many of which may end up touching a lot of the data in the table. After looking at the report server stats it appears that the data fetch doesn't actually take that long, but a lot of data is returned and report processing takes a fair while: it can take up to 20-30 minutes to process some of the larger reports, which can have tens of thousands of pages (the data fetch in these cases is less than 10 seconds).
(Note: I realise that there is never really a need to run 25,000 pages off but the client insists and won't listen to reason...something about Excel spreadsheets *FACEPALM!*)
At the moment they are concerned about a couple of performance issues that crop up sporadically and the culprit may be the ad-hoc reporting.
We are looking at offloading the report processing anyway, so thought that this would be an ideal opportunity - but before doing so I'm wondering how much relief this will give the SQL server.
If I move the SSRS app and database onto another SQL host and remotely query the data (network conditions should be ideal as this is datacentre based), will I see any performance gains?
This is mainly based on guesswork at this stage but I see the following being the factors that could affect performance:
I/O for moving a shedload of rows from the query source to RS temp DB
CPU load when the report server is crunching all the data
In moving to another host I see these factors being reduced for the SQL server. The new server will be solely responsible for report processing (and should also be high spec), so hopefully there will be no contention when processing reports.
Do I sound like I am on the right track in my assumptions? Is there anything else that I may have missed which could adversely affect performance or improve performance?
Thanks in advance
You should look at transactional replication to send data from the main server to a database on the reporting server. Querying the tables directly over the network will only slow things down even more.

Fast interaction with database

I'm developing a project which gets some data from a hardware every 100 milliseconds. I'm using Visual Studio 2010 and C#. The data size is about 50KB in each round. The customer wants to log all the data in the database for statistical purposes.
I prefer using SQL Server 2005+ since I'm familliar with it and the project should be done in about 15 days it's a small size project.
Is this a reasonable speed for such a data size to be inserted into db? Do you suggest any generic approaches to speed up the interactions? (using sql commands, EF, other technologies which could have a positive effect on speed).
If this is way too fast for SQL Server to handle, what do you suggest I should use which:
1-has a quick learning curve.
2-could accept queries for statistical data.
3- could satisfy my speed interaction needs.
I'm thinking about System.Data.SQLite If it's a no go on SQL Server. But I don't know about the learning curve and speed enhancements.
500kb per second is nothing. I work with Sql databases that does gigabytes per second, it all depends on the hardware and server configuration underneath, but lets say you were to run this on a standard office desktop, you will be fine. Even then I would say you can start thinking about new hardware if you look at 20Mb per second or more.
Second part of your question. Since you are using c#, I suggest you use SQL 2008 and then use a table valued parameter (TVP), and then buffer the data in the application, in a dataset or datatable until you have say 10K rows, and then you call the proc to do the insert, and all you do is pass it the datatable as a parameter. This will save hundreds or thousands of ad-hoc inserts.
Hope this is clear, if not, ask an I will try to explain further.
50kB every 100 millseconds is 500kB a second. These days networks run at gigabit speeds (many megabytes per second) and hard drives can cope with hundreds of MB per second. 500kB is a tiny amount of data, so I'd be most surprised if SQL server can't handle it.
If you have a slow network connection to the server or some other problem that means it struggles to keep up, then you can try various strategies to improve things. Ideas might be:
Buffer the data locally (and/or on the server) and write it into the database with a separate thread/process. If you're not continually logging 24 hours a day, then a slow server would catch up when you finish logging. Even if you are logging continuously, this would smooth out any bumps (e.g. if your server has periods of "busy time" where it is doing so much else that it struggles to keep up with the data from your logger)
Compress the data that is going to the server so there's less data to send/store. If the packets are similar you may find you can get huge compression ratios.
If you don't need everything in each packet, strip out anything "uninteresting" from the data before uploading it.
Possibly batching the data might help - by collecting several packets and sending them all at once you might be able to minimise transmission overheads.
Just store the data directly to disk and use the database just to index the data.
... So I'd advise writing a prototype and see how much data you can chuck at your database over your network before it struggles.

Resources