Best method for data retrieval for charting - sql-server

I have an application developed in C#/ASP.NET and using SQL Server. I am now in the position where I can start bolting on the charting tools for my application. The application displays charts showing the customers data, which is 1 - 30 minute time series and could be over 12 months, so a lot of data (sometimes). My question is around, what is the most efficient and secure method for retrieving and preparing the data for the chart. That is to say, I want the data ready to go when the customer logs in and I want there to be as little lag as possible when the customer logs in to view their charts. It is a simple chart with time series data, but there can be a lot of it.
I want to make sure the performance is great and if i have several hundred users all getting data on the fly, I am going to get lag on the DB.
I had in mind, either creating an XML file for the chart data using SQL server and dropping that into a customer folder and updating this on regular basis. Then the chart uses the XML file I can bolt that straight into the chart.
The other was direct access to the data using a query or stored procedure.
What would you guys use to solve a similar problem.

You don't specify which version of SQL Server you're using but FOR XML was added in SQL Server 2012. https://msdn.microsoft.com/en-gb/library/ms178107.aspx
I'm using it to fetch complex structured data in a single round-trip, were fetching the same data using LINQ-to-SQL or EF would involve a lot of round-trips.
It may be difficult or inefficient to select your data directly into the structure required by your charting tools, but you could use XSLT to restructure the data before saving the file to disc.

Related

Suggested method to handle large dataset for SSRS and Excel

I have a situation where I have been asked to create some reports on a large dataset - around 350million records in SQL server 2012. The idea is that I create a report using different reporting tools to show the relative pros and cons of each. We have the options of using Qliksense, SSRS and Excel (I know this is not a reporting tool).
I've created a Qliksense model which handles the data volume reasonably well. The data takes a few seconds to process when we change dimensions or measures but it's manageable. However, I am unsure about how to handle this with SSRS and especially Excel. Clearly we need to summarise the data. This is fine as we always intended to show data summarised/aggregated by some dimension.
What I could do is create a summarised data set but the issue is that we ideally want to be able to see this data by different dimensions. Short of creating a table for STATE and another for PRODUCT and another for SALESMAN etc, the only other way I can think of is to build some kind of cube. Or perhaps one of the clever folk frequenting this site can suggest a better way? If the solution could handle some way to drill down, all the better.
I'd appreciate advice on how best to manage this large volume of data in such a way that I am able to report on it.

Database from Excel Sheets

I have a problem statement.
A client of mine has three years of data in very complicated Excel Sheets. They are currently doing data entry, reporting and every thing basically in excel and these excels are shared with each other when ever there is a need to share the data.
They want eventually to move to a web based solution.
The first step of the solution agreed is to make a routine and a database. The routine will import the data from excels into the database. The database itself will need to be designed in such a way that the same database can be used at the back for a web site.
The said routine will be used as a one time activity to import all the data for the last three years and also will be used on periodic basis so that they keep importing the data until the web front is ready or agreed upon.
The defacto method for me is to analyze the excel sheets, make a normalized database, make a service in java that will use poi, read the excel sheets, apply business logic and insert the data into the database. The problem here is that any time they change the excel or even the data format, the routine will need to be rewritten.
I am willing to go any way, maybe use an ETL tool.
Please recommend.

How to achieve High Performance Excel VSTO to SQL Server?

I'm working on an Excel 2010 VSTO solution (doing code-behind for an Excel workbook in Visual Studio 2010) and am required to interact with a centralised SQL Server 2008 R2 data source for both read and write operations.
The database will contain up to 15,000 rows in the primary table plus related rows. Ideally, the spreadsheets will be populated from the database, used asynchronously, and then uploaded to update the database. I'm concerned about performance around the volume of data.
The spreadsheet would be made available for download via a Web Portal.
I have considered two solutions so far:
A WCF Service acting as a data access layer, which the workbook queries in-line to populate its tables with the entire requisite data set. Changes would be posted to the WCF service once updates are triggered from within the workbook itself.
Pre-loading the workbook with its own data in hidden worksheets on download from the web portal. Changes would be saved by uploading the modified workbook via the web portal.
I'm not too fussed about optimisation until we have our core functionality working, but I don't want to close myself off to any good options in terms of architecture down the track.
I'd like to avoid a scenario where we have to selectively work with small subsets of data at a time to avoid slowdowns -> integrating that kind of behaviour into a spreadsheet sounds like needless complexity.
Perhaps somebody with some more experience in this area can recommend an approach that won't shoot us in the foot?
Having done similar:
Make sure that all of your data access code runs on a background thread. Excel functions and addins (mostly) operate on the UI thread, and you want your UI responsive. Marshalling is non-trivial (in Excel '03 it required some pinvoke, may have changed in '10), but possible.
Make your interop operations as chunky as possible. Each interop call has significant overhead, so if you are formatting programatically, format as many cells as possible at once (using ranges). This advice also applies to data changes- you only want diffs updating the UI, which means keeping a copy of your dataset in memory to detect diffs. If lots of UI updates come in at once, you may want to insert some artificial pauses (throttling) to let the UI thread show progress as you are updating.
You do want a middle-tier application to talk to SQL Server and the various excel instances. This allows you to do good things later on, like caching, load balancing, hot failover, etc. This can be a WCF service as you suggested, or a Windows Service.
If you spreadsheet file has to be downloaded from server you can use EPPlus on server to generate spread sheet, it will be much faster than VSTO, than you can use WCF from addin in excel app to upload the data. Reading data using range will take much less time than writing if you dont have any formula in your sheet. Also in WCF you can use batch to update 15000 rows it should take aprroximately 2 min-5min for enire operation

Dataset retrieving data from another dataset

I work with an application that it switching from filebased datastorage to database based. It has a very large amount of code that is written specifically towards the filebased system. To make the switch I am implementing functionality that will work as the old system, the plan is then making more optimal use of the database in new code.
One problem is that the filebased system often was reading single records, and read them repeatedly for reports. This have become alot of queries to the database, which is slow.
The idea I have been trying to flesh out is using two datasets. One dataset to retrieve an entire table, and another dataset to query against the first, thereby decreasing communication overhead with the database server.
I've tried to look at the DataSource property of TADODataSet but the dataset still seems to require a connection, and it asks the database directly if Connection is assigned.
The reason I would prefer to get the result in another dataset, rather than navigating the first one, is that there is already implemented a good amount of logic for emulating the old system. This logic is based on having a dataset containing only the results as queried with the old interface.
The functionality only have to support reading data, not writing it back.
How can I use one dataset to supply values for another dataset to select from?
I am using Delphi 2007 and MSSQL.
You can use a ClientDataSet/DataSetProvider pair to fetch data from an existing DataSet. You can use filters on the source dataset, filters on the ClientDataSet and provider events to trim the dataset only to the interesting records.
I've used this technique with success in a couple of migrating projects and to mitigate similar situation where a old SQL Server 7 database was queried thousands of times to retrieve individual records with painful performance costs. Querying it only one time and then fetching individual records to the client dataset was, at the time, not only an elegant solution but a great performance boost to that particular application: The most great example was an 8 hour process reduced to 15 minutes... poor users loved me that time.
A ClientDataSet is just a TDataSet you can seamlessly integrate into existing code and UI.

Custom Database integration with MOSS 2007

Hopefully someone has been down this road before and can offer some sound advice as far as which direction I should take. I am currently involved in a project in which we will be utilizing a custom database to store data extracted from excel files based on pre-established templates (to maintain consistency). We currently have a process (written in C#.Net 2008) that can extract the necessary data from the spreadsheets and import it into our custom database. What I am primarily interested in is figuring out the best method for integrating that process with our portal. What I would like to do is let SharePoint keep track of the metadata about the spreadsheet itself and let the custom database keep track of the data contained within the spreadsheet. So, one thing I need is a way to link spreadsheets from SharePoint to the custom database and vice versa. As these spreadsheets will be updated periodically, I need tried and true way of ensuring that the data remains synchronized between SharePoint and the custom database. I am also interested in finding out how to use the data from the custom database to create reports within the SharePoint portal. Any and all information will be greatly appreciated.
I have actually written a similar system in SharePoint for a large Financial institution as well.
The way we approached it was to have an event receiver on the Document library. Whenever a file was uploaded or updated the event receiver was triggered and we parsed through the data using Aspose.Cells.
The key to matching data in the excel sheet with the data in the database was a small header in a hidden sheet that contained information about the reporting period and data type. You could also use the SharePoint Item's unique ID as a key or the file's full path. It all depends a bit on how the system will be used and your exact requirements.
I think this might be awkward. The Business Data Catalog (BDC) functionality will enable you to tightly integrate with your database, but simultaneously trying to remain perpetually in sync with a separate spreadsheet might be tricky. I guess you could do it by catching the update events for the document library that handles the spreadsheets themselves and subsequently pushing the right info into your database. If you're going to do that, though, it's not clear to me why you can't choose just one or the other:
Spreadsheets in a document library, or
BDC integration with your database
If you go with #1, then you still have the ability to search within the documents themselves and updating them is painless. If you go with #2, you don't have to worry about sync'ing with an actual sheet after the initial load, and you could (for example) create forms as needed to allow people to modify the data.
Also, depending on your use case, you might benefit from the MOSS server-side Excel services. I think the "right" decision here might require more information about how you and your team expect to interact with these sheets and this data after it's initially uploaded into your SharePoint world.
So... I'm going to assume that you are leveraging Excel because it is an easy way to define, build, and test the math required. Your spreadsheet has a set of input data elements, a bunch of math, and then there are some output elements. Have you considered using Excel Services? In this scenario you would avoid running a batch process to generate your output elements. Instead, you can call Excel services directly in SharePoint and run through your calculations. More information: available online.
You can also surface information in SharePoint directly from the spreadsheet. For example, if you have a graph in the spreadsheet, you can link to that graph and expose it. When the data changes, so does the graph.
There are also some High Performance Computing (HPC) Excel options coming out from Microsoft in the near future. If your spreadsheet is really, really big then the Excel Services route might not work. There is some information available online (search for HPC excel - I can't post the link).

Resources