I'm working on an Excel 2010 VSTO solution (doing code-behind for an Excel workbook in Visual Studio 2010) and am required to interact with a centralised SQL Server 2008 R2 data source for both read and write operations.
The database will contain up to 15,000 rows in the primary table plus related rows. Ideally, the spreadsheets will be populated from the database, used asynchronously, and then uploaded to update the database. I'm concerned about performance around the volume of data.
The spreadsheet would be made available for download via a Web Portal.
I have considered two solutions so far:
A WCF Service acting as a data access layer, which the workbook queries in-line to populate its tables with the entire requisite data set. Changes would be posted to the WCF service once updates are triggered from within the workbook itself.
Pre-loading the workbook with its own data in hidden worksheets on download from the web portal. Changes would be saved by uploading the modified workbook via the web portal.
I'm not too fussed about optimisation until we have our core functionality working, but I don't want to close myself off to any good options in terms of architecture down the track.
I'd like to avoid a scenario where we have to selectively work with small subsets of data at a time to avoid slowdowns -> integrating that kind of behaviour into a spreadsheet sounds like needless complexity.
Perhaps somebody with some more experience in this area can recommend an approach that won't shoot us in the foot?
Having done similar:
Make sure that all of your data access code runs on a background thread. Excel functions and addins (mostly) operate on the UI thread, and you want your UI responsive. Marshalling is non-trivial (in Excel '03 it required some pinvoke, may have changed in '10), but possible.
Make your interop operations as chunky as possible. Each interop call has significant overhead, so if you are formatting programatically, format as many cells as possible at once (using ranges). This advice also applies to data changes- you only want diffs updating the UI, which means keeping a copy of your dataset in memory to detect diffs. If lots of UI updates come in at once, you may want to insert some artificial pauses (throttling) to let the UI thread show progress as you are updating.
You do want a middle-tier application to talk to SQL Server and the various excel instances. This allows you to do good things later on, like caching, load balancing, hot failover, etc. This can be a WCF service as you suggested, or a Windows Service.
If you spreadsheet file has to be downloaded from server you can use EPPlus on server to generate spread sheet, it will be much faster than VSTO, than you can use WCF from addin in excel app to upload the data. Reading data using range will take much less time than writing if you dont have any formula in your sheet. Also in WCF you can use batch to update 15000 rows it should take aprroximately 2 min-5min for enire operation
Related
I'm one of several analyst and data engineers working within a Snowflake database. We all often have to write ad-hoc small bits of code to check tables and views. These are often quite repetitive tasks (e.g. filtering data based on a certain reference, joining FACTS and DIM tables to add context).
I'd like to create a Worksheet that we can all periodically add useful bits of code to. Just to save us time making joins or be a good starting place for writing longer bits of code.
I've previously used SQL Server where I was able to save template files with useful bits of code. I could open these files directly within SQL Server so it was really easy to open, edit and run these files. What are some similar features in Snowflake?
Thanks
Yes, but you must use the new web UI "snowsight". In the new UI you can share worksheets with team members and give them edit access. You can also import your worksheets from the classic UI as well.
Keep in mind that team editing doesn't work like google docs, where you can see each other typing. So be mindful of that. However, every version of the worksheet is automatically saved and available from the upper right of the interface.
https://docs.snowflake.com/en/user-guide/ui-snowsight-gs.html
https://docs.snowflake.com/en/user-guide/ui-snowsight-worksheets-gs.html
Can I use Master Data Services to import data via Excel add-in mainly Measures (Numbers/Values)
Shortversion:
Looking for the best way to comfortably input data to an SQl-Server table with immediate feedback for the user.
Set-up:
We have a Datawarehouse (dwh) based on SQL Server 2012.
Everything is set up with the Tools from MS BI Suite (SSIS, SSAS, SSRS and so on)
The Departments access the BI-Cubes via Excel. They prefer to do everything in Excel if possible.
Most sources for the DWH are databases but one use-case has Excel-files as a source.
Use-Case with Excel files as a source
As-Is:
We have several Excel-files placed in a network folder.
Each Excel file is edited by a different user.
The files are ingested by an SSIS process looping through the files on a daily base.
The contents of the Excel-files is like this (fake data):
Header: Category | Product | Type | ... | Month | abc_costs | xyz_costs | abc_budget | xyz_budget | ...
Data: A Soup Beta 2017-06 16656 89233 4567 34333
Data Flow:
source.Excel -> 1.-> dwh.Stage -> 2.-> dwh.intermediateLayer -> 3.-> dwh.FactTable
Step 1 to 3 are SSIS ETL-Packages.
Step 3 looks-up the the Surrogate-Keys from the Dimensions and saves
them as Foreign-Keys in Fact-table based on the "Codes" provided by
the Excel (Code e.g. can be 'A' for Category).
Problems:
Step 1 "ingesting the Excel-files" is very error-prone.
Users can easily misstype the codes and numbers can be in the wrong
format.
Error messages regarding excel-sources are often missleading &
debugging Excel-sources in SSIS becomes a pain.
Sometimes Users leave Excel file open and a temporary Lock-File
blocks the whole ingestion process.
Requirements
I want to avoid the problems coming up when ingesting Excel-files.
It should be possible to validate data input and give a quick
feedback to the user
As BI-Developers we will try to avoid a solution that would involve
webdevelopment in the first place.
Excel-like input is preferred by the users.
Idea:
As Master Data Services comes with an Excel- addin that allows data manipulation
we thought that could be used for this data-input-scenario as well.
That would give us the oppurtunity to Test MDS at the same time.
But I'am not sure if this use-case fits to Master-Data-Services.
Doing a research I could not find any MDS example showing how measures are
entered via Excel-addin [samples are about modelling and and managing entities].
Can anybody clarify if this Use Case fits to MDS?
If it does not fit to MDS ? What can be a good choice that fits into
this BI-ecosystem? (preferrable Excel-based). [Lightswitch, Infopath, Powerapps or if no ther option Webdevelopment -> I am a bit confused about the options]
Keep in mind, an Entity in MDS does not represent a table in the database. This means when you load data in MDS, there are underlying tables populated with the data and metadata to keep track of changes, for example.
Using the Excel plugin to import data into MDS, and then expose the data to another system can work, considering the following:
Volume of data. The excel plugin handles large volumes in batches. So the process can become tedious.
Model setup. You need to configure the model properly with the Entities and Attributes well defined. The MDS architecture is 'pseudo data warehouse' where the entities can be considered 'facts' and the domain based attributes 'dimensions'. This is an oversimplification of the system but once you define a model you will understand what I mean.
A nice functionality is subscription views. Once you have the data in MDS, then you can expose it with subscription views which combines entities with domain based attributes in one view.
Considering your requirements:
I want to avoid the problems coming up when ingesting Excel-files.
This is possible, just keep in mind the Excel plugin has its own rules. So Excel effectively becomes the 'input form' of MDS, where data is input and committed. The user will need to have a connection set up to MDS using the credential manager etc.
It should be possible to validate data input and give a quick feedback
to the user
This can easily be handled with domain based attributes and business rules
As BI-Developers we will try to avoid a solution that
would involve webdevelopment in the first place. Excel-like input is
preferred by the users.
Keep in mind, the MDS plugin determines how the excel sheet looks and feels. No customization is possible. So your entity definitions need to be correct to facilitate a good user experience.
I have worked on a DWH project in which an MDS instance was used as a single source of truth for many dimensions. Most of the data have been rather read-only (lists of states, countries, currencies, etc.) and were maintained via the Excel plug-in. There was also some more volatile stuff which was imported via MDS import procedures.
In order to expose the MDS data to the warehouse, views were created that pointed directly to the MDS database. I have even written a SQL script that refreshed these views, depending on the MDS metadata and settings stored in the warehouse. Unfortunately, I don't have it around anymore, but it's all quite transparent there.
Everything was very much alive. Can't recall any problems with queries that involved these MDS views.
I have a problem statement.
A client of mine has three years of data in very complicated Excel Sheets. They are currently doing data entry, reporting and every thing basically in excel and these excels are shared with each other when ever there is a need to share the data.
They want eventually to move to a web based solution.
The first step of the solution agreed is to make a routine and a database. The routine will import the data from excels into the database. The database itself will need to be designed in such a way that the same database can be used at the back for a web site.
The said routine will be used as a one time activity to import all the data for the last three years and also will be used on periodic basis so that they keep importing the data until the web front is ready or agreed upon.
The defacto method for me is to analyze the excel sheets, make a normalized database, make a service in java that will use poi, read the excel sheets, apply business logic and insert the data into the database. The problem here is that any time they change the excel or even the data format, the routine will need to be rewritten.
I am willing to go any way, maybe use an ETL tool.
Please recommend.
Doing an application that will use Access as a back end and will rely on importing Excel sheets. Lots of reporting as well.
The app will only be used by one or two people.
Would you build this in Access forms? I also know Winforms and C#.
What criteria would you use for your decision making? Why would you choose one approach over another? What more information would you need to make a decision?
When considering an Access solution, the number of people using the application is an issue about data storage and retrieval. The front-end pieces should be segregated into a separate db file, and each user should have their own copy of that front-end db file; the back-end db file contains only data. You should view that type of split as an absolute requirement if there is any possibility that more than one user will ever use the application at the same time.
If a split Access application is unacceptable, use something other than Access for the front-end part.
As far as the suitability of Access for the front-end, you haven't described any features which could not be quickly and easily implemented with Access features. Notice you didn't provide any details about the purpose of the forms. For all we know, they may be used only to drive imports from Excel and select among reports, perhaps with form fields to select criteria for those reports. If that's all you need, it is an almost trivial task in Access.
Hopefully someone has been down this road before and can offer some sound advice as far as which direction I should take. I am currently involved in a project in which we will be utilizing a custom database to store data extracted from excel files based on pre-established templates (to maintain consistency). We currently have a process (written in C#.Net 2008) that can extract the necessary data from the spreadsheets and import it into our custom database. What I am primarily interested in is figuring out the best method for integrating that process with our portal. What I would like to do is let SharePoint keep track of the metadata about the spreadsheet itself and let the custom database keep track of the data contained within the spreadsheet. So, one thing I need is a way to link spreadsheets from SharePoint to the custom database and vice versa. As these spreadsheets will be updated periodically, I need tried and true way of ensuring that the data remains synchronized between SharePoint and the custom database. I am also interested in finding out how to use the data from the custom database to create reports within the SharePoint portal. Any and all information will be greatly appreciated.
I have actually written a similar system in SharePoint for a large Financial institution as well.
The way we approached it was to have an event receiver on the Document library. Whenever a file was uploaded or updated the event receiver was triggered and we parsed through the data using Aspose.Cells.
The key to matching data in the excel sheet with the data in the database was a small header in a hidden sheet that contained information about the reporting period and data type. You could also use the SharePoint Item's unique ID as a key or the file's full path. It all depends a bit on how the system will be used and your exact requirements.
I think this might be awkward. The Business Data Catalog (BDC) functionality will enable you to tightly integrate with your database, but simultaneously trying to remain perpetually in sync with a separate spreadsheet might be tricky. I guess you could do it by catching the update events for the document library that handles the spreadsheets themselves and subsequently pushing the right info into your database. If you're going to do that, though, it's not clear to me why you can't choose just one or the other:
Spreadsheets in a document library, or
BDC integration with your database
If you go with #1, then you still have the ability to search within the documents themselves and updating them is painless. If you go with #2, you don't have to worry about sync'ing with an actual sheet after the initial load, and you could (for example) create forms as needed to allow people to modify the data.
Also, depending on your use case, you might benefit from the MOSS server-side Excel services. I think the "right" decision here might require more information about how you and your team expect to interact with these sheets and this data after it's initially uploaded into your SharePoint world.
So... I'm going to assume that you are leveraging Excel because it is an easy way to define, build, and test the math required. Your spreadsheet has a set of input data elements, a bunch of math, and then there are some output elements. Have you considered using Excel Services? In this scenario you would avoid running a batch process to generate your output elements. Instead, you can call Excel services directly in SharePoint and run through your calculations. More information: available online.
You can also surface information in SharePoint directly from the spreadsheet. For example, if you have a graph in the spreadsheet, you can link to that graph and expose it. When the data changes, so does the graph.
There are also some High Performance Computing (HPC) Excel options coming out from Microsoft in the near future. If your spreadsheet is really, really big then the Excel Services route might not work. There is some information available online (search for HPC excel - I can't post the link).