Custom Database integration with MOSS 2007 - database

Hopefully someone has been down this road before and can offer some sound advice as far as which direction I should take. I am currently involved in a project in which we will be utilizing a custom database to store data extracted from excel files based on pre-established templates (to maintain consistency). We currently have a process (written in C#.Net 2008) that can extract the necessary data from the spreadsheets and import it into our custom database. What I am primarily interested in is figuring out the best method for integrating that process with our portal. What I would like to do is let SharePoint keep track of the metadata about the spreadsheet itself and let the custom database keep track of the data contained within the spreadsheet. So, one thing I need is a way to link spreadsheets from SharePoint to the custom database and vice versa. As these spreadsheets will be updated periodically, I need tried and true way of ensuring that the data remains synchronized between SharePoint and the custom database. I am also interested in finding out how to use the data from the custom database to create reports within the SharePoint portal. Any and all information will be greatly appreciated.

I have actually written a similar system in SharePoint for a large Financial institution as well.
The way we approached it was to have an event receiver on the Document library. Whenever a file was uploaded or updated the event receiver was triggered and we parsed through the data using Aspose.Cells.
The key to matching data in the excel sheet with the data in the database was a small header in a hidden sheet that contained information about the reporting period and data type. You could also use the SharePoint Item's unique ID as a key or the file's full path. It all depends a bit on how the system will be used and your exact requirements.

I think this might be awkward. The Business Data Catalog (BDC) functionality will enable you to tightly integrate with your database, but simultaneously trying to remain perpetually in sync with a separate spreadsheet might be tricky. I guess you could do it by catching the update events for the document library that handles the spreadsheets themselves and subsequently pushing the right info into your database. If you're going to do that, though, it's not clear to me why you can't choose just one or the other:
Spreadsheets in a document library, or
BDC integration with your database
If you go with #1, then you still have the ability to search within the documents themselves and updating them is painless. If you go with #2, you don't have to worry about sync'ing with an actual sheet after the initial load, and you could (for example) create forms as needed to allow people to modify the data.
Also, depending on your use case, you might benefit from the MOSS server-side Excel services. I think the "right" decision here might require more information about how you and your team expect to interact with these sheets and this data after it's initially uploaded into your SharePoint world.

So... I'm going to assume that you are leveraging Excel because it is an easy way to define, build, and test the math required. Your spreadsheet has a set of input data elements, a bunch of math, and then there are some output elements. Have you considered using Excel Services? In this scenario you would avoid running a batch process to generate your output elements. Instead, you can call Excel services directly in SharePoint and run through your calculations. More information: available online.
You can also surface information in SharePoint directly from the spreadsheet. For example, if you have a graph in the spreadsheet, you can link to that graph and expose it. When the data changes, so does the graph.
There are also some High Performance Computing (HPC) Excel options coming out from Microsoft in the near future. If your spreadsheet is really, really big then the Excel Services route might not work. There is some information available online (search for HPC excel - I can't post the link).

Related

Can I use Master Data Services to import data via Excel add-in ? Mainly Measures! (Numbers/Values)

Can I use Master Data Services to import data via Excel add-in mainly Measures (Numbers/Values)
Shortversion:
Looking for the best way to comfortably input data to an SQl-Server table with immediate feedback for the user.
Set-up:
We have a Datawarehouse (dwh) based on SQL Server 2012.
Everything is set up with the Tools from MS BI Suite (SSIS, SSAS, SSRS and so on)
The Departments access the BI-Cubes via Excel. They prefer to do everything in Excel if possible.
Most sources for the DWH are databases but one use-case has Excel-files as a source.
Use-Case with Excel files as a source
As-Is:
We have several Excel-files placed in a network folder.
Each Excel file is edited by a different user.
The files are ingested by an SSIS process looping through the files on a daily base.
The contents of the Excel-files is like this (fake data):
Header: Category | Product | Type | ... | Month | abc_costs | xyz_costs | abc_budget | xyz_budget | ...
Data: A Soup Beta 2017-06 16656 89233 4567 34333
Data Flow:
source.Excel -> 1.-> dwh.Stage -> 2.-> dwh.intermediateLayer -> 3.-> dwh.FactTable
Step 1 to 3 are SSIS ETL-Packages.
Step 3 looks-up the the Surrogate-Keys from the Dimensions and saves
them as Foreign-Keys in Fact-table based on the "Codes" provided by
the Excel (Code e.g. can be 'A' for Category).
Problems:
Step 1 "ingesting the Excel-files" is very error-prone.
Users can easily misstype the codes and numbers can be in the wrong
format.
Error messages regarding excel-sources are often missleading &
debugging Excel-sources in SSIS becomes a pain.
Sometimes Users leave Excel file open and a temporary Lock-File
blocks the whole ingestion process.
Requirements
I want to avoid the problems coming up when ingesting Excel-files.
It should be possible to validate data input and give a quick
feedback to the user
As BI-Developers we will try to avoid a solution that would involve
webdevelopment in the first place.
Excel-like input is preferred by the users.
Idea:
As Master Data Services comes with an Excel- addin that allows data manipulation
we thought that could be used for this data-input-scenario as well.
That would give us the oppurtunity to Test MDS at the same time.
But I'am not sure if this use-case fits to Master-Data-Services.
Doing a research I could not find any MDS example showing how measures are
entered via Excel-addin [samples are about modelling and and managing entities].
Can anybody clarify if this Use Case fits to MDS?
If it does not fit to MDS ? What can be a good choice that fits into
this BI-ecosystem? (preferrable Excel-based). [Lightswitch, Infopath, Powerapps or if no ther option Webdevelopment -> I am a bit confused about the options]
Keep in mind, an Entity in MDS does not represent a table in the database. This means when you load data in MDS, there are underlying tables populated with the data and metadata to keep track of changes, for example.
Using the Excel plugin to import data into MDS, and then expose the data to another system can work, considering the following:
Volume of data. The excel plugin handles large volumes in batches. So the process can become tedious.
Model setup. You need to configure the model properly with the Entities and Attributes well defined. The MDS architecture is 'pseudo data warehouse' where the entities can be considered 'facts' and the domain based attributes 'dimensions'. This is an oversimplification of the system but once you define a model you will understand what I mean.
A nice functionality is subscription views. Once you have the data in MDS, then you can expose it with subscription views which combines entities with domain based attributes in one view.
Considering your requirements:
I want to avoid the problems coming up when ingesting Excel-files.
This is possible, just keep in mind the Excel plugin has its own rules. So Excel effectively becomes the 'input form' of MDS, where data is input and committed. The user will need to have a connection set up to MDS using the credential manager etc.
It should be possible to validate data input and give a quick feedback
to the user
This can easily be handled with domain based attributes and business rules
As BI-Developers we will try to avoid a solution that
would involve webdevelopment in the first place. Excel-like input is
preferred by the users.
Keep in mind, the MDS plugin determines how the excel sheet looks and feels. No customization is possible. So your entity definitions need to be correct to facilitate a good user experience.
I have worked on a DWH project in which an MDS instance was used as a single source of truth for many dimensions. Most of the data have been rather read-only (lists of states, countries, currencies, etc.) and were maintained via the Excel plug-in. There was also some more volatile stuff which was imported via MDS import procedures.
In order to expose the MDS data to the warehouse, views were created that pointed directly to the MDS database. I have even written a SQL script that refreshed these views, depending on the MDS metadata and settings stored in the warehouse. Unfortunately, I don't have it around anymore, but it's all quite transparent there.
Everything was very much alive. Can't recall any problems with queries that involved these MDS views.

Database from Excel Sheets

I have a problem statement.
A client of mine has three years of data in very complicated Excel Sheets. They are currently doing data entry, reporting and every thing basically in excel and these excels are shared with each other when ever there is a need to share the data.
They want eventually to move to a web based solution.
The first step of the solution agreed is to make a routine and a database. The routine will import the data from excels into the database. The database itself will need to be designed in such a way that the same database can be used at the back for a web site.
The said routine will be used as a one time activity to import all the data for the last three years and also will be used on periodic basis so that they keep importing the data until the web front is ready or agreed upon.
The defacto method for me is to analyze the excel sheets, make a normalized database, make a service in java that will use poi, read the excel sheets, apply business logic and insert the data into the database. The problem here is that any time they change the excel or even the data format, the routine will need to be rewritten.
I am willing to go any way, maybe use an ETL tool.
Please recommend.

How to achieve High Performance Excel VSTO to SQL Server?

I'm working on an Excel 2010 VSTO solution (doing code-behind for an Excel workbook in Visual Studio 2010) and am required to interact with a centralised SQL Server 2008 R2 data source for both read and write operations.
The database will contain up to 15,000 rows in the primary table plus related rows. Ideally, the spreadsheets will be populated from the database, used asynchronously, and then uploaded to update the database. I'm concerned about performance around the volume of data.
The spreadsheet would be made available for download via a Web Portal.
I have considered two solutions so far:
A WCF Service acting as a data access layer, which the workbook queries in-line to populate its tables with the entire requisite data set. Changes would be posted to the WCF service once updates are triggered from within the workbook itself.
Pre-loading the workbook with its own data in hidden worksheets on download from the web portal. Changes would be saved by uploading the modified workbook via the web portal.
I'm not too fussed about optimisation until we have our core functionality working, but I don't want to close myself off to any good options in terms of architecture down the track.
I'd like to avoid a scenario where we have to selectively work with small subsets of data at a time to avoid slowdowns -> integrating that kind of behaviour into a spreadsheet sounds like needless complexity.
Perhaps somebody with some more experience in this area can recommend an approach that won't shoot us in the foot?
Having done similar:
Make sure that all of your data access code runs on a background thread. Excel functions and addins (mostly) operate on the UI thread, and you want your UI responsive. Marshalling is non-trivial (in Excel '03 it required some pinvoke, may have changed in '10), but possible.
Make your interop operations as chunky as possible. Each interop call has significant overhead, so if you are formatting programatically, format as many cells as possible at once (using ranges). This advice also applies to data changes- you only want diffs updating the UI, which means keeping a copy of your dataset in memory to detect diffs. If lots of UI updates come in at once, you may want to insert some artificial pauses (throttling) to let the UI thread show progress as you are updating.
You do want a middle-tier application to talk to SQL Server and the various excel instances. This allows you to do good things later on, like caching, load balancing, hot failover, etc. This can be a WCF service as you suggested, or a Windows Service.
If you spreadsheet file has to be downloaded from server you can use EPPlus on server to generate spread sheet, it will be much faster than VSTO, than you can use WCF from addin in excel app to upload the data. Reading data using range will take much less time than writing if you dont have any formula in your sheet. Also in WCF you can use batch to update 15000 rows it should take aprroximately 2 min-5min for enire operation

Platform for an Access 2007 DB that may import Excel and will require reports

Doing an application that will use Access as a back end and will rely on importing Excel sheets. Lots of reporting as well.
The app will only be used by one or two people.
Would you build this in Access forms? I also know Winforms and C#.
What criteria would you use for your decision making? Why would you choose one approach over another? What more information would you need to make a decision?
When considering an Access solution, the number of people using the application is an issue about data storage and retrieval. The front-end pieces should be segregated into a separate db file, and each user should have their own copy of that front-end db file; the back-end db file contains only data. You should view that type of split as an absolute requirement if there is any possibility that more than one user will ever use the application at the same time.
If a split Access application is unacceptable, use something other than Access for the front-end part.
As far as the suitability of Access for the front-end, you haven't described any features which could not be quickly and easily implemented with Access features. Notice you didn't provide any details about the purpose of the forms. For all we know, they may be used only to drive imports from Excel and select among reports, perhaps with form fields to select criteria for those reports. If that's all you need, it is an almost trivial task in Access.

Updating data from several different sources

I'm in the process of setting up a database with customer information. The database will handle customer data (customer id, address, phonenr etc.) as well as some basic information about which kind of advertisement a specific customer has been subjected to, and how they reacted to it.
The data will be maintained both from a central data-warehouse, but additional information about customers and the advertisement will also be updated from other sources. For example, if an external advertisement agency runs a campaign, I want them to be able to feed back data about OptOuts, e-mail bounces etc. I guess what I need is an API which can be easily handed out to any number of agencies.
My first thought was to set up a web service API for all external sources, but since we'll probably be talking large amounts of data (millions of records per batch) I'm not sure a web service is the best option.
So my question is, what's the best practice here? I need a solution simple enough for advertisement agencies (likely with moderately skilled IT-people) to make use of. Simplicity is of the essence – by which I mean “simplicity over performance” in this case. If the set up gets too complex, it won't work.
The system will very likely be based on Microsoft technology.
Any suggestions?
The process you're describing is commonly referred to as Data Integration using ETL processes. ETL stands for Extract-Transform-Load. The idea is to build up your central data warehouse by extracting information from a lot of different data-sources, transform it and then load it into your data warehouse.
A variety of (also graphical) tools exist to implement such a process. Since you said you'll probably running a Microsoft stack, I suggest having a look at Sql Server Integration Services (SSIS).
Regarding your suggestion to implement integration using a web-service, I don't think that's a good idea too. Similarily, I don't think shifting the burden of data integration to your customers is a good idea either. You should agree with your customers on some form of a data exchange format, it could be as simple as a CSV file, or XML, Excel sheets, Access databases, use whatever suits your needs.
Any modern ETL tool like SSIS is capable of working with those different data sources.

Resources