I have a large dataset, say 1,000,000,000 rows, that lives on a server. I need a user to be able to consume (i.e. "run queries upon") that data seamlessly, over the web, from within Access and/or Excel. Additionally, I need to filter the data on the server-side according to the user connected to it.
My current approach is to create a webservice that looks like an ODBC data source and connect to it from Excel.
Questions:
Is this the best way?
If so, what's the best way to create a custom ODBC data source?
I really thing that it is not the best way. I don't know your scenario, but I really would prefer another approach.
There is a discussion about that: Creating a custom ODBC driver
One of the suggestions was using BI approach.
Related
I've been scouring for information on how to add and query data within Excel VBA from an ACCDB. I've come across many answers: OpenDatabase() from my coworker, database connections and using an Access.Application object. What I couldn't figure out is, is there a benefit to using the Access object instead of creating a connection to a database with a string and such? I did read that using the Access Application object I didn't need to have the Access engine on the computer running the VBA, and I opted to do this for that reason. Plus, it looked a lot simpler than using a connection string and going that route. I've implemented the Access Object and it worked like a charm. So my question is, what's the benefit or disadvantage to doing the Access Object way vs. doing it another way? Thanks all!
Is the 10k incremental addition to the DB or your CSV input is increasing by 10K?
If it's the former then yes, storing it in a database is a good idea and I would use the DAO route. You notice that not many people are fans of firing up the Access application, mainly because you're not really using Ms Access features (it's more than a data store).
As an alternative, skip Excel and put your macro inside Access, since you have the app. There are a lot of goodies in Access that you can take advantage of.
However, if your CSV is always at full volume, you may just want to process the data yourself within Excel/VBA. I assume that the "other" table is a reference table.
For my new project I'm looking forward to use JSON data as a text file rather then fetching data from database. My concept is to save a JSON file on the server whenever admin creates a new entry in the database.
As there is no issue of security, will this approach will make user access to data faster or shall I go with the usual database queries.
JSON is typically used as a way to format the data for the purpose of transporting it somewhere. Databases are typically used for storing data.
What you've described may be perfectly sensible, but you really need to say a little bit more about your project before the community can comment on your approach.
What's the pattern of access? Is it always read-only for the user, editable only by site administrator for example?
You shouldn't worry about performance early on. Worry more about ease of development, maintenance and reliability, you can always optimise afterwards.
You may want to look at http://www.mongodb.org/. MongoDB is a document-centric store that uses JSON as its storage format.
JSON in combination with Jquery is a great fast web page smooth updating option but ultimately it still will come down to the same database query.
Just make sure your query is efficient. Use a stored proc.
JSON is just the way the data is sent from the server (Web controller in MVC or code behind in standind c#) to the client (JQuery or JavaScript)
Ultimately the database will be queried the same way.
You should stick with the classic method (database), because you'll face many problems with concurrency and with having too many files to handle.
I think you should go with usual database query.
If you use JSON file you'll have to sync JSON files with the DB (That's mean an extra work is need) and face I/O problems (if your site super busy).
Doing an application that will use Access as a back end and will rely on importing Excel sheets. Lots of reporting as well.
The app will only be used by one or two people.
Would you build this in Access forms? I also know Winforms and C#.
What criteria would you use for your decision making? Why would you choose one approach over another? What more information would you need to make a decision?
When considering an Access solution, the number of people using the application is an issue about data storage and retrieval. The front-end pieces should be segregated into a separate db file, and each user should have their own copy of that front-end db file; the back-end db file contains only data. You should view that type of split as an absolute requirement if there is any possibility that more than one user will ever use the application at the same time.
If a split Access application is unacceptable, use something other than Access for the front-end part.
As far as the suitability of Access for the front-end, you haven't described any features which could not be quickly and easily implemented with Access features. Notice you didn't provide any details about the purpose of the forms. For all we know, they may be used only to drive imports from Excel and select among reports, perhaps with form fields to select criteria for those reports. If that's all you need, it is an almost trivial task in Access.
Would it be possible to write a generic service to expose the contents of a number of SQLite databases, without knowing the structure of the databases at design-time?
I've been reading this series of blog posts about custom data service providers; would this seem like a valid starting point?
If this is possible, would it be possible for us to be able to display the contents of a particular table in the SQLite database in a Silverlight client within a grid?
The purpose of this project is to allow our users to navigate the contents of the SQLite databases, in the same way as using a native query tool.
Yes, you can write a custom data service provider to do this. When wcf data services asks for metadata, you can look at the table schema and return whatever the structure of the table is. In fact, you can change the metadata if there are changes made to the underlying table also across requests.
Here's the link that should help you get started: http://blogs.msdn.com/b/alexj/archive/2010/01/07/data-service-providers-getting-started.aspx
Hope this helps.
Thanks
Pratik
Hopefully someone has been down this road before and can offer some sound advice as far as which direction I should take. I am currently involved in a project in which we will be utilizing a custom database to store data extracted from excel files based on pre-established templates (to maintain consistency). We currently have a process (written in C#.Net 2008) that can extract the necessary data from the spreadsheets and import it into our custom database. What I am primarily interested in is figuring out the best method for integrating that process with our portal. What I would like to do is let SharePoint keep track of the metadata about the spreadsheet itself and let the custom database keep track of the data contained within the spreadsheet. So, one thing I need is a way to link spreadsheets from SharePoint to the custom database and vice versa. As these spreadsheets will be updated periodically, I need tried and true way of ensuring that the data remains synchronized between SharePoint and the custom database. I am also interested in finding out how to use the data from the custom database to create reports within the SharePoint portal. Any and all information will be greatly appreciated.
I have actually written a similar system in SharePoint for a large Financial institution as well.
The way we approached it was to have an event receiver on the Document library. Whenever a file was uploaded or updated the event receiver was triggered and we parsed through the data using Aspose.Cells.
The key to matching data in the excel sheet with the data in the database was a small header in a hidden sheet that contained information about the reporting period and data type. You could also use the SharePoint Item's unique ID as a key or the file's full path. It all depends a bit on how the system will be used and your exact requirements.
I think this might be awkward. The Business Data Catalog (BDC) functionality will enable you to tightly integrate with your database, but simultaneously trying to remain perpetually in sync with a separate spreadsheet might be tricky. I guess you could do it by catching the update events for the document library that handles the spreadsheets themselves and subsequently pushing the right info into your database. If you're going to do that, though, it's not clear to me why you can't choose just one or the other:
Spreadsheets in a document library, or
BDC integration with your database
If you go with #1, then you still have the ability to search within the documents themselves and updating them is painless. If you go with #2, you don't have to worry about sync'ing with an actual sheet after the initial load, and you could (for example) create forms as needed to allow people to modify the data.
Also, depending on your use case, you might benefit from the MOSS server-side Excel services. I think the "right" decision here might require more information about how you and your team expect to interact with these sheets and this data after it's initially uploaded into your SharePoint world.
So... I'm going to assume that you are leveraging Excel because it is an easy way to define, build, and test the math required. Your spreadsheet has a set of input data elements, a bunch of math, and then there are some output elements. Have you considered using Excel Services? In this scenario you would avoid running a batch process to generate your output elements. Instead, you can call Excel services directly in SharePoint and run through your calculations. More information: available online.
You can also surface information in SharePoint directly from the spreadsheet. For example, if you have a graph in the spreadsheet, you can link to that graph and expose it. When the data changes, so does the graph.
There are also some High Performance Computing (HPC) Excel options coming out from Microsoft in the near future. If your spreadsheet is really, really big then the Excel Services route might not work. There is some information available online (search for HPC excel - I can't post the link).