Creating an excel database

Creating an excel database - database

I am building an excel database, the size of the database will be approximately 1000000 rows of data per year.
I have ran into a problem. Data will be entered to the same database by approximately 12 different people, which causes a problem that the database file will be Read-only most of the day. What options do i have to allow all those people to enter data simultaneously?
Also all those people will be entering numbers based on different categories, which would require a different user interface for data input for each one of them.

Excel 2013 will allow multiple users to edit the same file simultaneously if (1) you have the business/enterprise version and (2) the file is kept on the OneDrive cloud (which is included in the business version).
All the users will be able to see the other users' edits in real time. Each user's cursor shows up as a different color.
Alternatively, consider using QUICKBASE.COM. It is an easy-to-use cloud-based database designed for multiple simultaneous users. If you can figure out how to use Excel, you'll be able to figure out how to use QUICKBASE. It's pretty straightforward.

If company security policy excludes a cloud solution, 1 million records is unwieldy for either Excel or Access. Check the system limits for both and you will see what you are up against.
If you are still forced to use Excel, each user can be set up with their own data entry template with VBA to copy their additions to a master Excel file. This would overcome the read only issue. If you are looking at updating records once added, you are opening a new can of worms from a coding perspective.

Related

Can I use Master Data Services to import data via Excel add-in ? Mainly Measures! (Numbers/Values)

Can I use Master Data Services to import data via Excel add-in mainly Measures (Numbers/Values)
Shortversion:
Looking for the best way to comfortably input data to an SQl-Server table with immediate feedback for the user.
Set-up:
We have a Datawarehouse (dwh) based on SQL Server 2012.
Everything is set up with the Tools from MS BI Suite (SSIS, SSAS, SSRS and so on)
The Departments access the BI-Cubes via Excel. They prefer to do everything in Excel if possible.
Most sources for the DWH are databases but one use-case has Excel-files as a source.
Use-Case with Excel files as a source
As-Is:
We have several Excel-files placed in a network folder.
Each Excel file is edited by a different user.
The files are ingested by an SSIS process looping through the files on a daily base.
The contents of the Excel-files is like this (fake data):
Header: Category | Product | Type | ... | Month | abc_costs | xyz_costs | abc_budget | xyz_budget | ...
Data: A Soup Beta 2017-06 16656 89233 4567 34333
Data Flow:
source.Excel -> 1.-> dwh.Stage -> 2.-> dwh.intermediateLayer -> 3.-> dwh.FactTable
Step 1 to 3 are SSIS ETL-Packages.
Step 3 looks-up the the Surrogate-Keys from the Dimensions and saves
them as Foreign-Keys in Fact-table based on the "Codes" provided by
the Excel (Code e.g. can be 'A' for Category).
Problems:
Step 1 "ingesting the Excel-files" is very error-prone.
Users can easily misstype the codes and numbers can be in the wrong
format.
Error messages regarding excel-sources are often missleading &
debugging Excel-sources in SSIS becomes a pain.
Sometimes Users leave Excel file open and a temporary Lock-File
blocks the whole ingestion process.
Requirements
I want to avoid the problems coming up when ingesting Excel-files.
It should be possible to validate data input and give a quick
feedback to the user
As BI-Developers we will try to avoid a solution that would involve
webdevelopment in the first place.
Excel-like input is preferred by the users.
Idea:
As Master Data Services comes with an Excel- addin that allows data manipulation
we thought that could be used for this data-input-scenario as well.
That would give us the oppurtunity to Test MDS at the same time.
But I'am not sure if this use-case fits to Master-Data-Services.
Doing a research I could not find any MDS example showing how measures are
entered via Excel-addin [samples are about modelling and and managing entities].
Can anybody clarify if this Use Case fits to MDS?
If it does not fit to MDS ? What can be a good choice that fits into
this BI-ecosystem? (preferrable Excel-based). [Lightswitch, Infopath, Powerapps or if no ther option Webdevelopment -> I am a bit confused about the options]

Keep in mind, an Entity in MDS does not represent a table in the database. This means when you load data in MDS, there are underlying tables populated with the data and metadata to keep track of changes, for example.
Using the Excel plugin to import data into MDS, and then expose the data to another system can work, considering the following:
Volume of data. The excel plugin handles large volumes in batches. So the process can become tedious.
Model setup. You need to configure the model properly with the Entities and Attributes well defined. The MDS architecture is 'pseudo data warehouse' where the entities can be considered 'facts' and the domain based attributes 'dimensions'. This is an oversimplification of the system but once you define a model you will understand what I mean.
A nice functionality is subscription views. Once you have the data in MDS, then you can expose it with subscription views which combines entities with domain based attributes in one view.
Considering your requirements:
I want to avoid the problems coming up when ingesting Excel-files.
This is possible, just keep in mind the Excel plugin has its own rules. So Excel effectively becomes the 'input form' of MDS, where data is input and committed. The user will need to have a connection set up to MDS using the credential manager etc.
It should be possible to validate data input and give a quick feedback
to the user
This can easily be handled with domain based attributes and business rules
As BI-Developers we will try to avoid a solution that
would involve webdevelopment in the first place. Excel-like input is
preferred by the users.
Keep in mind, the MDS plugin determines how the excel sheet looks and feels. No customization is possible. So your entity definitions need to be correct to facilitate a good user experience.

I have worked on a DWH project in which an MDS instance was used as a single source of truth for many dimensions. Most of the data have been rather read-only (lists of states, countries, currencies, etc.) and were maintained via the Excel plug-in. There was also some more volatile stuff which was imported via MDS import procedures.
In order to expose the MDS data to the warehouse, views were created that pointed directly to the MDS database. I have even written a SQL script that refreshed these views, depending on the MDS metadata and settings stored in the warehouse. Unfortunately, I don't have it around anymore, but it's all quite transparent there.
Everything was very much alive. Can't recall any problems with queries that involved these MDS views.

How and where to store the current customer purchasing history data?

I am now working on a project which requires to show the transaction history of one customer and if the product customer buys is under warranty or not. I need to use the data from the current system, the system can provide Web API, which is a .csv file. So how can I make use of the current system data?
A solution I think of is to download all the .csv files and write scripts to insert every record into the database I built which contains the necessary tables and relations to hold the data I retrieve. Then I can have a new database which I want. because I never done this before so I want know if it is feasible?
And one more question would be, if I should store the data locally or use a cloud database like Firebase?

High-end databases like SQL Server and Oracle come with utilities that allow you to read directly from a csv file. Check the docs. Having done this many times, the best procedure I found was to read the file into one holding table. This gives you the chance to examine the data and find any unexpected quirks or missing fields. This allows you to correct the data, where possible.
Then write the scripts to move the data from the holding table into the proper tables you have designed. This must be done in a logical manner. For example, move the customer data before the buy transactions. Thus any error messages you get will not be because you tried to store a transaction before you stored the customer. (You will have referential integrity set up, yes?) This gives you more chances to correct or adjust the data or just identify problems more or less at your leisure.
Whether or not to store the data in the cloud is strictly according to the preferences of your employer.

Publish SQL Server data to clients from saas website with multi-tenant database?

We maintain a Software as a Service (SaaS) web application that sits on top of a multi-tenant SQL Server database. There are about 200 tables in the system, this biggest with just over 100 columns in it, at last look the database was about 10 gigabytes in size. We have about 25 client companies using the application every entering their data and running reports.
The single instance architecture is working very effectively for us - we're able to design and develop new features that are released to all clients every month. Each client experience can be configured through the use of feature-toggles, data dictionary customization, CSS skinning etc.
Our typical client is a corporate with several branches, one head office and sometimes their own inhouse IT software development teams.
The problem we're facing now is that a few of the clients are undertaking their own internal projects to develop reporting, data warehousing and dashboards based on the data presently stored in our multi-tenant database. We see it as likely that the number and sophistication of these projects will increase over time and we want to cater for it effectively.
At present, we have a "lite" solution whereby we expose a secured XML webservice that clients can call to get a full download of their records from a table. They specify the table, and we map that to a purpose-built stored proc that returns a fixed number of columns. Currently clients are pulling about 20 tables overnight into a local SQL database that they manage. Some clients have tens of thousands of records in a few of these tables.
This "lite" approach has several drawbacks:
1) Each client needs to develop and maintain their own data-pull mechanism, deal with all the logging, error handling etc.
2) Our database schema is constantly expanding and changing. The stored procs they are calling have a fixed number of columns, but occasionally when we expand an existing column (e.g. turn a varchar(50) into a varchar(100)) their pull will fail because it suddenly exceeds the column size in their local database.
3) We are starting to amass hundreds of different stored procs built for each client and their specific download expectations, which is a management hassle.
4) We are struggling to keep up with client requests for more data. We provide a "shell" schema (i.e. a copy of our database with no data in it) and ask them to select the tables they need to pull. They invariably say "all of them" which compounds the changing schema problem and is a heavy drain on our resources.
Sorry for the long winded question, but what I'm looking for is an approach to this problem that other teams have had success with. We want to securely expose all their data to them in a way they can most easily use it, but without getting caught in a constant process of negotiating data exchanges and cleaning up after schema changes.
What's worked for you?
Thanks,
Michael

I've worked for a SaaS company that went through a similar exercise some years back and Web Services is the probably the best solution here. incidentally, one of your "drawbacks" is actually a benefit. Customers should be encouraged to do their own data pulls because each customer's needs on timing and amount of data will be different.
Now instead of a LITE solution, you should look at building out a WSDL with separate CRUD calls for each table and good filtering capabilities. Also, make sure you have change times for records on each table. this way a customer can hit each table and immediately pull only the records that have been updated since the last time they pulled.
Will it be easy. Not a chance, but if you want scalability, it's the only route to go.
ood luck.

Storing data in text files instead of SQL Server

I'm intending to use both of SQL Server and simple text files to save my data.
Information like Users data are going to be stored in SQL Server, RSS fedd for each user are going to be stored in folder with the user Id as a title and inside this folder I can put the files that going to store the data in, each file can take only 20 lines, if there is more than 20 then I make a new file.
When I need to reed this data I simply call the last file in the user's folder.
I need to know what is the advantages and disadvantages of using this method?
thanx

I would suggest you to store the text file data into either VARCHAR(8000) or Blob and store inside the table in database.
The advantages of storing in database is:
All your data is stored in a single place. It is very easy for you to backup and restore in other place, if required
Database by default comes with concurrency and if you have say multiple users trying to access the same row, same table, database handles it inherently
When you go for files and database kind of hybrid approach, you are going for distributed storage and you have to always make sure that they are consistent
If you want to just store the latest text file content, go for UPDATE. If you want to keep history of earlier text files content, go for SCD Type 2 kind of storage or go for historical table containing previous text file data
Database is a single contained unit and you can do so many things on it like : Transparent data encryption, masking, access control and all security related stuff in a single contained unit. In hybrid approach, you have to manage security in two places.
When all your data is in a single place, and once you have proper indexes, you can write queries and come up with so many different reporting use cases, using SQL. But, if the data is distributed, you have to manage how will be handling the different reporting use cases.

The question is not quite correct.
You should start with clarification of requirements for the application. Answer to yourself the following questions:
What type of data queries need to be executed (selects, updates, reports).
How many users will be. How often requests from them will be coming. Does data must be synchronized across users (Concurrency).
Need of authentication and authorization, localization.
Need for modification history support.
Etc.
Databases usually have all this mechanisms and you do not have to implement them in your application.
Depending on your application needs you decide what strategy to use for storing the data: by means of database, files, or by both approaches.

Concurrently access database with Excel as frontend - doable?

Suppose you have an database with the largest tables containing about 200.000 rows, and frequently modified. The client wants Excel to connect via ODBC to the database, and work as a frontend to manage the data. The data should be modifiable by up to 25 users concurrently.
My first instinct would be to recommend something else, for example a web frontend. But suppose the client insists on the Excel solution, would you regard it as doable, and what pitfalls would you see in it?
My doubts would be about:
data integrity (how to manage users modifying same data at the same time)
large amounts of data moved unnecessarily (when opening the Excel workbook I imagine that the whole database has to be transferred)
security (showing only parts of data to appropriate users in a secure way would be challenging - see previous point)
using a tool (Excel) for something, in which it doesn't excel (pardon the pun)

I do this all the time. No you don't have to bring in the whole database or even the whole table. I use ADO and VBA and send SQL statements via the Command object. For example, I have a royalty database with an Excel front end.
The user types in an invoice number and a SELECT statement retrieves that one record and populates some custom classes. The user enters/modifies some data and clicks 'Save'. Then the class has a method that writes the record back to the database with and UPDATE or INSERT depending on the situation.
At the end of the month, the user enters a date range and retrieves some records into a report, again just a SELECT statement filling some classes and outputting to a sheet.
Use Transactions so you can roll back if you hit any record locking problems, but with 25 users you probably won't.

At first glance I would suggest treating Excel a bit like a web page, that is, pull only the required data and use a specific form for editing that updates one record at a time via ADO. You need only lock a single record and that for the fraction of time it takes to update. You can check whether or not the record has changed since it was opened for editing and users can be told that they cannot open a record for editing and then leave it sitting around in the edit form or they may lose the changes.
It is usually quite unlikely for such a small group to need to change the same record at the same time.
I do not think you will have much trouble with 25 concurrent users.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight