Database Strategy for Charting

Database Strategy for Charting - sql-server

I have a database that holds events from my application and these tables will be queried later on for charting purposes.
My table is denormalized, i.e.:
EventTable:
-EventId
-UserId
-UserName
-EventType
-DataTime
-CompanyId
-CompanyName
As it is going to be used for charting, I will be aggregating/pivoting/etc a lot.
Currently everything is being stored in SQL Server.
I would like to know if you guys have any suggestion to what I should be using to facilitate the data extraction from this table.
Sorry if this is not clear enough as english is not my native language.
UPDATE:
I will be creating the charts myself on a web application. My plan is to query the database using Stored Procedures and generate the chart using a javascript framework like Kendo UI.
Some of the queries that I'll be performing are:
'Quantity of Events' per 'Event Type' each Company had during a period
Distribution of 'Events Type' a particular Company had during a period

Related

Designing a database to gather statistics for a digital product

I'm building a digital product for a big community of users (2 million +), using Express + GraphQL for the API server and React + Apollo for the web app. Then I'm going to build mobile applications using React Native when the web part is completed.
Right now I'm struggling thinking how to develop the part that is going to gather all the statistics for the user generated content in the platform. To simplify things let's say I have to record:
unique user views of each article
total number of views of each
article
visits on each user profile
I have a couple of questions for those who had previous experience in developing such systems to gather data.
How should I record the raw data?
Should I create a kind of a log in a database and use that later to generate aggregate data depending on my needs?
Something like (article view example):
{
'user_id' : String,
'article_id' : String,
'date' : Date,
}
or should I use a different approach? And which database you recommend to use? Right now I'm thinking about using MongoDb since I'm already using it for the rest of the application.

Indeed there is no single "right" solution but some approaches may be chosen. I'd like suggest the combined approach used in several of my projects: store the most significant (and queryable) part of data as structured but put also raw data as semi-structured. The DBMS like SQL Server (faster but limited in free edition) or PostgreSQL (slower but may be sufficient) can do the job.
You may take a look at the chapter "Semi-structured data and high load" in my book for more details.

Data masking or security using Views?

Environment: SQL Server 2012
I am trying to help build a solution which includes data masking and encryption for our organisation.
Currently, we donot have any data masking in place and hence the need.
We are in the process of identifying the data which could identify the data as sensitive or not or some combinations of non-sensitive data which could lead to the identity of a person.
One approach would be to use some kind of tool like Redgate Data generator or DataVeil, which could generate fictitious data for the database for the fields we want to for Dev or UAT environment.
Other would be use some kind of function which would mask some characters as xxxx or **** based on the length.
In production environment, as per understanding, as masking is irreversible, encryption needs to happen which I will learn more about in coming weeks.
This above scenario would work where every user would see same data in UAT and Dev when data is generated from a tool or masked using TSQl code and based on access to the key for production env.
Please correct me on anything above you think doesn’t look right.
Next is user based access using views. There is not much material for security using views out there so asking on how we could implement if we take this route instead of the above mentioned.
I understand that the users could be granted access to the underlying tables using views.
What about existing queries and SSRS reports and Cube?
How could that work with views? Do I change every query ? I am little lost here.

The View option can be done by creating a new 'mask' view that includes all the columns from the source tables, and replaces sensitive columns with a dummy fixed value.
For example:
create view vMaskPeople
as
SELECT ID, DateCreated, 'Sample Name' as FullName, 'Sample Telephone' as Phone
FROM People
If you need more unique sample data, partially mask the columns, like:
SELECT ID, DateCreated,
Left(FullName,3)+'XXXXXX' as FullName,
'XXX-XXXX-'+Right(Phone,4) as Phone
If you are not able to somehow rig the Dev environments to use the new mask view, you could rename the source 'People' table to like 'People1' and then name the mask view 'People'

You mentioned SQL Data Generator, which creates a fresh data set from scratch, but here at Redgate we also have Data Masker, which allows you to take an existing database and specify masking rules, which sounds like it might suit your scenario better.

Can I use Master Data Services to import data via Excel add-in ? Mainly Measures! (Numbers/Values)

Can I use Master Data Services to import data via Excel add-in mainly Measures (Numbers/Values)
Shortversion:
Looking for the best way to comfortably input data to an SQl-Server table with immediate feedback for the user.
Set-up:
We have a Datawarehouse (dwh) based on SQL Server 2012.
Everything is set up with the Tools from MS BI Suite (SSIS, SSAS, SSRS and so on)
The Departments access the BI-Cubes via Excel. They prefer to do everything in Excel if possible.
Most sources for the DWH are databases but one use-case has Excel-files as a source.
Use-Case with Excel files as a source
As-Is:
We have several Excel-files placed in a network folder.
Each Excel file is edited by a different user.
The files are ingested by an SSIS process looping through the files on a daily base.
The contents of the Excel-files is like this (fake data):
Header: Category | Product | Type | ... | Month | abc_costs | xyz_costs | abc_budget | xyz_budget | ...
Data: A Soup Beta 2017-06 16656 89233 4567 34333
Data Flow:
source.Excel -> 1.-> dwh.Stage -> 2.-> dwh.intermediateLayer -> 3.-> dwh.FactTable
Step 1 to 3 are SSIS ETL-Packages.
Step 3 looks-up the the Surrogate-Keys from the Dimensions and saves
them as Foreign-Keys in Fact-table based on the "Codes" provided by
the Excel (Code e.g. can be 'A' for Category).
Problems:
Step 1 "ingesting the Excel-files" is very error-prone.
Users can easily misstype the codes and numbers can be in the wrong
format.
Error messages regarding excel-sources are often missleading &
debugging Excel-sources in SSIS becomes a pain.
Sometimes Users leave Excel file open and a temporary Lock-File
blocks the whole ingestion process.
Requirements
I want to avoid the problems coming up when ingesting Excel-files.
It should be possible to validate data input and give a quick
feedback to the user
As BI-Developers we will try to avoid a solution that would involve
webdevelopment in the first place.
Excel-like input is preferred by the users.
Idea:
As Master Data Services comes with an Excel- addin that allows data manipulation
we thought that could be used for this data-input-scenario as well.
That would give us the oppurtunity to Test MDS at the same time.
But I'am not sure if this use-case fits to Master-Data-Services.
Doing a research I could not find any MDS example showing how measures are
entered via Excel-addin [samples are about modelling and and managing entities].
Can anybody clarify if this Use Case fits to MDS?
If it does not fit to MDS ? What can be a good choice that fits into
this BI-ecosystem? (preferrable Excel-based). [Lightswitch, Infopath, Powerapps or if no ther option Webdevelopment -> I am a bit confused about the options]

Keep in mind, an Entity in MDS does not represent a table in the database. This means when you load data in MDS, there are underlying tables populated with the data and metadata to keep track of changes, for example.
Using the Excel plugin to import data into MDS, and then expose the data to another system can work, considering the following:
Volume of data. The excel plugin handles large volumes in batches. So the process can become tedious.
Model setup. You need to configure the model properly with the Entities and Attributes well defined. The MDS architecture is 'pseudo data warehouse' where the entities can be considered 'facts' and the domain based attributes 'dimensions'. This is an oversimplification of the system but once you define a model you will understand what I mean.
A nice functionality is subscription views. Once you have the data in MDS, then you can expose it with subscription views which combines entities with domain based attributes in one view.
Considering your requirements:
I want to avoid the problems coming up when ingesting Excel-files.
This is possible, just keep in mind the Excel plugin has its own rules. So Excel effectively becomes the 'input form' of MDS, where data is input and committed. The user will need to have a connection set up to MDS using the credential manager etc.
It should be possible to validate data input and give a quick feedback
to the user
This can easily be handled with domain based attributes and business rules
As BI-Developers we will try to avoid a solution that
would involve webdevelopment in the first place. Excel-like input is
preferred by the users.
Keep in mind, the MDS plugin determines how the excel sheet looks and feels. No customization is possible. So your entity definitions need to be correct to facilitate a good user experience.

I have worked on a DWH project in which an MDS instance was used as a single source of truth for many dimensions. Most of the data have been rather read-only (lists of states, countries, currencies, etc.) and were maintained via the Excel plug-in. There was also some more volatile stuff which was imported via MDS import procedures.
In order to expose the MDS data to the warehouse, views were created that pointed directly to the MDS database. I have even written a SQL script that refreshed these views, depending on the MDS metadata and settings stored in the warehouse. Unfortunately, I don't have it around anymore, but it's all quite transparent there.
Everything was very much alive. Can't recall any problems with queries that involved these MDS views.

Best method for data retrieval for charting

I have an application developed in C#/ASP.NET and using SQL Server. I am now in the position where I can start bolting on the charting tools for my application. The application displays charts showing the customers data, which is 1 - 30 minute time series and could be over 12 months, so a lot of data (sometimes). My question is around, what is the most efficient and secure method for retrieving and preparing the data for the chart. That is to say, I want the data ready to go when the customer logs in and I want there to be as little lag as possible when the customer logs in to view their charts. It is a simple chart with time series data, but there can be a lot of it.
I want to make sure the performance is great and if i have several hundred users all getting data on the fly, I am going to get lag on the DB.
I had in mind, either creating an XML file for the chart data using SQL server and dropping that into a customer folder and updating this on regular basis. Then the chart uses the XML file I can bolt that straight into the chart.
The other was direct access to the data using a query or stored procedure.
What would you guys use to solve a similar problem.

You don't specify which version of SQL Server you're using but FOR XML was added in SQL Server 2012. https://msdn.microsoft.com/en-gb/library/ms178107.aspx
I'm using it to fetch complex structured data in a single round-trip, were fetching the same data using LINQ-to-SQL or EF would involve a lot of round-trips.
It may be difficult or inefficient to select your data directly into the structure required by your charting tools, but you could use XSLT to restructure the data before saving the file to disc.

Public Library Data over Time

I'm working on creating a better way to view the history of items held at the public library I work at. I have been using a combination of the built in functions of SirsiDynix (our library software), Excel, and Autohotkey to extract, manipulate, and display the data. I am currently stuck on designing a way to view the change in status of an item over time since the system as it is only shows based on the last transaction. For example, if I have the following item:
0000519227318 005.54 WAL 101 EXCEL 2013 TIPS, TRICKS & TIMESAVERS Walkenbach, John, author WE-WH 2013 7 7/13/2013 6/29/2015 35
I can tell you it was created on 7/13/2013, last checked out on 6/29/2015, and has been checked out a total of 7 times. But I am unable to tell you anything about the length of those checkouts, or when they occurred, or if the book had been missing for a year in the middle of that time period.
With Autohotkey and the SirsiDynix Director's Station I have been able to create "daily snapshot" csv files that indicate where an item is every day. However I am having trouble figuring out how to consolidate that information. Originally I was planning to simply add an additional column to the end of the record every day so that after the general item information you would have a series of numbers listing the changing location. The coding I have for AHK to do this is somewhat slow and I'm still working on how I would best display it in Excel regardless. However it occurred to me that there may be a much better way to handle this that could fully automate the process.
So I'm asking whether there are suggestions for either a simple database system to use or an improvement to my current method that could assist me. The queries I plan to do are simply to be able primarily to type in an item number and have a chart display the status of an item, hopefully with something that could also show whenever the total checkouts has increased. I have been looking at Stock Market charts as examples but as many people with those seem to want open,close,hi,low values the responses they get seem beyond what I may need. Additional queries of items with the longest period of time on the shelf relative to total time would be useful although not initially required.
Any help as to what direction I may want to go would be appreciated. I have basic understanding of AutoHotKey and Excel, and I briefly used MySQL several years ago so I have a general feel of how a database can be used.

Not too familiar with your specific software or Autohotkey but for an efficient, secure, and scalable solution, consider any type of relational database management system (RDMS) including server level enterprise systems (some open-source or proprietary): Oracle, SQL Server, DB2, PostgreSQL, MySQL; or file level databases including SQLite and MS Access. One main thing is to try to move out of the concept of flatfile spreadsheets and applications. Excel is simply not a database and should be only used an end-use document for reporting or graphics/analytics using retrieved database content.
With a relational database you can maintain data across normalized related tables linked together by primary and foreign keys. Essentially, you want to build a Library Management System which can comprise of the following tables:
Items -unique list of Items ISBN, Catalog, Title, Author, Publisher, Category (Fiction, Nonfiction, Reference, Media); Primary Key: ItemID (autonumber/auto-increment)
Stock -copies of Items, Condition, Missing/Damaged Status, Cost, and Inventory Quantity; one-to-many relationship with Items table; Primary Key: StockID, Foreign Key: ItemID
Checkouts -infinite history of checkout record including Stock Item, CheckoutDate, CheckinDate, Notes; many-to-many relationship with Stock; Primary Key: CheckoutID, Foreign Key: StockID
Now with this schema, you can better manage each Item throughout its life cycle (new or arrived item, checked out periods, and retired/discard/basement stored) with easy queries for real-time reporting. Additionally, you can use any type of general purpose language (Java, C#, PHP, Python, Perl, VB) which can connect to any aforementioned RDMS to build the interface or tool for this backend system. A host of free consoles are also available including:
PhpMyAdmin (PHP/MySQL)
PgAdmin (PostgreSQL)
SQLite-manager (Firefox addin/SQLite)
Management Studio (SQL Server)
MS Access (Jet/ACE Engine and MS Office) the often misnomer to conflate MS Access as a database when it is actually a GUI that connects by default to a Windows technology, the JET/ACE SQL engine, where this default can be switched out to any aforementioned RDMS
Here, Excel can connect via ODBC/OLEDB VBA for reporting on checked out items status, history, and current shelf stock. Depending on the RDMS, you can even build triggers where as soon as an item is checked out, a record is added to CheckOut table or code it in you tool or scripts. Finally, outputs into txt, csv, xml, pdf reports, email attachments to co-workers, board, etc can be integrated.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight