How to find database transactional discrepancies

How to find database transactional discrepancies - database

Background:
I have an application with sales and store transaction management, the application is quite well and working as per the customer's requirements for a long period.(Dev Set is VB6, SQL Server 2008 R2 and Crystal reports 2010 etc)
Problem Statement :
Recently, I have received a complaint from my customer about the store values of a product is not accurate, while compared to physical store and reports.
Corrective Actions Taken to Resolve the Issue:
Application transactions are checked by testing sales, sales return, stock in out and stock transfer from one store to other but not find any evidence to rectify the issue answer, Now to solve this issue and continue business process, I have to manually compare the stock values to balance the stock/store values.
Expected Solution :
How can I figure out this type of data discrepancy issues as all the
mandatory checks are in place.
Adding to above, is there any tool that can monitor this type of data error automatically?
Another thing is, how can we use data analysis tools using AI(Data Science) to predict data
discrepancies if possible.
Please help me and give your valuable suggestion and share your experiences, regards.

Related

Best Way to Pull in Live Data From 'Root' Database On Demand

Let me start by apologizing as I'm afraid this might be more of a "discussion" than an "answerable" question...but I'm running out of options.
I work for the Research Dept. for my city's public schools and am in charge of a reporting web site. We use a third-party vendor (Infinite Campus/IC) solution to track information on our students -- attendance, behavior, grades, etc. The IC database sits in a cloud and they replicate the data to a local database controlled by our IT Dept.
I have created a series of SSIS packages that pull in data each night from our local database, so the reporting data is through the prior school day. This has worked well, but recently users have requested that some of the data be viewed in real-time. My database sits on a different server than the local IC database.
My first solution was to create a linked server from my server to the local IC server, and this was slow but worked. Unfortunately, this put a strain on the local IC database, my IT Dept. freaked out and told me I could no longer do that.
My next & current solution was to create an SSIS package that would be called by a stored procedure. The SSIS package would query the local IC database and bring in the needed data to my database. This has been working well and is actually much quicker than using the linked server. It takes about 30 seconds to pull in the data, process it and spit it out on the screen as opposed to the 2-3 minutes the linked server took. It's been in place for about a month or so.
Yesterday, this live report turned into a parking lot -- the report says "loading" and just sits like that for hours. It eventually will bring back the data. I discovered the department head that I created this report for sent out an e-mail to all schools (approximately 160) encouraging them to check out the report. As far as I can tell, about 90 people tried to run the report at the same time, and I guess this is what caused the traffic jam.
So my question is...is there a better way to pull in this data from the local IC database? I'm kind of limited with what I can do, because I'm not in our IT Dept. I think if I presented a solution to them, they may work with me, but it would have to be minimal impact on their end. I'm good with SQL queries but I'm far from a db admin so I don't really know what options are available to me.
UPDATE
I talked to my IT Dept about doing transactional replication on the handful of tables that I needed, and as suspected it was quickly shot down. What I decided to do was set up an SSIS package that is called via Job Scheduler and runs every 5 minutes. The package only takes about 25-30 seconds to execute. On the report, I've put a big "Last Updated 3/29/2018 5:50 PM" at the top of the report along with a message explaining the report gets updated every 5 minutes. So far this morning, the report is running fantastically and the users I've checked in with seem to be satisfied. I still wish my IT team was more open to replicating, but I guess that is a worry for another day.
Thanks to everybody who offered solutions and ideas!!

One option which I've done in the past is an "ETL on the Fly" method.
You set up an SSIS package as a dataflow but it writes to a DataReader Destination. This then becomes the source for your SSRS Report. In effect this means that when the SSRS report is run - it automatically runs the SSIS package and fetches the data - it can pass parameters into the SSIS report as well.
There's a bit of extra config involved but this is straightforward.
This article goes through it -
https://www.mssqltips.com/sqlservertip/1997/enable-ssis-as-data-source-type-on-sql-server-reporting-services/

Rebuilding an unstable tool from scratch (Currently Access based - can go anywhere)

I have inherited a custom built tool that is poorly designed and unstable, and I have a great opportunity to rebuild it from scratch. This is an internal tool only that works almost entirely in Access, and its purpose is to provide higher detail on parts that cost the company over a certain dollar amount.
How it works:
1) The raw data (new part numbers) gets pulled nightly from the EDW via macros in Access.
2) The same macros then join two tables (part numbers from one, names from another). Any part under a certain dollar amount is removed, and the new data is appended to the existing Access database.
3) During the day employees can then open a custom Access form to add more details about the part. Different questions are asked depending on the part category.
4) The completed form is forwarded to management, and the information entered is retained in the Access database – it does not write back to the EDW.
5) Managers can also pull some basic reports from the database, based on overall costs.
The problems:
1) Currently everyone has to have Access installed on their work stations, and whenever there is an update the new database gets pushed to their stations. This is not considered an ideal situation by management or IT.
2) If anyone has left the tool open accidentally at the end of the day the database is locked out, therefore the macros cannot run and the tool cannot be updated with new part numbers.
3) If the tool cannot update for a few days in a row the database can become corrupted. We can restore from the last good backup, in the past this has resulted in the loss of multiple days of work.
Ideally we want to take the tool completely out of Access. I am building a SharePoint site that can host the tool, which (if I can get it right) will eliminate the need of Access on end-user stations with a database push. However the SharePoint form would need read/write ability.
The big question is: How do I build this?
I have a completely open path of possibilities – I can design it work any way I want, using any tools or platform I want, as long as it works. It does not have to update automatically, as I already run a number of SQL scripts at the start of my day and adding one more is inconsequential.
The resources I have at my immediate disposal are: SharePoint (with designer), Access, Toad, and SQL Server. The database can be hosted on a shared network drive.
I am a recent college graduate with basic SQL knowledge. I have about a year to produce a final product, but would like to get it up and running far sooner if possible.
Any advice on what direction to pursue would be very helpful, thank you.

Caveat: I've never worked with SQL Server, so I don't know all of it's capabilities (I'm an Oracle developer).
What I'd do in your situation is something like the following (although not necessarily in this exact order):
Get a SQL Server database set up to host your tables.
Create the tables etc
Migrate test data across (I'm assuming you have a dev/uat/test environment for your current system! If you haven't, make sure you set up at least a separate test environment to prod for your new db!)
Write stored procs to do the work for adding new parts, updating existing data, etc etc
Set up an automated job on the db (I'm assuming SQL Server can do this!) to do the overnight processing.
Create a separate db user with the necessary permissions to call the stored procedures
Get your frontend to call the stored procs with the relevant parameters using the db user you created in step 6 to connect to the db.
You'd also have to think about transaction control to try and mitigate the case where users go home at the end of the day without committing their work - Does the db handle the commits/rollbacks or does Sharepoint?
Once you've worked out everything in your test environment, it's then a case of creating the prod db, users and objects, and then working out the best way of migrating the prod data across.
Good luck.
Don't forget to get backups for the new db set up as well.

Effect of stored procedures on network traffic in Access/SQL setup

I am currently administering/developing an Access 2010 frontend/SQL backend database. We are trying to improve frontend performance, and one solution that has been suggested is pushing a lot of the VBA that is running the front end down into stored procedures on the server. I'm fairly proficient in VBA, but very new to SQL and network architecture. Everything I've turned up on google has been information about splitting the database, which is already done, rather than information about network loads resulting from running stored procedures vs running VBA.
What is the difference in network traffic between the current setup and pushing this action down to a stored procedure?
As a specific example, if I'm populating a form in the current setup, there are a few queries run to provide data to different elements on the form. With the current architecture, does Access retrieve the queried tables from the backend, query them client-side and then populate the data? How would that be different in terms of network traffic from, say, executing a SP when the form loads, and only transferring the data necessary for displaying the form?
The end goal is to reduce the chattiness between Access and SQL, and I'm mostly trying to figure out exactly what is happening where.

As a general rule, if you launch a form open with a where clause to restrict the form to one record, then using a bound form, or adopting a stored procedure will NOT result in any difference or reduction in network traffic.
Any local access query based on a table simply will request the one record. There is no “local” concept of processing in this regards EVEN with a linked table. Note the word “table” or singular here.
Access does not and will not pull down a whole table unless you have such forms and quires without any “where” clause to restrict the data pulled.
In other words if you have a poorly designed form, dump and change that design to something in which you now ONLY pull down the one record, then of course the setup will result in reduced network traffic.
However the above reduction is NOT DUE to adopting the stored procedure but ONLY that of adopting a design in which you restrict the records requested into the form.
So doing something poorly and then improving that process is NOT a justification to adopt stored procedures.
Thus in the case of pulling records into a form the using a stored procedure will NOT improve performance. Worse is binding a form to a stored procedure results in a form that is READY ONLY anyway!
So stored procedures don’t necessary increase performance or reduce network traffic when talking about loading a record into a form in terms of response time or performance.
If you have to do large amounts of recordset processing then of course adopting a stored procedure can save network performance. So in place of some VBA code to process 100,000 payroll reocrds, then yes moving such code server side will help. However processing a 100,000 payroll records is NOT common task and is NOT a user interface issue in most cases anyway. In other words, you don’t have a slow loading form or slow response time to load such forms. In other words, such types of processing are NOT done interactive by users waiting for a form to load.
SQL server is indeed a high performance system, and also a system that can scale to many users.
If you write your application in c++, or VB or in your case with ms-access, in GENERAL the performance of all of these tools will BE THE SAME.
In other words...sql server is rather nice, and is a standard system used in the IT industry.
However, sql server will NOT solve your performance issues without efforts on your part. And, it turns out that MOST of those same efforts also make your non sql server Access applications run better.
In fact, we see many posts that mention moving the back end data
to sql server actually slowed things down. (and in fact on a single machine, Access JET (now called ACE) is actually FASTER THEN SQL server (so when single user on same machine – Access is faster than SQL server on the same machine in most cases).
A few things:
Having a table with 75k records is quite small. Let’s assume you have 12 users. With a just a 100% file base system (jet), and no sql server, then the performance of that system should really have screamed.
I have some applications out there with 50, or 60 HIGHLY related tables. With 5 to 10 users on a network, response time is instant. I don't think any form load takes more than one second. Many of those 60+ tables are highly relational and in the 50 to 75k records range.
So, with my 5 users I see no reason why I can’t scale to 15 users with such small tables in the 75,000 record range. And this is without SQL server.
If the application did not perform with such small tables of only 75k records then upsizing to sql server will do absolute nothing to fix performance issues. In fact, in the sql server newsgroups you see weekly posts by people who find that upgrading to sql actually slowed things down.
I even seem some very cool numbers showing that some queries where actually MORE EFFICIENT in terms of network use by JET then sql server.
My point here is that technology will NOT solve performance problems. However, good designs that make careful use of limited bandwidth resources is the key here. So, if the application was not written with good performance in mind then you kind are stuck with a poor design!
I mean, when using a JET file share, you grab a invoice from the 75k record table only the one record is transferred down the network with a file share (and, sql server will also only transfer one record). So, at this point, you
really will NOT notice any performance difference by upgrading to SQL Server. There is no magic here. And adopting a SQL stored procedure will be even a GREATER waste of time!
And adopting a stored procedure in place of above will NOT gain you performance either!
Sql server is a robust and more scalable product then is JET. And, security, backup and host of other reasons make sql server a good choice. However, sql server will NOT solve a performance problem with dealing with such small tables as 75k records
Of course, when efforts are made to utilize sql server, then significant advances in performance can be realized.
I will give a few tips...these apply when using ms-access as a file share (without a server), or even odbc to sql server:
** Ask the user what they need before you load a form!
The above is so simple, but so often I see the above concept ignored. For example, when you walk up to an instant teller machine, does it download every account number and THEN ASK YOU what you want to do?
In access, it is downright silly to open up form attached to a table WITHOUT FIRST asking the user what they want! So, if it is a customer invoice, get the invoice number, and then load up the form with the ONE record. How can one record be slow? When done editing the record and the form is closed, and you are back to the prompt ready to do battle with the next customer.
You can read up on how this "flow" of a good user interface works here (and this applies to both JET, and sql server applications):
http://www.kallal.ca/Search/index.html
My only point here is restrict the form to only the ONE record the user needs. You don't need nor gain by using a stored procedure to accomplish this task. I am always dismayed how often a developer builds a nice form, attaches it to a large table, and then opens it and the throws this form attached to some huge table and then tells the users to go have at this and have fun. Don't we have any kind of concern for those poor users? Often, the user will not even know how to search for something!
So prompt, and asking the user also makes a HUGE leap forward in usability. And, the big bonus is reduced network traffic too! Gosh better and faster, and less network traffic! What more do we want!
** USE CAUTION with quires that require more than one linked table
JET has a real difficult time joining odbc tables together. Often the Access data engine (jet/Ace) does a good job, but often such joins are slow. However most forms for editing data are NOT based on a multi-table query. (so again, a stored procedure will not speed up form load for editing of data).
The simple solution for such multiple joins (for both forms and reports) is build the sql server side as a view, and then link to that view.
This view approach is MUCH less work then a stored procedure and results in the joins occurring server side. And results view are updatable as opposed to READ ONLY when you adopt stored procedures. And performance of such views will again equal that of stored procedure in THIS context.
So once gain, adopting stored procedures DOES NOT help and is more expensive from a developer cost then simply using a view. Really this just amounts to people suggesting that you rack up bills and use developer time to create something that yields nothing over that of a view except more billable hours.
I don't think it needs pointing out that if the query in question already runs well, then the above can be ignored, but just keep in mind that local queries with more than one table based on links to sql server can often run slow. So, just be aware of the above.
This view trick also applies well to combo boxes.
So one can continue to use bound forms to a linked table but one simply needs to restrict the form to the ONE RECORD you need.
You can safely open up to a single invoice form etc. but simply ENSURE you open such forms (openForm) by restricting records via the "where" clause. No view, or stored procedure is required here.
Bound forms are way less work then un-bound forms and performance is generally just as good anyway when done right.
Avoid large loading of combo boxes. A combo box is good for about 100 entries. After that you are torturing the user (what they got to look through 100s of entries). So, keep things like combo boxes down to a min size. This is both faster and MORE importantly it is kinder to your users.
After all, at the end of the day what we really want is to treat users well. It seems that treating the users well, and reducing the bandwidth (amount of data) goes hand in hand.
So, better applications treat the users well and run faster! (this is good news!)
So, #1 tip is to reduce the data that you transfer into a form.
Using stored procedures is not required in the vast majority of cases and will not reduce bandwidth requirements anymore then adopting where clauses and views.

Database Snapshot SQL Server 2000

We have a large database that receives records concerning several hundred thousand persons per year. For a multitude of reasons I won't get into when information is entered into the system for a specific person it is often the case that the individual entering the data will be unable to verify whether or not this person is already in the database. Due to legal reqirements, we have to strive towards each individual in our database having a unique identifier (and no individual should have two or more.) Because of data collection issues it'll often be the case that one individual will be assigned many different unique identifiers.
We have various automated and manual processes that mostly clean up the database on a set schedule and merge unique identifiers for persons who have had muliple assigned to them.
Where we're having problems is we are also legally required to generate reports at year end. We have a set of year-end reports we always generate, however it is also the case that every year several dozen ad hoc reports will be requested by decision makers. Where things get troublesome is because of the continuous merging of unique identifiers, our data is not static. So any reports generated at year end will be based on the data as it existed the last day of the year, 3 weeks later if a decision maker requests a report, whatever we give them can (and will) often conflict direcly with our legally required year end reports. Sometimes we'll merge up to 30,000 identifiers in a month which can greatly change the results of any query.
It is understood/accepted that our database is not static, but we are being asked to come up with a method for generating ad hoc reports based off of a static snapshot of the database. So if a report is requested on 1/25 it will be based off the exact same dataset as our year end reports.
After doing some research I'm familiar with database snapshots, but we have a SQL Server 2000 database and we have little ability to get that changed in the short-to-medium term and database snapshots are a new feature in the 2005 edition. So my question would be what is the best way to create a queryable snapshot of a database in SQL Server 2000?

Can you simply take a backup of the database on 12/31 and restore it under a different name?

You either need to take a snapshot and work off it (to another db or external file-based system, like Access or Excel) or, if there's enough date information stored, work from your live copy using the date value to distinguish previously reported data from new.
You're better off working from a snapshot because the date approach won't always work. Ideally, you'd export your live database at the end of the year somewhere (anywhere, really) else.

Backing up SQL Database for Reports

I'm looking for some help/suggestions for backing up two large databases to one server dedicated to reports. The situation is;
My company has two databases for its internal website. One for the UK and one for Europe. Both are mirrored for DR.
I have a server based in Europe which is dedicated to Microsoft Reporting Services, where we run reports based on the data collected in those two databases.
We do not want to point reporting services to the live databases for performance/security reasons so we currently backup both databases on a daily basis and restore them to our Reporting Services server.
However this means we are putting a strain on our networks by backing up the entire databases, and also the data is only up-to-date by midnight yesterday.
Our aim is to have the data up to date by at least 15 minutes, it has been suggested to look at Log Shipping so I wondered if anyone had any experience in setting this up and what are the pros and cons and whether there is a better alternative?
Any help would be greatley appreciated,
Thanks

We developed a similar environment. We used Mirroring to get the data off to our reporting server and created an automated routine to create Snapshots of the database every 15 min. These snapshots only take 1 to 2 seconds to create in our environment and give us a read only copy of the database. Let me know if you would like me to go into deeper detail.
Note we are running Enterprise on both servers.

Log shipping is a great solution for this. We've got articles about it over at SQLServerPedia's Log Shipping section, and I've got a video tutorial on there talking you through your different options. One thing to keep in mind about log shipping is that when the restores happen, your users will be kicked out of the reporting database.
Replication doesn't have that problem, but replication is nowhere near "set-it-and-forget-it" - it's time-intensive to manage, and isn't quite as reliable as you'd like it to be. In addition, you may have to make schema modifications in order to use replication. Log shipping is more automatic & stable, but at the cost of kicking users out at restore time.
You can minimize that by having two log shipping schedules - one for daytime during business hours, and one for the rest. During business hours, you only restore the data once per hour (or less), and the rest of the time you do it every 15 minutes.

You should look at replication as an alternative to backups.

I would recommend that you look into using Transactional Replication.
It sounds as though you are looking to implement a scenario that is similar to what we are currently implementing ourselves.
We use Transaction Replication (albeit real time, you would most likely wish to synchronize your environment on a less frequent schedule) to offload a copy of our live production database to another server for reporting purposes.
Offloading reporting data is a common replication scenario and is described here in the Microsoft Replication documentation.
http://msdn.microsoft.com/en-us/library/ms151784.aspx
Brent is right in that there is indeed an element of configuration required with Replication, along with security considerations that would need to be addressed however, there are a number of key advantages to using Replication in my opinion, including:
Reduced latency in comparison to log
shipping.
The ability to Publish only the
Articles (tables) that are required
for reporting.
Reduced storage requirements.
Less data being published means less
network traffic.
Access to your reporting
data/database at all times.
For example, in our environment, we decided to replicate only the specific tables (articles) from our production database that we actually require for reporting.
I hope what I have described is clear and makes sense but please do feel free to contact me if you have any queries.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight