As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
As Wikipedia says
Database Triggers are commonly used to:
audit changes (e.g. keep a log of the users and roles involved in
changes)
enhance changes (e.g. ensure that every change to a record
is time-stamped by the server's clock)
enforce business rules (e.g.
require that every invoice have at least one line item) etc.
ref: database triggers - wikipedia
But we can do these things inside the Business Layer using a common programming language (especially with OOP) easily. So what is the necessity of database triggers in modern software architecture? Why do we really need them?
It might work, if all data is changed by your application only. But there are other cases which I have seen very frequently:
There are other applications (like batch jobs doing imports etc.) which do not use the business layer
You cannot use plain SQL scripts as a means for hotfixes easily
Apart from that in some cases you can even combine both worlds: Define a trigger in the database, and use Java to implement it. PostgreSql for examples supports triggers written in Java. As for Oracle, you can call a Java method from a PL/SQL trigger. You can define CLR based triggers in MS SQL Server.
This way not every programmer needs to learn PL/SQL, and data integrity is enforced by the database.
Think about the performance. IF this is all to be done from the application, there are most likely a lot of extra sql*net round trips, slowing down the application. Having those actions defined in the database makes sure that they are always enforced, not only when the application is used to access the data.
When the database is in control, you have your rules defined on the central location, the database, instead of in many locations in the application.
Yes, you can completely omit database triggers.
However, if you can't guarantee that your database will only be accessed from the application layer (which is impossible) then you need them. Yes, you can perform all your database logic in the application layer but if you have a table that needs X done to it when you're updating it then the only way to do that is in a trigger. If you don't then people accessing your database directly, outside your application, will break your application.
There is nothing else you can do. If you need a trigger, use one. Do not assume that all connections to your database will be through your application...
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When there is a web app will query some information frequently, how to improve the performance by cache the query result?
(The information is like top news in a website and my database is SQL Server 2008, the application is on tomcat.)
I can suggest the following:
In your database you can use idex views, please check: How to mimick Oracle Materialized Views on MS SQL Server?.
If you has used JPA or Hibernate it can cache Entities (objects).
http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/performance.html#performance-cache
http://en.wikibooks.org/wiki/Java_Persistence/Caching
If you're looking for a cache system that is foreign to database and ORM, maybe you can review MemCache or EHCache.
http://memcached.org/
http://ehcache.org/
An option but not recommended is that you manage a cache in your application, by example you can store at ServletContext (also know as ApplicationContext) the list of Countries, but you need to implement the business logic for cache (update, delete and insert objects), also you need to be careful with the Heap Memory.
You can use a combination of the above strategies it depends of the context of your business
Best regards,
Ernesto.
This is a pretty general question and as you'd expxect, there are many options.
Closest to the UI, your web platform might have 'content caching.' ASP.NET, for example, will cache portions of a page for specified periods of time.
You could use a caching tool like memcached and cache a recordset (or whatever the stand-alone Java data structure is).
Some ORM's provide caching too.
And (probably not finally) you could define structure in your database to 'cache' results like this by running complex queries and saving the results into tables that are queried more often but are cheaper to query.
Just some ideas.
The answer for a really big site is all of the above. We do all our queries via stored procs. That helps because the query is compiled and one execution plan is reused. We have a wicked ccomplicated table valued function. It's so expensive we built a cache table. The table has the same general foormat as the function but with two extras. One is an expire time. The other is a search key. The search key is the parameters that go into the function concatenated together. Whenever we're about to query that table we run a Proc to check if the data is stale. If it is we start a transaction delete the rows, and then run the function and insert the rows. This means we run the function maybe 2 or 3% of the times we used to and the proc call we make to check for staleness is much cheaper. Whenever the app updates the relevant data it goes and updates the cache rows as stale - but it doesn't delete them we leave that to the cache check function. Why? Well maybe nobody will need that data right now, so less db hit. Then we hit the second layer. We cache many recordsets in memcached. Including all of the procs that call that function, and many more. That actually happens in our asp layer, which we still have. ADO recordsets can be persisted to xml natively, which then goes into memcache as a string.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
My sister is going to start taking classes to try to learn how to become a web developer. She's sent me the class lists for a couple of candidate schools for me to help guide her decision.
One of the schools mentions Microsoft Access as the primary tool used in the database classes including relational algebra, SQL, database management, etc.
I'm wondering - if you learn Microsoft Access will you be able to easily pick up another more socially-acceptable database technology later like MySQL, Postgres, etc? My experience with Access was not pleasant and I picked up a whole lot of bad practices when I played around with it during my schooling years.
Basically: Does Microsoft Access use standards-compliant SQL? Do you learn the necessary skills for other databases by knowing how Microsoft Access works?
Access I would say a lot more peculiarities over 'actual' databasing software. Access can be used as a front end for SQL databases easily and that's part of the program.
Let's assume the class is using databases built in Access. Then let's break it down into the parts of a database:
Tables
Access uses a simplified model for variables. Basically you can have typical number, text fields, etc. You can fix the number of decimals for instance like you could with SQL. You won't see variables like varchar(x) though. You will just pick a text field and set the field size to "8", etc. However, like a real database, it will enforce the limits you've put in. Access will support OLE objects, but it quickly becomes a mess. Access databases are just stored like a file and can become incredibly large and bloat quickly. Therefore using it for more than storing address books, text databases, or linking to external sources via code...you have to be careful about how much information you store just because the file will get too big to use.
Queries
Access implements a lot of things along the line of SQL. I'm not aware that it is SQL compliant. I believe you can just export your Access database into something SQL can use. In code, you interact with SQL database with DAO, ADO, ADODB and the Jet or Ace engines. (some are outdated but on older databases work) However, once you get to just making queries, many things are similar. Typical commands--select, from, where, order, group, having, etc. are normal and work as you'd see them work elsewhere. The peculiar things happen when you get into using calculated expressions, complicated joins (access does not implement some kinds of joins but you will see arguably the most important--inner join/union ). For instance, the behavior of distinct is different in Access than other database architecture. You also are limited in the way you use aggregate functions (sum/max/min/avg) . In essence, Access works for a lot of tasks but it is incredibly picky and you will have to write queries just to work around these problems that you wouldn't have to write in a real database.
Forms/Reports
I think the key feature of Access is that it is much more approachable to users that are not computer experts. You can easily navigate the tables and drag and drop to create forms and reports. So even though it's not a database in my book officially, it can be very useful...particularly if few people will be using the database and they highly prefer ease of use/light setup versus a more 'enterprise level' solution. You don't need crystal reports or someone to code a lot of stuff to make an Access database give results and allow users to add data as needed.
Why Access isn't a database
It's not meant to handle lots of concurrent connections. One person can hold the lock and there's no negotiating about it--if one person is editing certain parts of the database it will lock all other users out or at least limit them to read-only. Also if you try to use Access with a lot of users or send it many requests via code, it will break after about 10-20 concurrent connections. It's just not meant for the kinds of things oracle and mySQL are built for. It's meant for the 'everyman' computer user if you will, but has a lot of useful things programmers can exploit to make the user experience much better.
So will this be useful for you to learn more about?
I don't see how it would be a bad thing. It's an environment that you can more easily see the relational algebra and understand how to organize your data appropriately. It's a similar argument to colleges that teach Java, C++, or Python and why each has its merits. Since you can immediately move from Access to Access being the front-end (you load links to the tables) for accessing a SQL database, I'm sure you could teach a very good class with it.
MS-Access is a good Sand-pit to build databases and learn the Basic's (Elementary) design and structure of a Database.
MS-Access'es SQL implementation is jsut about equivalent to SQL1.x syntax. Again Access is a Great app for learning the interaction between Query's, Tables, and Views.
Make sure she doesnt get used to the Macro's available in Access as they structure doesnt translate to Main-Stream RDBMS. The best equivalent is Stored procedures (SProcs) in professional RDBMS but SProcs have a thousand fold more utility and functionality than any Access Macro could provide.
Have her play with MS-Access to get a look and feel for DBMS, but once she gets comfortable with Database design, have her migrate to either MS-SQL Express or MySQL or Both. SQL-Express is as close to the real thing without paying for MS-SQL Std. MySQL is good for the LAMP web infrastructures.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Search and destroy / capturing illegal data...
The Environment:
I manage a few very "open" databases. The type of access is usually full select/insert/update/delete. The mechanism for accessing the data is usually through linked tables (to SQL-Server) in custom-build MS Access databases.
The Rules
No social security numbers, etc. (e.g., think FERPA/HIPPA).
The Problem
Users enter / hide the illegal data in creative ways (e.g., ssn in the middle name field, etc.); administrative/disciplinary control is weak/ineffective. The general attitude (even from most of the bosses) is that security is a hassle, if you find a way around it then good for you, etc. I need a (better) way to find the data after it has been entered.
What I've Tried
Initially, I made modifications to the various custom-built user interfaces folks had (that I was aware of), all the way down to the table structures that they were linking to our our database server. The SSN's, for example, no longer had a field of their own, etc. And yet...I continue to find them buried in other data fields.
After a secret audit some folks at my institution did, where they found this buried data, I wrote some sql that (literally) checks every character in every field field in every table of the database looking for anything that matched an ssn pattern. It takes a long time to run, and the users are finding ways around my pattern definitions.
My Question
Of course, a real solution would entail policy enforcement. That has to be addressed (way) above my head, however, it is beyond the scope and authority of my position.
Are you aware of or do you use any (free or commercial) tools that have been targeted at auditing for FERPA & HIPPA data? (or if not those policies specifically, then just data patterns in general?
I'd like to find something that I can run on a schedule, and that stayed updated with new pattern definitions.
I would monitor the users, in two ways.
The same users are likely to be entering the same data, so track who is getting around the roadbloacks, and identify them. Ensure that they are documented as fouling the system, so that they are disciplined appropriately. Their efforts create risk (monetary and legal, which becomes monetary) for the entire organization.
Look at the queries that users issue. If they are successful in searching for the information, then it is somehow stored in the repository.
If you are unable to track users, begin instituting passwords.
In the long-run, though, your organization needs to upgrade its users.
In the end you are fighting an impossible battle unless you have support from management. If it's illegal to store an SSN in your DB, then this rule must have explicit support from the top. #Iterator is right, record who is entering this data and document their actions: implement an audit trail.
Search across the audit trail not the database itself. This should be quicker, you only have one day (or one hour or ...) of data to search. Each violation record and publish.
You could tighten up some validation. No numeric field I guess needs to be as long as an SSN. No name field needs numbers in it. No address field needs more that 5 or 6 numbers in it (how many houses are there on route 66?) Hmmm Could a phone number be used to represent an SSN? Trouble is you can stop someone entering acaaabdf etc. (encoding 131126 etc) there's always a way to defeat your checks.
You'll never achieve perfection, but you can at least catch the accidental offender.
One other suggestion: you can post a new question asking about machine learning plugins (essentially statistical pattern recognition) for your database of choice (MS Access). By flagging some of the database updates as good/bad, you may be able to leverage an automated tool to find the bad stuff and bring it to your attention.
This is akin to spam filters that find the bad stuff and remove it from your attention. However, to get good answers on this, you may need to provide a bit more details in the question, such as the # of samples you have (if it's not many, then a ML plugin would not be useful), your programming skills (for what's known as feature extraction), and so on.
Despite this suggestion, I believe it's better to target the user behavior than to build a smarter mousetrap.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Would any one please differentiate what is best to use SQLite or SQL Server? I was using XML file as a data storage ADD, delete , update.. Some one suggested to use SQLite for fast operation but I am not familier with SQLite I know SQL Server.
SQLite is a great embedded database that you deploy along with your application. If you're writing a distributed application that customers will install, then SQLite has the big advantage of not having any separate installer or maintenance--it's just a single dll that gets deployed along with the rest of your application.
SQLite also runs in process and reduces a lot of the overhead that a database brings--all data is cached and queried in-process.
SQLite integrates with your .NET application better than SQL server. You can write custom function in any .NET language that run inside the SQLite engine but are still within your application's calling process and space and thus can call out to your application to integrate additional data or perform actions while executing a query. This very unusual ability makes certain actions significantly easier.
SQLite is generally a lot faster than SQL Server.
However, SQLite only supports a single writer at a time (meaning the execution of an individual transaction). SQLite locks the entire database when it needs a lock (either read or write) and only one writer can hold a write lock at a time. Due to its speed this actually isn't a problem for low to moderate size applications, but if you have a higher volume of writes (hundreds per second) then it could become a bottleneck. There are a number of possible solutions like separating the database data into different databases and caching the writes to a queue and writing them asynchronously. However, if your application is likely to run into these usage requirements and hasn't already been written for SQLite, then it's best to use something else like SQL Server that has finer grained locking.
UPDATE: SQLite 3.7.0 added a new journal mode called Write Ahead Locking that supports concurrent reading while writing. In our internal multi-pricess contention test, the timing went from 110 seconds to 8 seconds for the exact same sequence of contentious reads/writes.
Both are in different league altogether. One is built for enterprise level data management and another is for mobile devices (embedded or server less environment). Though SQLite deployments can hold data in many hundred GBs but that is not what it is built for.
Updated: to reflect updated question:
Please read this blog post on SQLite. I hope that would help you and let you access it from redirect you to resources to programatically access SQLite from .net.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I'm aware of some of the test data generators out there, but most seem to just fill name and address style databases [feel free to correct me].
We have a large integrated and normalised application - e.g. invoices have part numbers linked to stocking tables, customer numbers linked to customer tables, change logs linked to audit information, etc which are obviously difficult to fill randomly. Currently we obfuscate real life data to get test data (but not very well).
What tools\methods do you use to create large volumes of data to test with?
Where I work we use RedGate Data Generator to generate test data.
Since we work in the banking domain. When we have to work with nominative data (Credit card numbers, personnal ID, phone numbers) we developed an application that can mask these database fields so we can work with them as real data.
I can say with Redgate you can get close to what your real data can look like on a production server since you can customize every field of every table in your BD.
You can generate data plans with VSTS Database Edition (with the latest 2008 Power tools).
It includes a Data Generation Wizard which allows automated data generation by pointing to an existing database so you get something that is realistic but contains entirely different data
I've rolled my own data generator that generates random data conforming to regular expressions. The basic idea is to use validation rules twice. First you use them to generate valid random data and then you use them to validate new input in production.
I've stated a rewrite of the utility as it seems like a nice learning project. It's available at googlecode.
I just completed a project creating 3,500,000+ health insurance claim lines. Due to HIPPA and PHI restrictions, using even scrubbed real data is a PITA. I used a tool called Datatect for this (http://www.datatect.com/).
Some of the things I like about this tool:
Uses ODBC so you can generate data into any ODBC data source. I've used this for Oracle, SQL and MS Access databases, flat files, and Excel spreadsheets.
Extensible via VBScript. You can write hooks at various parts of the data generation workflow to extend the abilities of the tool. I used this feature to "sync up" dependent columns in the database, and to control the frequency distribution of values to align with real world observed frequencies.
Referentially aware. When populating foreign key columns, pulls valid keys from parent table.
The Red Gate product is good...but not perfect.
I found that I did better when I wrote my own tools to generate the data. I use it when I want to generate say Customers...but it's not great if you wanted to simulate randomness that customers might engage in like creating orders...some with one item some with multiple items.
Homegrown tools will provide the most 'realistic' data I think.
Joel also mentioned RedGate in podcast #11