When developing a new system - should the db schema always be discussed with the stakeholders? [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I'm several layers/levels above the people involved in the project I'm about to describe.
The general requirement is for a web based issue management system. The system is a small part of a much larger project.
The lead pm has a tech pm who is supposed to handle this portion of the project. The lead pm asked me if it's normal for the help information to not be in the context of where the help was requested. The lead pm was providing feedback about the site and wanted modal dialogs and such for error messages and wanted me to take a look. I'm looking at the system and I'm thinking...
a new app was developed in cold fusion!?!?
the app has extremely poor data validation
the app data validation page navigates away from the data entry form
the app help page navigates away from the form
the db schema was not discussed between the developer and the pm
the db schema was not discussed because it does not exist
there is a menu page - i.e. once you go to a page, you have to go back to main menu and then go to the next page you want
the lead pm does not know what the dbms is...
there is a tech pm and she does not know what a dbms is...
the lead pm has wanted to fire the tech pm for a long time, but the tech pm is protected...
the lead pm suggested that the exact functionality desired exists in several proprietary projects (several of which are open source - bugtracker, bugzilla, etc.), but the tech pm and dev wouldn't listen.
I have two questions?
Do I
fire the dev?
fire the tech pm and the person protecting her?
fire the lead pm?
download and configure bugtracker/bugzilla for them and then fire all of them?
download and configure bugtracker/bugzilla for them and then go have a beer to forget my sorrows?
and isn't it SOP for the db schema to be discussed and rigorously thought through very early in the project?
EDIT:
I used to work with a wide variety of clients with disparate levels of technical knowledge (and intelligence). I always discussed the db schema with the stakeholder. If they didn't know what a schema was, I would teach them. If they didn't have the background to understand, I would still discuss the schema with them - even if they didn't realize we were talking about the schema. In most of the projects I've been directly involved in, the data is the most important part of the system. Thoroughly hashing out the schema/domain model has been critical in getting to a good understanding of the system and what things can be done and reported on. I have high regard for the opinions of the posters on SO. It's interesting to note that my approach is not the usual course.
BTW - the sad thing is that the project uses tax payer funds and the IT portion is a collaboration with a prestigious university... the dev and tech pm are long time employees - they are not inexperienced. I find it especially sad when I know intelligent and hard-working people who are jobless and people like these are employed.
When I was younger, I would report this type of ineptitude up the chain and expect appropriate action. Now that I'm up the chain, I find myself not wanting to micro-manage other people's responsibilities.
My resolution was to have two beers and get back to my responsibilities...

Okay, the first thing, to answer your question: No NO, a thousand times NO! The users are not people you should be discussing db schemata with; in general, you'd as well discuss calculus with a cow. Even if they have the technical background, what about the next time the requirements change; should they be involved in the schema update?
More generally, this sounds like a case where the technical leads let the problem get out of touch with the "customers" or stakeholders. If you're being asked to actually fix the problem, I'd suggest you need to build a GUI prototype of some sort, maybe even just a storyboard, and walk through that. then you'll have an idea where things stand.
Extended: yes, it WOULD be normal to discuss the DB schema within the project. I'd say you do need to think seriously about some, um, major counseling with the leads.
Extended more: I understand your point, but the thing is that the database schema is an implementation detail. We're so used to databases we let ourselves lose track of that, and end up with applications that, well, look like databases. But the database isn't what delivers customer value; it's whether the customer can do the things they want. If you tie the ways the customer sees the application to the DB schemata, then you tie them to one implementation; a change, such as denormalizing a table in order to make a more efficient system, becomes something you have to show the customer. Better to show them the observables, and keep these details to ourselves.
But I suspect we're having a terminology clash, too. I would have agreed with you on "domain model." If, by db schema, you mean only those tables and relations visible in the user's view of the system, the "use cases" if you will, then we'd be agreeing.

The DATA should be discussed with the stakeholders, absolutely yes. The DB SCHEMA should NOT be discussed with the stakeholders except under special circumstances, where the stakeholders are all "database savvy".
So how can you discuss the DATA without discussing the DB Schema? This is the primary use that I've found for Entity-Relationship (ER) diagrams, and the ER model in general. A lot of database designers tend to treat ER as a watered down version of relational data modeling (RDM). In my experience, it can be used much more profitably if you don't think of it as watered down RDM.
What is one difference between ER and RDM? In RDM, a many to many relationship requires a junction box in the middle. This junction box holds foreign keys that link the junction box to the participants in the many to many relationship.
In ER, when applied strictly, junction boxes are unnecessary in many to many relationships. You just indicate the relationship as a line, and indicate the possibility of "many" at both ends of the line. In fact, ER diagrams don't need foreign keys at all. The concept of linkage by means of foreign keys can be left out of the discussion with most users.
Data normalization is utterly irrelevant to ER diagramming. A well built ER diagram will have very little harmful redundancy in it, but that's largely serendipity and not the result of careful planning.
The "entities" and "relationships" in a stakeholder oriented ER diagram should only include entities that are understood by the subject matter experts, and not include entities or relationships that are added in the course of logical database design.
Values to be held in a database and served up on demand can be connected to attributes, and attributes can in turn be connected to either entities or relationships among entities. In addition, attributes can be tied to domains, the set of possible values that each attribute can take on. Some values stored in databases, like foreign keys, should be left out of discussions with most stakeholders.
Stakeholders who understand the data generally have an intuitive grasp of these concepts, although the terms "entity", "relationship", "attribute", and "domain", may be unfamiliar to them. Stakeholders who do not understand the subject matter data require special treatment.
The beauty of ER models and diagrams is that they can be used to talk about data not only in databases, but also as the data appears in forms that users can see. If you have any stakeholders that don't understand forms and form fill out, my suggestion is that you try to keep them away from computers, if that's still possible.
It's possible to turn a well built ER diagram into a moderately well built relational schema by a fairly mechanical process. A more creative design process might result in a "better" schema that's logically equivalent. A few technical stakeholders need to understand the relational schema and not merely the ER diagram. Don't show the relational schema to people who don't need to know it.

Well, first you probably should review very carefully the relationship between the tech pm and her sponsor. I'm surprised you say the tech pm is protected when you later imply you can fire the protector. Either she is, or she is not protected. If you can fire the protector, then she is NOT protected.
So it sounds like no-one is protected, and worse - NO-ONE is communicating. I'd recommend the following: call a meeting with the lead pm, the tech pm and the dev. Once together, ask each in turn: "without referencing anything except YOUR work (i.e. you can't blame anyone else for the duration of this exercise), tell me in 5 minutes or less why I should NOT fire you today".
I realize this is extreme advice, but you have described a HORRIBLE solution to a classic problem. Every aspect of this project and the resulting "code" sounds like a disaster. You probably should have had a greater hand in the oversight of this mess, but you didn't (for whatever reason). I realize that you should expect hired professionals at the PM level to do better than this.
Hence my recommendation for a SEVERE shake-up of the team. Once you put the fear of unemployment one the table (and I'd tell them that you are writing up the failure to communicate for each one), then REQUIRE them to post plans for immediate communication improvement PLUS detailed timelines for fixing the mess by the end of the week.
Then get off your own bum because you're now the LEAD-lead PM on this project.
If they shape up and pull off a comeback on this disaster, then slowly start increasing their responsibilities again. If not... there's always a door.
Cheers,
-R

the lead pm suggested that the exact
functionality desired exists in
several proprietary projects (several
of which are open source - bugtracker,
bugzilla, etc.), but the tech pm and
dev wouldn't listen.
If this is true, tell the lead pm to be more assertive; then tell him/her to install bugzilla and be done with it. If the tech pm and dev weren't listening because of stubbornness, they need a little chat...
Either way, I'd say you have a problem with your organization... How many thousands of dollars were lost because of a case "not developed here"? However, given that it reached the point of implementation, there are problems further upstream than the development level...
As far as discussing the db schema with everybody, I'd say no. Everyone who can positively contribute should be involved after the application requirements have been gathered.

Wow, sounds like a disaster. Let me address your points in rough order:
First, people develop in languages they find comfortable. If someone is still comfortable in an older environment when much better alternatives exist, it is a sure sign that they have little appetite for skill acquisition.
Data validation prevents people from going too far down a path only to find it is a blind alley. Lack of validation means the developer isn't thinking about the user. Also, it is not something tacked on at the end...it simply doesn't work that way.
Web "dialogs" cannot be "modal" in the sense you are thinking. However, it is easy enough to pop up an additional window. Help on a page should almost always use a pop up window of this sort.
Data validation should NEVER navigate away from the page where data is entered - this is horrible UI design.
The DB schema is kind of the least of your problems. If the developer is responsible for delivering functionality and is clearly competent in data schema design, I wouldn't think it critical to discuss the nuances of the schema with the lead PM. It should be discussed among various code-level stakeholders and it must be capable of handling the requirements of the work. However, the important thing from the PM's perspective isn't the schema so much as the operational aspects. Of course, if you have no faith in the developer's ability to construct a good db schema, all bets are off.
If you seriously don't know what the dbms is, you may have a serious problem. Do you have a standard? If everyone else in the extended project is using MS SQL Server and this guy chose Oracle, how do you transfer expertise and staff into or out of this project? This is a sign of an organization out of control.
There are two reasons for ignoring alternative proprietary products. First, they may not truly meet your needs. Second, the tech PM and developer may simply be featherbedding or engaging in some nasty 'not invented here' justification for wasting your resources. The problem is that you aren't likely to have enough insight, at your level, to know the difference between the two.
With respect to firing the Dev...is it possible to help him by sponsoring some additional training? If this person is otherwise a good employee and knows your business well, I'd be very hesitant to fire them when all that is needed is a push in the right direction.
The tech PM sounds like she really isn't doing her job. She is the logical person to point out the flaws I am writing about and pushing for improvement. The real question, vis a vis her position, is whether she can learn to be a better advocate for your organizational interests.
The lead PM sounds too passive as well. Comments made above regarding the tech PM apply here as well.
If bugtracker, etc. really work then it would make sense to go that route. However, you might want to be a bit more circumspect about firing people.

First off, I agree with Charlie Martin about the db schema.
Second,
It sounds like the developer on the project is very green - is this his/her first programming job? If so, I would only fire the dev if their resume says something else.
I don't know how involved the lead/tech pms are expected to be in a project, but it sounds like the responsibility is dev > tech pm > lead pm. If that is the case, then the tech pm completely dropped the ball. You may want to find out why the ball was dropped and fire/keep her based on that, but a botched job like that is reprimand time where I work.
Finally, imho, the "protection" stuff is b.s. - you need to reward and reprimand people based on their quality and value, not who their aunt is.
Good luck! Cheers!

Wow. I feel your pain.
Looks to me as if the first source of your problem is the techPM who is "protected". Why is she protected and by whom? I once was on a project where the ceo's secretary became first the business analyst and then (after he quit) the project manager because they were having an affair. She didn't know what language we programmed in and thought requirements were a waste of time. Since she was protected by someone as high as possible in the organization, the only real solution was to look elsewhere for employment.
You seem to think you can fire her and her protector so it may be someone lower than you but above the lead PM so he couldn't do anything about it but you can? Yes, you should fire the two of them.
The lead PM may or may not be salvageable depending on who the protector was. He could have been between a rock and a hard place where he knew what to do but due to the nature of the relationship between the tech pm and her protector was unable to exert any influence over her and the people who reported to her. I was in that position once where two of my bosses were having an affair with one of my subordinates and it creates all kinds of organizational havoc (which is why the protector must be fired as well as the tech PM). Give him the benefit of the doubt and discuss with him how he would would handle things differently if the tech pm and her protector were out of the way. If you like what you hear, you can keep him but organizationally you will need to step in and make sure that is it clear this person is in charge and no one will be allowed to ignore him. Once a lead has lost authority, he can only get it back with the strong backing from management.
I would also sit down with the lead and the developer and explain exactly what is unacceptable in the project as it currently stands. If the developer feels unable to take direction from the lead (assuming you decide to keep him) or is unable to adjust to a new way of doing business or cannot understand why the code as it stands is unacceptable, cut your losses and get rid of him as well. A new person is likely to work better for the lead if he is salvageable anyway because he won't have a history of ignoring him.

I wouldn't necessarily think that the db schema should always be shared with stakeholders. Most people wouldn't know what to do with that sort of information. If you're trying to make sure that the product fits the requirements, the requirements should be clearly laid out up front and verified throughout the development of the project.
If you're having problems with the dev, that's just par for the course. Someone more trust-worthy should have been found. If you hired a poor coder, that was your mistake.
There are a few possible solutions:
Get a better coder. He'll hate working through all the bad code but hopefully he'll slug through it till it's done. Hopefully you're willing to pay him good money.
Keep the coder and make him fix it all. Hire a new PM that can manage him better. That coder knows his code best and it might take less time for him to just improve it. In the long run, you're better off not keeping a bad coder on payroll so lose him when you're done.
Suck it up, buy a beer for everyone involved and start over with opensource. You'll probably still need a tech PM to manage the software. You'll also have to forget about doing anything custom at that point. Perhaps a contractor could manage this.
Either way, you're gonna lose some money. Should probably keep a closer eye on things next time.

I tend to think of it this way. The database schema is there to support the application's data storage requirements. What data the application needs to store will be determined by the end user's requirements. If you're not consulting your end user as to their requirements for the application you're obviously headed for trouble, but provided you have a good handle on their requirements (and likely future requirements) then database schema is a technical decision which can be made by the project team without direct input from the end user/client.
An end user is unlikely to understand the intricacies of tables, fields, normalization, etc, but they'll understand "the system needs to do xyz". Talk to the end users in a language they understand, and let your team make the appropriate technical decisions.

My big question is about the relationship between the lead pm and the tech pm's protector: did the lead pm have good reason to fear retaliation from the protector? It's entirely possible that he felt unable to do anything until the situation got bad enough that it was clearly important for people above the protector. In that case, he doesn't deserve any more harsh treatment.
The tech pm is apparently incompetent at her job, and her protector is more interested in favoring her than getting the work done. That suggests to me that they need to be dealt with in some fashion, at minimum with a talk about the importance of getting real work done, at maximum firing both of them.
The dev is likely hunkered down, trying to survive politically, given the climate you've outlined. I can't tell enough about the dev to give any advice.
Therefore, if your description and my amazing psychic powers have given me a clear and accurate picture:
Shield the lead pm from retaliation, and tell him to ditch all the crud and implement an off-the-shelf solution. (If he can't select one himself reliably, he shouldn't be lead pm.)
Discipline the tech pm and her protector. You really don't want to have people wrecking enterprise productivity that way.
The dev is the lead pm's responsibility. Leave it at that. Don't micromanage more than you have to. Have a couple of beers. Get back to your usual work.

Related

Why is convention and consistency important while working with data fields/names?

The issue is about good practice with database, form fields, and coding in general.
We run a content providing platform, much like Buzzfeed and Wired. I am currently implementing the OpenGraph meta tags for each posts, so that the post links are nicely presented in external websites such as Facebook.
A co-worker from the marketing team insisted that we should put something else other than title in the 'title' field for marketing reasons.
I argued that the Open Graph meta tags should truthfully represent the content of the link, to conserve consistency and convention - that the meta tags should not be considered 'one-off's.
However I couldn't further explain as to why I should! I'm not really good with words myself.
Most of the quarrels involve other workers wanting to 'hack' with perfectly fine APIs or implementations and I have to convince them why it is important to at least stay in the safe zone while possible.
I know convention and consistency is one of the most important practice with technology but I think I just got used to the fact and forgot my university lectures on why it is so.
Could I get some thoughts on this issue?
A co-worker from the marketing team insisted that we should put something else other than title in the 'title' field for marketing reasons.
That's a valid decision. Your job is to help save costs or make money for the business. It is not your job to maintain the Facebook ecosystem as a whole. That's not what you are payed to do.
If you don't have any business reason why this should not be done you have no case. Such a reason could be that Facebook would penalize this or that this creates some development cost or risk.
If this is not a technical decision at all, and I see no reason it would be in the question, it's his decision anyway. In that case you need to inform him of the concerns that you see and let him decide.
You clearly try to work in a mindset that lets self-discipline prevail over short-term gains and quick-and-dirty hacks. Managing to do that is always beneficial in the long run, but convincing managers and/or sales people to let go of the short-term gain is never easy (on the contrary, most of the time it is simply impossible).
Just want to let you know that there are many, many IT folks who "feel your pain". Don't give up your laudable mindset too easily.
Convention in naming makes for source code that is more readily understandable by others who follow the same conventions. That in turn makes for less costlier maintenance. Consistency in choosing "appropriate" names for things has similar benefits. Saying on the tin what's inside (and not something completely different or something way too vague and ambiguous) is the best possible practice in computing, but it is the worst possible one in marketing.

PCI DDS SAC D for small business with one emploee

I'm trying to figure out how to properly fill in PCI SAC D compliance form for a startup business with the only one owner/architect/developer/admin/QA/etc - all of them is me alone.
It's a web app for selling a particular intangible service. No card information is going to be stored. The reason for SAC D - I'd prefer to do some validation logic on my server side and have a total review and confirmation page that match the rest of UI.
Hosting environment will be AWS Beanstalk + RDS.
When I read it, common sense tells me to ignore statements like "Interview personnel" or "Review policies & procedures", but I expect that large corporate minds are not usually driven by common sense but by rules.
I can hardly imaging formal process of interviewing myself and documenting what I've asked and what I've said, especially the benefits of doing that.
Most of the questions in Requirement 8 make no sense either.
Questions that assume that stuff is more then one employee make no sense.
Can those be skipped (N/A-ed) or should I formally do the exercise and generate some funny nonsense?
Thank you!
You can N/A those questions.
Remember the SAQ is a SELF Assessment Questionnaire, not a test you are taking. The payment card industry is more concerned about your adherence to the "spirit" of PCI-DSS rather than hard fast rules. It's more about protecting cardholder data than it is complying with things that don't apply to your case. (Although anything that does apply should definitely be followed as a hard rule.)
If you did get audited, it would probably only be because you had a breach, which obviously would NOT be because you didn't "interview yourself" and put on a security ID badge when you sat down in front of your development computer :-D and I don't think you'd have any trouble at all getting that point across to the QSA.
Now, having all your security policies and procedures, network diagrams, firewall, etc. documented and reviewed periodically does apply, since for security guidelines to be followed on a continual basis, they must be reviewed on a continual basis. For these, just use common sense. In other words, go over your firewall rules and such at least as often as PCI-DSS requires and ask yourself, "Do I still need this ALLOW SNMP port 161 rule to be in effect?" etc. etc...Oh dear I think I just told you to interview yourself... :-D
Anyway, you get the idea.
Are you really really sure you need SAQ D? It's a pretty big undertaking if you're starting from scratch. Is the money flowing into your merchant account? If so you could potentially get away with SAQ A which is going to make your life WAY easier. If not, then you're probably SAQ D service provider and you'll have no choice but to do SAQ D. In terms of styling and validation you could use an iFrame solution like Braintree, you have quite a lot of control and it reduces your PCI scope significantly.
In my experience talking with the bank that holds the merchant account is a good place to start, they're keen for secure systems to be developed, so are likely to give you advice on what you need to do. You could also engage a QSA but they are not cheap in general.
I don't think (though i'm not 100% sure) interviewing yourself is required, those instructions are for auditors to use to ensure that policy and procedures are being followed. For lone developers, a big problem is code reviews, you will need someone else to do that.

What should every developer know about databases? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Whether we like it or not, many if not most of us developers either regularly work with databases or may have to work with one someday. And considering the amount of misuse and abuse in the wild, and the volume of database-related questions that come up every day, it's fair to say that there are certain concepts that developers should know - even if they don't design or work with databases today.
What is one important concept that developers and other software professionals ought to know about databases?
The very first thing developers should know about databases is this: what are databases for? Not how do they work, nor how do you build one, nor even how do you write code to retrieve or update the data in a database. But what are they for?
Unfortunately, the answer to this one is a moving target. In the heydey of databases, the 1970s through the early 1990s, databases were for the sharing of data. If you were using a database, and you weren't sharing data you were either involved in an academic project or you were wasting resources, including yourself. Setting up a database and taming a DBMS were such monumental tasks that the payback, in terms of data exploited multiple times, had to be huge to match the investment.
Over the last 15 years, databases have come to be used for storing the persistent data associated with just one application. Building a database for MySQL, or Access, or SQL Server has become so routine that databases have become almost a routine part of an ordinary application. Sometimes, that initial limited mission gets pushed upward by mission creep, as the real value of the data becomes apparent. Unfortunately, databases that were designed with a single purpose in mind often fail dramatically when they begin to be pushed into a role that's enterprise wide and mission critical.
The second thing developers need to learn about databases is the whole data centric view of the world. The data centric world view is more different from the process centric world view than anything most developers have ever learned. Compared to this gap, the gap between structured programming and object oriented programming is relatively small.
The third thing developers need to learn, at least in an overview, is data modeling, including conceptual data modeling, logical data modeling, and physical data modeling.
Conceptual data modeling is really requirements analysis from a data centric point of view.
Logical data modeling is generally the application of a specific data model to the requirements discovered in conceptual data modeling. The relational model is used far more than any other specific model, and developers need to learn the relational model for sure. Designing a powerful and relevant relational model for a nontrivial requirement is not a trivial task. You can't build good SQL tables if you misunderstand the relational model.
Physical data modeling is generally DBMS specific, and doesn't need to be learned in much detail, unless the developer is also the database builder or the DBA. What developers do need to understand is the extent to which physical database design can be separated from logical database design, and the extent to which producing a high speed database can be accomplished just by tweaking the physical design.
The next thing developers need to learn is that while speed (performance) is important, other measures of design goodness are even more important, such as the ability to revise and extend the scope of the database down the road, or simplicity of programming.
Finally, anybody who messes with databases needs to understand that the value of data often outlasts the system that captured it.
Whew!
Good question. The following are some thoughts in no particular order:
Normalization, to at least the second normal form, is essential.
Referential integrity is also essential, with proper cascading delete and update considerations.
Good and proper use of check constraints. Let the database do as much work as possible.
Don't scatter business logic in both the database and middle tier code. Pick one or the other, preferably in middle tier code.
Decide on a consistent approach for primary keys and clustered keys.
Don't over index. Choose your indexes wisely.
Consistent table and column naming. Pick a standard and stick to it.
Limit the number of columns in the database that will accept null values.
Don't get carried away with triggers. They have their use but can complicate things in a hurry.
Be careful with UDFs. They are great but can cause performance problems when you're not aware how often they might get called in a query.
Get Celko's book on database design. The man is arrogant but knows his stuff.
First, developers need to understand that there is something to know about databases. They're not just magic devices where you put in the SQL and get out result sets, but rather very complicated pieces of software with their own logic and quirks.
Second, that there are different database setups for different purposes. You do not want a developer making historical reports off an on-line transactional database if there's a data warehouse available.
Third, developers need to understand basic SQL, including joins.
Past this, it depends on how closely the developers are involved. I've worked in jobs where I was developer and de facto DBA, where the DBAs were just down the aisle, and where the DBAs are off in their own area. (I dislike the third.) Assuming the developers are involved in database design:
They need to understand basic normalization, at least the first three normal forms. Anything beyond that, get a DBA. For those with any experience with US courtrooms (and random television shows count here), there's the mnemonic "Depend on the key, the whole key, and nothing but the key, so help you Codd."
They need to have a clue about indexes, by which I mean they should have some idea what indexes they need and how they're likely to affect performance. This means not having useless indices, but not being afraid to add them to assist queries. Anything further (like the balance) should be left for the DBA.
They need to understand the need for data integrity, and be able to point to where they're verifying the data and what they're doing if they find problems. This doesn't have to be in the database (where it will be difficult to issue a meaningful error message for the user), but has to be somewhere.
They should have the basic knowledge of how to get a plan, and how to read it in general (at least enough to tell whether the algorithms are efficient or not).
They should know vaguely what a trigger is, what a view is, and that it's possible to partition pieces of databases. They don't need any sort of details, but they need to know to ask the DBA about these things.
They should of course know not to meddle with production data, or production code, or anything like that, and they should know that all source code goes into a VCS.
I've doubtless forgotten something, but the average developer need not be a DBA, provided there is a real DBA at hand.
Basic Indexing
I'm always shocked to see a table or an entire database with no indexes, or arbitrary/useless indexes. Even if you're not designing the database and just have to write some queries, it's still vital to understand, at a minimum:
What's indexed in your database and what's not:
The difference between types of scans, how they're chosen, and how the way you write a query can influence that choice;
The concept of coverage (why you shouldn't just write SELECT *);
The difference between a clustered and non-clustered index;
Why more/bigger indexes are not necessarily better;
Why you should try to avoid wrapping filter columns in functions.
Designers should also be aware of common index anti-patterns, for example:
The Access anti-pattern (indexing every column, one by one)
The Catch-All anti-pattern (one massive index on all or most columns, apparently created under the mistaken impression that it would speed up every conceivable query involving any of those columns).
The quality of a database's indexing - and whether or not you take advantage of it with the queries you write - accounts for by far the most significant chunk of performance. 9 out of 10 questions posted on SO and other forums complaining about poor performance invariably turn out to be due to poor indexing or a non-sargable expression.
Normalization
It always depresses me to see somebody struggling to write an excessively complicated query that would have been completely straightforward with a normalized design ("Show me total sales per region.").
If you understand this at the outset and design accordingly, you'll save yourself a lot of pain later. It's easy to denormalize for performance after you've normalized; it's not so easy to normalize a database that wasn't designed that way from the start.
At the very least, you should know what 3NF is and how to get there. With most transactional databases, this is a very good balance between making queries easy to write and maintaining good performance.
How Indexes Work
It's probably not the most important, but for sure the most underestimated topic.
The problem with indexing is that SQL tutorials usually don't mention them at all and that all the toy examples work without any index.
Even more experienced developers can write fairly good (and complex) SQL without knowing more about indexes than "An index makes the query fast".
That's because SQL databases do a very good job working as black-box:
Tell me what you need (gimme SQL), I'll take care of it.
And that works perfectly to retrieve the correct results. The author of the SQL doesn't need to know what the system is doing behind the scenes--until everything becomes sooo slooooow.....
That's when indexing becomes a topic. But that's usually very late and somebody (some company?) is already suffering from a real problem.
That's why I believe indexing is the No. 1 topic not to forget when working with databases. Unfortunately, it is very easy to forget it.
Disclaimer
The arguments are borrowed from the preface of my free eBook "Use The Index, Luke". I am spending quite a lot of my time explaining how indexes work and how to use them properly.
I just want to point out an observation - that is that it seems that the majority of responses assume database is interchangeable with relational databases. There are also object databases, flat file databases. It is important to asses the needs of the of the software project at hand. From a programmer perspective the database decision can be delayed until later. Data modeling on the other hand can be achieved early on and lead to much success.
I think data modeling is a key component and is a relatively old concept yet it is one that has been forgotten by many in the software industry. Data modeling, especially conceptual modeling, can reveal the functional behavior of a system and can be relied on as a road map for development.
On the other hand, the type of database required can be determined based on many different factors to include environment, user volume, and available local hardware such as harddrive space.
Avoiding SQL injection and how to secure your database
Every developer should know that this is false: "Profiling a database operation is completely different from profiling code."
There is a clear Big-O in the traditional sense. When you do an EXPLAIN PLAN (or the equivalent) you're seeing the algorithm. Some algorithms involve nested loops and are O( n ^ 2 ). Other algorithms involve B-tree lookups and are O( n log n ).
This is very, very serious. It's central to understanding why indexes matter. It's central to understanding the speed-normalization-denormalization tradeoffs. It's central to understanding why a data warehouse uses a star-schema which is not normalized for transactional updates.
If you're unclear on the algorithm being used do the following. Stop. Explain the Query Execution plan. Adjust indexes accordingly.
Also, the corollary: More Indexes are Not Better.
Sometimes an index focused on one operation will slow other operations down. Depending on the ratio of the two operations, adding an index may have good effects, no overall impact, or be detrimental to overall performance.
I think every developer should understand that databases require a different paradigm.
When writing a query to get at your data, a set-based approach is needed. Many people with an interative background struggle with this. And yet, when they embrace it, they can achieve far better results, even though the solution may not be the one that first presented itself in their iterative-focussed minds.
Excellent question. Let's see, first no one should consider querying a datbase who does not thoroughly understand joins. That's like driving a car without knowing where the steering wheel and brakes are. You also need to know datatypes and how to choose the best one.
Another thing that developers should understand is that there are three things you should have in mind when designing a database:
Data integrity - if the data can't be relied on you essentially have no data - this means do not put required logic in the application as many other sources may touch the database. Constraints, foreign keys and sometimes triggers are necessary to data integrity. Don't fail to use them because you don't like them or don't want to be bothered to understand them.
Performance - it is very hard to refactor a poorly performing database and performance should be considered from the start. There are many ways to do the same query and some are known to be faster almost always, it is short-sighted not to learn and use these ways. Read some books on performance tuning before designing queries or database structures.
Security - this data is the life-blood of your company, it also frequently contains personal information that can be stolen. Learn to protect your data from SQL injection attacks and fraud and identity theft.
When querying a database, it is easy to get the wrong answer. Make sure you understand your data model thoroughly. Remember often actual decisions are made based on the data your query returns. When it is wrong, the wrong business decisions are made. You can kill a company from bad queries or loose a big customer. Data has meaning, developers often seem to forget that.
Data almost never goes away, think in terms of storing data over time instead of just how to get it in today. That database that worked fine when it had a hundred thousand records, may not be so nice in ten years. Applications rarely last as long as data. This is one reason why designing for performance is critical.
Your database will probaly need fields that the application doesn't need to see. Things like GUIDs for replication, date inserted fields. etc. You also may need to store history of changes and who made them when and be able to restore bad changes from this storehouse. Think about how you intend to do this before you come ask a web site how to fix the problem where you forgot to put a where clause on an update and updated the whole table.
Never develop in a newer version of a database than the production version. Never, never, never develop directly against a production database.
If you don't have a database administrator, make sure someone is making backups and knows how to restore them and has tested restoring them.
Database code is code, there is no excuse for not keeping it in source control just like the rest of your code.
Evolutionary Database Design. http://martinfowler.com/articles/evodb.html
These agile methodologies make database change process manageable, predictable and testable.
Developers should know, what it takes to refactor a production database in terms of version control, continious integration and automated testing.
Evolutionary Database Design process has administrative aspects, for example a column is to be dropped after some life time period in all databases of this codebase.
At least know, that Database Refactoring concept and methodologies exist.
http://www.agiledata.org/essays/databaseRefactoringCatalog.html
Classification and process description makes it possible to implement tooling for these refactorings too.
About the following comment to Walter M.'s answer:
"Very well written! And the historical perspective is great for people who weren't doing database work at that time (i.e. me)".
The historical perspective is in a certain sense absolutely crucial. "Those who forget history, are doomed to repeat it.". Cfr XML repeating the hierarchical mistakes of the past, graph databases repeating the network mistakes of the past, OO systems forcing the hierarchical model upon users while everybody with even just a tenth of a brain should know that the hierarchical model is not suitable for general-purpose representation of the real world, etcetera, etcetera.
As for the question itself:
Every database developer should know that "Relational" is not equal to "SQL". Then they would understand why they are being let down so abysmally by the DBMS vendors, and why they should be telling those same vendors to come up with better stuff (e.g. DBMS's that are truly relational) if they want to go on sucking hilarious amounts of money out of their customers for such crappy software).
And every database developer should know everything about the relational algebra. Then there would no longer be a single developer left who had to post these stupid "I don't know how to do my job and want someone else to do it for me" questions on Stack Overflow anymore.
From my experience with relational databases, every developer should know:
- The different data types:
Using the correct type for the correct job will make your DB design more robust, your queries faster and your life easier.
- Learn about 1xM and MxM:
This is the bread and butter for relational databases. You need to understand one-to-many and many-to-many relations and apply then when appropriate.
- "K.I.S.S." principle applies to the DB as well:
Simplicity always works best. Provided you have studied how DB work, you will avoid unnecessary complexity which will lead to maintenance and speed problems.
- Indices:
It's not enough if you know what they are. You need to understand when to used them and when not to.
also:
Boolean algebra is your friend
Images: Don't store them on the DB. Don't ask why.
Test DELETE with SELECT
I would like everyone, both DBAs and developer/designer/architects, to better understand how to properly model a business domain, and how to map/translate that business domain model into both a normalized database logical model, an optimized physical model, and an appropriate object oriented class model, each one of which is (can be) different, for various reasons, and understand when, why, and how they are (or should be) different from one another.
I would say strong basic SQL skills. I've seen a lot of developers so far who know a little about databases but are always asking for tips about how to formulate a quite simple query. Queries are not always that easy and simple. You do have to use multiple joins (inner, left, etc.) when querying a well normalized database.
I think a lot of the technical details have been covered here and I don't want to add to them. The one thing I want to say is more social than technical, don't fall for the "DBA knowing the best" trap as an application developer.
If you are having performance issues with query take ownership of the problem too. Do your own research and push for the DBAs to explain what's happening and how their solutions are addressing the problem.
Come up with your own suggestions too after you have done the research. That is, I try to find a cooperative solution to the problem rather than leaving database issues to the DBAs.
Simple respect.
It's not just a repository
You probably don't know better than the vendor or the DBAs
You won't support it at 3 a.m. with senior managers shouting at you
Consider Denormalization as a possible angel, not the devil, and also consider NoSQL databases as an alternative to relational databases.
Also, I think the Entity-Relation model is a must-know for every developper even if you don't design databases. It'll let you understand thoroughly what's your database all about.
Never insert data with the wrong text encoding.
Once your database becomes polluted with multiple encodings, the best you can do is apply some kind combination of heuristics and manual labor.
Aside from syntax and conceptual options they employ (such as joins, triggers, and stored procedures), one thing that will be critical for every developer employing a database is this:
Know how your engine is going to perform the query you are writing with specificity.
The reason I think this is so important is simply production stability. You should know how your code performs so you're not stopping all execution in your thread while you wait for a long function to complete, so why would you not want to know how your query will affect the database, your program, and perhaps even the server?
This is actually something that has hit my R&D team more times than missing semicolons or the like. The presumtion is the query will execute quickly because it does on their development system with only a few thousand rows in the tables. Even if the production database is the same size, it is more than likely going to be used a lot more, and thus suffer from other constraints like multiple users accessing it at the same time, or something going wrong with another query elsewhere, thus delaying the result of this query.
Even simple things like how joins affect performance of a query are invaluable in production. There are many features of many database engines that make things easier conceptually, but may introduce gotchas in performance if not thought of clearly.
Know your database engine execution process and plan for it.
For a middle-of-the-road professional developer who uses databases a lot (writing/maintaining queries daily or almost daily), I think the expectation should be the same as any other field: You wrote one in college.
Every C++ geek wrote a string class in college. Every graphics geek wrote a raytracer in college. Every web geek wrote interactive websites (usually before we had "web frameworks") in college. Every hardware nerd (and even software nerds) built a CPU in college. Every physician dissected an entire cadaver in college, even if she's only going to take my blood pressure and tell me my cholesterol is too high today. Why would databases be any different?
Unfortunately, they do seem different, today, for some reason. People want .NET programmers to know how strings work in C, but the internals of your RDBMS shouldn't concern you too much.
It's virtually impossible to get the same level of understanding from just reading about them, or even working your way down from the top. But if you start at the bottom and understand each piece, then it's relatively easy to figure out the specifics for your database. Even things that lots of database geeks can't seem to grok, like when to use a non-relational database.
Maybe that's a bit strict, especially if you didn't study computer science in college. I'll tone it down some: You could write one today, completely, from scratch. I don't care if you know the specifics of how the PostgreSQL query optimizer works, but if you know enough to write one yourself, it probably won't be too different from what they did. And you know, it's really not that hard to write a basic one.
The order of columns in a non-unique index is important.
The first column should be the column that has the most variability in its content (i.e. cardinality).
This is to aid SQL Server ability to create useful statistics in how to use the index at runtime.
Understand the tools that you use to program the database!!!
I wasted so much time trying to understand why my code was mysteriously failing.
If you're using .NET, for example, you need to know how to properly use the objects in the System.Data.SqlClient namespace. You need to know how to manage your SqlConnection objects to make sure they are opened, closed, and when necessary, disposed properly.
You need to know that when you use a SqlDataReader, it is necessary to close it separately from your SqlConnection. You need to understand how to keep connections open when appropriate to how to minimize the number of hits to the database (because they are relatively expensive in terms of computing time).
Basic SQL skills.
Indexing.
Deal with different incarnations of DATE/ TIME/ TIMESTAMP.
JDBC driver documentation for the platform you are using.
Deal with binary data types (CLOB, BLOB, etc.)
For some projects, and Object-Oriented model is better.
For other projects, a Relational model is better.
The impedance mismatch problem, and know the common deficiencies or ORMs.
RDBMS Compatibility
Look if it is needed to run the application in more than one RDBMS. If yes, it might be necessary to:
avoid RDBMS SQL extensions
eliminate triggers and store procedures
follow strict SQL standards
convert field data types
change transaction isolation levels
Otherwise, these questions should be treated separately and different versions (or configurations) of the application would be developed.
Don't depend on the order of rows returned by an SQL query.
Three (things) is the magic number:
Your database needs version control too.
Cursors are slow and you probably don't need them.
Triggers are evil*
*almost always

How to get a customer to understand the importance of a qualified DBA?

I'm part of a software development company where we do custom developed applications for our clients.
Our software uses MS SQL Server and we have encountered some customers which do not have a DBA on staff to manage the databases or if they do, they lack the necessary knowledge to perform their job adequately.
We are in the process of drafting a contract with one of those customers to provide development services for new functionality on our software during the next year, where they have an amount of hours available for customization of our software.
Now they want us to include also a quote for database administration services and the problem is that they are including a clause that says that those services will be provided only when they request it.
My first reaction is that db administration is an ongoing process and not something that they can call us once a month to come for a day or two. I'm talking about a central 1TB+ MSSql Cluster and 100 branch offices with MSSql Workgroup edition.
My question is for any suggestions on how I could argue that there must be a fixed amount of hours every month for dba work and not only when their management thinks they need it (which I’m guessing would only be when they have a problem).
PS: Maybe this will be closed as not programming related. But I'm a programmer and I have this problem. My work is software development but i don't want to lose this client and the only solution I can think of is to find a way for the client to understand the scope so we can hire a qualified DBA to provide them with the service they require.
Edit: We are in a Latin American country with clients in the Spanish speaking region. My guess is that in more developed countries there is a culture that knows how delicate the situation is.
This is definitely one of those 'you can lead a horse to water, but you can't make them drink' situations.
My recommendation here would be to quote the DBA services as hourly, and make the rate high enough that you can outsource the work if you decide you want to. When (not if) the SQL servers start to have problems, the firm is on the hook.
I would also recommend that you include in your quote a non-optional 2 hour database technology review once per year. This is your opportunity to say 'You spent XXX on database maintenance this year, most of which was spent fighting fires that could have been easily avoided if you had just spent XXXX/4 and hired a DBA. We care about you as a customer, and we want you to save money, so we really recommend that you commit to using a DBA to perform periodic preventative maintenance'.
I would also recommend that you categorize any support requests as having root cause b/c of database maintenance vs other causes. This will let you put a nice pie chart in front of the customer during their annual review (which they are going to pay you to perform). It is critical to manage the perception so they don't think your code is causing the problems. You might even go so far as to share these metrics (db related issue vs non-db related issue) with them on a quarterly basis.
Sometimes people need to experience pain before they change. The key is to not be in between the hammer and their thumb as they learn the lesson, and hourly quoted work is one way of doing this.
As a side note, this sort of question is of great interest to a large number of developers. I'd say that this sort of thing could impact a programmer's quality of life more than any algorithm or library question ever could. Thanks for asking it!
No DBA on a system that size is a disaster waiting to happen. If they don't understand that, they are not qualified to run a database that size. I'd recommend that they talk to other companies with similar sized databases and have them ask them about their DBAs and what they do for them, and if they think they could survive without them.
Perhaps the link below from MS SQL Tips could give you some good talking points. But people who aren't technical wont respond to a technical explanation of the necessity of good DBA you are likley going to have to work toward proving the cost of bad DBA. Work out the worst case scenarios and see how they feel about them. If you can make it seem like a good financial move (and I think we all know it is) it will be an easy sell.
http://www.mssqltips.com/tip.asp?tip=1278

Trialware/licensing strategies [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I wrote a utility for photographers that I plan to sell online pretty cheap ($10). I'd like to allow the user to try the software out for a week or so before asking for a license. Since this is a personal project and the software is not very expensive, I don't think that purchasing the services of professional licensing providers would be worth it and I'm rolling my own.
Currently, the application checks for a registry key that contains an encrypted string that either specifies when the trial expires or that they have a valid license. If the key is not present, a trial period key is created.
So all you would need to do to get another week for free is delete the registry key. I don't think many users would do that, especially when the app is only $10, but I'm curious if there's a better way to do this that is not onerous to the legitimate user. I write web apps normally and haven't dealt with this stuff before.
The app is in .NET 2.0, if that matters.
EDIT: You can make your current licensing scheme considerable more difficult to crack by storing the registry information in the Local Security Authority (LSA). Most users will not be able to remove your key information from there. A search for LSA on MSDN should give you the information you need.
Opinions on licensing schemes vary with each individual, more among developers than specific user groups (such as photographers). You should take a deep breath and try to see what your target user would accept, given the business need your application will solve.
This is my personal opinion on the subject. There will be vocal individuals that disagree.
The answer to this depends greatly on how you expect your application to be used. If you expect the application to be used several times every day, you will benefit the most from a very long trial period (several month), to create a lock-in situation. For this to work you will have to have a grace period where the software alerts the user that payment will be needed soon. Before the grace period you will have greater success if the software is silent about the trial period.
Wether or not you choose to believe in this quite bold statement is of course entirely up to you. But if you do, you should realize that the less often your application will be used, the shorter the trial period should be. It is also very important that payment is very quick and easy for the user (as little data entry and as few clicks as possible).
If you are very uncertain about the usage of the application, you should choose a very short trial period. You will, in my experience, achieve better results if the application is silent about the fact that it is in trial period in this case.
Though effective for licensing purposes, "Call home" features is regarded as a privacy threat by many people. Personally I disagree with the notion that this is any way bad for a customer that is willing to pay for the software he/she is using. Therefore I suggest implementing a licensing scheme where the application checks the license status (trial, paid) on a regular basis, and helps the user pay for the software when it's time. This might be overkill for a small utility application, though.
For very small, or even simple, utility applications, I argue that upfront payment without trial period is the most effective.
Regarding the security of the solution, you have to make it proportional to the development effort. In my line of work, security is very critical because there are partners and dealers involved, and because the investment made in development is very high. For a small utility application, it makes more sense to price it right and rely on the honest users that will pay for the software that address their business needs.
There's not much point to doing complicated protection schemes. Basically one of two things will happen:
Your app is not popular enough, and nobody cracks it.
Your app becomes popular, someone cracks it and releases it, then anybody with zero knowledge can simply download that crack if they want to cheat you.
In the case of #1, it's not worth putting a lot of effort into the scheme, because you might make one or two extra people buy your app. In the case of #2, it's not worth putting a lot of effort because someone will crack it anyway, and the effort will be wasted.
Basically my suggestion is just do something simple, like you already are, and that's just as effective. People who don't want to cheat / steal from you will pay up, people who want to cheat you will do it regardless.
If you are hosting your homepage on a server that you control, you could have the downloadable trial-version of your software automatically compile to a new binary every night. This compile will replace a hardcoded datetime-value in your program for when the software expires. That way the only way to "cheat" is to change the date on your computer, and most people wont do that because of the problems that will create.
Try the Shareware Starter Kit. It was developed my Microsoft and may have some other features you want.
http://msdn.microsoft.com/en-us/vs2005/aa718342.aspx
If you are planning to continue developing your software, you might consider the ransom model:
http://en.wikipedia.org/wiki/Street_Performer_Protocol
Essentially, you develop improvements to the software, and then ask for a certain amount of donations before you release them (without any DRM).
One way to do it that's easy for the user but not for you is to hard-code the expiry date and make new versions of the installer every now and then... :)
If I were you though, I wouldn't make it any more advanced than what you're already doing. Like you say it's only $10, and if someone really wants to crack your system they will do it no matter how complicated you make it.
You could do a slightly more advanced version of your scheme by requiring a net connection and letting a server generate the trial key. If you do something along the lines of sign(hash(unique_computer_id+when_to_expire)) and let the app check with a public key that your server has signed the expiry date it should require a "real" hack to bypass.
This way you can store the unique id's serverside and refuse to generate a expiry date more than once or twice. Not sure what to use as the unique id, but there should be some way to get something useful from Windows.
I am facing the very same problem with an application I'm selling for a very low price as well.
Besides obfuscating the app, I came up with a system that uses two keys in the registry, one of which is used to determine that time of installation, the other one the actual license key. The keys are named obscurely and a missing key indicates tampering with the installation.
Of course deleting both keys and reinstalling the application will start the evaluation time again.
I figured it doesn't matter anyway, as someone who wants to crack the app will succeed in doing so, or find a crack by someone who succeeded in doing so.
So in the end I'm only achieving the goal of making it not TOO easy to crack the application, and this is what, I guess, will stop 80-90% of the customers from doing so. And afterall: as the application is sold for a very low price, there's no justification for me to invest any more time into this issue than I already have.
just be cool about the license. explain up front that this is your passion and a child of your labor. give people a chance to do the right thing. if someone wants to pirate it, it will happen eventually. i still remember my despair seeing my books on bittorrent, but its something you have to just deal with. Don't cave to casual piracy (what you're doing now sounds great) but don't cripple the thing beyond that.
I still believe that there are enough honest people out there to make a for-profit coding endeavor worth while.
Don't have the evaluation based on "days since install", instead do number of days used, or number of times run or something similar. People tend to download shareware, run it once or twice, and then forget it for a few weeks until they need it again. By then, the trial may have expired and so they've only had a few tries to get hooked on using your app, even though they've had it installed for a while. Number of activation/days instead lets them get into a habit of using your app for a task, and also makes a stronger sell (i.e. you've used this app 30 times...).
Even better, limiting the features works better than timing out. For example, perhaps your photography app could limit the user to 1 megapixel images, but let them use it for as long as they want.
Also, consider pricing your app at $20 (or $19.95). Unless there's already a micropayment setup in place (like iPhone store or XBoxLive or something) people tend to have an aversion to buying things online below a certain price point (which is around $20 depending on the type of app), and people assume subconciously if something is inexpensive, it must not be very good. You can actually raise your conversion rate with a higher price (up to a point of course).
In these sort of circumstances, I don't really think it matters what you do. If you have some kind of protection it will stop 90% of your users. The other 10% - if they don't want to pay for your software they'll pretty much find a way around protection no matter what you do.
If you want something a little less obvious you can put a file in System32 that sounds like a system file that the application checks the existence of on launch. That can be a little harder to track down.

Resources