I'm coding a new {monthly|yearly} paid site with the now typical "referral" system: when a new user signs up, they can specify the {username|referral code} of other user (this can be detected automatically if they came through a special URL), which will cause the referrer to earn a percentage of anything the new user pays.
Before reinventing the wheel, I'd like to know if any of you have experience with storing this kind of data in a relational DB. Currently I'm using MySQL, but I believe any good solution should be easily adapted to any RDBMS, right?
I'm looking to support the following features:
Online billing system - once each invoice is paid, earnings for referrals are calculated and they will be able to cash-out. This includes, of course, having the possibility of browsing invoices / payments online.
Paid options vary - they are different in nature and in costs (which will vary sometime), so commissions should be calculated based on each final invoice.
Keeping track of referrals (relationship between users, date in which it was referred, and any other useful information - any ideas?)
A simple way to access historical referring data (how much have been paid) or accrued commissions.
In the future, I might offer to exchange accrued cash for subscription renewal (covering the whole of the new subscription or just a part of it, having to pay the difference if needed)
Multiple levels - I'm thinking of paying something around 10% of direct referred earnings + 2% the next level, but this may change in the future (add more levels, change percentages), so I should be able to store historical data.
Note that I'm not planning to use this in any other project, so I'm not worried about it being "plug and play".
Have you done any work with similar requirements? If so, how did you handle all this stuff? Would you recommend any particular DB schema? Why?
Is there anything I'm missing that would help making this a more flexible implementation?

Rather marvellously, there's a library of database schemas. Although I can't see something specific to referrals, there may be something related. At least (hopefully) you should be able to get some ideas.


How to design the logic or database when two users choose one product?

Assuming a e commerce web app has a high amount of requests, how do I prevent two users from choosing the only product left? Should I check the quantity when adding to shopping list or payment? Is it using a field to record quantity of selected product in DB is bad way? How does the large e commerce web app like amazon deal with conflict problem?
Several options that I know :
For the RDBMS that support ACID , you can use optimistic locking technique on the product table. Unless it is very often that many users hit the buying button on the same product at the nearly same times ,it should work pretty well.(For how many users does the 'many' means, you have to measure it. I think 1k should be no problem. Just my guess , don't take it for granted)
Do not check it and let users to buy it. Adjust the business flow to handle it. For example, when an user hits the buying button ,tell him his order is just accepted and will be processed but not guarantee he must able to buy it. Then in the later stage when you find that there is not enough inventory to ship the product to him , send an email to apologise and refund to him.
Also in the real business , it is common that the product inventory can go to negative and still accepting orders but tell the user he will get the product at XXX days later. The business can then produce or order more product from the supplier after receiving the money.
If you are buying iPhone on the Apple web site , it also works like this.
It really depends upon the number of concurrent users here. In the case of millions, the NoSQL approach is prefered to manage the basket with eventual consistency then the buying process would go with ACID to ensure the product can be sold.
For less users, you can rely on an ACID database.
If you are not sure, you may go with a database that has ACID capabilities but can as well allow you to work in an eventual consistency way or that can implement the concept of sharding for scalability purpose. To my knowledge Oracle can do these 3 things: COMMIT NO WAIT, COMMIT and Sharding deployment.

Best way to build a DataMart from multiple external systems?

I'm in the planning stages of building a SQL Server DataMart for mail/email/SMS contact info and history. Each piece of data is located in a different external system. Because of this, email addresses do not have account numbers and SMS phone numbers do not have email addresses, etc. In other words, there isn't a shared primary key. Some data overlaps, but there isn't much I can do except keep the most complete version when duplicates arise.
Is there a best practice for building a DataMart with this data? Would it be an acceptable practice to create a key table with a column for each external key? Then, a unique primary ID can be assigned to tie this to other DataMart tables.
Looking for ideas/suggestions on approaches I may not have yet thought of.
The email address or phone number itself sounds like a suitable business key. Typically a "staging" database is used to load the data from multiple sources and then assign surrogate keys and do other transformations.
Are you familiar with data warehouse methods and design patterns? If you don't have previous knowledge or experience then consider hiring some help. BI / data warehouse projects have a very high failure rate and mistakes can be expensive.
Found more information here:,_transform,_load#Dealing_with_keys
Well, with no other information to tie the disparate pieces together, your datamart is going to be pretty rudimentary. You'll be able to get the types of data (sms, email, mail), metrics for each type over time ("this week/month/quarter/year we averaged 42.5 sms texts per day, and 8000 emails per month! w00t!"). With just phone numbers and email addresses, your "other datamarts" will likely have to be phone company names, or internet domains. I guess you could link from that into some sort of geographical information (internet provider locations?), or maybe financial information for the companies. Kind of a blur if you don't already know which direction you want to head.
To be honest, this sounds like someone high-up is having a knee-jerk reaction to the "datamart" buzzword coupled with hearing something about how important communication metrics are, so they sent orders on down the chain to "get us some datamarts to run stats on all our e-mails!"
You need to figure out what it is that you or your employer is expecting to get out of this project, and then figure out if the data you're currently collecting gives you a trail to follow to that information. Right now it sounds like you're doing it backwards ("I have this data, what's it good for?"). It's entirely possible that you don't currently have the data you need, which means you'll need to buy it (who knows if you could) or start collecting it, in which case you won't have nice looking graphs and trend-lines for upper-management to look at for some time... falling right in line with the warning dportas gave you in his second paragraph ;)

How to get book metadata?

My application needs to retrieve information about any published book based on a provided ISBN, title, or author. This is hardly a unique requirement---sites like,, and even software like Book Collector seem to be able to do this easily. But I have not been able to replicate it.
To clarify, I do not need to search the entire database of books---only a limited subset which have been inputted, as in a book collection. The database would simply allow me to tag the inputted books with the necessary metadata to enable search on that subset of books. So scale is not the issue here---getting the metadata is.
The options I have tried are:
Scrape Amazon. Scraping the regular Amazon pages was not very robust to things like missing authors, and while scraping the smaller mobile pages was faster, they shared the same issues with robustness of extraction. Plus, building this into an application is a clear violation of Amazon's Terms of Service.
Scrape the Library of Congress. While this seems to have fewer legal ramifications, ease and robustness were again issues. API. While the service is free up to a point, and does a good job of returning the necessary metadata, I need to do this for over 500 books on a daily basis, at which point this service costs money proportional to use. I'd prefer a free or one-time payment solution that allows me to do the same.
Google Book Data API. While this seems to provide the information I need, I cannot display the book preview as their terms of service requires.
Buy a license to a database of books. For example, companies like Ingram or Baker & Taylor provide these catalogs to retailers and libraries. This solution is obviously expensive, so I'm hoping that there's a more elegant solution I've missed. But if not, and someone on SO has had a good experience with a particular database, I'm willing to go with that.
I've tried to describe my approach in detail so others with fewer books can take advantage of the above solutions. But given my requirements, I'm at my wits' end for retrieving book metadata.
Since it is unlikely that you have to retrieve the same 500 books every day: store the data retrieved from in a database and fill it up book by book.
Instead of scraping Amazon, you can use the API they expose for their affiliate program:
It allows about 3k requests per hour and returns well-formed XML. It requires you to set a link to the book that you show the information about, and you must state that you are an affiliate partner.
This might be what you're looking for. They even offer a complete download!
As it seems, a lot of libraries and other organisations make information such as "ISBN" available through MAchine-Readable Cataloging aka MARC, you can find more information about it here as well.
Now knowing the "right" term to search for I discovered
Maybe this whole MARC thing gives you a new kind of an idea :)

Best practices on what data to collect in an in-app web analytics

In our SaaSy webapp we need to collect Google Analytics-like data (like, what pages were visited, how many 404s where there, etc.). I wonder if there are any best practices on what pieces of information should be collected (like, IP, User Agent, etc.) and how should these logs be stored. Requirements on what statistics we're going to display are not yet fixed, but I want to have a starting point.
Tracking for the sake of tracking is pointless. The point of tracking activity on your site is to answer specific business questions, such as how many people are buying your product, or how far are they getting in your sale funnel or other events like signing up for a newsletter, etc...What you should be doing is asking the people who make business decisions what it is they need/want to know, and go from there.
Having said that, most ad-hoc reports can be generated with basics like the URL and timestamp. Ability to parse specific variables from the URL and categorize them and their values is handy for campaign tracking. Tracking IP addresses are good for debugging and finding out what country/region/market the user is coming from. Referring URL is good for tracking where the user came from on the internet (another site, paid vs. organic search, a campaign, etc...).
And then throw a couple of variables into the mix. Allow for the ability to populate variables with arbitrary information (like product IDs, etc...) that can be sent to you and stored, so you can see things like how many times a product was viewed or purchased, how much it cost, etc...
But anyways, to answer your question, ultimately "best practice" is first sitting down with the guys in suits and ask what they want/need to know and work with them to find out if what they want to know is just silly or if it's actually actionable (for example, knowing things like number of pageviews is okay but how actionable is it really? What's MORE actionable is knowing how many of xyz is being sold, or where on your site people are abandoning you, so you can streamline your site, maybe decide your product or offer sucks and needs to be revisited, etc...).
I have to ask there a particular reason you wish to create your own tracking tool as opposed to using or investing in one of the many tools already out there? There is Google Analytics (GA), Yahoo Web Analytics (YWA), Omniture SiteCatalyst, Webtrends to name a few. Some are free, some cost money, but it is an investment that yields real returns if used properly.

Design principles for designing database architecture of financial transaction system?

I want to design a database which will keep record for financial transaction.I want to design it as a product so that it can be used for any type of financial transaction.Are there some design principles specific to financial transaction database design that can help me out to make database more durable for long term with minimal architectural level changes.Some good examples will be a great help too.
Some things particular to financial systems include internal controls (This is a critical accounting term, do some research to really think this one through). Things like the person entering the check value can't also approve it. Things like using stored procs and not SQL generated from the application so that you can restrict rights to only the procs (no dynamic SQL at all - ever - in a financial system) and so users can only do what they are authorized to do. No rights for anyone except the production dba and an alternate to the tables. Fraud is what you are trying to protect the system from not just outside attacks. Security is critical to financial systems.
You also need audit tables to know who changed what data and when and what the old value was. This is not only an additional way to help find problems if someone got around the internal controls (or the system forgot to implement some critical ones) stole money, but it is often critical to be able to undo a mistake without having to restore. In general accounting systems often have data fields that are not viewable by the user and that are generated through default values or in a way that the user doesn't see them.
Another thing is you need to view actions in time so things that might look like a natural relationship may need denormalizing to preserve what the cost was at the time the action happened. So if you have an hourly rate table, you would use that as a lookup to get the rate at the time of the action not join to it to get the rate when you query.
Financial systems have private data in them, almost always, think how you are going to protect this data. You will need to be encrypting and decrypting data. You probably want an encrypted backup as well.
This data is the lifeblood of a company, it is critical that you have a good backup plan and much practice restoring. Off-site backups are critical.
Data integrity is critical. You need the correct datatypes and you need pk/fk relationships, constraints and triggers to enforce the rules. A financial system can't afford to have orphaned records.
You need to consider deletes very carefully. Financial systems often do soft deletes (mark records as deleted to avoid losing historical data. Yes XYZ company is no longer a customer, but you don't want to lose the financial history of the orders they had in the past. I would not even consider using cascade delete in a financial system.
Don't just talk to accountants in designing the system, talk to financial people who will run the system and auditors who will audit the results. Read and know thoroughly the published accounting standard for the country you are designing for. Look at tax implications. This is complex stuff.
Think about data warehousing and archiving data. Financial systems often query old data for reports, reporting is big, big, big for financial systems. Think how to do it effectively without affecting day-to-day data entry.
Depending one what you are actually trying to achieve, for you to create a "financial transaction" system that is useful you will need to teach yourself about journals, ledgers and other details of accounting. It isn't as simple as logging the actual transactions in a table...
Really, I don't think you will find database design principles for financial systems that are all that different from from any database system that needs it's information to be 100% correct.
Hence, reading the following when working with databases never hurt anyone:
