So I am building a points system app for my high school (50% because I thought it was a great idea 50% because I want to educate myself about databases).
Essentially, my cafeteria uses a coupon system (physical paper) to provide a currency to buy food. Customers give cash in exchange for these coupons. I want to create a system and an app in which the customer gives cash, and the staff could log that 'customer x has y points in the system, therefore they can buy food worth y amount of points'. For this, you obviously need a database which is very durable, and preferably ACID compliant, as transactions such as these are very important and data loss could be devastating in these situations.
Except, for the data model I have, I don't have any relational data. If I opted for NoSQL, I could create a JSON doc like this for every student:
{"id": 0000000, "name": "DarthKitten", "balance": 30, "account_frozen": 0}
Whenever a customer deposits or withdraws points, it's a very simple write function to 'Balance'. There could be a single JSON doc which would contain the food names and prices (as NoSQL is schemaless) which the cafeteria could modify to their desires using the app. The app would pull this data and subtract it from the balance. I am also planning to add a Redis Cache over whichever database to increase checkout speed.
People here on Stack Overflow say you NEED to use an RDBMS for these situations as they are secure and ACID compliant, but when I look at a NoSQl database like CouchDB, which sports high levels of data durability, I see no point in using an RDBMS at all. Thoughts from more experienced devs? And no, I don't want comments about the viability of my school actually using this; whether they do or not I'm still learning something about programming. Thank you :).
Related
Assuming a e commerce web app has a high amount of requests, how do I prevent two users from choosing the only product left? Should I check the quantity when adding to shopping list or payment? Is it using a field to record quantity of selected product in DB is bad way? How does the large e commerce web app like amazon deal with conflict problem?
Several options that I know :
For the RDBMS that support ACID , you can use optimistic locking technique on the product table. Unless it is very often that many users hit the buying button on the same product at the nearly same times ,it should work pretty well.(For how many users does the 'many' means, you have to measure it. I think 1k should be no problem. Just my guess , don't take it for granted)
Do not check it and let users to buy it. Adjust the business flow to handle it. For example, when an user hits the buying button ,tell him his order is just accepted and will be processed but not guarantee he must able to buy it. Then in the later stage when you find that there is not enough inventory to ship the product to him , send an email to apologise and refund to him.
Also in the real business , it is common that the product inventory can go to negative and still accepting orders but tell the user he will get the product at XXX days later. The business can then produce or order more product from the supplier after receiving the money.
If you are buying iPhone on the Apple web site , it also works like this.
It really depends upon the number of concurrent users here. In the case of millions, the NoSQL approach is prefered to manage the basket with eventual consistency then the buying process would go with ACID to ensure the product can be sold.
For less users, you can rely on an ACID database.
If you are not sure, you may go with a database that has ACID capabilities but can as well allow you to work in an eventual consistency way or that can implement the concept of sharding for scalability purpose. To my knowledge Oracle can do these 3 things: COMMIT NO WAIT, COMMIT and Sharding deployment.
HTH
I am trying to create a social app in which users can follow their friends and their personalised feed is real time as well.
Question : Is Graph Database the best option to cater to such problem. What is the experience when the data reaches millions. Also, what is the right way to proceed for feeds, do we keep Kafka stream for each user? How do I start with the whole setup with respect to over engineering, a start point and the flow.
As usual, it depends 100% on how you use these technologies.
Neo4j (a graph database) can store fairly large amounts of data:
A graph database with a quadrillion nodes? Such a monstrous entity is
beyond the scope of what technologist are trying to do now. But with
the latest release of the Neo4j database from Neo Technology, such a
graph is theoretically possible.
There is effectively no limit to the sizes of graphs that people can
run with Neo4j 3.0, which was announced today, says Neo Vice President
of Products Philip Rathle.
“Before Neo4j 3.0, graph sizes were limited to tens of billions of
records,” Rathle says. “Even though they may not have tens of billions
of data items to actually store in a graph, just having a ceiling made
them nervous.”
By adopting dynamically sized pointers, Neo4j can now scale up to run
the biggest graph workloads that customers can throw at it. The
company expects some of its customers will begin to put that extra
capacity to use, for things such as crunching IoT data, identifying
fraud, and generating product recommendations.
Source: https://www.datanami.com/2016/04/26/neo4j-pushes-graph-db-limits-past-quadrillion-nodes/
Start with something simple, Neo4j sounds like a good starting point. Once you start hitting bottlenecks or scaling issues, then you can start looking at other solutions. It's very hard to predict where your bottlenecks will be without real-world data.
Realtime feeds at scale is hard to build, first define how realtime do you want it to be. Is 1 minute still considered realtime? Maybe 5 minutes?
The number you choose here will directly affect your technology choices.
Either way, more information is needed to give a more detailed answer.
I would like to implement a simple prototype of this for my local government which has a lot of corrupt and lazy government officials. I feel it would really speed things up and make sure there are no scams in records, of which there are many atm. I have seen many big cities implementing blockchain.
I'm a computer science student and an android and web programmer and I've bought a few technical books on blockchain. But I'm not sure how to go about doing something like this.
Is something like this really going to be feasible and secure from hackers? I don't wanna make things worse.
As I understood you are planning to use a blockchain as a database to keep the records regarding lands, vehicles, government services etc. to avoid making illegal changes by the government officials themselves or some other third party.
To get the advantage of having a distributed database, in an application like yours especially, the blockchain should be a public one. (Rather than few government institutes running full nodes ). In that case as per your need of only protecting records it might probably be a separate blockchain for the particular purpose. And probably there should be an advantage for the participants for them to participate in mining consuming their resources. I think then you have to come up with a economic model as well. (although if this is not the exact way you are gonna do there have to be reason for ppl to participate somehow)
And regarding how secure the system will be; that depends on your model. If it's going to be a blockchain with few nodes or even let's say public with a large no of nodes and if it's still worth the money to spend on attacking the network and acquiring one or two lands there can be situations that the network may under go 51% attack or a selfish mining attack.
So what I can suggest is to study more on blockchain, consensus algorithms used, how existing blockchains achieve the data security etc. and think about a model (may be an economic model) that the miners get an advantage by running the node, and the cost to attack the system outweigh the advantages that can be gained by attacking etc.
Our masters thesis project is creating a database schema analyzer. As a foundation to this, we are working on quantifying bad database design.
Our supervisor has tasked us with analyzing a real world schema, of our choosing, such that we can identify some/several design issues. These issues are to be used as a starting point in the schema analyzer.
Finding a good schema is a bit difficult because we do not want a schema which is well designed in all aspects, but a schema that is more "rare to medium".
We have already scheduled the following schemas for analysis: wikimedia, moodle and drupal. Not sure in which category each fit. It is not necessary that the schema is open source.
The database engine used is not important, though we would like to focus on SQL server, Posgresql and Oracle.
For now literature will be deferred, as this task is supposed to give us real world examples which can be used in the thesis. i.e. "Design X is perceived by us as bad design, which our analyzer identifies and suggests improvements to", instead of coming up with contrived examples.
I will update this post when we have some kind of a tool ready.
Check the Dell-dvd-store, you can use it for free.
The Dell DVD Store is an open source
simulation of an online ecommerce site
with implementations in Microsoft SQL
Server, Oracle and MySQL along with
driver programs and web applications
Bill Karwin has written a great book about bad designs: SQL antipatterns
I'm working on a project including a geographical information system. And in my opinion these designs are often "medium" to "rare".
Here are some examples:
1) Geonames.org
You can find the data and the schema here: http://download.geonames.org/export/dump/ (scroll down to the bottom of the page for the schema, it's in plain text on the site !)
It'd be interesting how this DB design performs with such a HUGE amount of data!
2) OpenGeoDB
This one is very popular in german-speaking countries (Germany, Austria, Switzerland) because it's a database containing nearly every city/town/village in the german speaking region with zip-code, name, hierarchy and coordinates.
This one comes with a .sql schema and the table fields are in english, so this shouldn't be a problem.
http://fa-technik.adfc.de/code/opengeodb/
The interesting thing in both examples is how they managed the hierarchy of entities like Country -> State -> County -> City -> Village etc.
PS: Maybe you could judge my DB design too ;) DB Schema of a Role Based Access Control
vBulletin has a really bad database schema.
"we are working on quantifying bad database design."
It seems to me like you are developing a model, or process, or apparatus, that takes a relational schema as input and scores it for quality.
I invite you to ponder the following:
Can a physical schema be "bad" while the logical schema is nonetheless "extremely good" ? Do you intend to distinguish properly between "logical schema" and "physical schema" ? How do you dream to achieve that ?
How do you decide that a certain aspect of physical design is "bad" ? Take for example the absence of some index. If the relvar that that "supposedly desirable index" is to be on, is itself constrained to be a singleton, then what detrimental effects would the absence of that index cause for the system ? If there are no such detrimental effects, then what grounds are there for qualifying the absence of such an index as "bad" ?
How do you decide that a certain aspect of logical design is "bad" ? Choices in logical design are done as a consequence of what the actual requirements are. How can you make any judgment whatsoever about a logical design, without a formalized and machine-readable way to specify what the actual requirements are ?
Wow - you have an ambitious project ahead of you. To determine what is a good database design may be impossible, except for broadly understood principles and guidelines.
Here are a few ideas that come to mind:
I work for a company that does database management for several large retail companies. We have custom databases designed for each of these companies, according to how they intend for us to use the data (for direct mail, email campaigns, etc.), and what kind of analysis and selection parameters they like to use. For example, a company that sells musical equipment in stores and online will want to distinguish between walk-in and online customers, categorize the customers according to the type of items they buy (drums, guitars, microphones, keyboards, recording equipment, amplifiers, etc.), and keep track of how much they spent, and what they bought, over the past 6 months or the past year. They use this information to decide who will receive catalogs in the mail. These mailings are very expensive; maybe one or two dollars per customer, so the company wants to mail the catalogs only to those most likely to buy something. They may have 15 million customers in their database, but only 3 million buy drums, and only 750,000 have purchased anything in the past year.
If you were to analyze the database we created, you would find many "work" tables, that are used for specific selection purposes, and that may not actually be properly designed, according to database design principles. While the "main" tables are efficiently designed and have proper relationships and indexes, these "work" tables would make it appear that the entire database is poorly designed, when in reality, the work tables may just be used a few times, or even just once, and we haven't gone in yet to clear them out or drop them. The work tables far outnumber the main tables in this particular database.
One also has to take into account the volume of the data being managed. A customer base of 10 million may have transaction data numbering 10 to 20 million transactions per week. Or per day. Sometimes, for manageability, this data has to be partitioned into tables by date range, and then a view would be used to select data from the proper sub-table. This is efficient for this huge volume, but it may appear repetitive to an automated analyzer.
Your analyzer would need to be user configurable before the analysis began. Some items must be skipped, while others may be absolutely critical.
Also, how does one analyze stored procedures and user-defined functions, etc? I have seen some really ugly code that works quite efficiently. And, some of the ugliest, most inefficient code was written for one-time use only.
OK, I am out of ideas for the moment. Good luck with your project.
If you can get ahold of it, the project management system Clarity has a horrible database design. I don't know if they have a trial version you can download.
I'm coding a new {monthly|yearly} paid site with the now typical "referral" system: when a new user signs up, they can specify the {username|referral code} of other user (this can be detected automatically if they came through a special URL), which will cause the referrer to earn a percentage of anything the new user pays.
Before reinventing the wheel, I'd like to know if any of you have experience with storing this kind of data in a relational DB. Currently I'm using MySQL, but I believe any good solution should be easily adapted to any RDBMS, right?
I'm looking to support the following features:
Online billing system - once each invoice is paid, earnings for referrals are calculated and they will be able to cash-out. This includes, of course, having the possibility of browsing invoices / payments online.
Paid options vary - they are different in nature and in costs (which will vary sometime), so commissions should be calculated based on each final invoice.
Keeping track of referrals (relationship between users, date in which it was referred, and any other useful information - any ideas?)
A simple way to access historical referring data (how much have been paid) or accrued commissions.
In the future, I might offer to exchange accrued cash for subscription renewal (covering the whole of the new subscription or just a part of it, having to pay the difference if needed)
Multiple levels - I'm thinking of paying something around 10% of direct referred earnings + 2% the next level, but this may change in the future (add more levels, change percentages), so I should be able to store historical data.
Note that I'm not planning to use this in any other project, so I'm not worried about it being "plug and play".
Have you done any work with similar requirements? If so, how did you handle all this stuff? Would you recommend any particular DB schema? Why?
Is there anything I'm missing that would help making this a more flexible implementation?
Rather marvellously, there's a library of database schemas. Although I can't see something specific to referrals, there may be something related. At least (hopefully) you should be able to get some ideas.