Resources about building an RDBMS [closed] - database

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking at implementing an RDBMS. Are there any good resources out there about how a database works internally, and the kinds of things I'd need to know when starting out to build my own? (Please no comments about whether it's a practical idea or not - just imagine it's for a hobby project or something).
Again - interested in the RDBMS design, not the Database design. And efficiency is very important (it seems like it's reasonably easy to design some kind of relational database like structure if I don't care about speed).

There are a few textbooks about this sort of stuff out there, when I was in college, we did this for a class project. This book should really help you on your way Database Systems: The Complete Book
I forgot to mention it, but my code is on googlecode here: cs4420-dbase
Please forgive the fact it is written in java, but I was outvoted by my teammates on that decision. but the basic ideas are all still there. It handles file creation and handling as well as a simple SQL parser and optimizer. It handles basic indexing (b-tree) and "memory" management. Please forgive some of the lack of commenting and strange commenting, many late nights were spent on that project.

I'd suggest starting with Introduction to Database Systems and Transactional Information Systems. They should both have bibliographies to take you further.

Building a RDMS is not trival, you need to combine classic CS knowledge from several fields together with deep knowledge about harddrives, OS specifics, filesystems, memory, cpu, caches to make it efficient.
A good article about architecture we are required to read is:
http://www.nowpublishers.com/product.aspx?product=DBS&doi=1900000002
For theoretical knowledge about databases I would recommend to buy a book on this topic, I can only talk about the book I use for this, which is Database Systems an Application-Oriented Approach by Kifer, Bernstein and Lewis.
You might want to look at some opensource databases for ideas.

I recently came to the same question, and like others, struggled a bit finding a book that helped in building an actual RDBMS from scratch (minimal, of course). Contrary to what occurs in other CS areas (OS, Compilers, etc); the Databases area seems to have fewer resources in this regard. Probably because RDBMS are among the hardest to grasp and implement ;-|
Nevertheless, I finally found what appears to be a satisfying answer. Sciore's book "Database Design and Implementation":
http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP000711.html
The first two parts are dedicated to learning to use RDBMS, which you probably know already. But the last two parts cover implementation details; and the interesting thing is that a minimal RDBMS (SimpleDB) is used to illustrate the concepts, and also can serve as the platform to perform programming exercises. The Wiley site has a quote that says it better:
"Comes with SimpleDB, a free-to-download, fully functional simplified database system that is (unlike commercial DB systems) small, easily readable and easily modified. SimpleDB can be used as the platform on which students complete homework projects and implement the concepts covered in the book."
Do not bother by the fact the sample RDBMS is written in Java; that has the advantage (IMHO) of hiding the low-level details of implementing in C/Unix. If you come like me from the applications world, you may be unfamiliar with system-programming stuff; but learning the RDBMS implementation concepts in a high level language like Java, can serve as a good bridge for the transition.
The Wiley site allows to buy an electronic version of the book, but the source code is available regardless you buy it. I can not post more than two links, but just google this term (including double quotes), and you will easily find the SimpleDB home page (where you can download it):
"The SimpleDB Database System"
If you are unsure about buying the book (which like other core-CS books, are not cheap for the student); probably you can start reading the code and this introductory article:
http://www.cs.bc.edu/~sciore/papers/SIGCSE07.pdf
If you find it appealing, buying the book may be a good investment.
Hope that helps,
Cheers.

Related

"Free for academic use" license [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am a researcher in Mathematics at a university, and I released a code toolbox that is mostly for research use, but can have applications in engineering and industry.
I would like to have a license that basically says "you can use this code freely if you are in a university and use it for publishing, studying teaching, but not if you are in the industry --- in that case, please reach for the wallet".
CC-NC-BY-SA would look perfect to my eyes, but using it for code is heavily discouraged, I suppose for good reasons. None of the other open source licenses seems to do what I want. Writing my own license looks like a legal mess, and I'd rather avoid it.
How would you solve this issue?
Related questions (but not the exact same thing): https://stackoverflow.com/questions/1232666/proper-open-source-license-to-release-academic-code, https://stackoverflow.com/questions/6443110/practicalities-of-licensing-academic-software
I assume that toolbox is written by your own and you want to share it with other academics. In case someone from industry is making millions you want your share (well, the sunny-day example).
I don't know any license that comes close to that.
The CC-NC-BY-SA is discouraged because of two reasons mainly:
CC licenses are not for software. They know nothing about the two most prominent forms of software: Object Code (binary, compiled) and the Source Code (author version).
Non-Commercial. It is undefined what this term means, especially legally.
So you use your morally and subjective day-to-day terms and every human on the planet comes pretty close in understanding what you mean, but putting this into a copyright related software license is problematic.
I'm not a lawyer, but probably some kind of passive licensing would suit your needs. There just is no license, you put your term:
"you can use this code freely if you are in a university and use it for publishing, studying teaching, but not if you are in the industry --- in that case, please reach for the wallet
and then say this is in your own words, you decide about the meaning in case it's unclear now or in the future. (if you talk with your lawyer, a suggestion will come up that you should disclaim warranties and such which is normally suggested.)
Most academic users I bet are fine with this. Commercial users are pressured for more clarification, so you can run contracts then. Job done.
The other route would be you release under a strong copyleft license like the AGPL. This would engage user-rights (so you give a lot), however this would be the typical something-for-something, because they need to offer the software as well to all of their users under AGPL, including their changes and add-ons.
Additionally you can offer "commercial" licenses (AGPL does not forbid commercial use, however it requires to preserve the freedom of the software) as long as you're the copyright owner.
Probably either the little suspicous, "I name no license" policy, or something that's okay for you to give (strong copyleft) + X might do it. There are pros and cons for either of these two paths, so chew a bit and maybe you get a third idea that's doing it for you.
However, I am not aware of any existing license that covers your case. You might find some through research because I'm normally not interested in licenses that are for some user-groups only (e.g. only academics, only non-commercial), because the borders are not clear.

Beginner's resources/introductions to classification algorithms [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
everybody. I am entirely new to the topic of classification algorithms, and need a few good pointers about where to start some "serious reading". I am right now in the process of finding out, whether machine learning and automated classification algorithms could be a worthwhile thing to add to some application of mine.
I already scanned through "How to Solve It: Modern heuristics" by Z. Michalewicz and D. Fogel (in particular, the chapters about linear classifiers using neuronal networks), and on the practical side, I am currently looking through the WEKA toolkit source code. My next (planned) step would be to dive into the realm of Bayesian classification algorithms.
Unfortunately, I am lacking a serious theoretical foundation in this area (let alone, having used it in any way as of yet), so any hints at where to look next would be appreciated; in particular, a good introduction of available classification algorithms would be helpful. Being more a craftsman and less a theoretician, the more practical, the better...
Hints, anyone?
I've always found Andrew Moore's Tutorials to be very useful. They're grounded in solid statistical theory and will be very useful in understanding papers if you choose to read them in the future. Here's a short description:
These include classification
algorithms such as decision trees,
neural nets, Bayesian classifiers,
Support Vector Machines and
cased-based (aka non-parametric)
learning. They include regression
algorithms such as multivariate
polynomial regression, MARS, Locally
Weighted Regression, GMDH and neural
nets. And they include other data
mining operations such as clustering
(mixture models, k-means and
hierarchical), Bayesian networks and
Reinforcement Learning
The answer referring to Andrew Moore's tutorials is a good one. I'd like to augment it, however, by suggesting some reading on the need which drives the creation of many classification systems in the first place: identification of causal relationships. This is relevant to many modeling problems involving statistical inference.
The best current resource I know of for learning about causality and classifier systems (especially Bayesian classifiers) is Judea Pearl's book "Causality: models, reasoning, and inference".
Overview of Machine Learning
To get a good overview of the field, watch the video lectures of Andrew Ng's Machine Learning course.
This course (CS229) -- taught by Professor Andrew Ng -- provides a broad introduction to machine learning and statistical pattern recognition. Topics include supervised learning, unsupervised learning, learning theory, reinforcement learning and adaptive control. Recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing are also discussed.
Classifiers
As for which classifier you should use, I'd recommend first starting with Support Vector Machines (SVM) for general applied classification tasks. They'll give you state-of-the-art performance, and you don't really need to understand all of the theory behind them to just use the implementation provided by a package like WEKA.
If you have a larger data-set, you might want to try using Random Forests. There's also an implementation of this algorithm in WEKA, and they train much faster on large data. While they're less broadly used than SVMs, their accuracy tends to match or nearly match the accuracy you could get from one.

Intelligent agents "tutorial" [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I've recently come across Intelligent Agents by reading this book :
link text
I'm interested in finding a good book for beginners, so I can start to implement such a system.
I've also tried reading "Multiagent Systems : A modern approach to distributed artificial intelligence" (can't find it on amazon) but it's not what I'm looking for.
Thanks for the help :).
The agent view point is simply an abstraction of convenience. There is nothing magical about agents. It is a way of thinking about software processes that may be migrated from one system to another.
So, yes, if you want your agents to be intelligent, then you need to understand AI algorithms.
There is numerous classical books:
David MacKay's classic (for free here)
Norvig's AIMA, of which a new version came out recently
Bishop's Neural Networks for Pattern Recognition
Bishop's Machine Learning and Pattern Recognition
The first two are the easiest, the second one covers more than machine learning. However, there is little "pragmatic" or "engineering" stuff in there. And the math is quite demanding, but so is the whole field. I guess you will do best with O'Reilly's programming collective intelligence because it has its focus on programming.
The book you have linked is actually a collection of invited research papers, which means it is quite an advanced book if you are just starting in Intelligent Systems.
Actually, there are two interpretations of Intelligent Systems:
(a) Artificial Intelligence studied mainly by the Computer Science community. AI deals with machine learning, knowledge representation and reasoning, learning and planning methods. AI is about developing algorithms. The absolute reference to AI is: "Artificial Intelligence, A modern approach"
Although you are referring to this interpretation, in case you are interested, here is the second one:
(b) Intelligent Control Systems studied mainly by Electrical Engineers. It deals with designing intelligent systems that are able to adapt to changes in the environment, able to learn, able to make intelligent decisions, etc. Intelligent systems deals with developing mathematical models of "intelligence" that can be applied to real-world systems, so as to optimize their performance (or some other measure). The tools used are mostly adaptive control, neural networks and optimization methods. There isn't an easy to follow book on this subject, however some excellent articles are here and here. Also, an excellent reference on Neural Networks is "Neural Networks, A Comprehensive Foundation"

Open Source C++ Object Oriented Database [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 11 months ago.
Improve this question
Is there an open source object oriented database for C++ available?
I had looked at Object oriented Relationship Mapping (ORM) libraries like those posted here:
https://stackoverflow.com/questions/74141/good-orm-for-c-solutions
and these were intereting as well:
Object-oriented-like structures in relational databases
http://en.wikipedia.org/wiki/List_of_object-relational_mapping_software#C.2B.2B
My experience so far has been painful. The solutions don't appear to be mature and I've had difficulty even compiling some of them, and the documentation and support can be sparse.
I suppose at some level I'm trying to avoid learning SQL (I'm not a database developer). On the other hand, my gut feeling is that ORMs are an architectural 'workaround' in that they are creating a layer above a database system that inherently doesn't support objects.
My ideal database library would allow the following:
Allow one to specify the object hierarchy tree based on class names, perhaps in XML or just in C++.
Allow one to specify which fields of those classes should be persistent.
Provide an API to create, update, delete, retreive the hierarchy of objects.
Ideally, provide an API for the in-memory tree itself, including concurrent access to tree nodes.
I had worked on embedded system that had such a custom database and api.
I'm almost at the point where I'm just going to create my own and open source it.
Just wondering if there is anything off the shelf I can use.
I saw this:
http://en.wikipedia.org/wiki/Comparison_of_object_database_management_systems
and am trying to figure out this might work:
http://www.fastdb.org/fastdb.html
Thanks in advance.
I'm not going to make any recommendations, because I don't know of a high-quality FOSS OO database. I would however make the following observations:
OO database are not a way of avoiding SQL - you need both. Frankly, If you don't know SQL pretty well, your life as a professional programmer iis likely to be unhappy.
OO databases are mature - they have been around for well over 20 years. I personally first used one on a large project in the finance industry 15 years ago.
OO database are best used where relational databases fail - I've used them in complex financial instrument modeling, oil-pipeline optimisation and telco work.
ORM databases take the bad parts of the OO and the relational models and make something even worse of them.
My favourite commercial OODB is ObjectStore, but I haven't done any work with it for quite while now.
Hope that is vaguely helpful.
Honestly, unless you're into "bleeding edge", I would stay away from OO databases. In almost all cases, they're not well supported, immature, and have various support issues client side.
The problem is, only the relational databases (and certain non-relational ones) get 99% of the attention, and thus end up far more mature. ORM may be a workaround, but if you want reliability, it's really what you need.
UPDATE:
To clarify, I'm sure there are some very reliable open source OODB's out there, but my requirements for "realiability" are more than just whether it doesn't crash and doesn't corrupt data. It includes reliability of the client connectors, reliability of the integration with the object models of popular languages, etc...
This is about open source OODB's, not commercial ones.
this is a good OO database , currently I am working with it
http://www.garret.ru/goods.html

Database Internals - Where to Begin? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
So lets say that you want to learn some stuff about database internals. What's the best source code to look at? the best books to buy?
I was talking about this with a buddy the other day and he recommended:
Art of Computer Programming, Volume 3: Sorting and Searching
What other books would help me learn about all the File IO and memory issues, pages, locking, etc.... ?
Textbook: Database Management Systems by Ramakrishnan and Gehrke.
Or: Architecture of a Database System by Hellerstein, Stonebraker, and Hamilton.
Production Code: PostgreSQL
(I like the PG code better than SQLite , it's far more complete and, I think, better organized. SQLite is awesome for what it does, but there is a lot it doesn't take on).
Extra Credit: Readings in Database Systems, 4th edition edited by Hellerstein.
If you are really serious, and although a tough read none other by the late and great Jim Gray and Reuter:
Transaction Processing, Concepts and Techniques
Again if serious, do not bother with anything else.. it's out of this world and certainly out of mySQL chasing by IBM or Oracle..
The SQLite source is very approachable to learn about database implementations.
PostgreSQL is a very well written piece of software, with higher complexity than SQLite.
A colleague and I got a great deal of information out of Database in Depth: Relational Theory for Practitioners Very low level stuff but it sounds like that is the sort of thing you are looking for.
Take a look at Database Systems: The Complete Book by by Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom. It is specifically about the internals of the DBMS.
The answer by SquareCog also contains sensible suggestions; I've not looked at the two books mentioned (though the Stonebreaker "Architecture" book is only 136 pages according to Amazon, which seems a tad lightweight).
Here's an interesting read about SQLOS, which drives Microsoft SQL Server 2005+.
In depth information about internals is database specific, here's a source on SQL Server 2008:
http://www.amazon.com/Microsoft%C2%AE-SQL-Server%C2%AE-2008-Internals/dp/0735626243
Not everybody likes his style, but I find that Joe Celko does a fine job of explaining the set-based logic that drives SQL databases. If you already have a little SQL experience under your belt, you should read SQL for Smarties.
Make sure that whatever you get covers relational algebra and relational calculus. No point delving into database internals if you don't have the basic theoretical background. Past that, any college style databases textbook will probably suffice.

Resources