As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I would like to learn Database systems implementation in Depth. Is there an open-source simple implementation of database for educational purpose that I can go through the code? Like there are a lot of OS implementation (Minix, Pintos...). I am wondering if there are similar systems for database education as well.
I read a few textbooks and they are mainly focus on theory and concepts.
Thanks a lot!
Alfred
Then find some educational material :)
When i was learning db concept, my professor ask us to code a simple dbms. One important reference is the Redbase:
http://infolab.stanford.edu/~widom/cs346/
Hope that helps.
MySQL, PostgreSQL, SQlite are all opensource. You can find their source code and related documentation.
Also check NoSQL group of databases.
What makes you think implementing a database is simple?
What parts of the database interest you? Storage management? Indexing? Query Language? Query Planning? Transactions?
Modern (even "toy") Relational systems have all of those components, which makes them rather complex from the outset. Other DBs, such dbm based databases are much simpler. Then you have things like Lucene, which is a database for documents and free form text -- conceptually simple but put a lot of effort in to scaling.
You can look at implementations of SPARQL if you're curious about query languages, as they work against RDF triple stores (which aren't super complicated).
There's also things like Prevlayer, which is an in memory database using a concept called prevalence. Probably the simplest of all of them, really when you get down to it.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
What are pros and cons of using software analysis patterns (in general)?
I need this information for study purposes. Such question was asked at the lecture of the Software Modeling subject, therefore I think it can come back again.
Anyway, this question intrigued me, because I know what pros of using analysis pattern would be (mostly I think). But what about cons?
It's more like using Right tool for Right Job.
So, pros and cons depend on how well did you use it in your design?
One help what these patterns provide that we need not reinvent the wheel. Someone has already found solution for a problem and published it for others to use.
Hence, Pros are a follows (but not limited to)
Time is less wasted.
We get robust solution without spending much effort.
Highly scalable.
Common understanding among developers.
Cons can be summarized if you use it for over engineering. i.e. making a simple problem more complex when pattern use can be avoided OR using a Pattern1(say) in place of Pattern2(say).
In general, It depends on how you use them.
You might like see following links:
Categories of design patterns
Does functional programming replace GoF design patterns?
Examples of GoF Design Patterns in Java's core libraries
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
The National Park Service's Natural Sounds Program collects multiple terabytes of data each year measuring soundscapes. In your opinion, what is best available scripting language to manage massive amounts of files and file types? We would like to easily design and run efficient user-friendly scripts to search for and retrieve/create copies of files that may be located in different directories according a single static hierarchy. The OS will most likely be windows. Thanks!
Use the one your developers are most familiar with. The productivity gains you'll get from that will almost certainly beat out any advantages that one language may have over another.
Use Python. It's easy to learn. Everyone can easily convert.
The size of the files doesn't much matter when you're searching directories or searching for metadata outside the files. Even so, you rarely need to read an entire sound sample file to strip off the metadata.
Also, if you're doing this frequently, you might want to consider
Extract all the metadata to a relational database.
Use the relational database as a complex "index" to the sound sample files.
Each file add or change would be done through an application that synchronized file changes with database updates to assure that the database index actually matches the filesystem.
The bulk of your searches might become SQL queries.
I don't really know what your are going to be looking for in a scripting language, but Eric is right that you should use something all your developers are familiar with. However, if you don't have developers (yet) and are designing the project (and team) from the ground up, C++ or .Net (C# or VB).
While C++ offers more powerful programming and performance, C# and VB.Net offer quicker production. Regardless of .Net's production advantage, I would think that for massive amounts of files & file types, you will have the best overall satisfaction from C++. In my opinion, the best user friendly design requires very little user input other than clicking buttons or selecting options from a list.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am looking for a really simple database implementation; basically one with no complex parsing SQL engine. What I am looking for is something demonstrating B+ trees and ACID storage (Suitable for educational purposes). What I have found up-till now form my current searches was hamster-db. I am looking for something even simpler with a smaller code-base. If there is any such opensource project in your knowledge please let me know.
The University of Wisconsin Databases group uses their own small relational database, minirel, to teach the undergraduate databases class. I just took it, actually; it's enlightening. My semester's assignments are posted publicly. I'm sure the faculty would be willing to part with the source code used at each step.
In the undergraduate class, we do not implement B+ trees or ACID components, but it appears that the larger project does include them.
You can try to look at OrientDB. Don't know if it's simpler than hamster-db, but it's open source, uses a mix of Red-Black Tree and B+Tree algorithms and supports ACID.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What's your favorite open source database design/modeling tool?
I'm looking for one that supports several databases, especially Firebird SQL but I can't find one on Google.
I've used DBDesigner before. It is an open source tool. You might check that out. Not sure if it fits your needs.
Best of luck!
Do you mean design as in 'graphic representation of tables' or just plain old 'engineering kind of design'. If it's the latter, use FlameRobin, version 0.9.0 has just been released.
If it's the former, then use DBDesigner. Yup, that uses Java.
Or maybe you meant something more like MS Access. Then Kexi should be right for you.
S.Lott inserted a comment, but it should be an answer: see the same question.
EDIT
Since it wasn't as obvious as I intended it to be, here follows a verbatim copy of S.Lott's answer in the other question:
I'm a big fan of ARGO UML from Tigris.org. Draws nice pictures
using standard UML notation. It does some code generation, but mostly
Java classes, which isn't SQL DDL, so that may not be close enough to
what you want to do.
You can look at the Data Modelling Tools list and see if anything
there is better than Argo UML. Many of the items on this list are
free or cheap.
Also, if you're using Eclipse or NetBeans, there are many
design plug-ins, some of which may have the features you're looking
for.
The DB Designer Fork project claims that it can generate FireBird sql scripts.
I like Clay Eclipse plugin. I've only used it with MySQL, but it claims Firebird support.
You may want to look at IBExpert Personal Edition. While not open source, this is a very good tool for designing, building, and administering Firebird and InterBase databases.
The Personal Edition is free, but some of the more advanced features are not available. Still, even without the slick extras, the free version is very powerful.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What's the widest overview and where are the deepest analysis of different replication methods and problems?
I would start here: wikipedia's replication article, then read a couple of related papers on general replication techniques such as the replicated distributed state machine approach (Paxos (pdf)) and epidemic replication (Google 'Epidemic Algorithms for Replicated Database Maintenance').
For a practical overview, perhaps consider investigating the source code for Postgresql, which seems to have some replication technologies built in. This presentation purports to have some details.
However, given that you're talking about deep analysis, the best approach is to make sure that you have a very sound understanding of fundamental distributed database systems issues. My copy of Date's Introduction to Database Systems has a few pages on distributed databases and their attendant issues. I should think a textbook dedicated to distributed databases would have much more detail - this one, for example, looks promising.
You can go much deeper if you read Ken Birman's work on Virtual Synchrony, and most things that Leslie Lamport has ever written. These will attack the problem from the perspective of a general distributed systems approach.
Good luck!
In my opinion, you should pick a mainstream database (such as Oracle) and study everything it offers and go from there.
Oracle offers:
Replication
Data guard (standby database and beyond- physical, logical)
Real Application Clusters - (multiple instances, one DB)
and more !
A bit of hands-on would not hurt so you can download a PC version and try various replication approaches on one PC!
Enjoy !
Though it is MS-SQL specific, you should have a look at "Pro SQL SERVER 2005 Replication" (Sujoy P. PAUL, Apress). I owe this guy many quiet nights... I guess you can find some extracts of this book as PDF files.
Wikipedia has some overview on the matter:
http://en.wikipedia.org/wiki/Multi-master_replication
http://en.wikipedia.org/wiki/Lazy_replication