Using Spotlight as the "database" of an application - database

I'm developing an OS X application to organize "things" (as iTunes is to music and iPhoto to photos). Instead of having my own database and index, I'm considering using Spotlight to essentially serve this purpose.
Has anyone tried this? Is it wise?
The main benefit, as I see it, would be simplicity and avoiding redundancy. It seems a bit wasteful to implement my own index machinery when OS X comes with one built in.
I have little experience working with Spotlight, however. From a user's perspective, I do know that it has been slow and imprecise in older versions of OS X. I also have a gut-feeling that since it's aimed at searching the whole filesystem, using it for "local" purposes becomes hackish.
Obviously, my applications's index needs to constantly be up-to-date. Can mdimport be used for this?

Several apps ship this way. I believe there is at least one company that puts all their customer data into text files to use Spotlight to find information. I save notes with keywords all the time with full confidence that Spotlight will be able to find it later!
In general, you don't need to prod Spotlight to keep the index up to date. It is very good about watching file changes and indexing rapidly.
The key, really, is figuring out your file format. If you go with something that Spotlight can index -- say, text files -- then you don't have to write an importer. If not, you do. Also, have a look at Core Data as it has excellent Spotlight support, too.
One caveat; there are those users that manually turn off spotlight indexing on a particular volume. Rare, but possible.

Related

My game requires a Database, both for official servers and private, but which one should I get?

This is a bit text Heavy, I apologise. There is a TLDR below.
Intro to myself: I know you all probably heard the same story. Guy wants to make a game etc. Going into it semi blind, and dose not know much of what he's doing. I confess to be somewhat like this.
Next, I'm terrible with terminology, and have a brain like a fish when I comes to long nouns, this is largely due to my (mild) dyslexia, so please forgive me.
I did pre-university computer science at collage some 4 years back. Did some MySQL, some VB.net (windows forms are fun) and JavaScript. I've been following along loosely since then, but haven't done much since then so am quite rusty.
My level of skills can be called enthusiast at best, and muddle-headedly incompetent at worst. Been dabbling in C# since me and a friend use Unity a lot. To by honest college was a little trauma inducing for me and soured my ability to code (minor fear/stage fright, easily distracted), but to get over that I've decided to head out on my own and make something for myself.
Whether I succeed with the project or not, no matter how big or small the project is, isn't important, just so long as I'm creating something I'm satisfied with and I can progress in my skills and learn, that is all the matters. The rest is just a bonus.
With that in mind, I've formulated a project I hope and believe will retain my interest long enough to get some education in.
So without further a do, lets get to why I'm here.
I want to make a space game, a massive space game. Mostly, if not all procedural, but with realistic star counts spanning an entire galaxy. This can range from 1 thousand, to 100 trillion stars (I'd like to go as far as 150t for stress levels then cap at 100), much like it is in real life (Segue 2 - 1k, IC1101 - 100 trillion). Sounds nuts but games like Space Engine have already done this, and that was developed by one person.
Is this practical? I don't really care all that much if this is sort of game is practical or achievable, so long as its realistic-ish on this aspect and a few others. Plus, the thought of "this could work, you just have to be creative and put the effort in" attracts me.
So what's the issue? Why cant I use a normal save file? This issue comes when the game I'm trying to make is somewhat similar to a 4x. Its not exactly, but its close enough to know that there is a lot of information flying around, information that needs to be stored, synced and in some cases processed.
On a small scale, a single home/gaming computer can do this, so having a small single player is fine. However that's not what I want to achieve; The game I would like to create allows multiple people to play over long spans of time, across a galaxy, no matter what size that galaxy is.
This is where I started my research. I need a way to store this information and so I looked up various database engines like PostgreSQL, and others.
I think to myself, for what I'm doing as far as the world map is concerned, for the most part, a standard SQL databased should be fine... right? My confidence started shaking when looking up what game devs are using these day... NoSQL? What on earth is this? Why is google persistent on this? So I looked into it more. I then find out about NewSQL and realise I'm starting to run around in circles. The rabbit hole I did descend.
The flexibility of NoSQL is attractive, but when you see the lack of ACID in terms of fast and reliable queries (if I've read these articles right, I could be wrong), is a massive downside if your trying to look up 1 of a trillion stars or 1 of a quadrillion planets on the map in a fast manner. This is also not taking into account the fact a game is running along side it, sending and creating new data all the time.
Now I could be making a big issue about all this, as my scale of work might me too small to effect even the slow of systems, but how can I really know?
There's also the issue that, I don't know for sure, that somewhere down the line I'll need to create a new datatype for some new feature I create, and go mess things up in my existing test data because the database is old SQL, and you kinda have to deal with a ridged schema. What I need that flexibility to mess around and not worry a table somewhere is going to break, or that I have to bloat the DB with empty or null to non-applicable entries.
Why am I messing around with DB's before the actual game? Well if I cant even create the chess board before I can play chess, there's no point trying to invent chess really... I can make things look pretty later, I need the data first to manipulate.
~~TL:DR:~~
I leu of everything mentioned, I've compiled a small shopping list of a what I'm looking for to store my data:
- Must allow me to integrate into a single application, so my end users can deploy their own privet servers, in a relatively simple manner. What I mean by this, is more a licensing thing, and the easy of integration/use. Simply dragging up some opensource software and stuff it in pre-packaged with the server could lend me in some issues I believe?
- It should not be a cloud databases service, since for this project, I would like to allow "modding" of private servers.
- Relatively scalable, nothing amazing, just something simple yet effective. I like the idea to allow my users the choice to deploy the server software on more than one system with the option to either run a new map or chain to an existing as a slave/cluster thingy, for map chunk loading/dynamic load bearing for busy areas of the map.
- It needs to deal with large-ish amounts of information fast. I'm not expecting a lot of users, but I am expecting large volumes of data per user. Even if that's not the case its better to prepared rather than walk out with your pants down. I don't want to fix problems I can avoid.
- Forgiving enough to deal with a game development process filled with possible changes in data etc.
- Reliable data querying. The map, other than stellar drift and celestial orbits, should be relatively static. And it will become an issue if many users on the same map start looking up planets or stars all at the same time.
############
That's all I can think of for now, and will keep updated if I think of anything else.
So far, for all I could know (which I don't) a standard old style SQL could be the ticket and I'm just making work for myself, but with all the new technology out, like NoSQL and NewSQL, its hard to know what will be the better option for my circumstances. My experience with Databases can be pinned down to the classes done on MySQL and the research I've done recently, so I'm not familiar with the performance and abilities of other software.
So right now, my head is spinning, mystified from over research, and would like to know my options in specific databases engines on the market suitable for my situation.
So I take it to you, great people of stack overflow, please put me out my confusion!

Good (CMS-based?) platform for simple database apps

I need to implement yet another database website. Let's say roughly 5 tables, 25 columns, and (eventually) thousands to tens of thousands of rows. Easy data entry and maintenance are more important than presentation of the data to non-privileged users. It's a niche site, so performance is not a concern. We'll have no trouble finding somewhere to host it.
So: what's a good platform for this? Intituitively I feel that there ought to be some platform that allows this to be done with no code written - some web version of MS Access. Obviously I'm happy to code business rules, and special logic that distinguishes this from every other database app.
I've looked at Drupal (with Views) and it looks possible, but with quite a bit of effort. Will look at Al Fresco next. A CMS-y platform helps because then you can nicely integrate static content, you get nice styling, plugins, etc etc.
Really good data entry (tracking changes, logging, ability to roll back, mass imports...) would be great. If authorised users could do arbitrary SQL queries (yes, I know...) that would be a big bonus. Image management support a small bonus.
Django is what you are looking for. In fact, you could probably set up what you ask without much coding at all, just configuration.
Once complete, authorised users can add 'rows' with a nice but simple GUI, or, of course, you can batch import via database commands.
I'm a Python newbie, and I've already created 2 Django-based sites. I have created more than a dozen Drupal-based sites, and Django is easier and produces significantly faster sites.
Your need somewhat sits between two chairs : bespoke application and CMS-based. I'd advocate for the CMS approach, if and only if you feel the need for content structure customization will grow in the future, slowly removing the need for direct SQL queries.
I am biased since working with eZ Publish for many years now, but it satisfies the requirements you expressed natively :
Really good data entry (tracking changes, logging, ability to roll back, mass imports...)
[...] Image management support a small bonus.
An idea of the content edition feel can be watched here:
http://ez.no/Demos-Videos/eZ-Publish-Administration-Interface-Video-Tutorial
and you can download and test-drive eZ Publish Community Edition there : http://share.ez.no/latest
It is a PHP-based solution, strong professional community (http://share.ez.no), over 1100 add-ons available on http://projects.ez.no. The underlying libs are mostly relying on Apache Zeta Components, high-quality, robust set of PHP5 libraries.
Last note : the content model is abstracted, meaning you'd not have to create a new table everytime a new type of content should be stored : a simple content class definition from the administration interface, and the rest is taken care of, including the edition interface for the new content type. Might remove the need for hardcore SQL queries ?
Hope it helped,
Drupal can do most of what you need (I don't know of a module that will let you enter arbitrary SQL queries), but you will end up with some overhead of tables and modules you don't really need. It's up to you to decide if that's a problem or not. I don't think the overhead would hurt performance in your case.
The advantages of using Drupal would be the large community, the stability of the platform and the flexibility to add more functionality when needed. Also, the large user base ensures that most code has been tested rather well.
I highly recommend Drupal. It is very simple (also internally codebase is small and clean) it has dosens of possibilities and tremendous support. Once you start with Drupal you will never go to anything else.
Note that I'm not connected with Drupal staff, I've just created dosens of Drupas sites and many of them in just a minutes. My last one took me 2 hrs, see it here http://iPadDevZone.com
UPDATE #1:
It really depends on your DB schema complexity. The best case is that you just use CCK module (part of core now) and create your node type. Node is Drupal name for content. All you do is just web admin your node type fields (text, image, numbers, dates, custom, etc). Then, if user creates content with this node type he/she can enter all the fields which are stored in separate db table fields. This is however hidden for you - if you wish not to know about it - it is just a web gui. Then you choose how the node is presented, which properties as shown and where.
Watch videos in CCK resources section in the bottom of this page: http://drupal.org/project/cck
If you need to do some programming then it is also very easy to use so called PHP code sniplets which are entered as part of your content (node) and executed when the page is displayed.
Drupal has node revisions built in the core. You can see all the versions and roll back if you wish.
You can set the permissions in quite granular level so you can control what your users may or may not.
I would take a look at Symphony. I havn't been using it myself, but it seems like it's really easy to use and to customize!
http://symphony-cms.com/
Seems to me an online database system would be better than a CMS system.
So in addition to what's been posted above:
www.quickbase.com (by Intuit) - think around $150/mo
www.rollbase.com - check on price, full featured
www.rhythmdata.com - easy to set up, but don't think it's got the advanced features you're looking for.
Good luck!
B
I appreciate these answers, but most of them are really platforms that are much better at something else (eg, Drupal really is a CMS, and has some support for custom fields - but it's not at all easy). Since this is a brand new site from scratch, it doesn't really make sense to start with something that does custom database fields as an afterthought, I think.
The closest I've found is Zoho Creator. It really is like "MS Access for Web 2.0" - and even supports importing from Access. The pricing could get expensive though. It feels like it might eventually be quite constraining. I'm still evaluating.
Are there any other products like Zoho Creator?

Database or format for help system?

I'm implementing a help system for my app. Its written in C and Gtk+.
Help is divided on books. Each book is just a bunch of HTML pages with resource.
The problem I've encountered is that each book is ~30M (using WebKit Gtk port to display it). After zipping it becomes ~7M, but opening document becomes extremely slow :( So I'm thinking about using some kind of library able to provide me with: full text search index creation, document listing, tree structure (a-la file system), and compression of course.
Any ideas on such thing?
P.S. Not all of the requirements are "must have", I'm still exploring this part and not sure that all of them are required, so if it'll miss something it'll be ok.
SQLite supports compressed databases (readonly), and is ideal for a one user database.
However, you should think about the need to compress. Hard disks are so big these days that a 700MB library on a computer isn't too much of a worry.
Alternatively, you could go for Firebird, which as far as I know doesn't support a compressed database, but you could compress your individual pages, in which case you would need to build your own index for full-text search - which I would consider unnecessary work.
Firebird supports a feature called "Embedded Server" which is especially designed for deployment with Windows applications.
I think you should make an ordered list, of the features you want, then pick the top two or three, and do that before advancing on. E.g. getting it into a database is something you'd want to do, before you thought about compression.

Managing a large collection of music

I'd like to write my own music streaming web application for my personal use but I'm racking my brain on how to manage it. Existing music and their location's rarely change but are still capable of (fixing filename, ID3 tags, /The Chemical Brothers instead of /Chemical Brothers). How would the community manage all of these files? I can gather a lot of information through just an ID3 reader and my file system but it would also be nice to keep track of how often played and such. Would using iTunes's .xml file be a good choice? Just keeping my music current in iTunes and basing my web applications data off of it? I was thinking of keeping track of all my music by md5'ing the file and using that as the unique identifier but if I change the ID3 tags will that change the md5 value?
I suppose my real question is, how can you keep track of large amounts of music? Keep the meta info in a database? Just how I would connect the file and db entry is my real question or just use a read when need filesystem setup.
I missed part 2 of your question (the md5 thing). I don't think an MD5/SHA/... solution will work well because they don't allow you to find doubles in your collection (like popular tracks that appear on many different samplers). And especially with big collections, that's something you will want to do someday.
There's a technique called acoustic fingerprinting that shows a lot of promise, have a look here for a quick intro. Even if there are minor differences in recording levels (like those popular "normalized" tracks), the acoustic fingerprint should remain the same - I say should, because none of the techniques I tested is really 100% errorfree. Another advantage of these acoustic fingerprints is that they can help you with tagging: a service like FreeDB will only work on complete CD's, acoustic fingerprints can identify single tracks.
For inspiration, and maybe even for a complete solution, check out ampache. I don't know what you call large, but ampache (a php application backed by a mysql db) easily handles music collections of tens of thousands of tracks.
Reecently I discovered SubSonic, and the web site says "Manage 100,000+ files in your music collection without hazzle" bt I haven't been able to test it yet. It's written in Java and the source looks pretty neat at first sight, so maybe there's inspiration to get there too.

Document/Image Database Repository Design Question

Question:
Should I write my application to directly access a database Image Repository or write a middleware piece to handle document requests.
Background:
I have a custom Document Imaging and Workflow application that currently stores about 15 million documents/document images (90%+ single page, group 4 tiffs, the rest PDF, Word and Excel documents). The image repository is a commercial, 3rd party application that is very expensive and frankly has too much overhead. I just need a system to store and retrieve document images.
I'm considering moving the imaging directly into a SQL Server 2005 database. The indexing information is very limited - basically 2 index fields. It's a life insurance policy administration system so I index images with a policy number and a system wide unique id number. There are other index values, but they're stored and maintained separately from the image data. Those index values give me the ability to look-up the unique id value for individual image retrieval.
The database server is a dual-quad core windows 2003 box with SAN drives hosting the DB files. The current image repository size is about 650GB. I haven't done any testing to see how large the converted database will be. I'm not really asking about the database design - I'm working with our DBAs on that aspect. If that changes, I'll be back :-)
The current system to be replaced is obviously a middleware application, but it's a very heavyweight system spread across 3 windows servers. If I go this route, it would be a single server system.
My primary concerns are scalabity and performace - heavily weighted toward performance. I have about 100 users, and usage growth will probably be slow for the next few years.
Most users are primarily read users - they don't add images to the system very often. We have a department that handles scanning and otherwise adding images to the repository. We also have a few other applications that receive documents (via ftp) and they insert them into the repository automatically as they are received, either will full index information or as "batches" that a user reviews and indexes.
Most (90%+) of the documents/images are very small, < 100K, probably < 50K, so I believe that storage of the images in the database file will be the most efficient rather than getting SQL 2008 and using a filestream.
Oftentimes scalability and performance are ultimately married to each other in the sense that six months from now management comes back and says "Function Y in Application X is running unacceptably slow, how do we speed it up?" And all too the often the answer is to upgrade the back end solution. And when it comes to upgrading back ends, its almost always going to less expensive to scale out than to scale up in terms of hardware.
So, long story short, I would recommend building a middleware app that specifically handles incoming requests from the user app and then routes them to the appropriate destination. This will sufficiently abstract your front-end user app from the back end storage solution so that when scalability does become an issue only the middleware app will need to be updated.
This is straightforward. Write the application to an interface, use some kind of factory mechanism to supply that interface, and implement that interface however you want.
Once you're happy with your interface, then the application is (mostly) isolated from the implementation, whether it's talking straight to a DB or to some other component.
Thinking ahead a bit on your interface design but doing bone stupid, "it's simple, it works here, it works now" implementations offers a good balance of future proofing the system while not necessarily over engineering it.
It's easy to argue you don't even need an interface at this juncture, rather just a simple class that you instantiate. But if your contract is well defined (i.e. the interface or class signature), that is what protects you from change (such as redoing the back end implementation). You can always replace the class with an interface later if you find it necessary.
As far as scalability, test it. Then you know not only if you may need to scale, but perhaps when as well. "Works great for 100 users, problematic for 200, if we hit 150 we might want to consider taking another look at the back end, but it's good for now."
That's due diligence and a responsible design tactic, IMHO.
I agree with gabriel1836. However, an added benefit would be that you could for a time run a hybrid system for a time since you aren't going to convert 14 millions documents from your proprietary system to you home grown system overnight.
Also, I would strongly encourage you to store the documents outside of a database. Store them on a file system (local, SAN, NAS it doesn't matter) and store pointers to the documents in the database.
I'd love to know what document management system you are using now.
Also, don't underestimate the effort of replacing the capture (scanning and importing) provided by the proprietary system.

Resources