DataBase for documents - database

I am beginning a new project and want to choose a db.
I don't have a fixed language yet but was thinking to go with Java 15 and spring.
The issue I have is that, this time, I will manage lots of documents (mainly scans or pictures).
I don't know which db does it best, or if I need to add a particular API.
What is the best database choice to manage docs ?
Do I need to consider specifics API to help ?
Thank you

Related

Pros/Cons of incorporating multiple database types into same project

I'm beginning to pursue my first online project that I am planning will need to scale as such I have opted for a NoSQL DB. Some reading into this and modeling of what my queries would look like and there are two databases I am considering. Cassandra seems like the right choice for item lookups by keyword but MongoDB sounds like the right choice for initially entering the data in as it can retain the account structure in document form.
This split decision has left me wondering: Are there any major companies that use multiple database types for storage of different items as in using both Cassandra and Mongo together?
I would think scaling up would be more difficult but are the added benefits (if there are any) worth the trouble? I'm not the expert on this. I'm hoping you are. Thanks in advance for sharing your experience.
Cassandra can handle both use cases so you can use the same database for your purposes.
Stargate (https://stargate.io/) is an open-source API platform which provides a data gateway to Cassandra with REST API, GraphQL API, Document API and even native CQL access.
The Document API lets you save and search schemaless JSON documents to/from Cassandra directly from your app.
You can try it out for free on Astra with no credit card required. In just a few clicks, you'll be able to launch a Cassandra cluster with Stargate pre-configured so you can use the Document API straight out-of-the box and build a proof-of-concept app immediately without having to worry about downloading/installing/configuring a Cassandra cluster.
There are even sample apps you can access straight from the Astra dashboard so you can see Stargate in action. For more info, see Using the Document API on Astra. Cheers!
Using multiple database technologies in the same project is somewhat common nowadays and it is called "Polyglot persistence".
Many people use this method to take advantage of multiple systems - and as you mentioned Cassandra is right for somethings and something else (maybe MongoDB) is best for something else, so using a combination can give the advantage of both worlds.
Scaling, Replication, Support can be more costly when you use multiple technologies because you need expertise in both to support.
So if you really have use cases where Cassandra wont be a good choice and you have some primary use cases where Cassandra is the best choice then yes, going with two databases can be the best option provided you are ready to take the trouble of supporting two systems.

How useful is Lucene/Solr in database search?

I am new in development and I need your advice.
Our student team is going to develop an application for online restaurant booking, where also will be search tool (restaurant and dishes search).
We want to use modern search tool like Lucene, but we are not sure if it is what we really need.
Due to knowledge information, this is more for text search with different kinds of indexes and so on, while our app will make search in database. BUT, if we want to add new features in future, I guess we need good search engine background today.
So, let me know if Lucene is able to do "select" operations or something like it, or this technology is just for text searches?
Sedond question, what can you advise in realisation of this feature? Where to start with?
Thank you in advance.
It all depends. You usually don't start with Lucene and Solr, you use it to attain a goal or implement a specific behavior you need. Usually Solr is your secondary storage, built from your primary database - i.e. you're inserting data into Solr to solve a specific need, for example proper full text search with relevancy scoring.
If you're just starting up, go with the technology you know - i.e. usually a regular RDBMS. You can then attach Solr if you need those features that they're really good at, and wait with introducing new technology until it's necessary. The need first, then the technology. Maybe Lucene/Solr isn't the right technology for what you end up needing when you get to that point.
One of the main tenants of modern development is "YAGNI" - You Ain't Gonna Need It. You implement features when you need them, not for some imagined behavior that may or may not show up down the road.

Good (CMS-based?) platform for simple database apps

I need to implement yet another database website. Let's say roughly 5 tables, 25 columns, and (eventually) thousands to tens of thousands of rows. Easy data entry and maintenance are more important than presentation of the data to non-privileged users. It's a niche site, so performance is not a concern. We'll have no trouble finding somewhere to host it.
So: what's a good platform for this? Intituitively I feel that there ought to be some platform that allows this to be done with no code written - some web version of MS Access. Obviously I'm happy to code business rules, and special logic that distinguishes this from every other database app.
I've looked at Drupal (with Views) and it looks possible, but with quite a bit of effort. Will look at Al Fresco next. A CMS-y platform helps because then you can nicely integrate static content, you get nice styling, plugins, etc etc.
Really good data entry (tracking changes, logging, ability to roll back, mass imports...) would be great. If authorised users could do arbitrary SQL queries (yes, I know...) that would be a big bonus. Image management support a small bonus.
Django is what you are looking for. In fact, you could probably set up what you ask without much coding at all, just configuration.
Once complete, authorised users can add 'rows' with a nice but simple GUI, or, of course, you can batch import via database commands.
I'm a Python newbie, and I've already created 2 Django-based sites. I have created more than a dozen Drupal-based sites, and Django is easier and produces significantly faster sites.
Your need somewhat sits between two chairs : bespoke application and CMS-based. I'd advocate for the CMS approach, if and only if you feel the need for content structure customization will grow in the future, slowly removing the need for direct SQL queries.
I am biased since working with eZ Publish for many years now, but it satisfies the requirements you expressed natively :
Really good data entry (tracking changes, logging, ability to roll back, mass imports...)
[...] Image management support a small bonus.
An idea of the content edition feel can be watched here:
http://ez.no/Demos-Videos/eZ-Publish-Administration-Interface-Video-Tutorial
and you can download and test-drive eZ Publish Community Edition there : http://share.ez.no/latest
It is a PHP-based solution, strong professional community (http://share.ez.no), over 1100 add-ons available on http://projects.ez.no. The underlying libs are mostly relying on Apache Zeta Components, high-quality, robust set of PHP5 libraries.
Last note : the content model is abstracted, meaning you'd not have to create a new table everytime a new type of content should be stored : a simple content class definition from the administration interface, and the rest is taken care of, including the edition interface for the new content type. Might remove the need for hardcore SQL queries ?
Hope it helped,
Drupal can do most of what you need (I don't know of a module that will let you enter arbitrary SQL queries), but you will end up with some overhead of tables and modules you don't really need. It's up to you to decide if that's a problem or not. I don't think the overhead would hurt performance in your case.
The advantages of using Drupal would be the large community, the stability of the platform and the flexibility to add more functionality when needed. Also, the large user base ensures that most code has been tested rather well.
I highly recommend Drupal. It is very simple (also internally codebase is small and clean) it has dosens of possibilities and tremendous support. Once you start with Drupal you will never go to anything else.
Note that I'm not connected with Drupal staff, I've just created dosens of Drupas sites and many of them in just a minutes. My last one took me 2 hrs, see it here http://iPadDevZone.com
UPDATE #1:
It really depends on your DB schema complexity. The best case is that you just use CCK module (part of core now) and create your node type. Node is Drupal name for content. All you do is just web admin your node type fields (text, image, numbers, dates, custom, etc). Then, if user creates content with this node type he/she can enter all the fields which are stored in separate db table fields. This is however hidden for you - if you wish not to know about it - it is just a web gui. Then you choose how the node is presented, which properties as shown and where.
Watch videos in CCK resources section in the bottom of this page: http://drupal.org/project/cck
If you need to do some programming then it is also very easy to use so called PHP code sniplets which are entered as part of your content (node) and executed when the page is displayed.
Drupal has node revisions built in the core. You can see all the versions and roll back if you wish.
You can set the permissions in quite granular level so you can control what your users may or may not.
I would take a look at Symphony. I havn't been using it myself, but it seems like it's really easy to use and to customize!
http://symphony-cms.com/
Seems to me an online database system would be better than a CMS system.
So in addition to what's been posted above:
www.quickbase.com (by Intuit) - think around $150/mo
www.rollbase.com - check on price, full featured
www.rhythmdata.com - easy to set up, but don't think it's got the advanced features you're looking for.
Good luck!
B
I appreciate these answers, but most of them are really platforms that are much better at something else (eg, Drupal really is a CMS, and has some support for custom fields - but it's not at all easy). Since this is a brand new site from scratch, it doesn't really make sense to start with something that does custom database fields as an afterthought, I think.
The closest I've found is Zoho Creator. It really is like "MS Access for Web 2.0" - and even supports importing from Access. The pricing could get expensive though. It feels like it might eventually be quite constraining. I'm still evaluating.
Are there any other products like Zoho Creator?

Erlang: 2 database on 2 webserver?

I have created a blogging system with php+postgresql.
Now I want to add a web chat ( in REAL TIME for Million of users simultaneously ) where every message is saved in database.
I am thinking to use Erlang+Mnesia on a different webserver for this issue.
Message's table will be like this:
message_id, user_id, message, date
user_id should be related with users table in Postgresql database in another webserver.
How can I do that without lose performance ?
If you have any other creative solutions tell me please ;).
I'm not sure why you want to save every single message in a database, but mnesia doesn't sound like a particularly good choice for doing that. Mnesia is more of a distributed key-value store, that you can use to keep the state of your application, when you need to store "tabular data" and query it, in a simple to medium-complex fashion.
For large amounts of text, I've heard lucene is supposed to be good, it has fulltext search features etc. which are said to be efficient, you might want to look into it:
Apache Lucene Project page
Other than that, using erlang as chatserver, using mnesia to hold all the other state sounds like a good idea, You could write a javascript client that uses something like JSONP (to overcome the cross-domaine-issue) and mochiweb on the erlang site to do the webserver part.
Writing the rest of the core chat system should be fairly simple, the fun part, so to say :)
Mnesia can certainly do what you suggest but if you've already got postgres set up is there some reason you don't want to use that? It might be simpler than creating a whole seperate service and if you want erlang to run the chat service then it has postgres drivers.
This project is using postgresql with great success.
http://zotonic.com/
You may want to use the same code for db access to postgresql.

Django database scalability

We have a new django powered project which have a potential heavy-traffic characteristic(means a heavy db interaction). So we need to consider the database scalability in advance. With some researches, the following questions are still not clear to us:
coarse-grained: how to specify one db table(a django model) to a specific db(maybe in another server)?
fine-grained: how to specify a group of table rows to a specific db(so-called sharding, also can in another db server)?
how to specify write and read to different db?(which will be helpful for future mysql master/slave replication)
We are finding the solution with:
be transparent to application program(means we don't need to have additional codes in views.py)
should be in ORM level(means only needs to specify in models.py)
compatible with the current(or future) django release(to keep a minimal change for future's upgrading of django)
I'm still doing the research. And will share in this thread later if I've got some fruits.
Hope anyone with the experience can answer. Thanks.
Don't forget about caching either. Using memcached to relieve your DB of load is key to building a high performance site.
As alex said, django-core doesn't support your specific requests for those features, though they are definitely on the todo list.
If you don't do this in the application layer, you're basically asking for performance trouble. There aren't any really good open source automation layers for this sort of task, since it tends to break SQL axioms. If you're really concerned about it, you should be coding the entire application for it, not simply hoping that your ORM will take care of it.
There is the GSoC project by Alex Gaynor that in future will allow to use multiple databases in one Django project. But now there is no cross-RDBMS working solution.
There is no solution right now too.
And again - there is no cross-RDBMS solution. But if you are using MySQL you can try excellent third-party Django application called - mysql_replicated. It allows to setup master-slave replication scenario easily.
here for some reason we r using django with sqlalchemy. maybe combination of django and sqlalchemy also works for your needs.

Resources