Joomla 4 Smart Search: Extra database for indexing data - database

I'm just about to migrate my site from Joomla 3 to Joomla 4. It's a pretty big website with thousands of articles that leads to problems using the new Joomla smart search.
The indexing of the articles for the smart search lets the database grow up to almost 2 GB. That's way too much. Just wondering whether anyone with the same problem figured out how to save the data for the component into an extra database what might be a solution for this mess.
As the old Joomla search won't be supported with Joomla 5 anymore, I'd like to avoid this alternative.

Related

Advice for implementing a user-search from with React UI and Spring Boot server

I have a Spring Boot/React application. I have a list of users in my database I will have populated already from LDAP.
As part of a form, I need to allow users to specify a list of users. Since they could be searching from (and technically specifying as well), up to 400,000 users (most will be in the 10k or less range), I'm assuming I'd need to do this both client and server-side.
Does anyone have any recommendations on the approach or technologies?
I'm not using a small amount of data, but I don't want to over-engineer it either (tips are mostly for server-side, but any are welcome).
If you are using hibernate as the ORM in your application, you may also checkout Hibernate Search. This seems to serve your purpose as I feel that searching through a list of users can be done using a normal text based index. Hibernate search leverages Lucene, which is suitable for text based indexing and searching.
While another answer is good and works perfectly fine when you have a small set of data but be aware of the few design issue with it.
Lucene is not distributed and you can't easily scale it to multiple horizontal machines without duplicating the whole index, which is perfectly fine when you have a small set of data and in-fact it's pretty fast as there will be no network call(in case of elasticsearch, it will be).
If you want to build a stateless application that is easy to HS(horizontally scalablele) then going with Lucene will not be helpful as it stateful and you need to create Lucene index before your newly spawned app-server finished local indexing in Lucene.
Elasticsearch(ES) is rest-based and is written in JAVA and has very good java-client which you can easily use for simple to complex use-cases.
Last but not the least, please go through the STOF answer of none other than shay banon, creator of Elasticsearch, who explains why he created ES in first place :) and which will give more trade-off and insights to choose a best solution for your use-case.

DataBase for documents

I am beginning a new project and want to choose a db.
I don't have a fixed language yet but was thinking to go with Java 15 and spring.
The issue I have is that, this time, I will manage lots of documents (mainly scans or pictures).
I don't know which db does it best, or if I need to add a particular API.
What is the best database choice to manage docs ?
Do I need to consider specifics API to help ?
Thank you

How much users Joomla 3 can handle at once?

I plan to build a site based on Joomla 3 CMS and client says that they will have from 750 to 1000 registered users per a day.
I wonder whether Joomla 3 is able to handle all those users at once or not?
That is a rather broad question.
It depends on how you have optimised your site and what type of hosting platform you are using. A $10/year shared platform obviously wouldn't work with Joomla! with that many users.
Joomla! is very efficient and shouldn't have any problems with this, but if you start adding in lots of plugins, components and modules which are not off a great quality then it will effect performance. Hence the broadness of your question.

Good (CMS-based?) platform for simple database apps

I need to implement yet another database website. Let's say roughly 5 tables, 25 columns, and (eventually) thousands to tens of thousands of rows. Easy data entry and maintenance are more important than presentation of the data to non-privileged users. It's a niche site, so performance is not a concern. We'll have no trouble finding somewhere to host it.
So: what's a good platform for this? Intituitively I feel that there ought to be some platform that allows this to be done with no code written - some web version of MS Access. Obviously I'm happy to code business rules, and special logic that distinguishes this from every other database app.
I've looked at Drupal (with Views) and it looks possible, but with quite a bit of effort. Will look at Al Fresco next. A CMS-y platform helps because then you can nicely integrate static content, you get nice styling, plugins, etc etc.
Really good data entry (tracking changes, logging, ability to roll back, mass imports...) would be great. If authorised users could do arbitrary SQL queries (yes, I know...) that would be a big bonus. Image management support a small bonus.
Django is what you are looking for. In fact, you could probably set up what you ask without much coding at all, just configuration.
Once complete, authorised users can add 'rows' with a nice but simple GUI, or, of course, you can batch import via database commands.
I'm a Python newbie, and I've already created 2 Django-based sites. I have created more than a dozen Drupal-based sites, and Django is easier and produces significantly faster sites.
Your need somewhat sits between two chairs : bespoke application and CMS-based. I'd advocate for the CMS approach, if and only if you feel the need for content structure customization will grow in the future, slowly removing the need for direct SQL queries.
I am biased since working with eZ Publish for many years now, but it satisfies the requirements you expressed natively :
Really good data entry (tracking changes, logging, ability to roll back, mass imports...)
[...] Image management support a small bonus.
An idea of the content edition feel can be watched here:
http://ez.no/Demos-Videos/eZ-Publish-Administration-Interface-Video-Tutorial
and you can download and test-drive eZ Publish Community Edition there : http://share.ez.no/latest
It is a PHP-based solution, strong professional community (http://share.ez.no), over 1100 add-ons available on http://projects.ez.no. The underlying libs are mostly relying on Apache Zeta Components, high-quality, robust set of PHP5 libraries.
Last note : the content model is abstracted, meaning you'd not have to create a new table everytime a new type of content should be stored : a simple content class definition from the administration interface, and the rest is taken care of, including the edition interface for the new content type. Might remove the need for hardcore SQL queries ?
Hope it helped,
Drupal can do most of what you need (I don't know of a module that will let you enter arbitrary SQL queries), but you will end up with some overhead of tables and modules you don't really need. It's up to you to decide if that's a problem or not. I don't think the overhead would hurt performance in your case.
The advantages of using Drupal would be the large community, the stability of the platform and the flexibility to add more functionality when needed. Also, the large user base ensures that most code has been tested rather well.
I highly recommend Drupal. It is very simple (also internally codebase is small and clean) it has dosens of possibilities and tremendous support. Once you start with Drupal you will never go to anything else.
Note that I'm not connected with Drupal staff, I've just created dosens of Drupas sites and many of them in just a minutes. My last one took me 2 hrs, see it here http://iPadDevZone.com
UPDATE #1:
It really depends on your DB schema complexity. The best case is that you just use CCK module (part of core now) and create your node type. Node is Drupal name for content. All you do is just web admin your node type fields (text, image, numbers, dates, custom, etc). Then, if user creates content with this node type he/she can enter all the fields which are stored in separate db table fields. This is however hidden for you - if you wish not to know about it - it is just a web gui. Then you choose how the node is presented, which properties as shown and where.
Watch videos in CCK resources section in the bottom of this page: http://drupal.org/project/cck
If you need to do some programming then it is also very easy to use so called PHP code sniplets which are entered as part of your content (node) and executed when the page is displayed.
Drupal has node revisions built in the core. You can see all the versions and roll back if you wish.
You can set the permissions in quite granular level so you can control what your users may or may not.
I would take a look at Symphony. I havn't been using it myself, but it seems like it's really easy to use and to customize!
http://symphony-cms.com/
Seems to me an online database system would be better than a CMS system.
So in addition to what's been posted above:
www.quickbase.com (by Intuit) - think around $150/mo
www.rollbase.com - check on price, full featured
www.rhythmdata.com - easy to set up, but don't think it's got the advanced features you're looking for.
Good luck!
B
I appreciate these answers, but most of them are really platforms that are much better at something else (eg, Drupal really is a CMS, and has some support for custom fields - but it's not at all easy). Since this is a brand new site from scratch, it doesn't really make sense to start with something that does custom database fields as an afterthought, I think.
The closest I've found is Zoho Creator. It really is like "MS Access for Web 2.0" - and even supports importing from Access. The pricing could get expensive though. It feels like it might eventually be quite constraining. I'm still evaluating.
Are there any other products like Zoho Creator?

Anyone using CouchDB? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I've followed the CouchDB project with interest over the last couple of years, and see it is now an Apache Incubator project. Prior to that, the CouchDB web site was full of do not use for production code type disclaimers, so I'd done no more than keep an eye on it. I'd be interested to know your experiences if you've been using CouchDB either for a live project, or a technology pilot.
After 18 Months of prototypes, testing and waiting for CouchDb to get ready we moved an internal application over to CouchDB in December 2008. So far I'm very happy with that move. It gets rid of a lot of filesystem objects for us (PDFs and JPEGs, now stored as attachments in CouchDB). This enables us to get rid of NFS and easier cluster/replicate our frontend webservers.
To what degree CouchDB is ready for you depends very much on the culture of your organization. We have an in-house development team maintaining several internal Erlang applications. Since CouchDB is written in Erlang and the codebase is of quite decent quality we felt confident that we could fix show stopper issues in CouchDB should the need arise - or at least get our data back out. We also hired one of the CouchDB core team as an consultant - just in case.
But CouchDB for sure isn't 1.0 yet. There are crashes in the Web worker processes all the time (if you misuse them). Replication breaks for us and we don't get error messages about it. Documentation is still very lacking. Still I'm confident that it will not eat our data and development moves forward with reasonable pace.
To give you an idea about our application: currently our biggest database is about 512000 records taking 7.5 GB of diskspace.
I use the CouchDB to power a Facebook application (over 35k monthly active users). For a while it was using MySQL but after porting the entire project over from Perl to Erlang, I decided to go for the gold and migrate all of the data into CouchDB and use that instead.
CouchDB has been a great data store to work with. I think that it is on track to becoming a major player in web-based services.
I got to know one of the people (Jan) working on it a while ago (like 6 months) and have been playing with it ever since. I found the community around CouchDB to be both very knowledgable and helpful so that whenever I ran into an issue it was resolved in a matter of minutes or hours at least.
We just kicked off a project the other week which basically requires us to store data in the non-relational way and due to CouchDB's document oriented store we selected it as one of the technologies to use. So this is actually the first time that I will run it in production, but I'm still pretty confident about it. :)
Just an update here (2009-10-25):
Our first CouchDB install is 20 GB, it hosts 40 million records. It's been running in production since January 2009, and it's been great. Read (GET) speed is outstanding and we use it as a store for complex data, and then it's just pull.
Our second couchdb installment has two databases, one is 160,000,000+ documents (210 GB), and growing between 150,000-300,000 documents a day. The other is only 35,000,000 documents (7 GB). This setup has a lot more reads and writes and initial tests are performing very well.
View building on the 160,000,000 document database took roughly a week, but since then we upgraded to a larger Amazon EC2 instance and we are also getting ready to update to CouchDB 0.10.x (from 0.9.1) as this release includes a lot of performance improvements in view building.
I am using couchdb in a few scenarios, as a document store for http://devk.it (under development) and in a much larger scale as a template store for a distributed email delivery system.
CouchDB is very slick for what it does, but I was not able to get it to run at as high of a concurrency level as I would have expected. Also note that the maximum document size is fairly limiting at 1MB due to the hardcoded max input buffer size in mochiweb. You can however alter a header file and recompile to get around this limit.
I'm using CouchDB to store (and serve) article ratings on my blog. It's not exactly heavy traffic but it's been rock solid so far.
Also planning on adding comments sometime which I'll most likely also store in CouchDB.
I've found it quite easy to get started with, on OSX you can just download CouchDBX to get started quickly. I use a Sinatra backend with RestClient to interact with 'the couch' through straight HTTP verbs and such.
Great fun.
At the moment I'm working with CouchDB for a computer science thesis. I'm writing about my progresses and opinions on my blog, http://metalelf0dev.blogspot.com. I think the project is well done, but the existing documentation isn't organized as it should. A quick tutorial about the Futon web interface could be really useful for starters IMHO :)
I used couchdb twice in production. First was the wiki likes project and I think that couchdb was perfect candidate for that role. Saving the version of all docs helps a lot.
The second project was quite query loaded and idea was dumping social data first, then query it with various filters. It was looked like standard CouchDB query features looks a bit pure for our needs. But we add Lucene like a full text indexer and after that doing many queries during Lucene part. And that solution looks good enough.

Resources