Integration of Drupal 7 And Dspace - drupal-7

I need to integrate DSpace with Drupal.
I've installed DSpace and use the XMLUI interface. It's working fine. Now I need to integrate it with Drupal and I also want to do employ a single sign-on system.
I have tried this DSpace module for Drupal and followed the instruction given even, yet I am still not able to access the DSpace content.

There are a few different components in your question:
Single Sign on
DSpace supports stackable authentication methods. Depending on which single sign on you wish to use, you may need to consider different approaches. The overall documentation on authentication methods can be found here:
https://wiki.duraspace.org/display/DSDOC4x/Authentication+Plugins
Read-only operations on DSpace from Drupal
DSpace 4 ships with a Jersey based REST API
https://wiki.duraspace.org/display/DSDOC4x/REST+API
Based on this API, you could use Drupal modules like the following for the integration:
http://drupal.org/project/rest_client
http://drupal.org/project/clients
Depositing content in DSpace from Drupal
The aforementioned REST API does not support write operations yet. However, DSpace supports SWORD 2, a very elaborate standard for facilitating Deposit.
I am not sure if there is a working Drupal module for SWORD 2 deposit. I found an older one here:
http://drupal.org/project/sword
The SWORD 2 Spec itself can be found at:
http://swordapp.org/

Related

Apache Solr: Can apache solr be used as a third part system for indexing and searching for documents from different websites?

I am working on implementing a research web application or portal that integrates different research portal or website using an open source platform called search kit. The web application will act as a central point of access to research publications on different research portals. To do this, I also need to implement a third party system that does the following:
Searches for documents based on user query on the other different research portals and presents or displays the results to the users on my web application.
Index the documents
Should be used by system administrators to configure the web application. Whereby system administrators can add,remove or modify the URL of the website Solr is pulling documents from
Displays the results to the user in one standard format.
My question is, can apache solr be used to implement the third party system? if not, what open source platform or way would you recommend I used to implement the third party system?
In general, Solr seems like a good fit here, but you might need some custom code (apart from configuration) here and there. To go through the points:
Querying is one of the main features of Solr, so this is definitely possible.
Indexing is handled by Solr.
There was a component for Solr called "Data Import Handler" that supported indexing from URLs (see the docs). However, this was removed from the main Solr distribution, and was moved to a separate package. This package doesn't seem to be actively maintained though, so you will probably run into some problems if you decide to use it. The alternative is to develop your document-pulling code yourself.
Solr can display the results in multiple formats, but it still might not support the exact format you would like it to be. In this case, you need to build your transformation based on the result from Solr.

After consolidating 3 different alfresco application, OOTB search is not working

I have consolidated 3 different alfresco application, initially we have 3 separate applications in alfresco.
but now we have consolidated those 3 applications and all applications repository layer code is running on single instance and there are 3 different share instances.
But here alfresco OOTB search functionality is not working properly.
I also found the root cause that all 3 applications are having search.get.js file
but for all these application we have some custom code in each file.
so can anyone please suggest me the way to use all 3 search.get.js files to achieve search functionality.

Solr luceneMatchVersion syntax

I have Solr 4.10 and I have collection on it with solorconfig.xml has the value for <luceneMatchVersion> as follows:
<luceneMatchVersion>4.7</luceneMatchVersion>
Is this correct? I saw other examples that has values such as LUCENE_35 What I need to know also, how could I express LUCENE_xx from my current Solr version?
You should use:
<luceneMatchVersion>4.10.4</luceneMatchVersion>
I recommend you to check your current solr version, in my case was 4.10.4.
if you are going to reindex, then both numbers should match. The only reason you might want to have them different, is if you had and index created with say Lucene 4.7, then you would have
<luceneMatchVersion>4.7</luceneMatchVersion>
Then, you upgrade lucene to 4.10.
Now, if among the changes in between 4.7 and 4.10 there are things that work differently regarding analysis (you get the same sentence analysed in both versions and get different output as a result), then, you might want to keep the version number at 4.7, otherwise some queries that contain affected terms might not work (as they were analysed at index time in a different way than at query time). You have to asses how critical that issue might be.
That is why the recommendation is to upgrade, change the setting to the current number, and reindex. This way you are sure to avoid any issue.
If anyone is using Drupal, the Search API Solr (search_api_solr) module has config templates by version in /sites/all/modules/search_api_solr/solr-conf/.
The template README.md states the following:
The solr-conf-templates directory contains config-set templates for
different Solr versions.
These are templates and are not to be used as config-sets!
To get a functional config-set you need to generate it via the Drupal
admin UI or with drush solr-gsc. See README.md in the module
directory for details.
The module's README.md lists these instructions:
Make sure you have Apache Solr started and accessible (i.e. via port 8983). You can start it without having a core configured at
this stage.
Visit Drupal configuration (/admin/config/search/search-api) and create a new Search API Server according to the search_api
documentation using "Solr" as Backend and the connector that
matches your setup. Input the correct core name (which you will
create at step 4, below).
Download the config.zip from the server's details page or by using drush solr-gsc with proper options, for example for a server named
"my_solr_server": drush solr-gsc my_solr_server config.zip 8.4.
Copy the config.zip to the Solr server and extract.
I generated a config file for 8.x, and it uses this:
<luceneMatchVersion>${solr.luceneMatchVersion:LUCENE_80}</luceneMatchVersion>

How to integrate Solr with Web Application

After reading many Solr books and article all over on the net, now I have an idea of the power of this server.
But... how to integrate it in a real application? For example: a web site written in PHP, etc.
Right now, I understand that Solr produces XML, JSON etc results... so to integrate this in a web application, the "simple" work is to convert this information for render in a page or there are other technique to avoid this?
I'm my case, I have to develop a search engine to scan many documents and find result.
My idea was:
Use Solr to build an index and search documents
Use a web application to show the result
Looking on the net I haven't find anything that explains how to integrate Solr in a real application, all the reading are about "How to use Solr... with Solr..." Anything about a real integration.
Does someone have some useful resource how to integrate Solr in a real application, with some clean examples?
Edit: It looks like Apache maintains their own list of recommended
client APIs, and their recommended tool for PHP is Google's
library (though they refer to it as SolPHP). Given this, I imagine that this is the best place
to start.
A Solr library for the programming language you're using could save you some of the trouble in implementing the integration. For instance, if your site is written in PHP, you could try Google's Solr library for PHP.
I have done most of my Solr work in Java, so I have used SolrJ quite a bit. This is a well supported tool because it comes from Apache in parallel with the Solr product itself.
If you are doing work in any other languages, you are likely to find libraries available for them. The amount of time they save you may vary according to the quality of the library itself.
When I was using Solr in my project, only my application server (that is Tomcat) was communicating with Solr server. I wrote a class, which executes GET requests to Solr server based on input provided by end user. When Solr returns XML/JSON back to an application server you may parse it and process as every other bussiness data (render an *.html). So, summing up, Web Browser never communicates directly with Solr, all goes through an application server:
WebBrowser -> GET to application server -> GET to Solr server
show *.html <- parse XML/JSON, render *.html <- return XML/JSON

drupal site better performance

i'm working on drupal community site and i want to ask about this things :
1 - how to hide the drupal information from my site ?
2 - how to make the drupal site more secure ?
3 - how to make my site work as fast as possible when there is a lot of visitors and users on the site
and there is a lot of interaction with the database at the same time?
4 - how to configure drupal to work with high server load and how to configure my server's hardware to work with high load ?
thank you
1 - how to hide the drupal information
from my site ?
What information? You can show/hide anything in your theme implementation
2 - how to make the drupal site more
secure ?
Stay up to date.
3 - how to make my site work as fast
as possible when there is a lot of
visitors and users on the site and
there is a lot of interaction with the
database at the same time?
4 - how to configure drupal to work
with high server load and how to
configure my server's hardware to work
with high load ?
Start with the pantheon project use it and learn from it:
Pressflow (a performance tuned version of Drupal)
Varnish Reverse Proxy Cache for anonymous users
APC for OpCode Caching
Memcached for easing the load on the DB
Use as few modules as possible.
The first area to need help in a social setup (lotsa logged in users posting content) is likely going to be the DB and so learning how to use Memcached will go a long way to helping you scale at the start
For further reading on Drupal Performance you might want to read everything from 2bits:
http://2bits.com/contents/articles
1 - how to hide the drupal information from my site ?
Remove the credits block.
Use template files, so that the look and feel is different from default Drupal sites.
Optimise your jss and css, so that it is difficult to identify that it is from Drupal.
Remove changelog.txt file from root.
2 - how to make the drupal site more secure ?
Have the latest stable version of Drupal and keep all your modules upto date. (Regularly check for security patches if there are no updates)
Install security review module
Theme is the weakest link in Drupal security. While theming make sure that you follow all the Drupal standards. Remember to sanitize data and use Drupal functions wherever possible.
3 - how to make my site work as fast as possible when there is a lot of visitors and users on the site and there is a lot of interaction with the database at the same time?
Memcache : high-performance, distributed memory object caching system. Eases the load on DB
Intelligent use of cache API in your custom modules.
4 - how to configure drupal to work with high server load and how to configure my server's hardware to work with high load ?
CDN : Content delivery Network, use this if you are rich enough.
Press Flow : Out of box performance for your Drupal site, from Four Kitchens.
Varnish : Reverse Proxy Cache
Try using Pantheons hosting service at: http://getpantheon.com/
We are using it, and are very happy with it so far.
1: Don't bother.
2: Make sure you keep your Drupal installation (including third-party modules) up to date.
3 and 4: Caching is a good step to take. Drupal comes with some handy caching features built-in (in the Performance settings), and modules like CacheRouter and Boost take you a long way further.
1 - What information exactly do you want to hide?
3 - Use the Devel modules to see what's happening. There's a lot of tweaking invovled here, especially if your using Views.
4 - Cache modules such as boost do a lot. Then there are things such as the web server, Nginx for example is generally faster then Apache, especially serving static content (and PHP-FPM for dynamic). You should also check out Memcached, APC or another php cache and of course Varnish cache is pretty awesome.
I saw above you mentioning making Drupal working with more than one databases. If you mean replications, I think that's directly covered in Pressflow's introduction here.

Resources