Why does Sling/Jackrabbit use twice the disc space to store a file? - jackrabbit

Why does Sling/Jackrabbit use twice the disc space to store a file and what can I do to prevent it from doing this?
I'm working on a project where we're storing files and associated data in Sling. For operational reasons we would like to avoid excessive disc usage. We've hit a problem where anytime we store a file in Sling the amount of disc space used appears to be double.
For instance, I have 47 megs of files. When uploaded to Sling I have 53 megs in the datastore, there's a lot of different files so I can accept that amount of inflation. However if we look at the total size of the Jackrabbit repository we see a different story ...
jackrabbit]# du -s
118688 .
jackrabbit]# du -s repository/datastore/
53296 repository/datastore/
jackrabbit]# du -s workspaces/default/blobs/
53464 workspaces/default/blobs/
Now the data associated with these files is quite small (<1 meg) so I don't see why the workspace is storing so much data when it's supposed to be all stored in the datastore. This problem gets worse when I upload a larger file, here I've added a 277 meg file to the above repository.
jackrabbit]# du -s
686508 .
jackrabbit]# du -s repository/datastore/
337152 repository/datastore/
jackrabbit]# du -s workspaces/default/blobs/
337320 workspaces/default/blobs/
I've uploaded sets of multi gigabyte files and this behaviour seems to be consistent.
I'm using the default repository.xml file created by the Sling standalone 6 when its run, this does appear to have the FileDataStore configured (see below). I was under the impression that with the FileDataStore configured Jackrabbit wouldn't store its files in blobs but that appears to be the case here. Can anyone provide me with a reason why this behaviour is exibited or a way to disable it? It seems exceedingly strange that a system would essentially store a file twice.
<Repository>
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/repository"/>
</FileSystem>
<DataStore class="org.apache.jackrabbit.core.data.FileDataStore"/>
<Security appName="Jackrabbit">
<SecurityManager class="org.apache.jackrabbit.core.DefaultSecurityManager" workspaceName="security">
</SecurityManager>
<AccessManager class="org.apache.sling.jcr.jackrabbit.server.impl.security.PluggableDefaultAccessManager">
</AccessManager>
<LoginModule class="org.apache.sling.jcr.jackrabbit.server.impl.security.PluggableDefaultLoginModule">
<param name="anonymousId" value="anonymous"/>
<param name="adminId" value="admin"/>
</LoginModule>
</Security>
<Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="default"/>
<Workspace name="${wsp.name}">
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${wsp.home}"/>
</FileSystem>
<PersistenceManager class="org.apache.jackrabbit.core.persistence.db.DerbyPersistenceManager">
<param name="url" value="jdbc:derby:${wsp.home}/db;create=true"/>
<param name="schemaObjectPrefix" value="${wsp.name}_"/>
<param name="shutdownOnClose" value="false"/>
</PersistenceManager>
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="supportHighlighting" value="true"/>
</SearchIndex>
</Workspace>
<Versioning rootPath="${rep.home}/version">
<FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
<param name="path" value="${rep.home}/version" />
</FileSystem>
<PersistenceManager class="org.apache.jackrabbit.core.persistence.db.DerbyPersistenceManager">
<param name="url" value="jdbc:derby:${rep.home}/version/db;create=true"/>
<param name="schemaObjectPrefix" value="version_"/>
<param name="shutdownOnClose" value="false"/>
</PersistenceManager>
</Versioning>
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${rep.home}/repository/index"/>
<param name="supportHighlighting" value="true"/>
</SearchIndex>
</Repository>

Related

WildFly 10.1 Failed to create journal-datasource to store message

I'm using an old version of WildFly (10.1) which uses the ActiveMQ Artemis that comes with it, and I was trying to switch the store method from a store in the file system to store in the database (and I know that the ActiveMQ Artemis default way to store the is better).
So I'm using this configuration for ActiveMQ Artemis:
<subsystem xmlns="urn:jboss:domain:messaging-activemq:1.0">
<server name="default">
<security-setting name="#">
<role name="guest" send="true" consume="true" create-non-durable-queue="true" delete-non-durable-queue="true"/>
</security-setting>
<address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10"/>
<http-connector name="http-connector" socket-binding="http" endpoint="http-acceptor"/>
<http-connector name="http-connector-throughput" socket-binding="http" endpoint="http-acceptor-throughput">
<param name="batch-delay" value="50"/>
</http-connector>
<in-vm-connector name="in-vm" server-id="0"/>
<http-acceptor name="http-acceptor" http-listener="default"/>
<http-acceptor name="http-acceptor-throughput" http-listener="default">
<param name="batch-delay" value="50"/>
<param name="direct-deliver" value="false"/>
</http-acceptor>
<in-vm-acceptor name="in-vm" server-id="0"/>
<jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>
<jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>
<connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>
<connection-factory name="RemoteConnectionFactory" connectors="http-connector" entries="java:jboss/exported/jms/RemoteConnectionFactory"/>
<pooled-connection-factory name="activemq-ra" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory" connectors="in-vm" transaction="xa"/>
</server>
</subsystem>
and than using jboss-cli.bat (I'm using Windows), I run this command:
/subsystem=messaging-activemq/server=vmvanz:write-attribute(name=journal-datasource, value=ExampleDS)
And I get this error:
{
"outcome" => "failed",
"failure-description" => "WFLYCTL0201: Unknown attribute 'journal-datasource'",
"rolled-back" => true
}
I try the same command in WildFly 14 and work. I read the documentation from 10.1 and it show the same command line.
Does someone knows if this is a problem from this version or know some other way to configure?
WildFly 10.1 doesn't support the journal-datasource attribute. It ships with version 1.0 of the ActiveMQ messaging configuration schema. This is specified in your configuration here:
<subsystem xmlns="urn:jboss:domain:messaging-activemq:1.0">
The journal-datasource attribute wasn't available until version 2.0 of the ActiveMQ messaging configuration schema which first appeared in WildFly 11.
Also, the documentation you linked is for WildFly 13. This is visible in the URL:
https://docs.wildfly.org/13/Admin_Guide.html#Messaging_JDBC_Store_for_Messaging_Journal

Sitecore Solr Index Configuration Set different root path

I need add to single Solr index 2 different sitecore path /sitecore/content/System/My Path and /sitecore/content/System/My Path 2.
How I can do it in correct way ?
Should I just add new crawler description to locations section with new root value. Should I use | spliter at root section ? Or need I copy all index section with new path in root ?
<index id="my_index" type="Sitecore.ContentSearch.SolrProvider.SwitchOnRebuildSolrSearchIndex, Sitecore.ContentSearch.SolrProvider">
<param desc="name">$(id)</param>
<param desc="core">myindex</param>
<param desc="rebuildcore">myindex_swap</param>
<param desc="propertyStore" ref="contentSearch/databasePropertyStore" param1="$(id)" />
<configuration ref="contentSearch/indexConfigurations/countryIndexConfiguration" />
<strategies hint="list:AddStrategy">
<strategy ref="contentSearch/indexUpdateStrategies/manual" />
</strategies>
<locations hint="list:AddCrawler">
<crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
<Database>web</Database>
<Root>/sitecore/content/System/My Path</Root>
</crawler>
</locations>
</index>
Just copy crawler and use different tag name than the original one, e.g. crawler1. It doesn't matter what's the tag name, so you can use some more specific tag names, e.g. news or blogs. The important thing is that you can not have 2 tags with the same name under locations tag.
Sample config with 2 roots:
<locations hint="list:AddCrawler">
<news type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
<Database>web</Database>
<Root>/sitecore/content/System/news</Root>
</news>
<blogs type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">
<Database>web</Database>
<Root>/sitecore/content/System/blogs</Root>
</blogs>
</locations>
We are using multiple tags with same name as crawler under locations and its working fine. It supports multiple tags with same name.

Getting org.apache.jackrabbit.core.state.ItemStateException while working with jackrabbit repository

class="org.apache.jackrabbit.core.persistence.bundle.PostgreSQLPersistenceManager">
<param name="driver" value="org.postgresql.Driver" />
<param name="url" value="jdbc:postgresql://192.168.1.200:5433/NEWDMS" />
<param name="user" value="postgres" />
<param name="password" value="eminence" />
<param name="schema" value="postgresql" />
<param name="schemaObjectPrefix" value="${wsp.name}_" />
<param name="externalBLOBs" value="false" />
</PersistenceManager>
I have created a transient Repository and done changes in repository.xml file
while accessing jackrabbit repository i am getting following exception : org.apache.jackrabbit.core.state.ItemStateException: failed to read bundle: deadbeef-face-babe-cafe-babecafebabe
java.lang.IllegalArgumentException: Invalid namespace index: 3158064
The "failed to read bundle: deadbeef-face-babe..." message is a symptom of a repository inconsistency. Start with these knowledge base articles: Fix the "jcr:system" node, Consistency Check, and Tar Data File Rotation. The third link highlights this configuration parameter of the persistence manager that might be of interest (although I see that you are not using the default TPM persistence manager, so it might not be relevant):
<param name="maxFileSize" value="512" />

GAE log on localhost to file

when i debug my GAE application on localhost, how can i save the log created with Logger class to file? I can see it in console now(stderr) but dont want to redirct console to file. I found some solutions for python but cant make it work for java. Please can you help me?
Add an ApplicationAppender in your log.xml setting file:
<appender name="applicationAppender" class="org.apache.log4j.DailyRollingFileAppender">
<param name="encoding" value="UTF-8"/>
<param name="file" value="C:/logs/yourlogname.log"/>
<param name="DatePattern" value="'.'yyyy-MM-dd" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{HH:mm:ss} %-5p %l - %m%n" />
</layout>
</appender>
This is for log4j, but it is a standard logging paradigm. Also you need to have your logger definition use the appender you create.
In Linux and OSX, you can use tee to direct output to a file while still making it visible on standard out:
my_command | tee filename

How do I prevent SimpleSecurityManager being used in JackRabbit?

How I stop Jackrabbit using SimpleSecurityManager?
I'm trying to call session.getUserManager() but I get a repository exception as SimpleSecurityManager.getUserManager() explicity throws it.
<Security appName="Jackrabbit">
<SecurityManager class="org.apache.jackrabbit.core.DefaultSecurityManager" workspaceName="security">
</SecurityManager>
<AccessManager class="org.apache.jackrabbit.core.security.simple.SimpleAccessManager">
<!-- <param name="config" value="${rep.home}/access.xml"/> -->
</AccessManager>
<LoginModule class="org.apache.jackrabbit.core.security.authentication.DefaultLoginModule">
<param name="anonymousId" value="anonymous"/>
<param name="adminId" value="admin"/>
</LoginModule>
</Security>
Rest of code for those that will ask...
Repository repository = new TransientRepository();
Session jackrabbitSession = repository.login(credentials);
UserManager userManager = session.getUserManager();
The user manager is a Jackrabbit extension. It's not a part of the JCR. So, you need to use a JackrabbitSession, not just a Session. Here's a link to the wiki:
http://wiki.apache.org/jackrabbit/UserManagement

Resources