Updating SOLR schema using version numbers - solr

I am designing a SOLR schema for my project, and will create the fields using the Schema-API.
I will likely need to add new fields to the schema in the future.
With SQL databases, I usually store a schema version number in a well-known table. Then when my app starts up, it checks to make sure the database schema is current. If not, I execute all of the needed updates (which are numbered) to bring it up to date.
How can I achieve this with SOLR using the schema-api? Specifically, how/where could I store and retrieve a version number?

My current workaround/solution is to store the version in the name of a SOLR field, for example by creating a field called "schema_version_2". When my app starts up, I retrieve the list of fields using the schema-api and iterate over them, looking for a field called "schema_version_XX".
Then I can determine whether I need to apply any upgrades to the SOLR schema for my app. If necessary, my app upgrades to the latest schema version (typically adding/modifying fields). At the end, I increment the version, for example by deleting the "schema_version_2" field and creating a new field called "schema_version_3".
I would still like to know what pattern and solution developers with more SOLR experience use to solve this problem.

Related

Does Vespa support dynamic fields?

I am looking for any option like dynamic fields (Solr) in Vespa. I need to add new fields to the existing schema without redeployment of the whole application.
Anything related to this is mentioned in documentation of Vespa http://docs.vespa.ai/documentation/search-definitions.html#modify-search-definitions
where it is mentioned that we can add a new field and run vespa-deploy prepare and vespa-deploy activate. But isn't it resubmission of whole application? What is an option to implement it at least overhead?
http://docs.vespa.ai/documentation/search-definitions.html#modify-search-definitions is the right document, yes
altering the search definition is safe, as the deploy prepare step will output steps needed (e.g. no steps, re-start or re-feed). Most actions do not require re-start or re-feed, as Vespa is built to be production friendly, and adding fields is easy as it requires no actions.
Note that there is no support for default values for fields.
As the Vespa configuration is declarative, the full application package is submitted, but the config servers will calculate changes and deploy the delta to the nodes. This makes it easy to keep the application package config in a code repo like git - what you see in the repo is what is deployed.
It depends on what you mean by "dynamic".
If the fields number in the hundreds, are controlled by the application owners and changed daily, then changing the schema and redeploy works perfectly fine: Fields can be updated individually (even when indexed), they incur no overhead if not used, and deploying an application with new fields is cheap and do not require any restarts or anything like that.
If you need tens of thousands of fields, or the fields are added and removed by users, then Vespa does not have a solution for you out of the box. You'd need to mangle the fields into the same index by adding e.g a "myfield_" prefix to each token. This is what engines that support this do internally to make this efficient.

Get information from publishItem to publish end in sitecore

I got a setup with sitecore and solr.
Im looking to gather information (the different TemplatesIds) in publishItem, and then when the publish has ended, call solr with the names which needs to be reindex.
Ive managed to get all the template IDs both using PublishItemProcessor and as a publish:itemProcessed event, where i store the template ids in the PublishContext.CustomData as a Hashset.
But how can i, when the publishing is done get this information i've gathered during publishing? I want to call solr, once, and only once, after everything is published, with information gathered during the publishing.
Hope this makes sense guys, please help out.
You don't need to make a hack to reindex indexes after a publishing.
Sitecore has out of the box this functionality.
You use index update strategies to maintain indexes. You can configure each index with a unique set of index update strategies. You should not specify more than three update strategies per index for performance reasons.
Sitecore provides a varied set of index update strategies, and you can extend this set with more strategies.
All the strategies that are delivered with Sitecore are defined under the following node in the Sitecore.ContentSearch.Solr.Index.IndexName configuration files:
<configuration ref="contentSearch/indexConfigurations/defaultSolrIndexConfiguration" />
<strategies hint="list:AddStrategy">
You need to use of these default strategies:
RebuildAfterFullPublish
OnPublishEndAsync
More information about search, indexing and crawling you can find here:
https://doc.sitecore.net/sitecore_experience_platform/setting_up__maintaining/search_and_indexing

Solr search engine Updating document

I'm using solr search engine.I'm new to this. I want to update data automatically every time when my database getting update or new data created in the tables.I tried delta import and full import.In these method I have to do it manually when ever I need to update.
Which way is best for update solr document.?
How to make it automatically?
Thanks for your help.
There isn't a built in way to do this using Solr. I wouldn't recommend running a full or delta import when just updating one row in a table. What most Solr deployments do with a database is update the corresponding document when updating a row. This will be application specific, but this is the most efficient and standard way of dealing with this issue.
Using full or delta imports would be something that would run nightly or every few hours typically.
So, basically you want to process document before adding in solr.
This can be achieved by adding new update processor in update process chain you can go through : Solr split joined dates to multivalue field.
Here they split data in a field and saved it as multi valued field

Create tabels in Hibernate auto or manually?

Im currently developing a servlet homepage (spring + hibernate + mysql).
Im at the moment using the Hibernate property hibernate.hbm2ddl.auto set to update.
This is working fine and Hibernate creates and updates my tables.
However, Ive have read on multiple places that this is not recommended in production and that it is unsafe.
But if I dont put this option my tables is not created, and I really don't want to create my tabels manually on the server. I got limited time working on this alone.
How is this usually done? It's seems like it is quite much work to add all tables manually imo.
In production, you typically have already existing tables with a large amount of data that you don't want to lose, and that you want to migrate to the new schema. Hibernate can't do that automagically for you. It doesn't know that the data that was previously in column A must now be in the new column B.
So you'll need to create a migration script. Of course, you can use Hibernate to generate the new schema for you in development, see what the differences with the old schema are, and create your script thanks to that. But yes, having an app in production and migrate it needs some work to be done.

Best strategy to initially populate a Grails database backend

I'd like to know your approach/experiences when it's time to initially populate the Grails DB that will hold your app data. Assuming you have CSVs with data, is is "safer" to create a script (with whatever tool fits you) that:
1.-Generates the Bootstrap commands with the domain classes, run it in test or dev environment and then use the native db commands to export it to prod?
2.-Create the DB's insert script assuming GORM's version = 0 and incrementing manually the soon-to-be autogenerated IDs ?
My fear is that the second approach may lead to inconsistencies for hibernate will have the responsability for the IDs generation and there may be something else I'm missing.
Thanks in advance.
Take a look at this link. This allows you to run groovy scripts in the normal grails context giving you access to all grails features including GORM. I'm currently importing data from a legacy database and have found that writing a Groovy script using the Groovy SQL interface to pull out the data then putting that data in domain objects appears to be the easiest thing to do. Once you have the data imported you just use the commands specific to your database system to move that data to the production database.
Update:
Apparently the updated entry referenced from the blog entry I link to no longer exists. I was able to get this working using code at the following link which is also referenced in the comments.
http://pastie.org/180868
Finally it seems that the simplest solution is to consider that GORM as of the current release (1.2) uses a single sequence for all auto-generated ids. So considering this when creating whatever scripts you need (in the language of your preference) should suffice. I understand it's planned for 1.3 release that every table has its own sequence.

Resources