Database search with Lucene.net and how to update the index - sql-server

I have a SQL server database, with about 40 tables that need to be searched. I just started looking into Lucene for .net. These tables that need to be searched doesn't have any column that identifies when the row was last updated or created. We don't want to change the table structure right now. What are the options I have to identify if a row in a table has modified so that I can update the document in the Lucene index? And same for newly created rows too. Any help is greatly appreciated.

If you can't tell what has changed by looking at the database, then just assume all of the rows have changed and update them all in Lucene. That handles your new rows as well.
If this is too slow or time consuming, then that gives you a reason why you should change your table structure to store the last updated date.

Related

How do I automatically populate a new table in MS-Access with info from an existing table?

I am very new to MS Access and I am struggling with some things that seem like they should be the most basic. I have imported a table of data from Excel and have defined the data types for the fields. I have no problem there, but now I want to make a new table that has as a primary key one of the fields from the imported table. It looks like I can manually create this table, set the relationship, and then go back and type in each record associated with the new primary key, but this seems completely ridiculous. Surely there must be a way to automatically create one record for each unique instance in the matching field from the original table. Yet, I've scrolled through hundreds of pages of Access tutorials and Googled the question and found no satisfactory guidance.
Do I completely misunderstand what Access is all about? How do I create a new table with entries from a field on an existing table? What am I missing?
You don't specify which version of Access you are using, the suggestions listed below apply to 2010, but should be similar is other versions.
You can create new tables from existing tables using either a 'Make Table' option after selecting 'Create' -> 'Query Design', or you can manually create your table first, then use an 'Append' query.
Without knowing the design of your table it's hard to get more descriptive.
Are you populating your new table's primary key ahead of time, or relying on Auto Number to do it (preferred method)?

How do SharePoint index column work?

I have been trying to find out what happens in SharePoint when we select a column as index column? I mean what happens in back end(database level)? I tried many articles but they just say that it improves performance but they don't say how. I found this question and others related to performance but I couldn't get a complete picture.
SharePoint mark the column index enabled, and then add records with item identifier and index field id and value to sql table NameValuePair_[loc].

Solr generate key

I'm working with solr and indexing data from DB.
When I import the data using SQL query, I got some rows with the same key.
I need a way that solr will generate a new field with unique key.
How can I do that?
Thanks
I am not sure if this is possible or not, but maybe you need to re-consider your logic here...
Indexing operation into Solr should be Re-Runable. So, imagine that you come one day and decide to change the schema of your core.
If you generate a new key everytime you import a document, you will end up creating duplicate items when you re-run your data import.
Maybe you need to revisit your DB design to have a unique key, or maybe in the select query, you can create a derived or calculated column value that is calculated based on multiple columns. But I am sure that pushing this problem to solr is not the solution.
ideally the unique key should come from the db (are you sure you cannot get one, by composing some columns etc?).
But, if you cannot, Solr supports UUID generation for this, look here to see how it works depending on your solr version

Updating solr index with deleted records

I was trying to figure out how to update the index for the deleted records. I'm indexing from the database. I search for documents in the database, put them in an array and index them by creating a SolrInputDocument.
So, I couldn't figure out how to update the index for the deleted records (because they don't exist in the database now).
I'm using the php-solr-pecl extension.
You need to handle the deletion of the documents separately from Solr.
Solr won't handle it for you.
In case of Incremental, You need to maintain the Documents deleted from the Database and then fire a delete query for the same to clean up the index.
For this you have to maintain a timestamp and delete flag to identify the documents.
In case of the Full, you can just clean up the index and reindex all.
However, in case of failures you may loose all the data.
Solr DIH provides a bit of handling for the same
create a delete trigger on the database table which will insert the deleted record id in another table.(or have boolean field "deleted" and mark the record instead of actually deleting it, considering the trade-offs I would choose the trigger)
Once in a while do a batch delete on index based on the "deleted" table, also removing them from the table itself.
We faced the same issue and came up with batch deletion approach.
We created a program that will delete the document from SOLR based on the uniqueid, if the unique id is present in SOLR but not in database you can delete that document from SOLR.
(Get the uniqueid list from SOLR) minus (uniqueid list from database)
You can just use SQL minus to get the list of uniqueid belonging to the documents that needs to be deleted.
Else you can do everything in JAVA side. Get the list from database, get the list from solr.. Do a comparison between the 2 list and delete based on that..This would be lost faster for huge number of documents. You can use binary search method to do the comparison..
Something like
Collections.binarySearch(DatabaseUniqueidArray, "SOLRuniqueid");

Change tracking -- simplest scenario

I am coding in ASP.NET C# 4. The database is SQL Server 2012.
I have a table that has 2000 rows and 10 columns. I want to load this table in memory and if the table is updated/inserted in any way, I want to refresh the in-memory copy from the DB.
I looked into SQL Server Change Tracking, and while it does what I need, it appears I have to write quite a bit of code to select from the change functions -- more coding than I want to do for a simple scenario that I have.
What is the best (simplest) solution for this problem? Do I go with CacheDependency?
I currently have a similar problem: I'm implementing a rest service that returns a table with 50+ columns and I want to cache the data on the client to reduce trafic.
I'm thinking about this implementation:
All my tables have the fields
ID AutoIncrement (primary key)
Version RowVersion (a numeric value that will be incremented
every time the record is updated)
To calculate a "fingerprint" of the table I use the select
select count(*), max(id), sum(version) from ...
Deleting records changes the first value, inserting the second value and updating the third value.
So if one of the three values changes, i have to reload the table.

Resources