Add comments to solr document - solr

what would be the most native way, to add comments to documents in solr? I would like to add comments with some user_id, datetime and the actual comment.
Thanks a lot

depending on your needs, if you wanted to query on the comments (so sort of maintaining the 1 Doc ---> N comments relationship in a more DB-like way) you might want to use block join too. Be aware of its limitations though

If you just want to load them together with the document: Stringify the array of objects and store it in an additional field.
If you want to search the comments also you have to split the fields of the comments up and store the comment text in a searchable multivalued field.

Related

Solr schema design and performance

I have books database that has three entities: Books, pages and titles (titles found in a page). I have got confused and concerned about performance between two approaches in the schema design:
1- Dealing with books as documents i.e book field, pages field with multiValue and titles field with multiValue too. In this approach all of the book data will be represented in one Solr document with very large fields.
2- dealing with pages as documents which will lead in much smaller fields but larger number of documents.
I tried to look at this official resource but I could not able to find a clear answer for my question.
Assuming you are going to take Solr results and present them through another application, I would make the smallest item - Titles - the model for documents, which will make it much easier to present where a result appears. Doing it this way minimizes the amount of application code you need to write. If your users are querying Solr directly I might use Page as a my document instead - presumably you are using Solr's highlighting feature then to assist your users with identifying how their search term(s) matched.
For Title documents I would model the schema as follows:
Book ID + Page Number + Title [string - unique key]
Book ID [integer]
Book Name [tokenized text field]
Page Number [TrieIntField]
Title [tokenized text field]
Content for that book/title/page combination [tokenized text field]
There may be other attributes you want to capture, such as author, publication date, publisher, but you do not explain above what other information you have so I leave that out of this example.
Textual queries then can involve Book Name, Title and Content where you may want to define a single field that's indexed, but not stored, that serves as a target for <copyField/> declarations in your schema.xml to allow for easy searching over all three at the same time.
For indexing, without knowing more about the data being indexed, I would use the ICU Tokenizer and Snowball Porter Stemming Filter with a language specification on the text fields to handle non-English data - assuming all the books are in the same language. And if English, the Standard Tokenizer instead of ICU.

Can Solr use field values of a known document in a query?

I would like to perform a Solr search using the values of certain fields of an indexed document which I can identify by its id. With MLT this is somehow possible, but I would prefer a regular query parser. Can I somehow use subqueries to inject the result of a subquery into the main query?
For example, let's say I have indexed information about books into solr, where each document represents a book, with an id, title and author field. At query time I have only the document id availible and I would like to search for books by the same author in a single step. Is this possible without using MLT?
You can use JOIN.
http://HOST:PORT/CORE/select?q={!join from=author to=author}id:<ID>

solr - complex data structure

I have the following data structure for creating index.
user
userid
username
userstatus
friends
friendid
friendstatus
friendcreateddate
I think dynamic field wont work for me since I need to query based on specific field names.
I have search based on friendstatus and friendcreateddate. Can someone advise me on best possible document structure?
That is a very simple data structure. You just need to look at an example schema.xml and put your own field definitions in there. A field like "friends" would be declared as multiValued="true" and the userid would be tagged <uniqueKey>
Follow this guide http://wiki.apache.org/solr/SchemaXml
and ignore complicated stuff like dynamic fields which you probably don't need.

Solr - How to index on multiple entities?

I have two tables contacts and inventory. These two tables are not related. I want to index these two tables and search using Solr.
Is this possible?
If some part of your application needs to search for contacts, and another one needs to search in the inventory, create two separate indices. Storing wildly different data in the same index is almost never a good idea, it complicates things unnecessarily. As the Solr wiki wisely says:
The more heterogeneous (different
kinds of data) you have in one field
or in one index, the less useful it
is.
You don't need to have multiple Solr instances to accomodate multiple indices, you can easily manage this with multi-core.
I found a very helpful answer to this question here, including some guidance on using "multiple indexes" vs. "multiple document types in one index". The post also links to example code on github that I found very useful.
Yes, you can do that. Simply create a Solr schema, that contains all fields necessary for both tables and add another field, that contains the table name. During indexing, add the table name property to the fields you want to index. During searching also always include a query parameter for the table name field.
As an alternative, you can setup multiple instances of Solr. But you should do this only, if we are talking about massive amounts of data here (like millions of table rows).

Should the descriptive tags associated with an entity be stored in a separate database table?

I have a Questions model, and just like StackOverflow, each question can be tagged with multiple descriptive tags by a user.
What I'm trying to decide is whether it's necessary for the Tags associated with a question to be stored in a separate table in the database.
Or could I store the Tags as a single field of the Questions table as a list of space-separated strings?
I'm not sure which makes more sense - is there any good reason to separate the data?
Using a comma-separated string for a multi-valued attribute is another SQL Antipattern. :-)
How long does the string need to be? Stated another way: how many tags can a given entry have? (It depends on how long the individual tags are.)
How do you account for strings that contain the separator character? What if a character you currently use as a separator becomes a legitimate character in a tag?
How do you insert or delete elements from the list in SQL? (You have to fetch the whole list into the application, explode the list, filter through it, and re-post it to the database.)
How can you do aggregates like COUNT(*) in SQL?
How do you search efficiently for all entries that share a given tag? (You have to use costly pattern-matching queries.)
The solution is to use a separate table, as most other folks on this thread are advising.
Separating tags into their own table, plus a further table with a many:many relationship between Tags and Questions, is what's known in relational land ad "normal form". It makes it easier and faster to perform tasks such as getting all questions tagged with a certain tag, finding the most popular tags, &c.
(Just in case you don't know -- a "many:many relationship" is a table with just two columns [a foreign key into Tags and one into Questions] and no uniqueness constraints).
I would put the questions in 1 table, the tags in 1 table, and have a seperate table to connect the tags to questions. This would be the best way to build that database. It keeps all tags consistant and highly reduces redundency.
By seperating the data like this, your can assure that searching for a specific tag will bring back the same items. You don't have to worry about whether the tag is spelled the same throughout all the questions. Also, you can limit the tag options easier this way.
You should definitely store the tags in a separate table, it makes everything easier, and that's the whole idea of a 'relational' database.

Resources