Is it possible to make that kind of query in solr - solr

I've get a column in solr - type string which is with values like 'JOHN JACKON', 'JAKE SMITH', 'JOHNATAN JAMESON'
IS it possible to tell solr when I type J to get first these record which has J more times than the other.

You may use solr.EdgeNGramFilterFactory. You can set minGramSize to 1.
This FilterFactory is very useful in matching prefix substrings (or suffix substrings if side="back") of particular terms in the index during query time.
Reference: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
So for your examples above,
for JOHN JACKSON, it will store:
J, JO, JOH, JOHN, J, JA, JAC, JACK, JACKS, JACKSON
and for JAKE SMITH:
j, JA, JAK, JAKE, S, SM, SMI, SMIT, SMITH
Now when someone searches for J, first document(john jackson) will get higher score, because J is twice in the index.

Related

Extraction of matched dataset from MatchThem

I have browsed almost all possible pages on the subject and I still can't find a way to extract a matched data dataset with the MatchThem package.
By analogy, MatchIt allows via the function match.data() to extract the dataset of matched data for example 3:1. Although MatchThem's complete() function is the equivalent, this function apparently does not allow to extract exclusively the imputed AND matched dataset.
Here is an example of multiple imputation with 3:1 matching from which I am trying to extract multiple matched datasets:
library(mice)
library(MatchThem)
#Multiple imputations
mids_object <- mice(data, maxit = 5, m=3, seed= 20211022, printFlag = F) # m=3 is voluntarily low for this example.
#Matching
mimids_object <- matchthem(primary_subtype ~ age + bmi + ps, data = mids_object, approach = "within" ,ratio= 3, method = "optimal")
#Details of matched data
print(mimids_object)
Printing | dataset: #1
A matchit object
method: Variable ratio 3:1 optimal pair matching
distance: Propensity score
- estimated with logistic regression
number of obs: 761 (original), 177 (matched)
target estimand: ATT
covariates: age, bmi, ps
#Extracting matched dataset
complete(mimids_object, action = "long") -> complete_mi_matched
#Summary of extracted dataset to check correct number of match
summary(complete_mi_matched$primary_subtype)
classic ADK SRC
702 59
It should show the matched proportion 3:1 with 177 matched (177 classic ADK and 59 SRC)
I am missing something. Thanks in advance for your help or suggestions.

Latex - labeling tables while using foreach loop

I have a loop in latex, where I access 5 different tables to include in my document. The look has two elements - one variable indicating short name of the category (\n which can be A, O, I, R or H) and the variable that has the long name (\m, which can be "Apartment", "Office", etc).
This loop works as intended for caption and for input. But it does not work for "\label". In other words, the loop produces 5 tables, pulling the right files each time. It puts correct caption on these tables (Apartment, Office, etc), but \label does not get populated correctly. It produces only one label as "output_reg\n" instead of 5 labels as "output_reg_A", "output_reg_O", etc.
I would appreciate all the help I can get!
\documentclass{article}
\usepackage{tikz}
\begin{document}
\foreach \n\m in {A/Apartments,O/Office,R/Retail,I/Industrial,H/Hotel}
{ \begin{table}
\small
\centering
\caption{Regression results \n - \m } \label{output_reg_\n}
\begin{tabular}{ccccc}
a & a & \\
a & a &
\end{tabular}
\end{table}
}
content
% I want to be able to reference the tables as \ref{output_reg_A} and \ref{output_reg_O and so on.
\end{document}

Is there a more effective way to combine 24 columns into a single column as an array in R

I have a code below that works to take 24 columns (hours) of data and combine it into a single column array for each row in a dataframe:
# Adds all of the values into column twentyfourhours with "," as the separator.
agg_bluetooth_data$twentyfourhours <- paste(agg_bluetooth_data[,1],
agg_bluetooth_data[,2], agg_bluetooth_data[,3], agg_bluetooth_data[,4],
agg_bluetooth_data[,5], agg_bluetooth_data[,6], agg_bluetooth_data[,7],
agg_bluetooth_data[,8], agg_bluetooth_data[,9], agg_bluetooth_data[,10],
agg_bluetooth_data[,11], agg_bluetooth_data[,12], agg_bluetooth_data[,13],
agg_bluetooth_data[,14], agg_bluetooth_data[,15], agg_bluetooth_data[,16],
agg_bluetooth_data[,17], agg_bluetooth_data[,18], agg_bluetooth_data[,19],
agg_bluetooth_data[,20], agg_bluetooth_data[,21], agg_bluetooth_data[,22],
agg_bluetooth_data[,23], agg_bluetooth_data[,24], sep=",")
However, after this I still have to write more lines of code to remove spaces, add brackets around it, and delete the columns. None of this is difficult to do, but I feel like there should be a shorter/cleaner code to use to get the results I am looking for. Does anyone have any suggestions?
There is a built-in function to do rowSums. It looks like you want an analogous rowPaste function. We can do this with apply:
# create example dataset
df <- data.frame(
v=1:10,
x=letters[1:10],
y=letters[6:15],
z=letters[11:20],
stringsAsFactors = FALSE
)
# rowPaste columns 2 through 4
apply(df[, 2:4], 1, paste, collapse=",")
Another option, using #Dan Y's data (might be helpful if you posted a subset of your data using dput though).
library(tidyr)
library(dplyr)
df %>%
unite('new_col', v, x, y, z, sep = ',')
new_col
1 1,a,f,k
2 2,b,g,l
3 3,c,h,m
4 4,d,i,n
5 5,e,j,o
6 6,f,k,p
7 7,g,l,q
8 8,h,m,r
9 9,i,n,s
10 10,j,o,t
You can then perform the neccessary edits with mutate. There's also a fair amount of flexibility in the column selections within the unite call. Check out the "Useful Functions" section of the select documentation.

Limiting amount of common nodes in a relationship - Neo4j

Say I have a nodes person and movie with a relation of [likes]. I am trying to be able to limit the amount of persons that like the same movie without limiting my results to only include 1 movie.
For example:
MATCH (p:Person)-[LIKES]->(m:Movie)
WHERE
p.age < 30
p.gender = "Male"
RETURN p,m
So in the query above I would like to get all the results but filter them so that only 2 Persons will like the same movie.
Is this possible?
This knowledge base article goes over different ways to limit match results per row.
For a non-APOC approach you can get the slice of the collection of people who liked the movie:
MATCH (p:Person)-[:LIKES]->(m:Movie)
WHERE
p.age < 30
p.gender = "Male"
RETURN m, collect(p)[..2] as peopleWhoLiked
If you want a separate row per person, then UNWIND the peopleWhoLiked list before the return.
For the second approach, you'll need APOC Procedures.
In order to use LIMIT, you'll need to first match on all movies, then perform the limited match to :Person nodes using apoc.cypher.run().
To get all movies that have exactly two (under-30 male) likers:
MATCH (p:Person)-[:LIKES]->(m:Movie)
WHERE
p.age < 30 AND
p.gender = "Male"
WITH m, COLLECT(p) AS likers
WHERE SIZE(likers) = 2
RETURN m, likers;

Solr, multivalued field: how can I return documents where ALL values in the field are contained within a set?

For example, if I have these 2 Documents:
id: 1
multifield: 2, 5
id: 2
multifield: 2, 5, 9
Then say I have a set that I'm querying with, which is {2, 5, 7}. What I would want is document 1 returned because 2 and 5 are both contained in the set. But document 2 should not be returned because 9 is not in the set.
Both the multivalued field and my set are of arbitrary length. Hopefully that makes sense.
Figured this out. This was the inspiration, specifically the answer suggesting to use Function Queries.
Using the same data in the question, I will add a calculated field to my documents which contains the number of values in my multivalued field.
id: 1
multifield: 2, 5
nummultifield: 2
id: 2
multifield: 2, 5, 9
nummultifield: 3
Then I'll use an frange with some function queries. For each item in my set, I'll use the termfreq function which will return 1 or 0. I will then sum up all of these values. Finally, if that sum equals the calculated field nummultifield, then I know that for that document, every value in the document is present in the set. Remember my set is 2,5,7 so my function query will look something like this:
fq={!frange l=0 u=0}sub( nummultifield, sum( termfreq(multifield,2), termfreq(multifield,5), termfreq(multifield,7)))
If we fill in the values for Document 1 and 2, it will look like this:
Document 1: sub( 2, sum( 1,1,0 ) ) = 0 ' in my range of {0,0} so Doc 1 is returned
Document 2: sub( 3, sum( 1,1,0 ) ) = 1 ' not in the range of {0,0} so not returned
I've tested it out and it works great. You need to make sure you don't duplicate any values in multifield or you'll get weird results. Incidentally, this trick of using frange could be used whenever you want to fake a boolean result from one or more function queries.
Faceting may be the what you are looking for.
http://wiki.apache.org/solr/SolrFacetingOverview
http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr
how to search for more than one facet in solr?
I adapted this from the Lucid Imagination link.
Choose all documents that have values 2 or 5 or 7:
http://localhost:8983/solr/select?q=*
&facet=on
&facet.field=multifield
&fq=multifield:2
&fq=multifield:5
&fq=multifield:7
Incomplete: I dont know any options to exclude all other values.

Resources