Hi all is there any way to find frequency of a phrase in a document in solr.
i have a document like below
i need to find how many times the phrase is repeating in the document.
sample input:
1) "Text messaging, or texting, is the act of composing and sending electronic messages"
2) "35 U.S.C. § 271(e)(2)(A)"
output : count of the phrase in the document
`
{
"id":1,
"filecontent": "Text messaging, or texting, is the act of composing and sending electronic messages, typically consisting of alphabetic and numeric characters, between two or more users of mobile phones, tablets, desktops/laptops, or other devices. Text messages may be sent over a cellular network, or may also be sent via an Internet connection.
The term originally referred to messages sent using the Short Message Service (SMS). It has grown beyond alphanumeric text to include multimedia messages (known as MMS) containing digital images, videos, and sound content, as well as ideograms known as emoji (happy faces, sad faces, and other icons).
As of 2017, text messages ABC are used by youth and adults for personal, family and social purposes and in business. Governmental and non-governmental organizations use text messaging for communication between colleagues. As with emailing, in the 2010s, the sending of short informal messages has become an accepted part of many cultures.[1] This makes texting a quick and easy way to communicate with friends and colleagues, including in contexts where a call would be impolite or inappropriate (e.g., calling very late at night or when one knows the other person is busy with family or work activities). Like e-mail and voice mail, and unlike calls (in which the caller hopes to speak directly with the recipient), texting does not require the caller and recipient to both be free at the same moment; this permits communication even between busy individuals. Text messages can also be used to interact with automated systems, for example, to order products or services from e-commerce websites, or to participate 35 U.S.C. § 271(e)(2)(A) in online contests. Advertisers and service providers use direct text marketing to send messages to mobile users about promotions, payment due dates, and other notifications instead of using postal mail, email, or voicemail. by youth and adults for personal, by youth and adults for personal, by youth and adults for personal, messaging , textx"
}`
put debug=results at the end of solr url
it will give you the phrase freq also.
special thanks to matslindh
Related
For example the variable name %%=(#variableName)=%% would be used to select other variable name %%=(#anotherVariableName)=%% from the Options in the Distributed Marketing Multivariate. This AmpScript "%%=(#anotherVariableName)=%%" would then trigger a sentence or code snippet in the body of the email.
Ultimately, Im looking for ways that a Sales User in Salesforce Lightening can choose from a predetermined list of sentences and code blocks to build an email when using Quick Send. If there are any suggestions or examples anywhere I would be enternally grateful.
now I am building a knowledge graph of the Chinese stock and want to build a news recommendation system. And I want to use TransE algorithm for the entity embedding and relationship embedding. But I do not have the dataset and don't know clearly how to build a dataset using my own knowledge graph?
One start would be to use data from Wikidata. It has some information on Chinese companies (I suppose you are referring to companies listed on Chinese stock exchanges). For instance, https://www.wikidata.org/wiki/Q831445 displays information about Sinopec.
The data from Wikidata can be downloaded from the API, the large dumps files at https://dumps.wikimedia.org/wikidatawiki/ or the SPARQL endpoint at https://query.wikidata.org/.
You can get a list of companies listed on the Shenzhen Stock Exchange with the SPARQL query:
SELECT
?company ?companyLabel
?industry ?industryLabel
{
?company wdt:P414 wd:Q517750 .
OPTIONAL { ?company wdt:P452 ?industry }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,zh". }
}
The result is (also) available at https://w.wiki/9DM . This result can be extended by modifying the query and it can be downloaded in various formats. With the DESCRIBE SPARQL keyword you can get the triple format that may be useful for the TransE algorithm, e.g., DESCRIBE wd:Q831445 with the result at https://w.wiki/9DN .
It is possible to process the large dump files and make a knowledge graph embedding with Gensim's Word2Vec, see "Wembedder: Wikidata entity embedding web service" at https://arxiv.org/abs/1710.04099 . You can explore one result of this approach with the Wembedder webapp, e.g., https://tools.wmflabs.org/wembedder/most-similar/Q51747 displays the result of a "most similar" query in the knowledge graph embedding with Air China
In DevPost Watson Developer Challenge for Conversational Applications post, I saw Watson (maybe) able to analyze following phrase "I want to visit Tokyo, Sydney, Manchester, and Reykjavik during a trip that takes 30 days".
Is there a better way to extract those array of locations without having to predefine max no of location variables (i.e. set location1 - 5) and manually specify various grammar items like $ (Locations)={location1} * (Locations)={location2} * (Locations)={location3} * (Locations)={location4} as per Pizza example dialog? I would like to follow up with comment such as "That's a lot" if location > 4, or "Sure" if less.
You could try something like alchemy or relationship extraction to identify all of the languages, and then simply add them to the user profile in Dialog. But today, the best way to do this within a broader conversation will be to do it the same way the pizza sample does as you outlined above.
I am an iOS developer from Germany. Apple has deprecated the UDID access for normal AppStore Apps. So I am searching for a replacement, and finaly came over the DieID. I investigated a bit, but I was not able to figure out if this ID is unique. (The UDID is a Unique Device Identefier, it lets the developer to identify the user, beacause the UDID is unique for every device. The UDID is made from SHA1(serial + IMEI + wifiMac + bluetoothMac) or SHA1(serial + ECID + wifiMac + bluetoothMac). All of these parameters has been made unaccessible in iOS 7 and 8.
Does anyone know if the DieID is unique for every die? I already contacted Samsung, but they didn't replyed yet.
The die id should be unique : to identify a die, you need the lot number, the wafer number and the position on the wafer (maybe also the production line, if you don't have unique number for the wafers). This information allow to identify a die uniquely, and each chip has a different die id.
Nevertheless, I can't find any documentation on dieID for iOS. I can't guarantee that the dieID information is the full id of the die (so I can't guarantee uniqueness. I can only say "it should be").
There's a movie which name I can't remember. It's about a carnival or amusement park with a horror house and a bunch of teens who are murdered one by one by something with a clowns mask. I've seen this movie about 20 years ago, and it's sequel, but can't remember it exactly. (And also forgot it's title.) As a result, I started wondering about how to solve something technical.
Assume that I have a database with the story plot and other data of each and every movie published. (Something like the IMDb.) And I would have an edit field where a user can just enter a description in plain text. The system would then start analysing this text to find the movie(s) that would qualify to this description.
For example (different movie), I enter this in the edit field: "Some movie about an Egyptian king who attacks a bunch of indians on horseback, but he's badly wounded and his horse dies while he lost this battle."
The system should then report the movie "Alexander" from 2004 as answer, but possibly a few more. (Even allowing a few errors in the description.)
To create such a system where a description gets analysed to find a matching record by searching through descriptions, what techniques should I need for something as complex as that? Not that I want to build something like that right now, but more out of curiosity if I ever want to pick up some interesting new project.
(I wanted to award extra points for those who recognise the movie I've mentioned in the beginning. But one Google-attempt later and I found it myself!)
Btw, it's not the search engine itself that interests me, but analysing the description to get to something a search engine will understand! With the example movie, it's human logic that helped me to find the title. (And it's annoying that this movie isn't for sale in the Netherlands.) Human logic will always be a requirement but it's about analysing the user input, which is in the form of a story or description, with possible errors.
You should check out document classification.
A few document classification techniques
Naive Bayes classifier
tf–idf
For what I can tell by your own comments, Google is the technique to be used. ;-) But, honestly, I think more or less any search engine would do.
Edit: heh, you removed your comment, but I do remember you mentioned Google as the one deserving extra points.
Edit+: well, you mentioned Google again, but I don't want to remove my first edit. ;-)
Pure speculation: Would something trivial such as taking every word of more than 4 letters in the description "Egyptian, Indian, horse battle etc." and fuzzy matching against a database of such summaries work? Perhaps with some normalisation eg. king == leader == emperor?
Hmmm ... Young Man, Girlfriend, swimming pool, mother, wedding does that get us to The Graduate? Well I guess with a small amount of specifics "Robinson" it might.
You can do lots of interesting stuff with the imdb keyword search:
http://akas.imdb.com/keyword/carnival/clown/murder/
You can specify multiple keywords, it suggests movies and more keywords which are in similar context with your given keywords.
The data contained in imdb is publicy available for non-commercial use and can be downloaded as text files. You could build a database from it.