Scopus DOIs not working in article retrieval APIs - doi

I’m trying to use the Elsapy module to extract the abstracts of documents on certain topics.
I am able to do this but, unfortunately, only for a fraction of the documents found.
For example, a particular search returns 16 documents but I am only able to extract the information (e.g. abstracts) from 4 of them.
Upon further inspection, it seems that for the documents I can’t get the abstracts of:
-Don’t have a PII
-And have DOIs that don’t work.
I have tested the DOIs in the article retrieval interactive API guide
-The ones that returned abstracts worked fine
-The other ones return the error:
RESOURCE_NOT_FOUNDThe resource specified cannot be found.
Even though I have found the original articles and checked their DOI is correct.
An example of one that didn’t work is:
Sengupta, N. K., & Sibley, C. G. (2019). The political attitudes and subjective wellbeing of the one percent. Journal of Happiness Studies, 20(7), 2125-2140. doi:10.1007/s10902-018-0038-4
I have found that the ones that do ‘work’ all have the general form:
10.1016/j.ssmph.2019.100471
10.1016/j.apacoust.2015.03.004
Please let me know if you know why this is and how I can fix it.
Thanks for your help :)

The Article Retrieval API works for Elsevier content hosted on sciencedirect.com; all Elsevier articles have PII identifiers. The example DOI 10.1007/s10902-018-0038-4 does not work because it is published by Springer and, consequently, not available on ScienceDirect.
Kindly note that this is not a bug and everything is working as expected.

Related

Reference in B2C_1A_TrustFrameworkExtensions missing in Identity Experience Framework examples

I'm getting an error when uploading my customized policy, which is based on Microsoft's SocialAccounts example ([tenant] is a placeholder I added):
Policy "B2C_1A_TrustFrameworkExtensions" of tenant "[tenant].onmicrosoft.com" makes a reference to ClaimType with id "client_id" but neither the policy nor any of its base policies contain such an element
I've done some customization to the file, including adding local account signon, but comparing copies of TrustFrameworkExtensions.xml in the examples, I can't see where this element is defined. It is not defined in TrustFrameworkBase.xml, which is where I would expect it.
I figured it out, although it doesn't make sense to me. Hopefully this helps someone else running into the same issue.
The TrustFrameworkBase.xml is not the same in each scenario. When Microsoft documentation said not to modify it, I assumed that meant the "base" was always the same. The implication of this design is: If you try to mix and match between scenarios then you also need to find the supporting pieces in the TrustFrameworkBase.xml and move them into your extensions document. It also means if Microsoft does provide an update to their reference policies and you want to update, you need to remember which one you implemented originally and potentially which other ones you had to pull from or do line-by-line comparison. Not end of the world, but also not how I'd design an inheritance structure.
This also explains why I had to work through previous validation errors, including missing <DisplayName> and <Protocol> elements in the <TechnicalProfile> element.
Yes - I agree that is a problem.
My suggestion is always to use the "SocialAndLocalAccountsWithMfa" scenario as the sample.
That way you will always have the correct attributes and you know which one to use if there is an update.
It's easy enough to comment out the MFA stuff in the user journeys if you don't want it.
There is one exception. If you want to use "username" instead of "email", the reads/writes etc. are only in the username sample.

Custom Searcher - Blending of hits from different sources

We have a need for "Blending of hits from different sources", as per your documentation it is recommended to write a custom-searcher in JAVA. Is there a demo of this written somewhere on Github ? I wouldn't even know where to start :( I understand I can create search "chains" , preferably Asynchronous, and then blend results in JAVA before returning them...but then how would I handle paginations, limits...etc ? This all seems very complicated, for someone who doesn't even know JAVA that much. So, I am hoping someone has already written a demo for this ? Please ? Anyone ?
Thank you so much
EDIT to make my quesion clearer:
We are writing a search engine that fetches data from various websites. Some websites have 10mil indexable items, other websites only 100,000. When we present the results to end user, we want to include results from all our sources ( when match applies ). Let's say 10 results from each of the websites we crawl, so that they all get equal amount of attention on page. If we don't do custom blending, what happens is that the largest website with most items wins all our traffic.
I understand that we can send 10 separate queries to VESPA, and blend the results in our front end, but that seems very inefficient. Thus, the quesion of "Custome Searcher". Thank you so much !
That documentation covers some very advanced use cases which you do not have. Are your sources different Vespa schemas or content clusters? If so Vespa will by default blend the hits returned from each according to their relevance scores so there's nothing you need to do.
The two other most common use-cases are:
Some (or all) the data sources are external, so you need to write a Searcher component to fetch the external data and turn it into a Result.
You want the data to be blended in some custom way (rather than by relevance score). If so you need to exclude the default blending Searcher (com.yahoo.prelude.searcher.BlendingSearcher) and write your own.
If you provide some more information about your use cases I can give you some code examples.
EDIT: Use grouping to solve the need explained under "EDIT" in the question:
Create a "siteid" field when feeding (e.g in document processing).
Use the grouping expression all(group(siteid) each(max(10) output(summary())))
See http://docs.vespa.ai/documentation/grouping.html

Where does React's `scryRenderedDOMComponentsWithClass` method name come from?

Working on testing a React component, I was reading the docs and found scryRenderedDOMComponentsWithClass. I'm having trouble understanding the function of this component because it's unpronounceable, so I don't understand how it's naming maps to a mental model of what it's doing. (There are a number of related names, such as scryRenderedDOMComponentsWithTag.)
What does the scry part of this method name refer to? Scary? Scurry? What concept is this name trying to illustrate?
Short answer
"Scry" in this context just means "find all". See this comment on ReactTestUtils.scryRenderedComponentsWithClass. It's a single word, not an abbreviation, and it's pronounced like "cry" but with an "s" at the beginning.
Longer (and nerdier) answer
Elsewhere in that same file, you'll see a reference to DOM.scry:
/**
* Todo: Support the entire DOM.scry query syntax. For now, these simple
* utilities will suffice for testing purposes.
* #lends ReactTestUtils
*/
zpao explains in a comment on a GitHub issue:
That's a reference to an internal Facebook module. It's basically querySelectorAll with fallback behavior for handling old browsers and special cases. It is pretty unremarkable and doesn't actually translate super well here (except maybe a scryRenderedDOMComponentsWithQSA or something, but meh). We're working on improving the testing in other ways so I don't think there's anything we really want to do with this right now.
jimfb takes it a bit further in another GitHub issue, explaining that the name is a reference to Dungeons & Dragons:
Back in the day, we had a bunch of D&D fans on the team.
For reference:
http://www.dandwiki.com/wiki/SRD:Scrying
http://www.dandwiki.com/wiki/SRD3e:Scry_Skill
https://en.wikipedia.org/wiki/Scrying
Historically, we've used scry to indicate a helper that finds a set of results. As the framework matures, we should start choosing function names based on what the functions actually do instead of fantasy words that have very little meaning to the typical developer.
Though I would agree that the word has very little meaning to most, it's worth noting that "scry" is a real English word:
scry
[skrahy]
verb (used without object), scried, scrying.
to use divination to discover hidden knowledge or future events, especially by means of a crystal ball.
Interestingly, according to the data from Google's Ngram Viewer, it seems that the word fell out of normal usage in the early 19th century and then wallowed in obscurity until the 1980s, presumably after D&D gained popularity:
So I can't say I object to jimfb calling it a "fantasy word", especially considering the kind of imagery my imagination conjures up when I hear it.

How to perform Geo Spatial search with django-haystack + solr

I'm currently using django haystack with xapian. I couldn't find any documentation on how to perform geospatial queries on xapian. But there seems to be some momentum on Solr. So i'm currently experimenting with that.
I couldn't get spatialSolr to work properly on local, but for now working with spatial-solr-light, which seems to work fine. It accepts queries like
http://127.0.0.1:8080/solr/select/?q=blahblah&spatial={!radius=1.0%20sort=true}lat:10.0,lng:-10.0
Can anyony point me to a patch for haystack that allows me to pass custom queries like that. I could use raw_search(), but i can't chain the resuts. In any case i would like to find a cleaner way to do something like
sqs.spatial(....)
There are some patches from other people mentioned on the google group(links below), but most of them are unreachable.
References:
https://github.com/fizx/solr-spatial-light
http://groups.google.com/group/django-haystack/browse_thread/thread/d0e23d45c0baa300/2298b6cf43389e18?lnk=gst&q=Spatial#2298b6cf43389e18
http://groups.google.com/group/django-haystack/browse_thread/thread/f88d625679941d77/420892adac151a64
http://groups.google.com/group/django-haystack/browse_thread/thread/e3a70112ce944b00/33bd673fbaaed0a7?lnk=gst&q=jteam#33bd673fbaaed0a7
If you're not tied to Xapian, look at Django, Sphinx and search by distance. I had a similar problem when I ran across this question and this seems to solve it. Thanks to django-sphinx, it's about as easy to set up as Haystack. Sphinx also seems to offer more flexibility.
Here's a fork of django haystack that adds in support for :
https://github.com/sidmitra/django-haystack-spatialsolrplugin
And corresponding notes are here:
https://github.com/sidmitra/django-haystack-spatialsolrplugin/wiki/_pages
Sidmitra, I made port of your solution using haystack 1.2.X and solr 3.4. With some limitations to be frank - no support for schema generation at the moment, only LatLong geo type supported, sorting by distance is not perfect (but works)
https://github.com/frutik/django-haystack/tree/1.2.X
I agree with https://github.com/sidmitra/django-haystack-spatialsolrplugin .
It seems to be out-of-date, but I could beat it into shape with some work. Issues I had:
Hard to find the java SSP and when I found it it was the wrong version. http://www.dutchworks.nl/en/home/download.html was the link that worked for me.
The classpaths in the example xml files I found on the net were all wrong; I had to remove .solrext. from all of them.
The plugin was very picky about which directory it lived in; it couldn't talk to anything else until it was happily in solr/lib
solr_backend.py required the following patch (around line 505):
if self.spatial_query:
final_query = '{{!spatial circles={lat},{long},{radius} }}{0}'.format(final_query,**self.spatial_query)
I had further issues with making the solrconfig.xml so that GeoDistanceComponent never loaded before the query had a valid rsp.
In other words, you can certainly make it work, but you have to be able to deal with a number of error messages in both python and java before you get there.

How to automatically excerpt user generated content?

I run a website that allows users to write blog-post, I would really like to summarize the written content and use it to fill the <meta name="description".../>-tag for example.
What methods can I employ to automatically summarize/describe the contents of user generated content?
Are there any (preferably free) methods out there that have solved this problem?
(I've seen other websites just copy the first 100 or so words but this strikes me as a sub-optimal solution.)
Think of the task of summarization as a challenge to 'select the most important sentences' from the document.
The method described in The Automatic Creation of Literature Abstracts by H.P. Luhn (1958) describes a naive method that actually performs quite well. Try giving it a shot.
If your website is in Python coding this algorithm using the NLTK (Natural Language Toolkit) is a fun task.
Make it predictable.
From a users perspective simply using the first paragraph is not bad at all.
Using any automation is bound to fall flat in some cases. So I suggest to display
the first paragraph (maybe truncating at some point) as a summary and offer the ability to override that by an optional field.
I might try using mechanical Turk or any number of other crowdsourcing options.
Another item to check out, a SourceForge project, AutoSummary Semantic Analysis Engine
Not a trivial task... You should look for articles or books on "extractive summarization"
A few starters could be:
Books:
Natural Language Processing with Python
Foundations of Statistical Natural Language Processing
Articles:
Language independent extractive summarization
Extractive summarization: how to identify the gist of a text
Extractive Summarization using Inter- and Intra- Event Relevance
Yahoo has a free API for this:
http://developer.yahoo.com/search/content/V1/termExtraction.html
Apple's patent 6424362 - Auto-summary of document content contains sample code which might be useful...
This borders on artificial intelligence so there's not going to be an "easy" solution out there, but there are products that target this problem.
Check out Copernic Summarizer, for one.
Noun phrases typically tend to be important elements of a sentence. Picking sentence(s) with a high density of noun phrases could yield a good summary. You could get noun phrases using a POS tagger.
For a good summary, it is desirable that it is a meaningful sentence. Reading a broken sentence is slightly jarring.
Alternatively, when the author posts the article, the author can highlight what are the keywords that can be used in the description which can then be automatically put in the meta description tag.

Resources