How to index the landing page of many URLs using Solr? - solr

I have a list of thousands of Web URLs (originated from bookmarks). I have a need to be able to search the landing page of these URLs. I don't need the web crawler function but I need the de-duplication function.
I'm new to solr and try to figure out the simplest way to create the index. I'm thinking about using SimplePostTool that accepts multi URLs as its parameters. However, I have difficulty understanding how to use the de-duplication with this SimplePostTool.
Is there any other method to do this indexing?
Appreciate for any hints.
Thanks.

Related

Determine last page in Solr using cursorMark

I am building an API that allows for searching on a number of fields in solr with pagination. It returns cursorMark from the solr results in the API response for users to use in their next query.
I'm following the Google AIP for pagination which requires returning a blank string when there are no more results left however Solr will keep returning the same cursor.
I'm looking for a simple solution that doesn't involve keeping state on the server or needing to make multiple requests. If I knew the position of the page in the total list of results I could determine if the page is the last, but I'm not able to access this.
Is there a simple way to achieve this?

Indexing internal Angular site with Elastic Search

I have an intranet site built on AngularJS 1.x.
The problem I'm trying to solve is to:
1. Index all of the pages in this site
2. Provide search using ElasticSearch
The issues I'm running into is:
I'm not sure how to produce this index to feed into ElasticSearch
I tried CURL but CURL does not interpret the AngularJS program and therefore does not see the content on the partial HTML pages.
Guidance very much appreciated.

How to add a powerful search functionality to my web app

I am building an app to browse events. The front end uses angular framework and the backend uses laravel.
How do I add a powerful search functianlity, wherein the user enter his query using an input element and I pass the same to laravel controller.
I now need to return relevant events based on the query.
As of now I am using a very basic algorithm - each word in a query is pushed into an array. Article's are discarded. Based on the length of the array, I try to match each words to some fileds in the table and return unique events.
Is there a better, faster and more efficient way of doing this.
You could use laravel-search
It has support for a few advanced search methods.
I recommend you to use Elastic search for your goals. You can find tutorial here: http://www.fullstackstanley.com/read/simple-search-with-laravel-and-elasticsearch. Also you can use good wrapper https://github.com/mmanos/laravel-search for elastic and another search engines.
Well you added the tag elasticSearch so I guess that's what you are interested in. ElasticSearch has no authentication system so the best way is to create a controller in Laravel that comunicates with elasticSearch a returns the JSON response.
A basic setup will not be hard to do, anything beyond that will need you to learn more advanced concepts of ElasticSearch.
ElasticSearch will handle the search algorithm for you, you just need to pass the value the user is searching and that's it.

angularjs sitemap SEO

I don't see any updated answer on similar topics (hopefully something has changed with last crawl releases), that's why I come up with a specific question.
I have an AngularJS website, which lists products that can be added or removed (the links are clearly updated). URLs have the following format:
http://example.com/#/product/564b9fd3010000bf091e0bf7/published
http://example.com/#/product/6937219vfeg9920gd903bg03/published
The product's ID (6937219vfeg9920gd903bg03) is retrieved by our back-end.
My problem is that Google doesn't list them, probably because I don't have a sitemap.xml file in my server..
In a day a page can be added (therefore a new url to add) or removed..
How can I manage this?
Do I have to manually (or by batch) edit the file each time?
Is there a smart way to tell Google: "Hey my friend, look at this page"?!
Generally you can create a JavaScript - AngularJS sitemap, and according to this guidance from google :
https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html
They Will crawl it.
you can also use Fetch as Google To validate that the pages rendered correctly
.
There is another study about google execution of JavaScript,
http://searchengineland.com/tested-googlebot-crawls-javascript-heres-learned-220157

Angular and Symfony2 - Different Templates for Search Engines

I want to implement a Search-Tool with AngularJS on a Site in Symfony. The Angular-App is only part of this single Template. BUT the results of the search has also crawled from Search Engines as Google, Yahoo and so on. The Idea i have was to build a switch in a Action of Symfony. If the action detects that the request is from a Search Engine it gets a Template that have a regular search index with first results, without the angular-part.
The Question is, may this cause Problems?

Resources