Google Streetview API returns irrelevant images - google-street-view

I would like to incorporate street level images in an Android app. My first port of call has been the Google Streetview API. Much to my surprise, the images returned often lack relevancy. Here are a couple off examples
https://maps.googleapis.com/maps/api/streetview?size=480x300&location=49.602363,6.133369&key=YOUR-API-KEY
which corresponds to the street address: 45, avenue de la Gare, Luxembourg
which returns what appears to be the inside of a bedroom in a hotel.
https://maps.googleapis.com/maps/api/streetview?size=480x300&location=49.847802,6.098523&key=YOUR-API-KEY
which returns the image of two people walking near a field. Contrary to what I thought was standard Google policy the people are clearly recognizable.
In both cases I notice that the images are not by Google.
Is there anything that can be done to limit the images by source so I only deliver imagery taken by Google which I assume is more reliable? Alternately, are there other sources of street level imagery? I have looked at Mapillary but found the imagery lacking and at Here maps which appears to have an overly complicated API

Related

Crawling and scraping random websites

I want to build a a webcrawler that goes randomly around the internet and puts broken (http statuscode 4xx) image links into a database.
So far I successfully build a scraper using the node packages request and cheerio. I understand the limitations are websites that dynamically create content, so I'm thinking to switch to puppeteer. Making this as fast as possible would be nice, but is not necessary as the server should run indefinetely.
My biggest question: Where do I start to crawl?
I want the crawler to find random webpages recursively, that likely have content and might have broken links. Can someone help to find a smart approach to this problem?
List of Domains
In general, the following services provide lists of domain names:
Alexa Top 1 Million: top-1m.csv.zip (free)
CSV file containing 1 million rows with the most visited websites according to Alexas algorithms
Verisign: Top-Level Domain Zone File Information (free IIRC)
You can ask Verisign directly via the linked page to give you their list of .com and .net domains. You have to fill out a form to request the data. If I recall correctly, the list is given free of charge for research purposes (maybe also for other reasons), but it might take several weeks until you get the approval.
whoisxmlapi.com: All Registered Domains (requires payment)
The company sells all kind of lists containing information regarding domain names, registrars, IPs, etc.
premiumdrops.com: Domain Zone lists (requires payment)
Similar to the previous one, you can get lists of different domain TLDs.
Crawling Approach
In general, I would assume that the older a website, the more likely it might be that it contains broken images (but that is already a bold assumption in itself). So, you could try to crawl older websites first if you use a list that contains the date when the domain was registered. In addition, you can speed up the crawling process by using multiple instances of puppeteer.
To give you a rough idea of the crawling speed: Let's say your server can crawl 5 websites per second (which requires 10-20 parallel browser instances assuming 2-4 seconds per page), you would need roughly two days for 1 million pages (1,000,000 / 5 / 60 / 60 / 24 = 2.3).
I don't know if that's what you're looking for, but this website renders a new random website whenever you click the New Random Website button, it might be useful if you could scrape it with puppeteer.
I recently had this question myself and was able to solve it with the help of this post. To clarify what other people have said previously, you can get lists of websites from various sources. Thomas Dondorf's suggestion to use Verisign's TLD zone file information is currently outdated, as I learned when I tried contacting them. Instead, you should look at ICANN's CZDNS. This website allows you to access TLD file information (by request) for any name, not just .com and .net, allowing you to potentially crawl more websites. In terms of crawling, as you said, Puppeteer would be a great choice.

How can I analyse instagram pictures?

I ve the task to crowd and analyse a set of instagram users's posts, including texts and pictures. I don t need to know the user identity.
In particular, I ve to use the images to train some machine learning classifier/regressor. So that I need to temporaly store the picture to extract visual features.
Reading the instagram API policy I m a bit confused :
Comply with any requirements or restrictions imposed on usage of Instagram user photos and videos ("User Content") by their respective owners. You are solely responsible for making use of User Content in compliance with owners' requirements or restrictions.
14 only store or cache User Content for the period necessary to provide your app's service.
17.Don't apply computer vision technology to User Content, without our prior permission
11 => what are the common owner's restrictions? I mean, when people post a picture on instagram do they set any restriction or are they publicly available? And then there are some cases of pictures with restrictions
14 => i have not to build or sell any app. I need to store the pictures just the time needed for the analysis. However, according to this point I could store pictures on a drive and than remove them when finished. Did I get the point or I m wrong?
17 => I guess that this point is related to image manipulation. Not analysis.
I hope that someone can clarify me what can I do, considering my purpose. As an example, at this page you can see a project in which a set of instagram picture have been crowled, labelled and analysed. The downloadable file doesn t contain the pictures, but just their instagram ids.

Mapping without Google Maps (on a stand-alone server)

I've been asked to create a stand-alone site/app that's not connected to the web (all on a local server).
One part of it is to have a map of a natural reserve with a bunch of links that will show footpaths, different animals habitat areas, visitor centres and such.
So there's a map (static picture) and when you click on it some overlay goes on top of it.
At least that's the way I see it now.
I've looked here: http://www.carto.net/williams/yosemite/ but it just looks mucho ugly.
Getting Maps Premium is not an option as it's not that cheap. And the reason they don't want to use Maps/Earth free API is because internet connection is still very slow there (sattelite internet only and when optic cable will be hooked up nobody knows).
Looking for some recommendations as to how to proceed there. Drawing paths/areas on the picture of the maps seems extremely insufficient and time consuming.
I'd need some way to use coordinates to automatically draw areas and lines over the map (and then somehow export that as a graphis file (or SVG) that'll be layered on top of original map simply using ajax.
Will ARCGIS pro edition be the way to go or should I start learning SVG. Do you know some good SVG books/tutorials (as related to mapping)? Maybe there's some other way around altogether...
They do have detailed maps of the area in ARCGIS (whatever format they are in I don't know yet).
Just looking for some ideas, any help will be appreciated. Thanks in advance.
Do you know GeoServer? More or less all-in-one, compatible with different types of datasets, widely customisable.
Starting from "raw" SVG and write the whole thing yourself will probably be prohibitively time consuming.
If you have very little data (say less than 50 geometries) that is fixed, you could also use OpenLayers without any backend server.
For the data you could use a OpenLayers.Layer.Image if your (overlay-) map consists of a small raster image. For vector data, you can use OpenLayers.Layer.Text or a OpenLayers.Layer.Vecor together with protocols OpenLayers.Layer.KML or .JSON.
You can click through the current release examples.
I admit that this is not an easy task for a beginner, but it's fun hacking the maps together.

United States Weather Radar Data Feed or API?

Is there a government or private API for accessing weather radar data in the United States?
NOAA has a SOAP API: http://www.nws.noaa.gov/forecasts/xml/
Several private APIs are listed here:
http://www.programmableweb.com/apis/directory/1?apicat=Weather
I was looking for radar data awhile back to overlay on a google map. This site offers it for free and they provide some sample code to get started for google maps and some other online maps:
IEM Open GIS Consortium
The map tiles they provide are not limited to radar and as far as I can tell they are all free to use.
Radarmatic has a JSON API at http://radarmatic.com/api.html
Update: link broken, project no longer active
A better way to apprach this would be to use the "Weather and Climate Toolkit" offered at : The Weather and Climate Toolkit homepage.
The software can batch process raw radar data - and you can get just about anything you
want this way if you are able to place it on your map after processing. It can export in JSON, geoTIF and some other formats. If you want more options for your app/project, this is the easiest way to do it - as you can get rain, snow, hail, wind velocity, dual polarization products, etc quite easily once you learn your way around the software.
Weather radar data feed from every WSR-88D radar site comes in 2 raw forms : Level-2 and Level-3. Level 2 data ("super resolution" and base data) is available from the Amazon AWS servers (NEXRAD on AWS) and level-3 data is available from the NWS server at This link from the Radar Operations Center.
You can get images updated every three minutes from NWS RIDGE. It's not really an API -- just images sitting in a directory -- but the naming convention and structure of the images is fully documented.

Google Wave applications

My understanding is that Google wave is a communications and collaboration tool. But is it only limited to IM/Twitter type interface or can it do much more? Can it be something completely different than the top-down conversation format?
Say I want to build a collaborative photo editing app with google wave. which API should I use? or am I not getting it?
That would be a gadget, I believe (possibly combined with a robot). I'm not sure whether photo editing would really be a practical application of Wave, although a "collaborative canvas" certainly works.
The gadget would be used for the user interface side of things, and the robot could be used for more complex effects that you didn't want to implement in JavaScript. You'd add a bit of data representing "I want posterisation applied" (for example) and the robot would see that, apply the effect and then send back the modified blip with the posterised version.
The main problem I'd see with collaborative photo editing is the amount of potentially changed data for each edit. I suspect it would technically work, but it may not be great in terms of space/bandwidth usage...
If you are interested in collaborative diagramming, take a look at the video demo on the following page:
http://www.googlewaveblogger.com/collaboration/gravity-the-best-business-example-of-google-wave-period/
Midway through the video, you can see several users collaboratively editing a SAP business process (flowchart). Super cool.
There are three aspects to Google Wave:
A product: Wave is a web app using HTML5 built on GWT
A protocol: Wave also denotes the underlying format for storing and sharing waves
A platform: Wave provides a set of open APIs for developers
The platform can further be divided into Wave extensions, and the Embed API. Wave extensions include robots and gadgets, and the Embed API allows you to embed waves into third party applications and websites. A gadget is an application that runs within a wave, and a robot is an automated participant in a wave.
Some links that might be useful to you:
Google wave blog post: http://googleblog.blogspot.com/2009/05/went-walkabout-brought-back-google-wave.html
Google Wave API overview: http://code.google.com/apis/wave/guide.html
Google Wave Federation Architecture whitepaper: http://www.waveprotocol.org/whitepapers/google-wave-architecture
Google Wave Data Model and Client-Server Protorol whitepaper: http://www.waveprotocol.org/whitepapers/internal-client-server-protocol
Google Wave Extensions, An Inside Look: http://mashable.com/2009/06/11/google-wave-extensions/
Here is a searchable collection of Google Wave Gadgets and Robots in order to look at some examples of what you can do.
You can check out the Cards gadget, for example that has source code available.

Resources