API For Associating Cell Number With Provider - mobile

Given a cell number in the US (although other countries would be nice too), how to I go about figuring out the provider? Are there any web services that you know of or ETL dumps that I can use to get this information?
Ideally this component would accept a cell phone number and return a service provider.

MX Telecom looks most promising. They have an XML interface detailed here.
Another nice one (pay to get CSV file)
Here is a list of other service providers.
Before local number portability, you could determine carrier by the 3 digits after area code. This no longer works for people who "took their numbers with them" when they switched carriers.

I dont know about other countries. Here is a list of codes for Indian mobile phones. I am not aware of any webservice available. But this can stored in a DB and referred.

Related

Crawling and scraping random websites

I want to build a a webcrawler that goes randomly around the internet and puts broken (http statuscode 4xx) image links into a database.
So far I successfully build a scraper using the node packages request and cheerio. I understand the limitations are websites that dynamically create content, so I'm thinking to switch to puppeteer. Making this as fast as possible would be nice, but is not necessary as the server should run indefinetely.
My biggest question: Where do I start to crawl?
I want the crawler to find random webpages recursively, that likely have content and might have broken links. Can someone help to find a smart approach to this problem?
List of Domains
In general, the following services provide lists of domain names:
Alexa Top 1 Million: top-1m.csv.zip (free)
CSV file containing 1 million rows with the most visited websites according to Alexas algorithms
Verisign: Top-Level Domain Zone File Information (free IIRC)
You can ask Verisign directly via the linked page to give you their list of .com and .net domains. You have to fill out a form to request the data. If I recall correctly, the list is given free of charge for research purposes (maybe also for other reasons), but it might take several weeks until you get the approval.
whoisxmlapi.com: All Registered Domains (requires payment)
The company sells all kind of lists containing information regarding domain names, registrars, IPs, etc.
premiumdrops.com: Domain Zone lists (requires payment)
Similar to the previous one, you can get lists of different domain TLDs.
Crawling Approach
In general, I would assume that the older a website, the more likely it might be that it contains broken images (but that is already a bold assumption in itself). So, you could try to crawl older websites first if you use a list that contains the date when the domain was registered. In addition, you can speed up the crawling process by using multiple instances of puppeteer.
To give you a rough idea of the crawling speed: Let's say your server can crawl 5 websites per second (which requires 10-20 parallel browser instances assuming 2-4 seconds per page), you would need roughly two days for 1 million pages (1,000,000 / 5 / 60 / 60 / 24 = 2.3).
I don't know if that's what you're looking for, but this website renders a new random website whenever you click the New Random Website button, it might be useful if you could scrape it with puppeteer.
I recently had this question myself and was able to solve it with the help of this post. To clarify what other people have said previously, you can get lists of websites from various sources. Thomas Dondorf's suggestion to use Verisign's TLD zone file information is currently outdated, as I learned when I tried contacting them. Instead, you should look at ICANN's CZDNS. This website allows you to access TLD file information (by request) for any name, not just .com and .net, allowing you to potentially crawl more websites. In terms of crawling, as you said, Puppeteer would be a great choice.

Storing settings on the database for a web app?

I'm developing an open-source web application (a helpdesk) where the users will download it and install. This application will have some settings like: title, colors, default e-mail, logs... (for example). This settings will be edited by the user on the admin panel because most of them will not understand how to do it in code.
My question is what is the best way to store this on a (MySQL) database model? And counting that this application will upgrade and add more "settings" to that settings table.
Thank you in advance
There are a lot of different ways to do this, and it depends on what you think the final needs will be.
There's nothing wrong with saving parameters in a dedicated table, each parameter/user/value on a separate row. This is a fast way to set and get the information, and allows you to easily get access to reports by parameter and value and user, for example, what colors are the most popular.
If you are just using this for configuration, you can store the parameters as an XML or JSON string in a text/blob field. This has the benefit of giving you a single load to get all of the parameter values. Even more powerful, if your application already has default values for the parameters, you can easily extend the application without changing the database records. For a large number of parameters, this reduces the number of DB calls to load up all the parameters.

Is there a way to access to the database of Google Maps for the public transport?

I'd like to have access to the numbers and type of public transportation and for each of them, their stops of a certain city. So for instance, I'd like to have :
Number Type Stops
1 Metro Stop1.1, Stop1.2, Stop1.3, ...
6 Bus Stop6.1, Stop6.2, Stop6.3, ...
17 Tram Stop17.1, Stop17.2, Stop17.3, ...
... ... ...
Of course, I don't really care about the format, I just want to know how to have access to the data, in order to do not re-enter it manually in my website!
Thanks for any help :-)
GTFS Data Exchange
GTFS Exchange was retired in 2016.
It is still useful for legacy gtfs files that are not updated/available anymore. However, you should now refer to the following databases for static GTFS:
Transitland
TransitFeeds
TransitWiki
Google's own compilation: PublicFeeds.wiki
They overlap a lot but not completely, so it's always good to check all of them out. Also know that some smaller transit agencies still don't share their data with these services, so it's worth checking their website.
Finally, Google map uses these same files as their source data to my knowledge. In their reference page for GTFS, they mention the following:
Submitting a Transit feed to Google
If you're at a public agency that oversees public transportation for your city, you can use the GTFS specification to provide schedules and geographic information to Google Maps and other Google applications that show transit information.
If you provide a transportation service that is open to the public, and operates with fixed schedules and routes, we welcome your participation; it is simple and free.
For live updates though, I think that google uses GTFS RealTime.
Is there a way... in a word, no. This data isn't available through the Maps API (yet).

Silverlight application global data

I am re-making a Console game, that my boss made a considerably long time ago, in Silverlight. It's totally text based. In the Console version, each computer that had it installed had its own map, which was divided into a grid of rooms.
What I want to do is make the map global; when anybody runs the Silverlight version, they will all see the same map. There is no changing that goes on in the map, only new rooms being added to the map.
So, currently I'm storing all the data in IsolatedStorage, which is obviously not global. How should I store the data and retrieve it so that everybody playing can see the same map?
If it helps any, the server that it will be hosted on is a linux server, and has MySQL.
See this answer to a person who was trying to do something very similar (he wanted high score data, you want map data): High Scores self contained in .xap
The fact that you are running a linux server complicates things a little - instead of running a WCF or asmx service, you could consider a java based web service, or just make a normal HTTP page that queries the MySQL database and returns data which your Silverlight app can request and consume (this is still a "web service", albeit a very primitive one).

Web application for managing elections campaign

I’m trying to help a friend in his election campaign.
We mainly need a tool to manage a list of possible voters. We need to be able to:
1. Easily update details about the voters, and
2. Query for voters according to various parameters, and show and print the resulting lists
To enable campaigners to work from multiple workstations, we would like the system to be distributed, probably web based.
We would also like that to be in Hebrew, if possible.
Is there any existing tool that easily enables it?
If not, can you recommend on an easy way to implement such a tool?
(I have a solid programming knowledge, but not much time to devote to that)
You can achieve this easily with iFreeTools Creator. Just create the entities and attributes for Voters and add campaigners as users providing their Google email-id.
Regarding your requirements..
* This app is web-based. It runs on Google App Engine.
* The interface is English only, but data can be in unicode. Entity name and attribute names are also "data", so they can be in unicode too.
Other related features which might be useful in this context..
* You can import voter list using CSV files.
* Campaigners can search for voters near their workstation by filtering out records based on nearness to a geo-location.
// Disclosure : I wrote code for this web-app. Hope you like it. Feedback welcome.
Some possible answers might be found in the same question I asked in the web apps forum

Resources