Developing visualization on zip code level data

Developing visualization on zip code level data - maps

I'm trying to use the zip code data from the census (any year that works). I have 5-digit zip code data that I need to match up to a map/shape file. The problem is that the zip code tabulation files from the Censsus ZCTA that can be found here http://www.census.gov/cgi-bin/geo/shapefiles2010/layers.cgi have holes in them. I am assuming these are lakes, parks, and wilderness areas. However, every zip code map I've seen do not have these. Is there a resource or method of combining these to create a solid zip code area?
Also, are there any other alternatives to zip code tabulation areas? Google maps will not work for what I am doing unfortunately. :(

Validating your assumption about the holes in your coverage, according to the Census Bureau information on Zip Code Tabulation Areas: For the 2010 Census, large water bodies and large unpopulated land areas do not have ZCTAs.
Although zip codes really aren't area features, just mail routing constructs, there are other zip code shapefiles that cover the contiguous US. See this GIS StackExchange post https://gis.stackexchange.com/questions/2682/most-up-to-date-source-for-us-zip-code-boundaries for pointers to them.

Related

How to generate text from wikidata json

I am looking for pointers for libraries or methods that would be able to generate full text from the structured information returned by Wikidata - if possible in multiple languages.
To be clearer: from data like the one provided here (this is the JSON version) I would like to be able to generate text similar to the intro paragraph of the wikipedia page for the same item:
Orvieto Cathedral (Italian: Duomo di Orvieto; Cattedrale di Santa Maria Assunta) is a large 14th-century Roman Catholic cathedral dedicated to the Assumption of the Virgin Mary and situated in the town of Orvieto in Umbria, central Italy.
The reason is that the text is provided by Wikipedia for all those cases where a page exists, but I would like to have something also for the Wikidata items without a wikipedia page.
My problem #1 here is: I don't know what something like this is called, so I have no idea what to google for. Any pointers to start from are appreciated, including services or APIs.

This problem falls under the Data to text generation task. I do not know of any services that are currently offering solution. You can look at WEBNLG challenge which has the same objective and similar data. AFAIK mostly template based methods are used to automatically insert data from wikidata into wikipedia as text.

How can I detect visual blocks in a PDF?

I'm trying to OCR resumes. My first problem is, before OCR, to get the main blocks of a document.
Since all the resumes have "visual blocks" (referring to professional experience, skills, languages, hobbies, whatever ...), I wonder if there's any open source solution to "split" into "blocks" a document, obviously no matter the layout design (that's where some kind of AI will work, I assume)
Thank you

First decompress your pdf using zlib.
you will then be able to see the pdf in a readable format - https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF#A_first_example
The pdf format is kind of similar to postscript.
also try converting your pdf to postscript to see how contents are arranged.
you can decompress the pdf using pdf-parser https://blog.didierstevens.com/2008/10/30/pdf-parserpy/
try this as well - https://gist.github.com/averagesecurityguy/ba8d9ed3c59c1deffbd1390dafa5a3c2
Once you can see how your data is presented => you can then start applying alogorithms to extract more meaning.

Interpolating custom data onto a PDF

I am building an Angular test preparation app (with Laravel 5.1 API). One of the requirements is to allow the user to print a certificate of achievement.
The client wants the person's name and credentials interpolated into the document (e.g., highlighted below). Here is a snapshot of the PDF template they sent:
The way I'm handling PDF viewing is simply by storing the file on S3 and giving them a link to that file.
Interpolating information into a PDF doc doesn't seem trivial and I haven't found much information on programmatically allowing this, but there are tools like DocHub, that allow you do edit while viewing the PDF.
I'm interested in learning:
is doing this programmatically trivial?
are there 3rd party tools I'm unaware of?
would I even be able to send this information along to the S3 link to interpolate in the first place?

Using PDF as a format for editing is usually a bad choice. If you have a form with fixed fields, then it's easy. Create a PDF template with an interactive form. In this form, based on AcroForm technology, you'll define fields with fixed coordinates, and a fixed size. You can then add content to these fields.
One major disadvantage with this approach is the lack of flexibility. Did you notice that I used the word "fixed" three times in the previous paragraph? If text doesn't fit the predefined field, you're out of luck. If the field is overdimensioned, you'll end up with plenty of white space. This approach is great if you can predict what the data will be like. A typical use case is a ticket or a voucher. For instance: the empty form is a really nice page, with only a couple of fields where an automated system can put a name, a date, a time, and a seat number.
This isn't the best approach for the example you show in your screen shot. The position of every line of text, every word, every character is known in advance. If you want to replace a short word with a long word (or vice-versa), then all those positions (of each line, of the complete page, possibly of the complete document) need to be recalculated. That's madness. Only people with very poor design skills come up with such an idea.
A better idea, is to store the template as HTML. See for instance chapter 5 of iText's pdfHTML tutorial, where we have this snippet of HTML:
<html>
<head>
<title>Invitation to SXSW 2018</title>
</head>
<body>
<u><b>Re: Invitation</b></u>
<br>
<p>Dear <name>SXSW visitor</name>,
we hope you had a great SXSW film festival experience last year.
And we would like to invite you to the next edition of SXSW Film
that takes place from March 9 until March 17, 2018.</p>
<p>Sincerely,<br>
The SXSW crew<br>
<date>August 4, 2017</date></p>
</body>
</html>
Actually, it's not really HTML, because the <name> tag and the <date> tag don't exist in HTML. All HTML processors (browsers as well as pdfHTML) ignore those tags and treat their content as if the tag was a <span>:
It doesn't make much sense to have such tags in the context of pure HTML, but it does make a lot of sense in the case of pdfHTML. With pdfHTMLL, you can configure custom tags, and have a result that looks like the PDFs shown below:
Look at the document for "John Doe" and compare it with the document for "Bruno Lowagie". The name "John Doe" is much shorter than my name, hence more words fit on that first line. The text flows nicely (we could also have chosen to justify the text on both sides). This "flow" is impossible to achieve with your approach, because you will never get a PDF template to reflow nicely.
OK, I get it, you probably say, but what about the practical aspects? You talk about a Java / .Net library, but I am working with Laravel and Angular.js. First, let me tell you that I don't think you'll find any good PDF tools for Laravel or Angular.js, because of the nature of PDF and those development environments (in my opinion, those technologies don't play well together). Regardless of my opinion, this shouldn't be much of a problem for you because you work in an Amazon environment. AWS supports Java, and the Java code needed to get pdfHTML working is minimal. Most of the code samples I wrote for the pdfHTML tutorial are shorter than 15 lines. So why not try Java and pdfHTML?

If you're already using Amazon services, why not use an amazon lambda function, in combination with iText7 (java), to generate the pdf on demand?
That way, you are guaranteed that the pdf is correct, and has nice layout every time.
Generating the pdf can either be done by:
converting HTML,
programmatically creating your entire document,
filling and flattening an XFA form.
I think for your use-case, either option 1 or 2 are the most sustainable.

Online Maps (google, nokia)

Is it possible to highlight a list of countries with a different colors?
I need to display some countries' statistics on the world map.
Now I use an image and fill a region with color (calculated for each country) by country's coordinates. It's a simple solution and it works well. But now I need to specify the countries' name too (and I think it's not the last customization).
There is a polygon solution, but it uses an array of coordinates. I don't think it's a suitable solution to highlight countries's territory.
I haven't found a solution yet. Any suggestions?
Thanks in advance.

Highlighting countries or regions to support statistics is known as Choropleth Mapping
, but unfortunately there usually isn't direct library support for Choropleth maps bundled into an online map API. This means you'll have to create your own framework, but fortunately it is possible to create one - I wrote an example using jQuery + HERE Maps to
answer the question here
Updated WKT solution now available
Access to KML shapes is no longer required, since the Geocoder API now offers an IncludeShapes attribute which returns the shape of a country in WKT format. A WKT parser can be found here.
A simple WKT choropleth example can be found here.
KML Base solution
For any framework you will need to have a file holding the boundaries of the countries or regions you need. The example uses a KML file, but you could also start with polygons if you had them. Country borders are a political minefield, which is the reason I guess most online mapping APIs steer clear of them. As a hint: try starting with something like http://geocommons.com/overlays/119819 and simplify it as much as possible to speed up the rendering- many small wiggles in the coast lines and small outlying islands are unnecessary.
Of course you could also try searching for "create choropleth map" from a search engine of your choice and use an tool to create a static image for your data (potentially at several zoom levels) and then use this as the basis of an map tile overlay. This requires a lot more work up front, but would push all the calculations server side and hence be faster to display.
Working example can be found on GitHub here

You could put the country's name into the image. It's not that difficult to place text into an image. The only tricky bit is if you are using tiles, you need to deal with names that cross tile boundaries by drawing the name once for each tile.

Google Waves - basic structure

Is a wave limited to the sharing of textual information (HTML), or am I correct in assuming that a wave can contain arbitrary data (represented in XML), so long as it also contains the javascript necessary to render it in a meaningful way?
I ask because the collaborative document preparation demonstrated in the Google I/O video looks very powerful, but there are many other types of documents than simple rtf text. In my case I would be looking interactively to develop gantt charts.

There is a lot that can be done inside each Wave. They have not yet made all features available, but here is a link to some samples: http://wave-samples-gallery.appspot.com/ which includes my Slashdot Gadget:http://wave-samples-gallery.appspot.com/about_app?app_id=18006
The Slashdot Gadget actually takes the RSS feed for Slashdot and displays the latest headlines.
Here is the XML: http://www.m1cr0sux0r.com/slashdot.xml
alt text http://www.m1cr0sux0r.com/xml.jpg

I got access to Google Wave a few days ago, and here's what the raw data for their Sokoban game (which supports two players playing simultaneously on the same board) looks like, for example:
<blip>
<p _t="title">
</p>
<p>
<w:gadget author="blixt#wavesandbox.com" prefs="" state="" title="" url="http://sokoban-server.appspot.com/com.example.simplegadget.client.SokobanGadget.gadget.xml">
<w:pref name="playerAllocation" value="1 1,blixt">
</w:pref>
<w:pref name="totalMoves" value="8">
</w:pref>
<w:pref name="playerPositions" value="1 4,2">
</w:pref>
<w:pref name="rockPositions" value="6 2,2 3,2 14,2 15,2 16,2 4,3">
</w:pref>
</w:gadget>
</p>
</blip>
So yes, you can store any data you like in a single blip, with the possibility to go backwards in "time" to see older versions of the data etc.
By the way, if you're interested in seeing some code for a robot that sits in a wave and interacts with users, I made one for a game I'm developing: Google Code Project for multifarce (and the game in question, it's not really public yet and as such not particularly functional.) The bot source is here: multifarce Wave robot source
Basically, all you need to get a bot running are the 14 last lines in that code. I love it! =)