Diff tool which generates documentation? - file

I have tried Beyond Compare, and it seems to be a good tool.
But I haven't found a way to export an overview of the differences.
The format of it should be one that most people can read. Doc, Rtf, Pdf, Html...
What I need is to display the differences of two folder. And it would be enough to display which files has been changed. But it would also be nice if it would be possible to, in the documentation, go deeper and actually see which rows in a file has been changed.
Are there any tools that can do this?

Beyond Compare has some functions to do that.
For example, in the folder diff view, select the files you want to report and then select Actions->File Compare Report. HTML is one of the output formats supported there.

Araxis Merge covered all of my needs.
Simple to use
Generated a nice overview of files in folder structure
Could click on changed files to see the changes in the content
The colors could be better, but that can be solved by inserting a custom CSS-file. :)

This script to colorize diff output to HTML might be useful. There are many other tools, one more is difftool.
On a relatively different note, I had used a code coverage tool that also generated HTML code views from gcov coverage information. Its called lcov.

Related

Interpolating custom data onto a PDF

I am building an Angular test preparation app (with Laravel 5.1 API). One of the requirements is to allow the user to print a certificate of achievement.
The client wants the person's name and credentials interpolated into the document (e.g., highlighted below). Here is a snapshot of the PDF template they sent:
The way I'm handling PDF viewing is simply by storing the file on S3 and giving them a link to that file.
Interpolating information into a PDF doc doesn't seem trivial and I haven't found much information on programmatically allowing this, but there are tools like DocHub, that allow you do edit while viewing the PDF.
I'm interested in learning:
is doing this programmatically trivial?
are there 3rd party tools I'm unaware of?
would I even be able to send this information along to the S3 link to interpolate in the first place?
Using PDF as a format for editing is usually a bad choice. If you have a form with fixed fields, then it's easy. Create a PDF template with an interactive form. In this form, based on AcroForm technology, you'll define fields with fixed coordinates, and a fixed size. You can then add content to these fields.
One major disadvantage with this approach is the lack of flexibility. Did you notice that I used the word "fixed" three times in the previous paragraph? If text doesn't fit the predefined field, you're out of luck. If the field is overdimensioned, you'll end up with plenty of white space. This approach is great if you can predict what the data will be like. A typical use case is a ticket or a voucher. For instance: the empty form is a really nice page, with only a couple of fields where an automated system can put a name, a date, a time, and a seat number.
This isn't the best approach for the example you show in your screen shot. The position of every line of text, every word, every character is known in advance. If you want to replace a short word with a long word (or vice-versa), then all those positions (of each line, of the complete page, possibly of the complete document) need to be recalculated. That's madness. Only people with very poor design skills come up with such an idea.
A better idea, is to store the template as HTML. See for instance chapter 5 of iText's pdfHTML tutorial, where we have this snippet of HTML:
<html>
<head>
<title>Invitation to SXSW 2018</title>
</head>
<body>
<u><b>Re: Invitation</b></u>
<br>
<p>Dear <name>SXSW visitor</name>,
we hope you had a great SXSW film festival experience last year.
And we would like to invite you to the next edition of SXSW Film
that takes place from March 9 until March 17, 2018.</p>
<p>Sincerely,<br>
The SXSW crew<br>
<date>August 4, 2017</date></p>
</body>
</html>
Actually, it's not really HTML, because the <name> tag and the <date> tag don't exist in HTML. All HTML processors (browsers as well as pdfHTML) ignore those tags and treat their content as if the tag was a <span>:
It doesn't make much sense to have such tags in the context of pure HTML, but it does make a lot of sense in the case of pdfHTML. With pdfHTMLL, you can configure custom tags, and have a result that looks like the PDFs shown below:
Look at the document for "John Doe" and compare it with the document for "Bruno Lowagie". The name "John Doe" is much shorter than my name, hence more words fit on that first line. The text flows nicely (we could also have chosen to justify the text on both sides). This "flow" is impossible to achieve with your approach, because you will never get a PDF template to reflow nicely.
OK, I get it, you probably say, but what about the practical aspects? You talk about a Java / .Net library, but I am working with Laravel and Angular.js. First, let me tell you that I don't think you'll find any good PDF tools for Laravel or Angular.js, because of the nature of PDF and those development environments (in my opinion, those technologies don't play well together). Regardless of my opinion, this shouldn't be much of a problem for you because you work in an Amazon environment. AWS supports Java, and the Java code needed to get pdfHTML working is minimal. Most of the code samples I wrote for the pdfHTML tutorial are shorter than 15 lines. So why not try Java and pdfHTML?
If you're already using Amazon services, why not use an amazon lambda function, in combination with iText7 (java), to generate the pdf on demand?
That way, you are guaranteed that the pdf is correct, and has nice layout every time.
Generating the pdf can either be done by:
converting HTML,
programmatically creating your entire document,
filling and flattening an XFA form.
I think for your use-case, either option 1 or 2 are the most sustainable.

Where can I find the simple information of the format for uploading questions?

Situation:
I want to train and simple configure the retrieve and rank service.
I just uploaded some PDFs and now I want to upload some questions.
In the documentation I do not find a simple information how the csv file must be structured and which are the must fields and which are not must files.
Something like: "[YOUR QUESTION (MUST)]",[DOCUMENT ID (MUST)], [RANKING (OPTIONAL)]
The document ID you will find in xyz in section xyz.
Inside the help I can not find such kind of help.
https://www.ibm.com/watson/developercloud/doc/retrieve-rank/training_data.shtml#script
Impact:
There is no chance to get a "real" documentation of the configuration outside the tutorial.
Possible Solution:
Provide additional documenation.
Maybe I was not able to find it and someone can guide me to the right place?
Ok, I found the solution for me, by try and error. Following steps do work for me:
1) You need a plain text file and the ending should be *.txt
2) Inside the file you have to write your questions like this:
What is the best place to be?
Why should I travel to the USA?
-> Don't do it like
"What is the best place to be?"
For me the help was missleading, because saying something about CSV files.
You can take a look also in the comment of #dalelane he is right, and highlight the entry text for the upload of the file.

Difficulty with filename and filemime when using Migrate module

I am using the Drupal 7 Migrate module to create a series of nodes from JPG and EPS files. I can get them to import just fine. But I notice that when I am done importing them if I look at the nodes it creates, none of the attached filefield and thumbnail files contain filename information.
Upon inspecting the file_managed table I see that both the filename and filemime fields are empty for ONLY the files that I attached via the migrate module. This also creates an issue with downloading the files.
Now I think the problem has to do with the fact that I am using "file_link" instead of "file_copy" as the file operation I specify. The problem is I am importing around 2TB (thats Terabytes) of image files. We had to put in a special request with Rackspace just to get access to that much disk space on our server. So I can't go around copying from one directory to the next because of space issues. So "file_link" seems like the obvious choice.
Now you probably want to see how I am doing this exactly, so here is the code snippet:
$jpg_arguments = MigrateFileFieldHandler::arguments(NULL,
'file_link', FILE_EXISTS_RENAME, 'en', array('source_field' => 'jpg_name'),
array('source_field' => 'jpg_filename'), array('source_field' => 'jpg_filename'));
$this->addFieldMapping('field_image', 'jpg_uri')
->arguments($jpg_arguments);
As you can see I am specifying no base path (just like the beer.inc example file does). I have set file_link, the language, and the source fields for the description, title, and alt.
It is able to generate thumbnails from the JPGs. But still missing those columns of data in the db table. I traced through the functions the best I could but I don't see what is causing this. I tried running the uri in the table through the functions that generate the filename and the filemime and they output just fine. It is like something is removing just those segments of data.
Does anyone have any idea what this could be? I am using the Drupal 7 Migrate module version 2.2. It is running on Drupal 7.8.
Thanks,
Patrick
Ok, so I have found the answer to yet another question of mine. This is actually an issue with the migrate module itself. The issue is documented here. I will be repealing this bounty (as soon as I figure out how).

How to export text from all pages of a MediaWiki?

I have a MediaWiki running which represents a dictionary of German terms and their translation to a local dialect. Each page holds one term, its translation and a number of additional infos.
Now, for a printable version of the dictionary, I need a full export of all terms and their translation. Since this is an extract of a page's content, I guess I need a complete export of all pages in their newest version in a parsable format, e.g. xml or csv.
Has anyone done that or can point me to a tool?
I should mention, that I don't have full access to the server, e.g. no command line, but I am able to add MediaWiki extensions or access the MySQL database.
You can export the page content directly from the database. It will be the raw wiki markup, as when using Special:Export. But it will be easier to script the export, and you don't need to make sure all your pages are in some special category.
Here is an example:
SELECT page_title, page_touched, old_text
FROM revision,page,text
WHERE revision.rev_id=page.page_latest
AND text.old_id=revision.rev_text_id;
If your wiki uses Postgresql, the table "text" is named "pagecontent", and you may need to specify the schema. In that case, the same query would be:
SET search_path TO mediawiki,public;
SELECT page_title, page_touched, old_text
FROM revision,page,pagecontent
WHERE revision.rev_id=page.page_latest
AND pagecontent.old_id=revision.rev_text_id;
This worked very well for me. Notice I redirected the output to the file backup.xml. From a Windows Command Processor (CMD.exe) prompt:
cd \PATH_TO_YOUR_WIKI_INSTALLATION\maintenance
\PATH_OF_PHP.EXE\php dumpBackup.php --full > backup.xml
I'm not completely satisfied with the solution, but I ended up specifying a common category for all pages and then I can add this category and all of the containing page names in the Special:Export box. It seems to work, allthough I'm not sure if it will still work when I reach a few thousand pages.
Export
cd maintenance
php5 ./dumpBackup.php --current > /path/wiki_dump.xml
Import
cd maintenance
php5 ./importDump.php < /path/wiki_dump.xml
It looks less than simple. http://meta.wikimedia.org/wiki/Help:Export might help, but probably not.
If the pages are all structured in the same way, you might be able to write a web scraper with something like Scrapy
You can use the special page, Special:Export to export to XML; here is Wikipedia's version.
You might also consider Extension:Collection if you want it eventually human readable (e.g. PDF) form.
You can set https://www.mediawiki.org/wiki/Manual:$wgExportAllowAll to true, then export all pages from Special:Export.

How Do I Use Multiple po Files in CakePHP?

I'm just beginning the process of exploring i18n in CakePHP and I can't seem to find the right combination of files and functions that will allow me to use multiple po files. If I want to use a single po file (default.po) for every bit of translatable text, that works fine, but I see that becoming an unmaintainable hairball very, very quickly. I've read the docs and the few articles I can find, but none really dive into i18n beyond the trivial use of one .po file.
Here's where I am right now:
I've "baked" my po templates (.pot files) and copied those into app/locale/eng/LC_MESSAGES (I'm not going to be using the default text as the key so that I can easily spot missing keys). For now, I have -views-layouts-default.po and -views-pages-index.po.
In those .po files, I've entered the text I want to use for each key.
In my homepage (views/pages/index.ctp) and default layout (views/layouts/default.ctp) I've wrapped the text key I want to translate with the __() function.
When I load the homepage, though, all I see are they keys. No text has been translated. If I throw up a default.po file, though, any keys I drop in there are populated just fine. I'm clearly missing some piece of the puzzle, but I can't find it. Any help would be much appreciated.
Thanks.
I found the piece I was missing thanks to the CakePHP Google Group. I had been playing with the __d() convenience function, but didn't have a clear picture of how to tie it together to my .po files. The answer is easy once you know it:
The domain translation:
__d ( 'login', 'PLEASE_LOGIN' );
Will look for the "PLEASE_LOGIN" key in the file named login.po. I didn't know (and hadn't read anywhere) that domain == po file name (without extension). Learning that made all the difference.

Resources