Export specific sections in pandoc when converting from Markdown - export

I have a Markdown document that was generated using Knitr (literate programming). This markdown document gets converted to Microsoft Word (docx) and HTML using pandoc. Now I would like to include specific parts from the Markdown in HTML, and others in docx. The concrete use case is that I'm able to generate JS+HTML charts using rCharts which is fine for HTML, but obviously doesn't render in docx, so I would like to use a simple PNG image in that case.
Is there some specific pandoc syntax or trick that I can use for this?

So one way to solve this is to post-process the generated markdown from knitr.
I output some mustasche and then parse that using the R package whisker.
Roughly the code looks like:
md <- knit(rmd, envir=e)
docx.temp <- tempfile()
html.temp <- tempfile()
writeLines(whisker.render(readLines(md), list(html=T)), html.temp)
writeLines(whisker.render(readLines(md), list(html=F)), docx.temp)
docx <- pandoc(docx.temp, format="docx")
html <- pandoc(html.temp, format="html")
file.copy(docx, "./report.docx", overwrite=T)
file.copy(html, "./report.html", overwrite=T)
With the Rmd (knitr) containing something roughly like
{{^html}}
```{r}
WITHOUT HTML
```
{{/html}}
{{#html}}
```{r}
WITH HTML
```
{{/html}}

Related

Loading pre-trained CBOW/skip-gram embeddings from a file that has unknown encoding?

I'm trying to load pre-trained word embeddings for the Arabic language (Mazajak embeddings: http://mazajak.inf.ed.ac.uk:8000/). The embeddings file does not have a particular extension and I'm struggling to get it to load. What's the usual process to load these embeddings?
I've tried doing with open("get_sg250", encoding = encoding) as file: file.readlines() for different encodings but it seems like none of them are the answer (utf-8 does not work at all), if I try windows-1256 I get gibberish:
e.g.
['8917028 300\n',
'</s> Hل®:0\x16ء:؟X§؛R8ڈ؛\xa0سî9K\u200fƒ::m¤9¼»“8¤p\u200c؛tعA:UU¾؛“_ع9‚Nƒ¹®G§¹قفگ؛ww$؛\u200eba:\x14.„:R¸پ:0–\x0b:–ü\x06:×#¦؛Yٍ²؛m ظ:{\x14¦:µ\x01‡:ه\x17S¹Yr¯:j\x03-¹ff€9×£P¸\n',
'W‚؛UUه9¼»é¹""§؛\u200c¶د:UU؟:\u200eb؟¹{\x14\u200d¸,ù19ïî\u200d؛ئ\x12¯؛\x00\x00ا:\u200c6°7A§a؛ذé„؛ذi†؛®G\x14:حجŒ8\x03\u200cè9ه\x17¸؛ق]¦؛ڈآ5¸قفا9حج^:\x00€ٹ؛q=²:\x00\x00¢9\x14®أ9×£T¹لz‚:\x1bèG؛®G7؛ڑ™<:m\xa0ƒ¹""´9\x14®\x1d:"¢²؛®G-؛ڑ™~:±ن¸:\x18ث«:¸\x1e…؛`,8؛Hل\u200d¹±ن.:\x1f…¥؛لْ‚:ڑ™s:R¸\x0b؛ئ’\x07؛0–C؛ڈآ¸:ذéھ:ة/خ¹A\'¸:ڑ™ز:m\xa0\x1e:è´ظ::ي‡؛\n',
'×\x05؛Œ%8؛ش\x06~؛أُu:\x00\x00\n',
":‰ˆ\x149\x14®?؛ِ(\x05:«ھ…:)\\‡833G:Haط؛\x1f…¼:¼»'9\x00\x00 ؛=\n",
'6؛R¸‚¹¼;€؛\x1bè¾؛\x1bèw؛قف؛:A§\x1a؛""j؛K~J:Hل\x14؛ىرد:\u200c6\x0c؛–|ب؛‚Nm:cةد·:mک؛‰ˆھ9\x00\x00ü9DD(¹ذi\x1f:ذé¬؛,ù™9¼»\x1e:wwƒ؛\x03\u200cF87ذ©·×£Q؛\x1f…w؛ئ\x12ح؛\x00\x00\x007ٍ‹U8\x0etZ6“ك«؛cةط؛Haد؛–ü¼؛33?¹Œ%َ9أُخ9=\n',
'‹؛ق]ع:ڈآ/؛0–ق¹¤pُ¹Dؤخ:¤p¤؛\x1bèت9\u200ebé¹ùE‹:–üb7=ٹ؛:؟Xv؛×£c؛ِ(·؛è4\xa0؛cة‹؛0\x16ˆ؛ئ’U:""#؛ة/j:R8،:أُى9ذé€:ىQX:\x1f…L:""›؛K\u200f•؛ڈآں؛‰ˆ8¸ww´:""o؛è´…؛\n',
'W·؛¤pگ:{”¶؛\x0etJ¹\u200eb>:ùإة؛`¬أ؛ِ(ü9K\u200f™:‚N؛:لz;:ِ(ٹ:Œ¥ˆ؛§\n',
'ں؛ِ¨\xad:ڑ™q؛\u200c6\x19:×£H9¤p\x1c:\x03\u200cخ¹–üٹ8UU\x13؛Hلؤ¹è´ء؛ïnژ؛®Gک:è´¯9\x0etN؛O\x1b\x0b؛\x00\x00Z:\n',
'Wڑ؛""J؛؟طخ:\x03\u200c¹:لْ¬؛\u200c6ک9ڑ™D؛\x1bèT8ق]ƒ:¼»س:0–-:~±³:,y‰:è´،¸jƒأ:m\xa0]:A\'د:j\x03\x15؛Haد:""½:wwù¹ه\x17ء؛×#س:&؟œ9×£5؛Hلz¹\\ڈ€¹)\\¨؛O\x1bْ¹ه\x17\x1b¹ڈB×؛\x03\u200c™؛ىQز¹لz¤¹ذi\x1c:\\ڈژ9ùإV¹R¸€:ùإü9ww?9‰\x08\u200d:~±ؤ¹‚Nù¹‰ˆ\x10¹UUn؛\x11\x11ƒ؛ٍ‹چ8‰ˆ½:\x1bèî¹O\x1bè¶`¬´؛=\n',
'¢:\n',
I've also tried using pickle but that also doesn't work.
Any suggestions on what I could try out?

How would I extract a title from a script using bs4

I am trying to extract the title from HTML located in a </script> where I want to assign a variable only to the Timer 5 mins 3 sec.
Heres the HTML
</script>
<title>Timer 5 mins 3 sec - 24/9/2020</title>
Heres what I've done so far
with requests.Session() as s:
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
itemitle = soup.find(True,{"script":"title"})
print(itemitle)
But this does not seem to find it
title is the tag and you can use the type (tag) selector. It is not inside the script tag (as shown), e.g.
soup.select_one('title').
With bs4 4.7.1+ you can use :contains to specify has "Timer" substring, or longer substring.
e.g.
soup.select_one('title:contains("Timer")')
This assumes the content is not dynamically generated. In that case, you will need to determine if comes from an additional xhr found in the network tab, or the javascript generating it.

How to save file in detectron2

When I use the cv2_imshow code of my custom dataset, I can view the results of detections on the image via Google Colaboratory. Now, I want to save this image to Google Drive.
v = Visualizer(im[:, :, ::-1], metadata=microcontroller_metadata, scale=1.2)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(v.get_image()[:, :, ::-1])
However, when I use the demo.py code provided by detectron2, I get results with kites and other classes which are COCO classes but not my custom classes
I use this code to run demo.py
!python demo.py --config-file detectron2/configs/COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml --input gdcnn/0_img_input/validate/validate{a}.jpg --confidence-threshold 0.2 --output path/to/googledrive/predictionfasterrcnn.jpg --opts MODEL.WEIGHTS output/model_final.pth
You can save file like this:
v.save(filepath)
or
cv2.imwrite(filepath, v.get_image()[:, :, ::-1])
Save the output file by using cv2 function to save image or instead use:
cv2.imwrite(filename, img)

How to export some selected records from SQLFORM.grid as doc or pdf format in web2py

When I click the checkbox before every row, export the selected records as doc or pdf format. How to realize this?
def test():
form=SQLFORM.grid(db.problem,selectable = lambda ids:download(ids)
return dict(form=form)
def export(ids):
if I set csv=True in SQLFORM.grid there are some format while no doc and pdf!
Thanks!
Jian,
Unfortunately, working with .doc, .docx (Microsoft Word) and .pdf isn't as simple as you might think it is.
For Word, you'll need python-docx that can be installed with $ pip install python-docx and you can find the documentation an sample code here.
To create a PDF document, you'll only need to import pyfpdf using from gluon.contrib.fpdf import FPDF, since web2py already comes with it. But, the same case applies here: You'll need to read some documentation and write some code.

How to export Rich Text fields as HTML from Notes with LotusScript?

I'm working on a data migration task, where I have to export a somewhat large Lotus Notes application into a blogging platform. My first task was to export the articles from Lotus Notes into CSV files.
I created a Agent in LotusScript to export the data into CSV files. I use a modified version of this IBM DeveloperWorks forum post. And it basically does the job. But the contents of the Rich Text field is stripped of any formatting. And this is not what I want, I want the Rich Text field rendered as HTML.
The documentation for the GetItemValue method explicitly states that the text is rendered into plain text. So I began to research for something that would retrieve the HTML. I found the NotesMIMEEntity class and some sample code in the IBM article How To Access HTML in a Rich Text Field Using LotusScript.
But for the technique described in the above article to work, the Rich Text field need to have the property "Store Contents as HTML and MIME". And this is not the case with my Lotus Notes database. I tried to set the property on the fields in question, but it didn't do the trick.
Is it possible to use the NotesMIMEEntity and set the "Store Contents as HTML and MIME" property after the content has been added, to export the field rendered as HTML?
Or what are my options for exporting the Notes database Rich Text fields as HTML?
Bonus information: I'm using IBM Lotus Domino Designer version 8.5
There is this fairly unknown command that does exactly what you want: retrieve the URL using the command OpenField.
Example that converts only the Body-field:
http://SERVER/your%5Fdatabase%5Fpath.nsf/NEW%5FVIEW/docid/Body?OpenField
Here is how I did it, using the OpenField command, see D.Bugger's post above
Function GetHtmlFromField(doc As NotesDocument, fieldname As String) As String
Dim obj
Set obj = CreateObject("Microsoft.XMLHTTP")
obj.open "GET", "http://www.mydomain.dk/database.nsf/0/" + doc.Universalid + "/" + fieldname + "?openfield&charset=utf-8", False, "", ""
obj.send("")
Dim html As String
html = Trim$(obj.responseText)
GetHtmlFromField = html
End Function
I'd suggest looking at Midas' Rich Text LSX (http://www.geniisoft.com/showcase.nsf/MidasLSX)
I haven't used the personally, but I remember them from years ago being the best option for working with Rich Text. I'd bet it saves you a lot of headaches.
As for the NotesMIMEEntity class, I don't believe there is a way to convert RichText to MIME, only MIME to RichText (or retain the MIME within the document for emailing purposes).
If you upgrade to Notes Domino 8.5.1 then you can use the new ConvertToMIME method of the NotesDocument class. See the docs. This should do what you want.
Alternativly the easiest way to get the Domino server to render the RichText will be to actually retrieve it via a url call. Set up a simple form that just has the RichText field and then use your favourite HTTP api to pull in the page. It should then be pretty straight forward to pull out the body.
Keep it simple.
Change the BODY field to Store contents as HTML and MIME
Open the doc in editmode.
Save.
Close.
You can now use the NotesMIMEEntity to get what you need from script.
You can use the NotesDXLExporter class to export the Rich Text and use an XSLT to transform the output to what you need.
I know you mentioned using LotusScript, but if you don't mind writing a small Java agent (in the Notes client), this can be done fairly easily - and there is no need to modify the existing form design.
The basic idea is to have your Java code open a particular document through a localhost http request (which is simple in Java) and to have your code capture that html output and save it back to that document. You basically allow the Domino rendering engine to do the heavy lifting.
You would want do this:
Create a form which contains only the rich-text field you want to convert, and with Content Type of HTML
Create a view with a selection formula for all of the documents you want to convert, and with a form formula which computes to the new form
Create the Java agent which just walks your view, and for each document gets its docid, opens a URL in the form http://SERVER/your_database_path.nsf/NEW_VIEW/docid?openDocument, grabs the http response and saves it.
I put up some sample code in a similar SO post here:
How to convert text and rich text fields in a document to html using lotusscript?
Works in Domino 10 (have not tested with 9)
HTMLStrings$ = NotesRichTextItem .Converttohtml([options] ) As String
See documentation :
https://help.hcltechsw.com/dom_designer/10.0.1/basic/H_CONVERTOHTML_METHOD_NOTESRICHTEXTITEM.html
UPDATE (2022)
HCL no longer support this method since version 11. The documentation does not include any info about the method.
I have made some tests and it still works in v12 but HCL recommended to not use it.
Casper's recommendation above works well, but make sure the ACL is such to allow Anonymous Access otherwise your HTML will be the HTML from your login form
If you do not need to get the Richtext from the items specifically, you can use ?OpenDocument, which is documented (at least) here: https://www.ibm.com/developerworks/lotus/library/ls-Domino_URL_cheat_sheet/
https://www.ibm.com/support/knowledgecenter/SSVRGU_9.0.1/com.ibm.designer.domino.main.doc/H_ABOUT_URL_COMMANDS_FOR_OPENING_DOCUMENTS_BY_KEY.html
OpenDocument also allows you to expand sections (I am unsure if OpenField does)
Syntax is:
http://Host/Database/View/DocumentUniversalID?OpenDocument
But be sure to include the charset parameter as well - Japanese documents were unreadable without specifying utf-8 as the charset.
Here is the method I use that takes a NotesDocument and returns the HTML for the doc as a string.
private string ConvertDocumentToHml(Domino.NotesDocument doc, string sectionList = null)
{
var server = doc.ParentDatabase.Server.Split('/')[0];
var dbPath = doc.ParentDatabase.FilePath;
string viewName = "0";
string documentId = doc.UniversalID.ToUpper();
var ub = new UriBuilder();
ub.Host = server;
ub.Path = dbPath.Replace("\\", "/") + "/" + viewName + "/" + documentId;
if (string.IsNullOrEmpty(sectionList))
{
ub.Query = "OpenDocument&charset=utf-8";
}
else
{
ub.Query = "OpenDocument&charset=utf-8&ExpandSection=" + sectionList;
}
var url = ub.ToString();
var req = HttpWebRequest.CreateHttp(url);
try
{
var resp = req.GetResponse();
string respText = null;
using (var sr = new StreamReader(resp.GetResponseStream()))
{
respText = sr.ReadToEnd();
}
return respText;
}
catch (WebException ex)
{
return "";
}
}

Resources