How would I extract a title from a script using bs4

How would I extract a title from a script using bs4 - database

I am trying to extract the title from HTML located in a </script> where I want to assign a variable only to the Timer 5 mins 3 sec.
Heres the HTML
</script>
<title>Timer 5 mins 3 sec - 24/9/2020</title>
Heres what I've done so far
with requests.Session() as s:
r = s.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
itemitle = soup.find(True,{"script":"title"})
print(itemitle)
But this does not seem to find it

title is the tag and you can use the type (tag) selector. It is not inside the script tag (as shown), e.g.
soup.select_one('title').
With bs4 4.7.1+ you can use :contains to specify has "Timer" substring, or longer substring.
e.g.
soup.select_one('title:contains("Timer")')
This assumes the content is not dynamically generated. In that case, you will need to determine if comes from an additional xhr found in the network tab, or the javascript generating it.

Related

Using Python Selenium - How to enter text on the web <p> tag

I am running into trouble, i am wondering how do i enter a custom text to a paragraph on a webpage.
The paragraph tag already has a default text example below [see Screenshot1]:
<p class="MuiTypography-root jss43 MuiTypography-body1">35 Years</p>
Screenshot1
I want to play with this tag to write my own text here lets say "25 Years".
Please assist!
I tried following:
I've requisite library imported
also check that there is no iframe..
gender = WebDriverWait(driver, 60).until(EC.presence_of_element_located(
(By.CLASS_NAME, 'MuiTypography-root')))
gender.click() # this click and i can see items
gender.send_keys('25 Years') # not working
gender.innerHtml('25 Years') # not working

try this element ((By.XPATH, '//div[#class="jss42"]/p') instead By.CLASS_NAME

IronPDF html query parameter

Does IronPDF support html query parameters, or is there an alternate method?
I've been using IronPDF to convert an html file to PDF using the following method: var pdf = ironRenderer.RenderUrlAsPdf(reportPath);
However, the html located at reportPath now requires a parameter userid. I have tried var pdf = ironRenderer.RenderUrlAsPdf(reportPath?userid=1); but that gives me the following error: CheckHtmlFilePath - File not found: .../index.html%3Fuserid=1'
I can't see any documentation in IronPdf that parameters are supported. Does anyone have any work arounds?

Instead of adding the parameter to the variable name, add it to the string. For example:
reportPath += "?userid=1";
var pdf = ironRenderer.RenderUrlAsPdf(reportPath);
Check the URL to see if there is already a parameter, and manipulate the URL string accordingly. If you posted more code I could have described more.

How to visualize LabelMe database using Matlab

The LabelMe database can be downloaded from http://www.cs.toronto.edu/~norouzi/research/mlh/data/LabelMe_gist.mat
However, there is another link http://labelme.csail.mit.edu/Release3.0/
The webpage has a toolbox but I could not find any database to download. So, I was wondering if I could use the LabelMe_gist.mat which has the following fields. The field names contins the labels for the images, and img perhaps contains the images. How do I display the training and test images? I tried
im = imread(img)
Error using imread>parse_inputs (line 486)
The filename or url argument must be a string.
Error in imread (line 336)
[filename, fmt_s, extraArgs, msg] = parse_inputs(varargin{:});
but surely this is not the way. Please help
load LabelMe_gist.mat;
load('LabelMe_gist.mat', 'img')

Since we had no idea from your post what kind of data this is I went ahead and downloaded it. Turns out, img is a collection of 22019 images that are of size 32x32 (RGB). This is why img is a 32 x 32 x 3 x 22019 variable. Therefore, the i-th image is accessible via imshow(img(:,:,:,i));
Here is an animation of all of them (press Ctrl+C to interrupt):
for iImage = 1:size(img,4)
figure(1);clf;
imshow(img(:,:,:,iImage));
drawnow;
end

Export specific sections in pandoc when converting from Markdown

I have a Markdown document that was generated using Knitr (literate programming). This markdown document gets converted to Microsoft Word (docx) and HTML using pandoc. Now I would like to include specific parts from the Markdown in HTML, and others in docx. The concrete use case is that I'm able to generate JS+HTML charts using rCharts which is fine for HTML, but obviously doesn't render in docx, so I would like to use a simple PNG image in that case.
Is there some specific pandoc syntax or trick that I can use for this?

So one way to solve this is to post-process the generated markdown from knitr.
I output some mustasche and then parse that using the R package whisker.
Roughly the code looks like:
md <- knit(rmd, envir=e)
docx.temp <- tempfile()
html.temp <- tempfile()
writeLines(whisker.render(readLines(md), list(html=T)), html.temp)
writeLines(whisker.render(readLines(md), list(html=F)), docx.temp)
docx <- pandoc(docx.temp, format="docx")
html <- pandoc(html.temp, format="html")
file.copy(docx, "./report.docx", overwrite=T)
file.copy(html, "./report.html", overwrite=T)
With the Rmd (knitr) containing something roughly like
{{^html}}
```{r}
WITHOUT HTML
```
{{/html}}
{{#html}}
```{r}
WITH HTML
```
{{/html}}

How to export Rich Text fields as HTML from Notes with LotusScript?

I'm working on a data migration task, where I have to export a somewhat large Lotus Notes application into a blogging platform. My first task was to export the articles from Lotus Notes into CSV files.
I created a Agent in LotusScript to export the data into CSV files. I use a modified version of this IBM DeveloperWorks forum post. And it basically does the job. But the contents of the Rich Text field is stripped of any formatting. And this is not what I want, I want the Rich Text field rendered as HTML.
The documentation for the GetItemValue method explicitly states that the text is rendered into plain text. So I began to research for something that would retrieve the HTML. I found the NotesMIMEEntity class and some sample code in the IBM article How To Access HTML in a Rich Text Field Using LotusScript.
But for the technique described in the above article to work, the Rich Text field need to have the property "Store Contents as HTML and MIME". And this is not the case with my Lotus Notes database. I tried to set the property on the fields in question, but it didn't do the trick.
Is it possible to use the NotesMIMEEntity and set the "Store Contents as HTML and MIME" property after the content has been added, to export the field rendered as HTML?
Or what are my options for exporting the Notes database Rich Text fields as HTML?
Bonus information: I'm using IBM Lotus Domino Designer version 8.5

There is this fairly unknown command that does exactly what you want: retrieve the URL using the command OpenField.
Example that converts only the Body-field:
http://SERVER/your%5Fdatabase%5Fpath.nsf/NEW%5FVIEW/docid/Body?OpenField

Here is how I did it, using the OpenField command, see D.Bugger's post above
Function GetHtmlFromField(doc As NotesDocument, fieldname As String) As String
Dim obj
Set obj = CreateObject("Microsoft.XMLHTTP")
obj.open "GET", "http://www.mydomain.dk/database.nsf/0/" + doc.Universalid + "/" + fieldname + "?openfield&charset=utf-8", False, "", ""
obj.send("")
Dim html As String
html = Trim$(obj.responseText)
GetHtmlFromField = html
End Function

I'd suggest looking at Midas' Rich Text LSX (http://www.geniisoft.com/showcase.nsf/MidasLSX)
I haven't used the personally, but I remember them from years ago being the best option for working with Rich Text. I'd bet it saves you a lot of headaches.
As for the NotesMIMEEntity class, I don't believe there is a way to convert RichText to MIME, only MIME to RichText (or retain the MIME within the document for emailing purposes).

If you upgrade to Notes Domino 8.5.1 then you can use the new ConvertToMIME method of the NotesDocument class. See the docs. This should do what you want.
Alternativly the easiest way to get the Domino server to render the RichText will be to actually retrieve it via a url call. Set up a simple form that just has the RichText field and then use your favourite HTTP api to pull in the page. It should then be pretty straight forward to pull out the body.

Keep it simple.
Change the BODY field to Store contents as HTML and MIME
Open the doc in editmode.
Save.
Close.
You can now use the NotesMIMEEntity to get what you need from script.

You can use the NotesDXLExporter class to export the Rich Text and use an XSLT to transform the output to what you need.

I know you mentioned using LotusScript, but if you don't mind writing a small Java agent (in the Notes client), this can be done fairly easily - and there is no need to modify the existing form design.
The basic idea is to have your Java code open a particular document through a localhost http request (which is simple in Java) and to have your code capture that html output and save it back to that document. You basically allow the Domino rendering engine to do the heavy lifting.
You would want do this:
Create a form which contains only the rich-text field you want to convert, and with Content Type of HTML
Create a view with a selection formula for all of the documents you want to convert, and with a form formula which computes to the new form
Create the Java agent which just walks your view, and for each document gets its docid, opens a URL in the form http://SERVER/your_database_path.nsf/NEW_VIEW/docid?openDocument, grabs the http response and saves it.
I put up some sample code in a similar SO post here:
How to convert text and rich text fields in a document to html using lotusscript?

Works in Domino 10 (have not tested with 9)
HTMLStrings$ = NotesRichTextItem .Converttohtml([options] ) As String
See documentation :
https://help.hcltechsw.com/dom_designer/10.0.1/basic/H_CONVERTOHTML_METHOD_NOTESRICHTEXTITEM.html
UPDATE (2022)
HCL no longer support this method since version 11. The documentation does not include any info about the method.
I have made some tests and it still works in v12 but HCL recommended to not use it.

Casper's recommendation above works well, but make sure the ACL is such to allow Anonymous Access otherwise your HTML will be the HTML from your login form

If you do not need to get the Richtext from the items specifically, you can use ?OpenDocument, which is documented (at least) here: https://www.ibm.com/developerworks/lotus/library/ls-Domino_URL_cheat_sheet/
https://www.ibm.com/support/knowledgecenter/SSVRGU_9.0.1/com.ibm.designer.domino.main.doc/H_ABOUT_URL_COMMANDS_FOR_OPENING_DOCUMENTS_BY_KEY.html
OpenDocument also allows you to expand sections (I am unsure if OpenField does)
Syntax is:
http://Host/Database/View/DocumentUniversalID?OpenDocument
But be sure to include the charset parameter as well - Japanese documents were unreadable without specifying utf-8 as the charset.
Here is the method I use that takes a NotesDocument and returns the HTML for the doc as a string.
private string ConvertDocumentToHml(Domino.NotesDocument doc, string sectionList = null)
{
var server = doc.ParentDatabase.Server.Split('/')[0];
var dbPath = doc.ParentDatabase.FilePath;
string viewName = "0";
string documentId = doc.UniversalID.ToUpper();
var ub = new UriBuilder();
ub.Host = server;
ub.Path = dbPath.Replace("\\", "/") + "/" + viewName + "/" + documentId;
if (string.IsNullOrEmpty(sectionList))
{
ub.Query = "OpenDocument&charset=utf-8";
}
else
{
ub.Query = "OpenDocument&charset=utf-8&ExpandSection=" + sectionList;
}
var url = ub.ToString();
var req = HttpWebRequest.CreateHttp(url);
try
{
var resp = req.GetResponse();
string respText = null;
using (var sr = new StreamReader(resp.GetResponseStream()))
{
respText = sr.ReadToEnd();
}
return respText;
}
catch (WebException ex)
{
return "";
}
}