Remove html from RichTextField - wagtail

I'm trying to remove the html code that wraps the RichTextField content, I thought I could do it using "raw_data" but that doesn't seem to work. I could use regex to remove it but there must be a wagtail/django way to do this?
for block in post.faq.raw_data:
print(block['value']['answer'])
Outputs:
<p data-block-key="y925g">The time is almost 4.30</p>
Expected output (just the raw text):
The time is almost 4.30
StructBlock:
class FaqBlock(blocks.StructBlock):
question = blocks.CharBlock(required=False)
answer = blocks.RichTextBlock(required=False)

You can do this in Beautiful Soup easily.
soup = BeautifulSoup(unescape(html), "html.parser")
inner_text = ' '.join(soup.findAll(text=True))
In your case, html = value.answer which you can pass into a template_tag
EDIT: example filter:
from bs4 import BeautifulSoup
from django import template
from html import unescape
register = template.Library()
#register.filter()
def plaintext(richtext):
return BeautifulSoup(unescape(richtext), "html.parser").get_text(separator=" ")
There's the get_text() operator in BeautifulSoup which takes a separator - it does the same as the join statement I wrote earlier. The default separator is null string which joins all the text elements together without a gap.
<h3>Rich Text</h3>
<p>{{ page.intro|richtext }}</p>
<h3>Plain Text</h3>
<p>{{ page.intro|plaintext }}</p>
If you want to retain line breaks, it needs a bit more parsing to replace block elements with a \n. The streamvalue.render_as_block() method does that for you, but there's no method like this for RichTextField since it's just a string. You can find code examples to do this if you need.

Related

Hugo code fences output two tags, pre and code

Is there any way to tweak how Hugo output codefences?
if I have some markdown like so:
```csharp
//some code
```
It will be generated as:
<pre class="language-csharp">
<code class="language-csharp">
//some code
Can I somehow change the pre+code output?
I'm trying to integrate Mermaid.js into my site and this fails due to having the two tags.
If it manages to hook onto the code tag, the Mermaid output is just shown as code inside the pre
And if it hooks onto the pre, then the inner text is wrong and cant be parsed.
For anyone stuck on this issue, here is how I ended up solving it.
In the template for our pages, we take the content of the markdown file.
Then find-replace language-mermaid with just mermaid.
This prevents collision with other libraries like Prism.JS.
And it allows Mermaid.JS to correctly find the proper tag and class to hook into.
<div>
{{ $content := .Content }}
//other replace hacks ....
//...
{{ $content = replace $content "language-mermaid" "mermaid" }}
{{ safeHTML $content}}
This results in generated files containing the following output.
<pre>
<code class="mermaid">
...
Ugly hack, but works. so that is good enough for us right now.
So far, 6th March 2022, it is not possible. According to the official documentation, only images, links, and headings are adjustable in this way.
However, you should be able to create your own shortcode and implement it in a way that will provide you the features you want to get and use.

Dangerously Set innerHTML React

I have React frontend and strapi backend.
When inserting data into my strapi backend, the resulting output in my frontend contains html elements.
How can I show the output without the HTML elements? I have the following Gatsby code block,
import ReactMarkdown from "react-markdown"
<ReactMarkdown children={info_} />
The data within {info_} is outputted with the HTML elements, how can I use Dangerously Set innerHTML in my code or is there some other way to achieve this?
If you display an html node within the dangerouslySetInnerHTML property, you put your application at risk for XSS attacks. Long story short, the html could contain malicious code that would harm the user. If you do it, you need to sanitize the content before displaying it. The best option would be to use a battle-tested library such as sanitize-html-react.
You can use DOMParser to create a document from your HTML input and then extract the text like this:
new DOMParser().parseFromString(info_, 'text/html').body.textContent;
Here's an example using a functional form:
I tried putting this into a snippet demo, but the Stack Overflow snippet environment doesn't like something about the syntax. 🤷 ☹️ You can copy and paste it in your JS console to try it.
Note that the embedded script never runs, but its source text is included in the output. If you want just part of the created document's text, you can use a method like Document.querySelector on the created document rather than its body.
function getTextFromHtml (html) {
const doc = new DOMParser().parseFromString(html, 'text/html');
return doc.body.textContent ?? '';
}
// Use:
// assuming `info_` is a string of valid HTML like this:
const info_ = `
<div>
<p>Some text</p>
<p>Some more text</p>
<script>console.log('This script executed!')</script>
</div>
`;
const textContent = getTextFromHtml(info_);
console.log(textContent);
Afterward, you'll have plain text, so you won't need dangerouslySetInnerHTML.

Fragment id linking in wagtail's rich text content

I have a bunch of content in a Wagtail 2.0 rich text field that looks like
Page heading
(intro blurb)
heading 1
(heading-1-relevant text)
heading 2
(heading-2-relevant text)
...
and I would like to give each heading an id so that any text can be made a link to jump to the relevant content. I can't seem to find an option to give headings an explicit id, and the "link" button in the rich text editor does not seem to let me pick active fragment identifiers in the content.
Is there a way to add fragment identifier based navigation on the same page work with Wagtail's rich text editor?
Revisiting my own question a year later because this is still something we need, the solution we came up with is to simply wrap the RichText html serialization, and putting fragment id injection on top:
import re
from django import template
from django.utils.text import slugify
from wagtail.core.rich_text import RichText
# We'll be wrapping the original RichText.__html__(), so make
# sure we have a reference to it that we can call.
__original__html__ = RichText.__html__
# This matches an h1/.../h6, using a regexp that is only
# guaranteed to work because we know that the source of
# the HTML code we'll be working with generates nice
# and predictable HTML code (and note the non-greedy
# "one or more" for the heading content).
heading_re = r"<h([1-6])([^>]*)>(.+?)</h\1>"
def add_id_attribute(match):
"""
This is a regexp replacement function that takes
in the above regex match results, and then turns:
<h1>some text</h1>
Into:
<h1><a id="some-text"></a>some text</h1>
where the id attribute value is generated by running
the heading text through Django's slugify() function.
"""
n = match.group(1)
attributes= match.group(2)
text_content = match.group(3)
id = slugify(text_content)
return f'<h{n}{attributes}><a id="{id}"></a>{text_content}</h{n}>'
def with_heading_ids(self):
"""
We don't actually change how RichText.__html__ works, we just replace
it with a function that does "whatever it already did", plus a
substitution pass that adds fragment ids and their associated link
elements to any headings that might be in the rich text content.
"""
html = __original__html__(self)
return re.sub(heading_re, add_id_attribute, html)
# Rebind the RichText's html serialization function such that
# the output is still entirely functional as far as wagtail
# can tell, except with headings enriched with fragment ids.
RichText.__html__ = with_heading_ids
This works rather well, does not require any hacking in draftail or wagtail, and is very easy to enable/disable simply by loading this code as part of the server startup process (we have it living in our wagtailcustom_tags.py file, so when Django loads up all template tag sets, the RichText "enrichment" kicks in automatically).
We had initially tried to extend the ... | richtext template filter, but while that's entirely possible, that only works for custom blocks we ourselves wrote, with our own custom templates, and so turned out to not be a solution given the idea that it should "just work".
To have control over the structure of your page body, it's preferable to encourage users to use heading blocks, rather than headings within the rich text block. Then you can have a heading block type which has two fields, a 'text' and an 'id', and you can specify a template that outputs the h element with the id attribute.
class Heading2Block(blocks.StructBlock):
heading = blocks.CharBlock(classname='full title')
link_id = blocks.CharBlock(help_text='For making hyperlinks to this heading')
class Meta:
template = 'blocks/h2.html'
Put the following in blocks/h2.html:
<h1{% if value.link_id %} id="{{ value.link_id|slugify }}"{% endif %}>{{ value.heading }}</h1>
In earlier versions of Wagtail it was possible to remove the h widget from the Hallo.js rich text editor, and this was a good way of encouraging user adoption of the heading block. Similar restriction is not currently present in Draftail, but there is a pull request which reimplements it.

Angularjs translate in nested tag

Good day,
I'm trying to translate using the directive way this portion of html
<h1>First text to translate<small>Second text to translate</small></h1>
But I encounter some difficulties. For example if I try:
<h1 translate>KEY<small>Second text to translate</small></h1>
the key will not be translated and I see it on the page and if I try:
<h1 translate="KEY"><small>Second text to translate</small></h1>
this time the key is translated but the second text disappear.
To make it work I must use the translate service inside the controller or remove the nesting. Any advice?
You can use it as a filter instead of directive:
<h1>{{'KEY' | translate}}<small>{{'Second text to translate' | translate}}</small></h1>
See https://angular-translate.github.io/docs/#/api/pascalprecht.translate.filter:translate

How to format carriage returns in a Backbone model in a Mustache template

I'm using Backbone models as input into Mustache templates to generate HTML.
I have a Backbone model with a number of attributes, such as name, description and id. The description attribute can contain carriage returns, which I want to render as <br> tags when they're rendered in the template.
By default, Mustache simply outputs the carriage returns directly, so the markup looks tidy, but the rendered result has no breaks.
I don't particularly want to replace \n\r in the description attribute, as that property could be used elsewhere (e.g. in alt or meta tags).
The only idea I have so far is to add a duplicate description attribute that has the formatted text.
Is there nothing in Mustache that formats HTML line breaks as <br> tags?
Mustache is very limited on purpose. If you need anything special in a Mustache template, you prepare your data in JavaScript so that Mustache's interpolation and loops can handle it. In your case, that means splitting your string on EOLs to get an array:
// Adjust the regex to suit your data, this one is pretty loose.
var lines = string.split(/[\r\n]+/)
.map(function(line) { return { line: line } });
and then loop over that array in Mustache:
{{#lines}}
{{line}}<br>
{{/lines}}
mu is too short's answer is correct. I just want to add that the .map function isn't supported in IE8 (and older).
I ended up using a loop to achieve the same affect as we need to support IE8:
var descriptionArray = description.split(/[\r\n]+/);
var descriptionLines = new Array();
for (var line = 0; line < descriptionArray.length; line++) {
descriptionLines.push({ Line: descriptionArray[line] });
}

Resources