How to remove alt attribute of img tag in HTML Purifier? - htmlpurifier

By default, HTML Purifier adds an alt attribute to each img tag (really annoying behavior). So
<img src="123.jpg" />
becomes
<img src="123.jpg" alt="123.jpg" />
Documentation mentiones an Attr.DefaultImageAlt option. It defaults to NULL meaning to use the basename of the src attribute for the alt. When I set Attr.DefaultImageAlt to an empty string the result becomes
<img src="123.jpg" alt="" />
Anyone can suggest how to get rid of the alt attribute completely?

What you're observing stems from that the alt attribute is mandatory for img tags according to the standards, and HTML Purifier takes the standards into account.
That means HTML Purifier, unless you tweak its fundamental HTML handling behaviour (be it by patching HTML Purifier, or by overriding its understanding of certain tags or attributes), cannot be made to leave away the alt= attribute.
(Browsers actually have a similar behaviour, though it may not be as apparent - if you remove alt=, they will still have an internal alt= value that they use instead.)
If this information doesn't change your opinion on how to handle the attribute, read on:
Patching
(i.e. changing the behaviour by changing the HTML Purifier source code.)
If you want to patch HTML Purifier to allow alt to be absent, you should patch library/HTMLPurifier/AttrTransform/ImgRequired.php. You can also see how the Attr.DefaultImageAlt directive is used there - if you supply a value of null (rather than an empty string), part of the filename will be used as the alt value.
Overriding
(i.e. changing the behaviour without changing the HTML Purifier source code.)
If you want to override the HTML Purifier behaviour, check out the Customize! documentation on the HTML Purifier site.
Without having tested it, I believe you need to make two changes to override the behaviour you see:
1) Make alt non-mandatory:
$htmlDef = $this->configuration->getHTMLDefinition(true);
$htmlDef->addAttribute('img', 'alt', new HTMLPurifier_AttrDef_Text());
The lack of * should help you there.
2) Remove or replace the ImgRequired attribute-transformation.
You can see that the HTMLPurifier_AttrTransform_ImgRequired class ends up getting registered to both $htmlDef->info_attr_transform_post['img'] and $htmlDef->info_attr_transform_pre['img'] in library/HTMLPurifier/HTMLModule/Image.php. You should be able to do something like this:
$htmlDef->info_attr_transform_pre['img'] = array();
$htmlDef->info_attr_transform_post['img'] = array();
// You can *replace* the old behaviour with your own by writing
// your own class and loading it here:
// $htmlDef->info_attr_transform_pre['img'][] = new YourOwnClass();
// $htmlDef->info_attr_transform_post['img'][] = new YourOwnClass();
There may be some roadblocks on the way to getting this to work (e.g. the class may be registered somewhere subtly different that I just said it would be - it's been a few years since I tinkered with HTML Purifier on this level!), but this should set you on a good path to getting your hands dirty on HTML Purifier code. :)

Related

How can my template include an element whose type is determined by an expression in angularjs?

It's 2022 and sadly I'm learning AngularJS (already past end of life!)
I need need to use what might be called a dynamic element/component. Pseudocode example:
In controller:
this.theElementName = 'b';
In the template:
<{{$ctrl.theElementName}}>this is some text</{{$ctrl.theElementName}}>
I would want this to create <b>this is some text</b>.
The reason is that I want to generate an array of different directives to render, and I don't want code like:
<b ng-if="$ctrl.theElementName === 'b'">this is some text</b>
<div ng-if="$ctrl.theElementName === 'div'">this is some text</div>
<directive-abc ng-if="$ctrl.theElementName === 'directive-abc'">this is some text</directive-abc>
...
In Svelte, it's
<svelte:element this={theElementName} />
In Vue it's
<div :is="theElementName" />
EDIT: in response to the reluctant 'that person', clarifying the use-case
Consider a user-configurable UI. The result of the configuration might be an array list of components desired. I would then need to loop and output those different components in my template. Of course the components would need a standard interface for properties passesd in, events emitted etc. but that can all be designed for.
My code could do a big switch statement, but that requires prior knowledge of every possible component that might be used now or in the future. By doing it the way I intend to, however, a future person could add a component without needing to touch this code.
You can write directive my-directive to use:
<div my-directive="$ctrl.theElementName">...
to generate:
<div><component-a>...
<div><component-b>...
<div><component-c>...
All directive should do is to generate html string and compile it:
element.append($compile('<' + scope.myDirective + '>...')(scope))
(also remember to update content in onChanges if you want to support it)
Directive may also copy certain/all attributes from original element etc.
P.S. you should be cautious e.g. if component name comes from database that may allow injections.
Not a brilliant solution, but documenting what is more of a workaround.
ng-include can be used to source another template file. That file can contain the component you need to include.
<ng-include src="'/path/to/' + theElementName + '.html'"></ng-include>

Getting wordpress posts with react shows special chars instead of apostrophe

I am getting what I am assuming is json data from a wordpress blog endpoint like so:
https://example.com/wp-json/wp/v2/posts
I am looping through and showing the tiles for now:
<div>{posts && posts.map((post) => <h1>{post.title.rendered}</h1>)}</div>
But the post titles are not displaying properly. For example the word Don't shows Don’t
I have discovered that I can use dangerouslySetInnerHTML to fix this issue but is it safe? The fact that it has the word 'dangerously' in it is worrying.
I believe dangerouslySetInnerHTML is the way to go about this - but I will go into more detail as to why "dangerously" is in "dangerouslySetInnerHTML" and hopefully that will help you make an informed decision for your situation.
What dangerouslySetInnerHTML does is render any HTML string given to it within the DOM element.
For example:
<h1 dangerouslySetInnerHTML={{__html: post.title.rendered}} />
(as an aside, note the __html key has two underscores)
Will properly render the string Don’t to Don't.
This is all pretty harmless, however, if, for example, the value of post.title.rendered could be set by an untrusted party (such as an arbitrary user), and if this arbitrary user wanted to do some damage, they could enter a string such as:
<script type="text/javascript>
// Do evil stuff
console.log('I did some evil stuff');
</script>
This code would then be executed by the browser when the page loads - because React would have generated the following DOM:
<h1>
<script type="text/javascript>
// Do evil stuff
console.log('I did some evil stuff');
</script>
</h1>
So with all that in mind, if you are sure that the value of this field is within your control (and not anyone else's) and you also know that there will not be any arbitrary code in these strings, then go ahead and use dangerouslySetInnerHTML.
However, if there is the possibility that someone besides yourself could manipulate this field, I would instead look to something like decode-html-entities - this way you can have the presentation you want, without compromising your app/users.

Problem with output from the database yii2

I have in the database the path to the files that i want to get outputed.
like:
<audio src="/yii2-biblioteca/frontend/web/uploads/audio/lya1.mp3" controls type="audio/mpeg">
and i am using:
<?=HtmlPurifier::process($model->audio)?>
for the output.
I used the same thing for images and it's ok, it works, but for the audio and for the pdf embed not so much.
At the beginning the pdf worked, i changed some things with a js funtion, it was not suppos to have a negative impact. I reversed all back to when it was good, but it's not working now.
the pdf exemple: <embed src="/yii2-biblioteca/frontend/web/uploads/pdf/dying.pdf" type="application/pdf" width="100%" height="100%" />
Yii2's HTMLPurifier wrapper takes a second argument:
echo HtmlPurifier::process($html, [
// options go here
]);
For <embed>, you should be able to use the HTML.SafeEmbed setting:
echo HtmlPurifier::process($html, [
'HTML.SafeEmbed' => true,
]);
Unfortunately, for <audio>, the underlying problem here is that HTML Purifier isn't HTML5-aware, which is going to make adding that a lot more complicated.
There are user-supplied patches to allow HTML Purifier to understand HTML5, but as far as I know, none has been audited and so it's hard to say what this will do to the security of your site. (Arguably, HTML Purifier with userland supplied HTML5 definitions is still better than no HTML Purifier at all, though.)
I've given some rough instructions about how to make HTML Purifier (the library itself, not its Yii2 wrapper) aware of only the <audio> tag over on another question. Quoting the relevant pieces:
You'll have to look at the "Customize!" end-user documentation, where it will tell you how to add tags and attributes that HTML Purifier is not aware of.
To quote the most vivid code example from the linked documentation
(this code teaches HTML Purifier about the <form> tag):
Time for some code:
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
$config->set('HTML.DefinitionRev', 1);
$config->set('Cache.DefinitionImpl', null); // remove this later!
$def = $config->getHTMLDefinition(true);
[...]
$form = $def->addElement(
'form', // name
'Block', // content set
'Flow', // allowed children
'Common', // attribute collection
array( // attributes
'action*' => 'URI',
'method' => 'Enum#get|post',
'name' => 'ID'
)
);
$form->excludes = array('form' => true);
Each of the parameters corresponds to one of the questions we asked. Notice that we added an asterisk to the end of the action attribute to
indicate that it is required. If someone specifies a form without that
attribute, the tag will be axed. Also, the extra line at the end is a
special extra declaration that prevents forms from being nested within
each other.
Once you've followed those instructions to make your purifying routine
aware of <audio>, adding the tag <audio> to your configuration
whitelist will work.
So, in brief, if you want to be able to purify just <audio> tags without losing them altogether, you're going to have to do some research on the tags' capability and add the information to HTML Purifier.
You could base your code on what you can find in xemlock/htmlpurifier-html5's HTML5Definition.php file if you don't want to work on it from scratch.

How to change valid HTML tags that get rendered in ng-bind-html?

I have a text editor (textAngular) that I've modified to limit the number of valid HTML tags I can generate using that tool. Now, I want to only support a limited number of HTML elements (h3, h4, h5, h6, ol, ul) to produce a news story but I want to disable some of the valid HTML rendered by ng-bind-html. Namely, I want to remove , tags as a valid tags because they could have disastrous results for this user generated content.
Is it possible to remove and tags as something rendered by ng-bind-html?
Unfortunately no, it isn't possible to config the valid HTML tags.
The ng-bind-html use the $sanitize service to strip invalid tags/attributes, and you can see in the source code that all the configurations are private.
// Safe Block Elements - HTML5
var blockElements = angular.extend({}, optionalEndTagBlockElements, makeMap("address,article," +
"aside,blockquote,caption,center,del,dir,div,dl,figure,figcaption,footer,h1,h2,h3,h4,h5," +
"h6,header,hgroup,hr,ins,map,menu,nav,ol,pre,script,section,table,ul"));
// Inline Elements - HTML5
var inlineElements = angular.extend({}, optionalEndTagInlineElements, makeMap("a,abbr,acronym,b," +
"bdi,bdo,big,br,cite,code,del,dfn,em,font,i,img,ins,kbd,label,map,mark,q,ruby,rp,rt,s," +
"samp,small,span,strike,strong,sub,sup,time,tt,u,var"));
If you really want it, one way you could do is to copy the angular-sanitize.js and modify the valid HTML tags configuration directly.
Please note that if you do it that way, all the ng-bind-html in your entire application will be also affected. If that is undesired, you have to write your own custom directive and inject/use your modified version of $sanitize instead.
If you're into modifying textAngular already, you could modify something around the taCustomRenderers Section of the code and use ta-bind instead of ng-bind-html. They do nearly the same thing except ta-bind runs all the extra renderers.
Custom Renderers Code: textAngularSetup, textAngular - probably in this one you can do your stripping out of unwanted code.

Stop AngularJS inserting <span class="ng-scope"></span> using ng-include

I'm using the Foundation layout framework, which automatically floats the last sibling of .column to the right and I really appreciate this is a behaviour. However, AngularJS takes it upon itself to insert span.ng-scope after every div.column, which somehow causes browsers to consider the last span the last sibling of .column (even though it is not).
Specifically the css in Foundation responsible for this is:
[class*="column"] + [class*="column"]:last-child { float: right; }
As I understand it, [attribute*="substring"] should select only siblings that match, so, for the above, only elements whose class attribute contains column (including columns). I would think a span tag whose class attribute that does not contain column should not match (and thus be ignored by :last-child). However, this does not seem to be the case.
Regardless, the span is causing the problem:
Angular buggering it up (jsfiddle)
Works fine without Angular (same jsfiddle, no ng-include)
Is there a way to configure angular to stop inserting those span tags? I would, begrudgingly, modify the css selector to somehow ignore all span tags; however I might eventually need/want to use a span tag.
Since you indicated the div can be moved inside, this works:
<ng-include src="'main.tmpl'"></ng-include>
Then in your template:
<div class="row">
<article id="sidepanels" class="four columns">
...
</div>
I'm not aware of any way to prevent angular from inserting the span tags (I think it keeps track of scopes that way -- for garbage collection).
Also you can try my version of include directive that does not creates a scope: Gist source.
As no scopes are created, AngularJS should not create additional element to mainain scope (it actually use data attributes to store link to scope).

Resources