Configuration Documentation
Table of Contents
- Attr
- AutoFormat
- CSS
- Cache
- Core
- AggressivelyFixLt
- AllowHostnameUnderscore
- CollectErrors
- ColorKeywords
- ConvertDocumentToFragment
- DirectLexLineNumberSyncInterval
- DisableExcludes
- EnableIDNA
- Encoding
- EscapeInvalidChildren
- EscapeInvalidTags
- EscapeNonASCIICharacters
- HiddenElements
- Language
- LexerImpl
- MaintainLineNumbers
- NormalizeNewlines
- RemoveInvalidImg
- RemoveProcessingInstructions
- Filter
- HTML
- Allowed
- AllowedAttributes
- AllowedComments
- AllowedCommentsRegexp
- AllowedElements
- AllowedModules
- Attr.Name.UseCDATA
- BlockWrapper
- CoreModules
- CustomDoctype
- DefinitionID
- DefinitionRev
- Doctype
- FlashAllowFullScreen
- ForbiddenAttributes
- ForbiddenElements
- MaxImgLength
- Nofollow
- Parent
- Proprietary
- SafeEmbed
- SafeIframe
- SafeObject
- SafeScripting
- TargetBlank
- TidyAdd
- TidyLevel
- TidyRemove
- Trusted
- Output
- Test
- URI
Types
string
: String
istring
: Case-insensitive string
text
: Text
itext
: Case-insensitive text
int
: Integer
(int)
.
float
: Float
is_numeric()
),
which will be cast to a float using (float)
.
bool
: Boolean
0
or
1
(other integers are not permitted) or a string
"on"
, "true"
or "1"
for
true
, and "off"
, "false"
or
"0"
for false
.
lookup
: Lookup array
true
, e.g. array('key'
=> true, 'key2' => true)
. You are alternatively permitted
to pass an array list of the keys array('key', 'key2')
or a comma-separated string of keys "key, key2"
. If
you pass an array list of values, ensure that your values are
strictly numerically indexed: array('key1', 2 =>
'key2')
will not do what you expect and emits a warning.
list
: Array list
array('val1', 'val2')
. You are alternatively permitted
to pass a comma-separated string of keys "val1, val2"
.
If your array is not in this form, array_values
is run
on the array and a warning is emitted.
hash
: Associative array
array('key1' => 'val1', 'key2' => 'val2')
. You are
alternatively permitted to pass a comma-separated string of
key-colon-value strings, e.g. "key1: val1, key2: val2"
.
mixed
: Mixed
Attr
Attr.AllowedClasses
Version added | 4.0.0 |
---|---|
Type | Lookup array (or null) |
Default | NULL |
Used in |
|
Attr.AllowedFrameTargets
Type | Lookup array |
---|---|
Default | array() |
Used in |
|
Attr.AllowedRel
Version added | 1.6.0 |
---|---|
Type | Lookup array |
Default | array() |
Attr.AllowedRev
Version added | 1.6.0 |
---|---|
Type | Lookup array |
Default | array() |
Attr.ClassUseCDATA
Version added | 4.0.0 |
---|---|
Type | Boolean (or null) |
Default | NULL |
Attr.DefaultImageAlt
Version added | 3.2.0 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
Attr.DefaultInvalidImage
Type | String |
---|---|
Default | '' |
Used in |
|
Attr.DefaultInvalidImageAlt
Type | String |
---|---|
Default | 'Invalid image' |
Used in |
|
Attr.DefaultTextDir
Type | String |
---|---|
Allowed values | "ltr", "rtl" |
Default | 'ltr' |
Used in |
|
Attr.EnableID
Version added | 1.2.0 |
---|---|
Type | Boolean |
Default | false |
Aliases | HTML.EnableAttrID |
Used in |
|
Attr.ForbiddenClasses
Version added | 4.0.0 |
---|---|
Type | Lookup array |
Default | array() |
Used in |
|
Attr.IDBlacklist
Type | Array list |
---|---|
Default | array() |
Used in |
|
Attr.IDBlacklistRegexp
Version added | 1.6.0 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
Attr.IDPrefix
Version added | 1.2.0 |
---|---|
Type | String |
Default | '' |
Used in |
|
Attr.IDPrefixLocal
Version added | 1.2.0 |
---|---|
Type | String |
Default | '' |
Used in |
|
AutoFormat
AutoFormat.AutoParagraph
Version added | 2.0.1 |
---|---|
Type | Boolean |
Default | false |
This directive turns on auto-paragraphing, where double newlines are converted in to paragraphs whenever possible. Auto-paragraphing:
- Always applies to inline elements or text in the root node,
- Applies to inline elements or text with double newlines in nodes that allow paragraph tags,
- Applies to double newlines in paragraph tags
p
tags must be allowed for this directive to take effect.
We do not use br
tags for paragraphing, as that is
semantically incorrect.
To prevent auto-paragraphing as a content-producer, refrain from using
double-newlines except to specify a new paragraph or in contexts where
it has special meaning (whitespace usually has no meaning except in
tags like pre
, so this should not be difficult.) To prevent
the paragraphing of inline text adjacent to block elements, wrap them
in div
tags (the behavior is slightly different outside of
the root node.)
AutoFormat.Custom
Version added | 2.0.1 |
---|---|
Type | Array list |
Default | array() |
This directive can be used to add custom auto-format injectors. Specify an array of injector names (class name minus the prefix) or concrete implementations. Injector class must exist.
AutoFormat.DisplayLinkURI
Version added | 3.2.0 |
---|---|
Type | Boolean |
Default | false |
This directive turns on the in-text display of URIs in <a> tags, and disables those links. For example, example becomes example (http://example.com).
AutoFormat.Linkify
Version added | 2.0.1 |
---|---|
Type | Boolean |
Default | false |
This directive turns on linkification, auto-linking http, ftp and
https URLs. a
tags with the href
attribute
must be allowed.
AutoFormat.PurifierLinkify.DocURL
Version added | 2.0.1 |
---|---|
Type | String |
Default | '#%s' |
Aliases | AutoFormatParam.PurifierLinkifyDocURL |
Used in |
|
Location of configuration documentation to link to, let %s substitute into the configuration's namespace and directive names sans the percent sign.
AutoFormat.PurifierLinkify
Version added | 2.0.1 |
---|---|
Type | Boolean |
Default | false |
Internal auto-formatter that converts configuration directives in
syntax %Namespace.Directive to links. a
tags
with the href
attribute must be allowed.
AutoFormat.RemoveEmpty.RemoveNbsp.Exceptions
Version added | 4.0.0 |
---|---|
Type | Lookup array |
Default | array ( 'td' => true, 'th' => true, ) |
Used in |
|
When %AutoFormat.RemoveEmpty and %AutoFormat.RemoveEmpty.RemoveNbsp are enabled, this directive defines what HTML elements should not be removede if they have only a non-breaking space in them.
AutoFormat.RemoveEmpty.RemoveNbsp
Version added | 4.0.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
When enabled, HTML Purifier will treat any elements that contain only non-breaking spaces as well as regular whitespace as empty, and remove them when %AutoForamt.RemoveEmpty is enabled.
See %AutoFormat.RemoveEmpty.RemoveNbsp.Exceptions for a list of elements that don't have this behavior applied to them.
AutoFormat.RemoveEmpty
Version added | 3.2.0 |
---|---|
Type | Boolean |
Default | false |
When enabled, HTML Purifier will attempt to remove empty elements that contribute no semantic information to the document. The following types of nodes will be removed:
-
Tags with no attributes and no content, and that are not empty
elements (remove
<a></a>
but not<br />
), and -
Tags with no content, except for:
- The
colgroup
element, or -
Elements with the
id
orname
attribute, when those attributes are permitted on those elements.
- The
Please be very careful when using this functionality; while it may not seem that empty elements contain useful information, they can alter the layout of a document given appropriate styling. This directive is most useful when you are processing machine-generated HTML, please avoid using it on regular user HTML.
Elements that contain only whitespace will be treated as empty. Non-breaking spaces, however, do not count as whitespace. See %AutoFormat.RemoveEmpty.RemoveNbsp for alternate behavior.
This algorithm is not perfect; you may still notice some empty tags, particularly if a node had elements, but those elements were later removed because they were not permitted in that context, or tags that, after being auto-closed by another tag, where empty. This is for safety reasons to prevent clever code from breaking validation. The general rule of thumb: if a tag looked empty on the way in, it will get removed; if HTML Purifier made it empty, it will stay.
AutoFormat.RemoveSpansWithoutAttributes
Version added | 4.0.1 |
---|---|
Type | Boolean |
Default | false |
This directive causes span
tags without any attributes
to be removed. It will also remove spans that had all attributes
removed during processing.
CSS
CSS.AllowImportant
Version added | 3.1.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
CSS.AllowTricky
Version added | 3.1.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
display:none;
is considered a tricky property that
will only be allowed if this directive is set to true.
CSS.AllowedFonts
Version added | 4.3.0 |
---|---|
Type | Lookup array (or null) |
Default | NULL |
Used in |
|
Allows you to manually specify a set of allowed fonts. If
NULL
, all fonts are allowed. This directive
affects generic names (serif, sans-serif, monospace, cursive,
fantasy) as well as specific font families.
CSS.AllowedProperties
Version added | 3.1.0 |
---|---|
Type | Lookup array (or null) |
Default | NULL |
Used in |
|
If HTML Purifier's style attributes set is unsatisfactory for your needs, you can overload it with your own list of tags to allow. Note that this method is subtractive: it does its job by taking away from HTML Purifier usual feature set, so you cannot add an attribute that HTML Purifier never supported in the first place.
Warning: If another directive conflicts with the elements here, that directive will win and override.
CSS.DefinitionRev
Version added | 2.0.0 |
---|---|
Type | Integer |
Default | 1 |
Revision identifier for your custom definition. See %HTML.DefinitionRev for details.
CSS.ForbiddenProperties
Version added | 4.2.0 |
---|---|
Type | Lookup array |
Default | array() |
Used in |
|
This is the logical inverse of %CSS.AllowedProperties, and it will override that directive or any other directive. If possible, %CSS.AllowedProperties is recommended over this directive, because it can sometimes be difficult to tell whether or not you've forbidden all of the CSS properties you truly would like to disallow.
CSS.MaxImgLength
Version added | 3.1.1 |
---|---|
Type | String (or null) |
Default | '1200px' |
Used in |
|
This parameter sets the maximum allowed length on img
tags,
effectively the width
and height
properties.
Only absolute units of measurement (in, pt, pc, mm, cm) and pixels (px) are allowed. This is
in place to prevent imagecrash attacks, disable with null at your own risk.
This directive is similar to %HTML.MaxImgLength, and both should be
concurrently edited, although there are
subtle differences in the input format (the CSS max is a number with
a unit).
CSS.Proprietary
Version added | 3.0.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Whether or not to allow safe, proprietary CSS values.
CSS.Trusted
Version added | 4.2.1 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Cache
Cache.DefinitionImpl
Version added | 2.0.0 |
---|---|
Type | String (or null) |
Default | 'Serializer' |
Aliases | Core.DefinitionCache |
Used in |
|
Cache.SerializerPath
Version added | 2.0.0 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
Absolute path with no trailing slash to store serialized definitions in. Default is within the HTML Purifier library inside DefinitionCache/Serializer. This path must be writable by the webserver.
Cache.SerializerPermissions
Version added | 4.3.0 |
---|---|
Type | Integer |
Default | 493 |
Used in |
|
Directory permissions of the files and directories created inside the DefinitionCache/Serializer or other custom serializer path.
Core
Core.AggressivelyFixLt
Version added | 2.1.0 |
---|---|
Type | Boolean |
Default | true |
Used in |
|
This directive enables aggressive pre-filter fixes HTML Purifier can perform in order to ensure that open angled-brackets do not get killed during parsing stage. Enabling this will result in two preg_replace_callback calls and at least two preg_replace calls for every HTML document parsed; if your users make very well-formed HTML, you can set this directive false. This has no effect when DirectLex is used.
Notice: This directive's default turned from false to true in HTML Purifier 3.2.0.
Core.AllowHostnameUnderscore
Version added | 4.6.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
By RFC 1123, underscores are not permitted in host names. (This is in contrast to the specification for DNS, RFC 2181, which allows underscores.) However, most browsers do the right thing when faced with an underscore in the host name, and so some poorly written websites are written with the expectation this should work. Setting this parameter to true relaxes our allowed character check so that underscores are permitted.
Core.CollectErrors
Version added | 2.0.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Core.ColorKeywords
Version added | 2.0.0 |
---|---|
Type | Associative array |
Default | array ( 'maroon' => '#800000', 'red' => '#FF0000', 'orange' => '#FFA500', 'yellow' => '#FFFF00', 'olive' => '#808000', 'purple' => '#800080', 'fuchsia' => '#FF00FF', 'white' => '#FFFFFF', 'lime' => '#00FF00', 'green' => '#008000', 'navy' => '#000080', 'blue' => '#0000FF', 'aqua' => '#00FFFF', 'teal' => '#008080', 'black' => '#000000', 'silver' => '#C0C0C0', 'gray' => '#808080', ) |
Used in |
|
Core.ConvertDocumentToFragment
Type | Boolean |
---|---|
Default | true |
Aliases | Core.AcceptFullDocuments |
Used in |
|
Core.DirectLexLineNumberSyncInterval
Version added | 2.0.0 |
---|---|
Type | Integer |
Default | 0 |
Used in |
|
Specifies the number of tokens the DirectLex line number tracking implementations should process before attempting to resyncronize the current line count by manually counting all previous new-lines. When at 0, this functionality is disabled. Lower values will decrease performance, and this is only strictly necessary if the counting algorithm is buggy (in which case you should report it as a bug). This has no effect when %Core.MaintainLineNumbers is disabled or DirectLex is not being used.
Core.DisableExcludes
Version added | 4.5.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
This directive disables SGML-style exclusions, e.g. the exclusion of
<object>
in any descendant of a
<pre>
tag. Disabling excludes will allow some
invalid documents to pass through HTML Purifier, but HTML Purifier
will also be less likely to accidentally remove large documents during
processing.
Core.EnableIDNA
Version added | 4.4.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Core.Encoding
Type | Case-insensitive string |
---|---|
Default | 'utf-8' |
Used in |
|
Core.EscapeInvalidChildren
Type | Boolean |
---|---|
Default | false |
Warning: this configuration option is no longer does anything as of 4.6.0.
When true, a child is found that is not allowed in the context of the parent element will be transformed into text as if it were ASCII. When false, that element and all internal tags will be dropped, though text will be preserved. There is no option for dropping the element but preserving child nodes.
Core.EscapeInvalidTags
Type | Boolean |
---|---|
Default | false |
Used in |
|
Core.EscapeNonASCIICharacters
Version added | 1.4.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Core.HiddenElements
Type | Lookup array |
---|---|
Default | array ( 'script' => true, 'style' => true, ) |
Used in |
|
This directive is a lookup array of elements which should have their
contents removed when they are not allowed by the HTML definition.
For example, the contents of a script
tag are not
normally shown in a document, so if script tags are to be removed,
their contents should be removed to. This is opposed to a b
tag, which defines some presentational changes but does not hide its
contents.
Core.Language
Version added | 2.0.0 |
---|---|
Type | String |
Default | 'en' |
Used in |
|
Core.LexerImpl
Version added | 2.0.0 |
---|---|
Type | Mixed (or null) |
Default | NULL |
Used in |
|
This parameter determines what lexer implementation can be used. The valid values are:
- null
- Recommended, the lexer implementation will be auto-detected based on your PHP-version and configuration.
- string lexer identifier
- This is a slim way of manually overridding the implementation. Currently recognized values are: DOMLex (the default PHP5 implementation) and DirectLex (the default PHP4 implementation). Only use this if you know what you are doing: usually, the auto-detection will manage things for cases you aren't even aware of.
- object lexer instance
-
Super-advanced: you can specify your own, custom, implementation that
implements the interface defined by
HTMLPurifier_Lexer
. I may remove this option simply because I don't expect anyone to use it.
Core.MaintainLineNumbers
Version added | 2.0.0 |
---|---|
Type | Boolean (or null) |
Default | NULL |
Used in |
|
If true, HTML Purifier will add line number information to all tokens. This is useful when error reporting is turned on, but can result in significant performance degradation and should not be used when unnecessary. This directive must be used with the DirectLex lexer, as the DOMLex lexer does not (yet) support this functionality. If the value is null, an appropriate value will be selected based on other configuration.
Core.NormalizeNewlines
Version added | 4.2.0 |
---|---|
Type | Boolean |
Default | true |
Used in |
|
Whether or not to normalize newlines to the operating
system default. When false
, HTML Purifier
will attempt to preserve mixed newline files.
Core.RemoveInvalidImg
Version added | 1.3.0 |
---|---|
Type | Boolean |
Default | true |
Used in |
|
This directive enables pre-emptive URI checking in img
tags, as the attribute validation strategy is not authorized to
remove elements from the document. Revert to pre-1.3.0 behavior by setting to false.
Core.RemoveProcessingInstructions
Version added | 4.2.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
<? ...
?>
, remove it out-right. This may be useful if the HTML
you are validating contains XML processing instruction gunk, however,
it can also be user-unfriendly for people attempting to post PHP
snippets.
Core.RemoveScriptContents
Version added | 2.0.0 |
---|---|
Type | Boolean (or null) |
Default | NULL |
Used in |
|
This directive enables HTML Purifier to remove not only script tags but all of their contents.
Filter
Filter.Custom
Version added | 3.1.0 |
---|---|
Type | Array list |
Default | array() |
This directive can be used to add custom filters; it is nearly the
equivalent of the now deprecated HTMLPurifier->addFilter()
method. Specify an array of concrete implementations.
Filter.ExtractStyleBlocks.Escaping
Version added | 3.0.0 |
---|---|
Type | Boolean |
Default | true |
Aliases | Filter.ExtractStyleBlocksEscaping, FilterParam.ExtractStyleBlocksEscaping |
Used in |
|
Whether or not to escape the dangerous characters <, > and & as \3C, \3E and \26, respectively. This is can be safely set to false if the contents of StyleBlocks will be placed in an external stylesheet, where there is no risk of it being interpreted as HTML.
Filter.ExtractStyleBlocks.Scope
Version added | 3.0.0 |
---|---|
Type | String (or null) |
Default | NULL |
Aliases | Filter.ExtractStyleBlocksScope, FilterParam.ExtractStyleBlocksScope |
Used in |
|
If you would like users to be able to define external stylesheets, but
only allow them to specify CSS declarations for a specific node and
prevent them from fiddling with other elements, use this directive.
It accepts any valid CSS selector, and will prepend this to any
CSS declaration extracted from the document. For example, if this
directive is set to #user-content
and a user uses the
selector a:hover
, the final selector will be
#user-content a:hover
.
The comma shorthand may be used; consider the above example, with
#user-content, #user-content2
, the final selector will
be #user-content a:hover, #user-content2 a:hover
.
Warning: It is possible for users to bypass this measure using a naughty + selector. This is a bug in CSS Tidy 1.3, not HTML Purifier, and I am working to get it fixed. Until then, HTML Purifier performs a basic check to prevent this.
Filter.ExtractStyleBlocks.TidyImpl
Version added | 3.1.0 |
---|---|
Type | Mixed (or null) |
Default | NULL |
Aliases | FilterParam.ExtractStyleBlocksTidyImpl |
Used in |
|
If left NULL, HTML Purifier will attempt to instantiate a csstidy
class to use for internal cleaning. This will usually be good enough.
However, for trusted user input, you can set this to false
to
disable cleaning. In addition, you can supply your own concrete implementation
of Tidy's interface to use, although I don't know why you'd want to do that.
Filter.ExtractStyleBlocks
Version added | 3.1.0 |
---|---|
Type | Boolean |
Default | false |
External deps |
|
This directive turns on the style block extraction filter, which removes
style
blocks from input HTML, cleans them up with CSSTidy,
and places them in the StyleBlocks
context variable, for further
use by you, usually to be placed in an external stylesheet, or a
style
block in the head
of your document.
Sample usage:
<?php header('Content-type: text/html; charset=utf-8'); echo '<?xml version="1.0" encoding="UTF-8"?>'; ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <head> <title>Filter.ExtractStyleBlocks</title> <?php require_once '/path/to/library/HTMLPurifier.auto.php'; require_once '/path/to/csstidy.class.php'; $dirty = '<style>body {color:#F00;}</style> Some text'; $config = HTMLPurifier_Config::createDefault(); $config->set('Filter', 'ExtractStyleBlocks', true); $purifier = new HTMLPurifier($config); $html = $purifier->purify($dirty); // This implementation writes the stylesheets to the styles/ directory. // You can also echo the styles inside the document, but it's a bit // more difficult to make sure they get interpreted properly by // browsers; try the usual CSS armoring techniques. $styles = $purifier->context->get('StyleBlocks'); $dir = 'styles/'; if (!is_dir($dir)) mkdir($dir); $hash = sha1($_GET['html']); foreach ($styles as $i => $style) { file_put_contents($name = $dir . $hash . "_$i"); echo '<link rel="stylesheet" type="text/css" href="'.$name.'" />'; } ?> </head> <body> <div> <?php echo $html; ?> </div> </body> </html>
Warning: It is possible for a user to mount an imagecrash attack using this CSS. Counter-measures are difficult; it is not simply enough to limit the range of CSS lengths (using relative lengths with many nesting levels allows for large values to be attained without actually specifying them in the stylesheet), and the flexible nature of selectors makes it difficult to selectively disable lengths on image tags (HTML Purifier, however, does disable CSS width and height in inline styling). There are probably two effective counter measures: an explicit width and height set to auto in all images in your document (unlikely) or the disabling of width and height (somewhat reasonable). Whether or not these measures should be used is left to the reader.
Filter.YouTube
Version added | 3.1.0 |
---|---|
Type | Boolean |
Default | false |
Warning: Deprecated in favor of %HTML.SafeObject and %Output.FlashCompat (turn both on to allow YouTube videos and other Flash content).
This directive enables YouTube video embedding in HTML Purifier. Check this document on embedding videos for more information on what this filter does.
HTML
HTML.Allowed
Version added | 2.0.0 |
---|---|
Type | Case-insensitive text (or null) |
Default | NULL |
Used in |
|
This is a preferred convenience directive that combines
%HTML.AllowedElements and %HTML.AllowedAttributes.
Specify elements and attributes that are allowed using:
element1[attr1|attr2],element2...
. For example,
if you would like to only allow paragraphs and links, specify
a[href],p
. You can specify attributes that apply
to all elements using an asterisk, e.g. *[lang]
.
You can also use newlines instead of commas to separate elements.
Warning:
All of the constraints on the component directives are still enforced.
The syntax is a subset of TinyMCE's valid_elements
whitelist: directly copy-pasting it here will probably result in
broken whitelists. If %HTML.AllowedElements or %HTML.AllowedAttributes
are set, this directive has no effect.
HTML.AllowedAttributes
Version added | 1.3.0 |
---|---|
Type | Lookup array (or null) |
Default | NULL |
Used in |
|
If HTML Purifier's attribute set is unsatisfactory, overload it! The syntax is "tag.attr" or "*.attr" for the global attributes (style, id, class, dir, lang, xml:lang).
Warning: If another directive conflicts with the elements here, that directive will win and override. For example, %HTML.EnableAttrID will take precedence over *.id in this directive. You must set that directive to true before you can use IDs at all.
HTML.AllowedComments
Version added | 4.4.0 |
---|---|
Type | Lookup array |
Default | array() |
Used in |
|
HTML.AllowedCommentsRegexp
Version added | 4.4.0 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
^regex$
, otherwise you may accept
comments that you did not mean to! In particular, the regex /foo|bar/
is probably not sufficiently strict, since it also allows foobar
.
See also %HTML.AllowedComments (these directives are union'ed together,
so a comment is considered valid if any directive deems it valid.)
HTML.AllowedElements
Version added | 1.3.0 |
---|---|
Type | Lookup array (or null) |
Default | NULL |
Used in |
|
If HTML Purifier's tag set is unsatisfactory for your needs, you can overload it with your own list of tags to allow. If you change this, you probably also want to change %HTML.AllowedAttributes; see also %HTML.Allowed which lets you set allowed elements and attributes at the same time.
If you attempt to allow an element that HTML Purifier does not know about, HTML Purifier will raise an error. You will need to manually tell HTML Purifier about this element by using the advanced customization features.
Warning: If another directive conflicts with the elements here, that directive will win and override.
HTML.AllowedModules
Version added | 2.0.0 |
---|---|
Type | Lookup array (or null) |
Default | NULL |
Used in |
|
A doctype comes with a set of usual modules to use. Without having to mucking about with the doctypes, you can quickly activate or disable these modules by specifying which modules you wish to allow with this directive. This is most useful for unit testing specific modules, although end users may find it useful for their own ends.
If you specify a module that does not exist, the manager will silently fail to use it, so be careful! User-defined modules are not affected by this directive. Modules defined in %HTML.CoreModules are not affected by this directive.
HTML.Attr.Name.UseCDATA
Version added | 4.0.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
HTML.BlockWrapper
Version added | 1.3.0 |
---|---|
Type | String |
Default | 'p' |
Used in |
|
String name of element to wrap inline elements that are inside a block context. This only occurs in the children of blockquote in strict mode.
Example: by default value,
<blockquote>Foo</blockquote>
would become
<blockquote><p>Foo</p></blockquote>
.
The <p>
tags can be replaced with whatever you desire,
as long as it is a block level element.
HTML.CoreModules
Version added | 2.0.0 |
---|---|
Type | Lookup array |
Default | array ( 'Structure' => true, 'Text' => true, 'Hypertext' => true, 'List' => true, 'NonXMLCommonAttributes' => true, 'XMLCommonAttributes' => true, 'CommonAttributes' => true, ) |
Used in |
|
Certain modularized doctypes (XHTML, namely), have certain modules that must be included for the doctype to be an conforming document type: put those modules here. By default, XHTML's core modules are used. You can set this to a blank array to disable core module protection, but this is not recommended.
HTML.CustomDoctype
Version added | 2.0.1 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
HTML.DefinitionID
Version added | 2.0.0 |
---|---|
Type | String (or null) |
Default | NULL |
Unique identifier for a custom-built HTML definition. If you edit the raw version of the HTMLDefinition, introducing changes that the configuration object does not reflect, you must specify this variable. If you change your custom edits, you should change this directive, or clear your cache. Example:
$config = HTMLPurifier_Config::createDefault(); $config->set('HTML', 'DefinitionID', '1'); $def = $config->getHTMLDefinition(); $def->addAttribute('a', 'tabindex', 'Number');
In the above example, the configuration is still at the defaults, but using the advanced API, an extra attribute has been added. The configuration object normally has no way of knowing that this change has taken place, so it needs an extra directive: %HTML.DefinitionID. If someone else attempts to use the default configuration, these two pieces of code will not clobber each other in the cache, since one has an extra directive attached to it.
You must specify a value to this directive to use the advanced API features.
HTML.DefinitionRev
Version added | 2.0.0 |
---|---|
Type | Integer |
Default | 1 |
Revision identifier for your custom definition specified in %HTML.DefinitionID. This serves the same purpose: uniquely identifying your custom definition, but this one does so in a chronological context: revision 3 is more up-to-date then revision 2. Thus, when this gets incremented, the cache handling is smart enough to clean up any older revisions of your definition as well as flush the cache.
HTML.Doctype
Type | String (or null) |
---|---|
Allowed values | "HTML 4.01 Transitional", "HTML 4.01 Strict", "XHTML 1.0 Transitional", "XHTML 1.0 Strict", "XHTML 1.1" |
Default | NULL |
Used in |
|
HTML.FlashAllowFullScreen
Version added | 4.2.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Whether or not to permit embedded Flash content from
%HTML.SafeObject to expand to the full screen. Corresponds to
the allowFullScreen
parameter.
HTML.ForbiddenAttributes
Version added | 3.1.0 |
---|---|
Type | Lookup array |
Default | array() |
Used in |
|
While this directive is similar to %HTML.AllowedAttributes, for
forwards-compatibility with XML, this attribute has a different syntax. Instead of
tag.attr
, use tag@attr
. To disallow href
attributes in a
tags, set this directive to
a@href
. You can also disallow an attribute globally with
attr
or *@attr
(either syntax is fine; the latter
is provided for consistency with %HTML.AllowedAttributes).
Warning: This directive complements %HTML.ForbiddenElements, accordingly, check out that directive for a discussion of why you should think twice before using this directive.
HTML.ForbiddenElements
Version added | 3.1.0 |
---|---|
Type | Lookup array |
Default | array() |
Used in |
|
This was, perhaps, the most requested feature ever in HTML Purifier. Please don't abuse it! This is the logical inverse of %HTML.AllowedElements, and it will override that directive, or any other directive.
If possible, %HTML.Allowed is recommended over this directive, because it
can sometimes be difficult to tell whether or not you've forbidden all of
the behavior you would like to disallow. If you forbid img
with the expectation of preventing images on your site, you'll be in for
a nasty surprise when people start using the background-image
CSS property.
HTML.MaxImgLength
Version added | 3.1.1 |
---|---|
Type | Integer (or null) |
Default | 1200 |
Used in |
|
This directive controls the maximum number of pixels in the width and
height attributes in img
tags. This is
in place to prevent imagecrash attacks, disable with null at your own risk.
This directive is similar to %CSS.MaxImgLength, and both should be
concurrently edited, although there are
subtle differences in the input format (the HTML max is an integer).
HTML.Nofollow
Version added | 4.3.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
HTML.Parent
Version added | 1.3.0 |
---|---|
Type | String |
Default | 'div' |
Used in |
|
String name of element that HTML fragment passed to library will be inserted in. An interesting variation would be using span as the parent element, meaning that only inline tags would be allowed.
HTML.Proprietary
Version added | 3.1.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Whether or not to allow proprietary elements and attributes in your
documents, as per HTMLPurifier_HTMLModule_Proprietary
.
Warning: This can cause your documents to stop
validating!
HTML.SafeEmbed
Version added | 3.1.1 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Whether or not to permit embed tags in documents, with a number of extra security features added to prevent script execution. This is similar to what websites like MySpace do to embed tags. Embed is a proprietary element and will cause your website to stop validating; you should see if you can use %Output.FlashCompat with %HTML.SafeObject instead first.
HTML.SafeIframe
Version added | 4.4.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Whether or not to permit iframe tags in untrusted documents. This directive must be accompanied by a whitelist of permitted iframes, such as %URI.SafeIframeRegexp, otherwise it will fatally error. This directive has no effect on strict doctypes, as iframes are not valid.
HTML.SafeObject
Version added | 3.1.1 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
Whether or not to permit object tags in documents, with a number of extra security features added to prevent script execution. This is similar to what websites like MySpace do to object tags. You should also enable %Output.FlashCompat in order to generate Internet Explorer compatibility code for your object tags.
HTML.SafeScripting
Version added | 4.5.0 |
---|---|
Type | Lookup array |
Default | array() |
Used in |
|
Whether or not to permit script tags to external scripts in documents. Inline scripting is not allowed, and the script must match an explicit whitelist.
HTML.Strict
Version added | 1.3.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
HTML.TargetBlank
Version added | 4.4.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
target=blank
attributes are added to all outgoing links.
(This includes links from an HTTPS version of a page to an HTTP version.)
HTML.TidyAdd
Version added | 2.0.0 |
---|---|
Type | Lookup array |
Default | array() |
Used in |
|
HTML.TidyLevel
Version added | 2.0.0 |
---|---|
Type | String |
Allowed values | "none", "light", "medium", "heavy" |
Default | 'medium' |
Used in |
|
General level of cleanliness the Tidy module should enforce. There are four allowed values:
- none
- No extra tidying should be done
- light
- Only fix elements that would be discarded otherwise due to lack of support in doctype
- medium
- Enforce best practices
- heavy
- Transform all deprecated elements and attributes to standards compliant equivalents
HTML.TidyRemove
Version added | 2.0.0 |
---|---|
Type | Lookup array |
Default | array() |
Used in |
|
HTML.Trusted
Version added | 2.0.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
HTML.XHTML
Version added | 1.1.0 |
---|---|
Type | Boolean |
Default | true |
Aliases | Core.XHTML |
Used in |
|
Output
Output.CommentScriptContents
Version added | 2.0.0 |
---|---|
Type | Boolean |
Default | true |
Aliases | Core.CommentScriptContents |
Used in |
|
Output.FixInnerHTML
Version added | 4.3.0 |
---|---|
Type | Boolean |
Default | true |
Used in |
|
If true, HTML Purifier will protect against Internet Explorer's
mishandling of the innerHTML
attribute by appending
a space to any attribute that does not contain angled brackets, spaces
or quotes, but contains a backtick. This slightly changes the
semantics of any given attribute, so if this is unacceptable and
you do not use innerHTML
on any of your pages, you can
turn this directive off.
Output.FlashCompat
Version added | 4.1.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
If true, HTML Purifier will generate Internet Explorer compatibility code for all object code. This is highly recommended if you enable %HTML.SafeObject.
Output.Newline
Version added | 2.0.1 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
Newline string to format final output with. If left null, HTML Purifier will auto-detect the default newline type of the system and use that; you can manually override it here. Remember, \r\n is Windows, \r is Mac, and \n is Unix.
Output.SortAttr
Version added | 3.2.0 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
If true, HTML Purifier will sort attributes by name before writing them back
to the document, converting a tag like: <el b="" a="" c="" />
to <el a="" b="" c="" />
. This is a workaround for
a bug in FCKeditor which causes it to swap attributes order, adding noise
to text diffs. If you're not seeing this bug, chances are, you don't need
this directive.
Output.TidyFormat
Version added | 1.1.1 |
---|---|
Type | Boolean |
Default | false |
Aliases | Core.TidyFormat |
Used in |
|
Determines whether or not to run Tidy on the final output for pretty formatting reasons, such as indentation and wrap.
This can greatly improve readability for editors who are hand-editing the HTML, but is by no means necessary as HTML Purifier has already fixed all major errors the HTML may have had. Tidy is a non-default extension, and this directive will silently fail if Tidy is not available.
If you are looking to make the overall look of your page's source better, I recommend running Tidy on the entire page rather than just user-content (after all, the indentation relative to the containing blocks will be incorrect).
Test
Test.ForceNoIconv
Type | Boolean |
---|---|
Default | false |
Used in |
|
URI
URI.AllowedSchemes
Type | Lookup array |
---|---|
Default | array ( 'http' => true, 'https' => true, 'mailto' => true, 'ftp' => true, 'nntp' => true, 'news' => true, ) |
Used in |
|
data
and file
URI schemes, but they are not enabled by default.
URI.Base
Version added | 2.1.0 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
The base URI is the URI of the document this purified HTML will be inserted into. This information is important if HTML Purifier needs to calculate absolute URIs from relative URIs, such as when %URI.MakeAbsolute is on. You may use a non-absolute URI for this value, but behavior may vary (%URI.MakeAbsolute deals nicely with both absolute and relative paths, but forwards-compatibility is not guaranteed). Warning: If set, the scheme on this URI overrides the one specified by %URI.DefaultScheme.
URI.DefaultScheme
Type | String |
---|---|
Default | 'http' |
Used in |
|
Defines through what scheme the output will be served, in order to select the proper object validator when no scheme information is present.
URI.DefinitionID
Version added | 2.1.0 |
---|---|
Type | String (or null) |
Default | NULL |
Unique identifier for a custom-built URI definition. If you want to add custom URIFilters, you must specify this value.
URI.DefinitionRev
Version added | 2.1.0 |
---|---|
Type | Integer |
Default | 1 |
Revision identifier for your custom definition. See %HTML.DefinitionRev for details.
URI.Disable
Version added | 1.3.0 |
---|---|
Type | Boolean |
Default | false |
Aliases | Attr.DisableURI |
Used in |
|
Disables all URIs in all forms. Not sure why you'd want to do that (after all, the Internet's founded on the notion of a hyperlink).
URI.DisableExternal
Version added | 1.2.0 |
---|---|
Type | Boolean |
Default | false |
URI.DisableExternalResources
Version added | 1.3.0 |
---|---|
Type | Boolean |
Default | false |
URI.DisableResources
Version added | 4.2.0 |
---|---|
Type | Boolean |
Default | false |
Disables embedding resources, essentially meaning no pictures. You can still link to them though. See %URI.DisableExternalResources for why this might be a good idea.
Note: While this directive has been available since 1.3.0, it didn't actually start doing anything until 4.2.0.
URI.Host
Version added | 1.2.0 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
Defines the domain name of the server, so we can determine whether or an absolute URI is from your website or not. Not strictly necessary, as users should be using relative URIs to reference resources on your website. It will, however, let you use absolute URIs to link to subdomains of the domain you post here: i.e. example.com will allow sub.example.com. However, higher up domains will still be excluded: if you set %URI.Host to sub.example.com, example.com will be blocked. Note: This directive overrides %URI.Base because a given page may be on a sub-domain, but you wish HTML Purifier to be more relaxed and allow some of the parent domains too.
URI.HostBlacklist
Version added | 1.3.0 |
---|---|
Type | Array list |
Default | array() |
Used in |
|
URI.MakeAbsolute
Version added | 2.1.0 |
---|---|
Type | Boolean |
Default | false |
Converts all URIs into absolute forms. This is useful when the HTML being filtered assumes a specific base path, but will actually be viewed in a different context (and setting an alternate base URI is not possible). %URI.Base must be set for this directive to work.
URI.Munge
Version added | 1.3.0 |
---|---|
Type | String (or null) |
Default | NULL |
Munges all browsable (usually http, https and ftp)
absolute URIs into another URI, usually a URI redirection service.
This directive accepts a URI, formatted with a %s
where
the url-encoded original URI should be inserted (sample:
http://www.google.com/url?q=%s
).
Uses for this directive:
- Prevent PageRank leaks, while being fairly transparent to users (you may also want to add some client side JavaScript to override the text in the statusbar). Notice: Many security experts believe that this form of protection does not deter spam-bots.
- Redirect users to a splash page telling them they are leaving your website. While this is poor usability practice, it is often mandated in corporate environments.
Prior to HTML Purifier 3.1.1, this directive also enabled the munging
of browsable external resources, which could break things if your redirection
script was a splash page or used meta
tags. To revert to
previous behavior, please use %URI.MungeResources.
You may want to also use %URI.MungeSecretKey along with this directive in order to enforce what URIs your redirector script allows. Open redirector scripts can be a security risk and negatively affect the reputation of your domain name.
Starting with HTML Purifier 3.1.1, there is also these substitutions:
Key | Description | Example <a href=""> |
---|---|---|
%r | 1 - The URI embeds a resource (blank) - The URI is merely a link |
|
%n | The name of the tag this URI came from | a |
%m | The name of the attribute this URI came from | href |
%p | The name of the CSS property this URI came from, or blank if irrelevant |
Admittedly, these letters are somewhat arbitrary; the only stipulation was that they couldn't be a through f. r is for resource (I would have preferred e, but you take what you can get), n is for name, m was picked because it came after n (and I couldn't use a), p is for property.
URI.MungeResources
Version added | 3.1.1 |
---|---|
Type | Boolean |
Default | false |
Used in |
|
If true, any URI munging directives like %URI.Munge
will also apply to embedded resources, such as <img src="">
.
Be careful enabling this directive if you have a redirector script
that does not use the Location
HTTP header; all of your images
and other embedded resources will break.
Warning: It is strongly advised you use this in conjunction %URI.MungeSecretKey to mitigate the security risk of an open redirector.
URI.MungeSecretKey
Version added | 3.1.1 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
This directive enables secure checksum generation along with %URI.Munge. It should be set to a secure key that is not shared with anyone else. The checksum can be placed in the URI using %t. Use of this checksum affords an additional level of protection by allowing a redirector to check if a URI has passed through HTML Purifier with this line:
$checksum === hash_hmac("sha256", $url, $secret_key)
If the output is TRUE, the redirector script should accept the URI.
Please note that it would still be possible for an attacker to procure secure hashes en-mass by abusing your website's Preview feature or the like, but this service affords an additional level of protection that should be combined with website blacklisting.
Remember this has no effect if %URI.Munge is not on.
URI.OverrideAllowedSchemes
Type | Boolean |
---|---|
Default | true |
Used in |
|
URI.SafeIframeRegexp
Version added | 4.4.0 |
---|---|
Type | String (or null) |
Default | NULL |
Used in |
|
A PCRE regular expression that will be matched against an iframe URI. This is a relatively inflexible scheme, but works well enough for the most common use-case of iframes: embedded video. This directive only has an effect if %HTML.SafeIframe is enabled. Here are some example values:
%^http://www.youtube.com/embed/%
- Allow YouTube videos%^http://player.vimeo.com/video/%
- Allow Vimeo videos%^http://(www.youtube.com/embed/|player.vimeo.com/video/)%
- Allow both
Note that this directive does not give you enough granularity to, say, disable
all autoplay
videos. Pipe up on the HTML Purifier forums if this
is a capability you want.