Skip to content

Refactor sanitize_token for better subclassing #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

mfa
Copy link
Contributor

@mfa mfa commented Apr 24, 2013

For my application I need to clean all unallowed tokens.
At the moment the only way is to copy the whole sanitize_token method, remove a few lines and add one.
Splitting up the sanitize_token method would simplify this a lot and reduces code duplication (in my code).

Example usage:

import html5lib
from html5lib import treebuilders, treewalkers, serializer, sanitizer

class MySanitizer(sanitizer.HTMLSanitizer):
    # reduce tokens to only a few
    acceptable_elements = ['b', 'br', 'center', 'em', 'h3', 'h4', 'h5', 'h6', 
                           'i', 'li', 'ol', 'p', 'span', 'strike', 'strong', 
                           'tt', 'ul']

    allowed_elements = acceptable_elements

    def unallowed_token(self, token):
        # remove all unallowed tokens
        return ""

def sanitize(input):
    p = html5lib.HTMLParser(tokenizer=MySanitizer, tree=treebuilders.getTreeBuilder("dom"))
    dom_tree = p.parseFragment(input)
    walker = treewalkers.getTreeWalker("dom")
    stream = walker(dom_tree)

    s = serializer.htmlserializer.HTMLSerializer(omit_optional_tags=False,
                                                 quote_attr_values=True)
    return u"".join(s.serialize(stream))

@gsnedders
Copy link
Member

Well, Travis CI shows you having made all the sanitizer tests fail. That ought to be fixed. Otherwise, you should use the standard English "disallow" and not "unallow". That said, in principle looks good.

@gsnedders
Copy link
Member

Merged in 52f9ca6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants