a an@sdZddlmZddlZddlZddlZzddlmZddlm Z Wn"e yfddl mZm Z Yn0ddl m Z ddlmZdd lmZmZdd lmZmZzeWneyeZYn0zeWneyeZYn0zeWneyeefZYn0gd Zed ejejBjZ ed ejjZ!ejdgej"ddkrTej#fndRj$Z%edejj&Z'edejj&Z(edejj$Z)ddZ*edjZ+edejejBZ,e -dZ.e j-ddeidZ/Gddde0Z1e1Z2e2j3Z3edejedejgZ4gd Z5ed!ejed"ejed#gZ6d$gZ7e4e5e6e7fd%d&Z8d'd(Z9d)d*Z:e8je:_gd+Z;d,gZd3d4Z?ed5ejZ@d6d7ZAdS)8zcA cleanup tool for HTML. Removes unwanted tags and content. See the `Cleaner` class for details. )absolute_importN)urlsplit) unquote_plus)rr)etree)defs) fromstringXHTML_NAMESPACE) xhtml_to_html_transform_result) clean_htmlcleanCleanerautolink autolink_html word_breakword_break_htmlzexpression\s*\(.*?\)z @\s*importzzdescendant-or-self::*[@style]zdescendant-or-self::a [normalize-space(@href) and substring(normalize-space(@href),1,1) != '#'] |descendant-or-self::x:a[normalize-space(@href) and substring(normalize-space(@href),1,1) != '#']x)Z namespacesc @seZdZdZdZdZdZdZdZdZ dZ dZ dZ dZ dZdZdZdZdZdZdZdZejZdZdZddhZdd Zed d d d gd d d d dZddZddZddZ ddZ!ddZ"d"ddZ#ddZ$e%&de%j'j(Z)ddZ*d d!Z+dS)#r a Instances cleans the document of each of the possible offending elements. The cleaning is controlled by attributes; you can override attributes in a subclass, or set them in the constructor. ``scripts``: Removes any ``