Description
I was trying to strip tags, and anything that could be used to build an HTML tags from data entered in a Rails application, and had a few tests related to this... It seems that changes in Nokogiri 1.13.5 have modified the Rails::Html::FullSanitizer behavior (probably a change in libxml2 to 1.9.13 - see Loofah issue below).
The difference seems to be whether some blank characters are used inside <>, close to the brackets. This changes the behavior, from removing the inside of <...> to escape the '<' and '>' characters, keeping what is inside.
Before Nokogiri 1.13.5 (this is 1.13.4):
s1 = 'Hello <world!>'
ActionView::Base.full_sanitizer.sanitize(s1)
# => "Hello "
s2 = 'Hello <... world!>'
ActionView::Base.full_sanitizer.sanitize(s2)
# => "Hello "
s3 = 'Kitty is <-NOT-> bad!'
ActionView::Base.full_sanitizer.sanitize(s3)
# => "Kitty is bad!"
Nokogiri 1.13.5:
s1 = 'Hello <world!>'
ActionView::Base.full_sanitizer.sanitize(s1)
# => "Hello "
s2 = 'Hello <... world!>'
ActionView::Base.full_sanitizer.sanitize(s2)
# => "Hello <... world!>"
s3 = 'Kitty is <-NOT-> bad!'
ActionView::Base.full_sanitizer.sanitize(s3)
# => "Kitty is <-NOT-> bad!"
This issue in Loofah is probably the same: flavorjones/loofah#230 (closed).
I am not sure if this is a problem for some folks. In our case we wanted to remove any HTML that later could be used to carefully build a XSS problem, so it is not a big deal, but surprising.