Skip to content

Out-of-bounds Read in BehaviorTree.CPP Script Validation #923

Open
@cktii

Description

@cktii

When parsing invalid scripts that generate large error messages, the ValidateScript function may attempt to create a string from a non-NULL-terminated buffer, causing strlen to read beyond the buffer's bounds. This occurs because a fixed-size stack buffer is used to store error messages without ensuring proper NULL termination.

Result ValidateScript(const std::string& script)

Found in commit: 48f6c5b

Bug Class

Out-of-bounds Read / Buffer Over-read (CWE-125)

Root Cause

In src/script_parser.cpp, the ValidateScript function allocates a fixed-size buffer of 2048 bytes for error messages:

Result ValidateScript(const std::string& script)
{
  char error_msgs_buffer[2048];

  auto input = lexy::string_input<lexy::utf8_encoding>(script);
  auto result =
      lexy::parse<BT::Grammar::stmt>(input, ErrorReport().to(error_msgs_buffer));
  if(result.has_value() && result.error_count() == 0)
  {
    try
    {
      std::vector<BT::Ast::ExprBase::Ptr> exprs = LEXY_MOV(result).value();
      if(exprs.empty())
      {
        return nonstd::make_unexpected("Empty Script");
      }
      // valid script
      return {};
    }
    catch(std::runtime_error& err)
    {
      return nonstd::make_unexpected(err.what());
    }
  }
  return nonstd::make_unexpected(error_msgs_buffer);
}

}  // namespace BT

This buffer is passed to lexy's error reporter, which fills it with error messages. When parsing fails, the buffer content is used to construct a string via nonstd::make_unexpected.
The string construction internally calls strlen to determine the buffer length. However, there's no guarantee that:

  • The buffer is NULL-terminated
  • The error messages fit within the buffer
  • The last byte is reserved for a NULL terminator

This leads to strlen reading past the buffer bounds looking for a NULL terminator that isn't there.

Stack trace

==2543337==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffff5a09820 at pc 0x5555555ae3af bp 0x7fffffffcd10 sp 0x7fffffffc4d8
READ of size 2049 at 0x7ffff5a09820 thread T0
    #0 0x5555555ae3ae in strlen (/home/user/BehaviorTree.CPP/build3/script_fuzzer+0x5a3ae) (BuildId: 390cd92534c04116bddfa049abd4832d7b583113)
    #1 0x555555692d1c in std::char_traits<char>::length(char const*) /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/char_traits.h:399:9
    #2 0x555555679d3c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>::basic_string<std::allocator<char>>(char const*, std::allocator<char> const&) /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/basic_string.h:648:30
    #3 0x5555558ff01b in nonstd::expected_lite::expected<std::monostate, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>::expected<char*, 0>(nonstd::expected_lite::unexpected_type<char*>&&) /home/user/BehaviorTree.CPP/include/behaviortree_cpp/contrib/expected.hpp:1979:36
    #4 0x5555558fe109 in BT::ValidateScript(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) /home/user/BehaviorTree.CPP/src/script_parser.cpp:94:10
[...]

Reproducer

#include "behaviortree_cpp/scripting/script_parser.hpp"
#include <iostream>

int main()
{
  std::string script = "+6e66>6666.6+66\r6>6;6e62=6+6e66>66666'; en';o';o'; en'; "
                       "\177n\177r;6.6+66.H>6+6\2006\036;@e66";

  auto result = BT::ValidateScript(script);

  if(!result)
  {
    std::cout << "Script validation failed as expected" << std::endl;
  }

  return 0;
}

NOTE: This requires that libbheaviortree.so is compiled with -fsanitize=address.

Proposed fix

I'm by no means an expert for lexy, but given that lexy's parse function accepts a template callback for error handling:

/// Parses the production into a value, invoking the callback on error.
template <typename Production, typename Input, typename ErrorCallback>
constexpr auto parse(const Input& input, const ErrorCallback& callback)
{
    return parse_action<void, Input, ErrorCallback>(callback)(Production{}, input);
}

A recommended fix seems to implement a safe error callback that manages its own memory instead of a fix-sized stack-based buffer, something similar to:

class SafeErrorReport {
    std::string error_buffer;
    std::size_t count = 0;

    struct sink {
        std::string& buffer;
        std::size_t& count;

        template <typename Input, typename Reader, typename Tag>
        void operator()(const lexy::error_context<Input>& context,
                       const lexy::error<Reader, Tag>& error) {
            buffer += "error: while parsing ";
            buffer += context.production();
            buffer += "\n";
            count++;
        }

        std::size_t finish() && {
            return count;
        }
    };

public:
    auto sink() { return sink{error_buffer, count}; }
    const std::string& get_errors() const { return error_buffer; }
};

This is based on my understanding of lexy's error_report.hpp. That would turn ValidateScript into something like this:

Result ValidateScript(const std::string& script)
{
    auto input = lexy::string_input<lexy::utf8_encoding>(script);
    SafeErrorReport error_report;  // Replace char buffer with our safe handler
    
    auto result = lexy::parse<BT::Grammar::stmt>(input, error_report);
    
    if(!result.has_value() || result.error_count() != 0)
    {
        return nonstd::make_unexpected(error_report.get_errors());
    }

    try
    {
        std::vector<BT::Ast::ExprBase::Ptr> exprs = LEXY_MOV(result).value();
        if(exprs.empty())
        {
            return nonstd::make_unexpected("Empty Script");
        }
        // valid script
        return {};
    }
    catch(std::runtime_error& err)
    {
        return nonstd::make_unexpected(err.what());
    }
}

NOTE: Not 100% sure on this fix, just loosely based on my quick understand of lexy's code..

Impact

The vulnerability could allow an attacker to cause out-of-bounds memory reads by providing specially crafted invalid scripts. While this doesn't lead to direct code execution, it could potentially:

  • Lead to information disclosure if sensitive data is stored adjacent to the buffer
  • Cause program crashes (DoS)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions