23andMe’s Yamale Python code injection, and properly sanitizing eval()

YAML Security Vulnerability

Background

JFrog security research team (formerly Vdoo) has recently disclosed a code injection issue in Yamale, a popular schema validator for YAML that’s used by over 200 repositories. The issue has been assigned to CVE-2021-38305.

The injection issue

An attacker that can control the contents of the schema file that’s supplied to Yamale (-s/--schema command line parameter), can provide a seemingly valid schema file that will cause arbitrary Python code to run. Note that the schema file is one of the two mandatory parameters to Yamale (the other one being the YAML file to validate).

The issue lies in the parser.parse function:

safe_globals = ('True', 'False', 'None')
safe_builtins = dict((f, __builtins__[f]) for f in safe_globals)
...
def parse(validator_string, validators=None):
    validators = validators or val.DefaultValidators
    try:
        tree = ast.parse(validator_string, mode='eval')
        # evaluate with access to a limited global scope only
        return eval(compile(tree, '', 'eval'),
                    {'__builtins__': safe_builtins},
                    validators)
    ...

In our case, validator_string is the user’s input coming from the schema file.

We can see that arbitrary input is flowing to eval, which generally can be manipulated for code injection.

In this case, eval has been nerfed and the globals parameter has been blanked out, save for the True, False and None builtins. This means that an attacker cannot easily run malicious code, since code like __import__('os').system('evil_command') will fail because the __import__ builtin will not be available. As we explain below, however, a vulnerability still exists.

Bypassing eval protections

Does emptying the builtins prevent attackers from running arbitrary code?

The answer is NO, and actually even a completely empty builtins will not help.

The underlying issue is that through Python reflection, an attacker can “claw back” any needed builtin and run arbitrary code.

For example, the following string will run the Python HTTP server, even with an emptied builtins:

[x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['print']([x for x in (1).__class__.__base__.__subclasses__() if x.__name__ == 'catch_warnings'][0]()._module.__builtins__['__import__']('os').system('cd /; python3 -m http.server'))

There have been many writeups regarding this subject, but the short answer is – if you are passing completely unsanitized input to eval (regardless of builtins) then you are susceptible to arbitrary code injection.

Let’s see how it is still possible to use eval without opening ourselves to code injection.

Yamale’s fix and sanitizing eval()

  1. Yamale’s maintainers chose to sanitize the input string before it’s passed to eval via a whiltelist.
    If the eval’d string contains any substring that’s not on the whitelist, the operation fails.
    This is a perfectly acceptable solution, as long as the whitelist is restrictive enough.
    Note that we do not recommend using a blacklist, since attackers can usually find some combination of values that will evade the blacklist while still performing something malicious.
  2. If possible, we highly recommend using ast.literal_eval instead of eval.
    literal_eval can only handle simple expressions, but should be sufficient for a lot of simple use cases, without exposing the code to any vulnerabilities.

Can we exploit it remotely?

As mentioned, an attacker needs to be able to specify the contents of the schema file in order to inject Python code.

This can be exploited remotely, if some piece of vendor code allows an attacker to do that, for example:

subprocess.run(["yamale", "-s", remote_userinput, "/path/to/file_to_validate"])

However, this situation is a bit contrived and would probably not occur in production code in a remote/network context.

A more likely situation is the exploitation of such issues (vulnerabilities triggered through command line parameters) via a separate parameter injection issue.

Imagine the following vendor code –

def run_yamale_fixed_schema(path: str):
    # Check for malicious shell metacharacters
        if re.search(r"[;`$()|]*", path):
        # ATTEMPTED COMMAND INJECTION! 
        return
    
    # Run Yamale child process
    cmdstr = f"yamale -s safe_schema.yaml {path}"
    subprocess.run(cmdstr, shell=True)

Although the code above will successfully stop command injection attempts, the fact that the space and - characters are allowed, can allow the attacker to inject arbitrary flags. For example imagine the following path:

-s evil_schema.yaml /path/to/file_to_validate

The full cmdstr would be:

yamale -s safe_schema.yaml -s evil_schema.yaml /path/to/file_to_validate

Note that the multiply-defined -s parameter will behave differently according to the argument parser used (and the parsing options specified) but in Yamale’s case (argparse with default options) the latter option will be taken, which means the attacker can unexpectedly control the schema file and perform the code injection attack remotely.

Also note that parameter injection attacks can even be achieved in some cases when the space character is not allowed, by using various escape mechanisms

Conclusion and Acknowledgements

To conclude, we recommend using one of the above methods to sanitize eval, and if possible avoiding use of eval entirely by replacing it with a more specific API for your required task.

We would like to thank Yamale’s maintainers, for validating and fixing the issue in record time and for responsibly creating a CVE for the issue after the fixed version was available.

Questions? Thoughts? Contact us at research@jfrog.com for any inquiries related to security vulnerabilities.

JFrog