Parse Your Way Through Errors

In our code base there's a static function to log an error:

final class Debug
{
    public static function error(
        $msg,
        $obj = false,
        $category = 'user',
        $type = 'user'
    ) {
        // ...
    }
}
Debug::error('API call failed', $response, 'facebook', 'api-call-failed');

The first argument is a message, the second argument can be a value of any type, the third and fourth argument are used for grouping errors by category and type. This is very useful for analytics.

Ignore for a second that we should use the PSR-3: Logger Interface. We could easily move this Debug class behind an implementation of that interface if we ever wanted to.

The problem

Did you notice that the category and type have a default value "user"? I don't know when or why this was introduced in our code base but it's bad. When the category and type is not set the errors will get lost in the category "user" and the type "user".

How can we find these Debug::error statements without a category and a type? We don't want to ignore these errors.

Use a regex?

Maybe... If the arguments are simple:

RegExr experiment to find Debug::error's

(RegExr)

Note that this is the first regex that came to my mind. We could improve that regex but let's not waste our time. We need a better way. We need something that truly understands our code.

Nikita's PHP Parser to the rescue

A parser is useful for static analysis, manipulation of code and basically any other application dealing with code programmatically. A parser constructs an Abstract Syntax Tree (AST) of the code and thus allows dealing with it in an abstract and robust way.

https://github.com/nikic/PHP-Parser

Parsing a programming language is hard. This package makes it easy!

Let's see how we can use the package to find Debug::error's without a category and type.

  • $ mkdir parser-demo
  • $ cd parser-demo
  • $ composer require nikic/php-parser

This package is bundled with a binary (php-parse) to play with the parser:

$ vendor/bin/php-parse '<?php Debug::error("message", $var);'
====> Code <?php Debug::error("message", $var);
==> Node dump:
array(
    0: Expr_StaticCall(
        class: Name(
            parts: array(
                0: Debug
            )
        )
        name: error
        args: array(
            0: Arg(
                value: Scalar_String(
                    value: message
                )
                byRef: false
                unpack: false
            )
            1: Arg(
                value: Expr_Variable(
                    name: var
                )
                byRef: false
                unpack: false
            )
        )
    )
)

The abstract syntax tree is a list of statements. The first statement is an Expr_StaticCall. A static call has a class, name and args. If the count of args is not equal to 4 we know that the category and/or type is missing. Let's translate that in some code:

$ touch find.php

First, we'll need a way to find all PHP files in our code base:

<?php

$directory = $argv[1];
$directoryIterator = new RecursiveDirectoryIterator($directory);
$directoryIterator = new RecursiveIteratorIterator($directoryIterator);
$files = new RegexIterator($directoryIterator, '/^.+\.php$/i', RecursiveRegexIterator::GET_MATCH);

We can now pass a path to our code base as a command line argument:

$ php find.php ~/path/to/code

Secondly, we'll need a way to traverse all nodes of an AST. Meet the NodeTraverser.

<?php

use PhpParser\Error;
use PhpParser\NodeTraverser;
use PhpParser\ParserFactory;
use PhpParser\NodeVisitor\NameResolver;

require_once __DIR__ . '/vendor/autoload.php';

// ... directory iterator

$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);

foreach ($files as $file) {
    try {
        $code = file_get_contents($file[0]);
        $statements = $parser->parse($code);
        $traverser = new NodeTraverser;
        $traverser->addVisitor(new NameResolver);
        $traverser->addVisitor(new DebugErrorVisitor($file[0]));
        $traverser->traverse($statements);
    } catch (Error $e) {
        echo 'Parse Error: ', $e->getMessage(), PHP_EOL;
    }
}

The NodeTraverser will call a method (enterNode) on each visitor for each node in the AST. Let's implement DebugErrorVisitor:

<?php

use PhpParser\Node;
use PhpParser\NodeVisitorAbstract;
use PhpParser\Node\Expr\StaticCall;

require_once __DIR__ . '/vendor/autoload.php';

final class DebugErrorVisitor extends NodeVisitorAbstract
{
    private $file;

    public function __construct($file)
    {
        $this->file = $file;
    }

    public function enterNode(Node $node)
    {
        if (!$node instanceof StaticCall) {
            return;
        }

        if ($node->class instanceof Name && $node->class->toString !== 'Debug') {
            return;
        } elseif ($node->class !== 'Debug') {
            return;
        }

        if ($node->name !== 'error') {
            return;
        }

        if (count($node->args) !== 4) {
            echo "Found {$node->class}::{$node->name} in {$this->file} on line {$node->getLine()}", PHP_EOL;
        }
    }
}

// ... directory iterator
  1. If $node is not a StaticCall we'll return.
  2. If the class name is not Debug we'll return (The class will be a Name instance if it's in a namespace).
  3. If the name of the method we're calling is not error we'll return.
  4. If the count of arguments is not 4, we found an Debug::error we're interested in.

Let's put it all together:

<?php

use PhpParser\Node;
use PhpParser\NodeVisitorAbstract;
use PhpParser\Node\Expr\StaticCall;
use PhpParser\Error;
use PhpParser\NodeTraverser;
use PhpParser\ParserFactory;
use PhpParser\NodeVisitor\NameResolver;

require_once __DIR__ . '/vendor/autoload.php';

final class DebugErrorVisitor extends NodeVisitorAbstract
{
    private $file;

    public function __construct($file)
    {
        $this->file = $file;
    }

    public function enterNode(Node $node)
    {
        if (!$node instanceof StaticCall) {
            return;
        }

        if ($node->class instanceof Name && $node->class->toString !== 'Debug') {
            return;
        } elseif ($node->class !== 'Debug') {
            return;
        }

        if ($node->name !== 'error') {
            return;
        }

        if (count($node->args) !== 4) {
            echo "Found {$node->class}::{$node->name} in {$this->file} on line {$node->getLine()}", PHP_EOL;
        }
    }
}

$directory = $argv[1];
$directoryIterator = new RecursiveDirectoryIterator($directory);
$directoryIterator = new RecursiveIteratorIterator($directoryIterator);
$files = new RegexIterator($directoryIterator, '/^.+\.php$/i', RecursiveRegexIterator::GET_MATCH);

$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);

foreach ($files as $file) {
    try {
        $code = file_get_contents($file[0]);
        $statements = $parser->parse($code);
        $traverser = new NodeTraverser;
        $traverser->addVisitor(new NameResolver);
        $traverser->addVisitor(new DebugErrorVisitor($file[0]));
        $traverser->traverse($statements);
    } catch (Error $e) {
        echo 'Parse Error: ', $e->getMessage(), PHP_EOL;
    }
}

We can now call the script with a path to our code base and it will find Debug::error's without a category and a type accurately. Perhaps we can use this script in our Jenkins setup to let a build fail if there's a Debug::error without a category and type.

We could take this one step further and ask for a category and type on the command line (using readline) and replace it automatically but I'll leave that as an exercise for the reader. 😏

Happy programming!

Categories: PHP