Refactoring to Collections

How many times have you seen code like this?

public function displayCoffeeRunsOpenForOrders()
{
    $coffeeRuns = $this->coffeeRunsRepository->getAll();
    $openForOrders = array();

    foreach ($coffeeRuns as $coffeeRun) {
        if ($coffeeRun->ordersCanBeMade()) {
            $openForOrders[] = $coffeeRun;
        }
    }

    return $openForOrders;
}

Our office is very close to a great coffee bar, and people are going there from time to time to get coffee for everyone in the office. Keeping track of who's going for coffee and who wants which coffee can be daunting, which is why I'm making this app. Let's take a closer look at that piece of code.

We're manually traversing an array of CoffeeRun objects that we got out of the Repository to filter out those who are open for orders. If we find one, we add it to a new array. This way we build a list of CoffeeRuns that are still accepting orders.

To Iterator or not, that's the question

Now, there's nothing very wrong with this, but it could be made a whole lot better, I think. Let's say that the Repository contains 1000 CoffeeRuns. We're now forcing the Repository to return an array of 1000 CoffeeRun objects when we call getAll(). That's not great. But let's think about it... The Repository already represents a list of all CoffeeRun objects. Why can't we just iterate over it? Something like this:

public function displayCoffeeRunsOpenForOrders()
{
    $coffeeRuns = $this->coffeeRunsRepository;
    $openForOrders = array();

    foreach ($coffeeRuns as $coffeeRun) {
        if ($coffeeRun->ordersCanBeMade()) {
            $openForOrders[] = $coffeeRun;
        }
    }

    return $openForOrders;
}

Having a repository we can iterate over like that would already be a first step. We could implement the Iterator interface within the Repository, and have this functionality immediately. But that would leave the foreach out there, and that would also present a problem for our dependency injection: the class where this displayCoffeeRunsOpenForOrders is a controller, and it has the Repository injected in its constructor, like this:

public function __construct(CoffeeRunsRepository $repository)
{
    $this->coffeeRunsRepository = $repository;
}

If we'd count on the fact that the Repository is now an Iterator, we should either typehint on it, but then we lose our typehint for CoffeeRunsRepository, which we don't want. Or we could make the CoffeeRunsRepository interface extend the Iterator interface, but then we get a really big interface we need to implement for every implementation of the Repository. Both of which are annoying to say the least.

Think SOLIDly Different

Let's try something completely different. In fact, the foreach in that method, is blocking progress for us right now; Let's extract that whole block to a method. This is what was in the controller before the extraction:

public function displayCoffeeRunsOpenForOrders()
{
    $coffeeRuns = $this->coffeeRunsRepository->getAll();
    $openForOrders = array();

    foreach ($coffeeRuns as $coffeeRun) {
        if ($coffeeRun->ordersCanBeMade()) {
            $openForOrders[] = $coffeeRun;
        }
    }

    return $openForOrders;
}

In the Repository we added a new method openForOrders(), and we kept the same implementation as before, which now resides in the implementation of that method:

interface CoffeeRunsRepository
{
    // ...

    /**
     * @return CoffeeRun[] A list of CoffeeRuns open for orders
     */
    public function openForOrders();

    // ...
}
final class CoffeeRunsRepositoryInMemory implements CoffeeRunsRepository
{
    // ...

    public function openForOrders()
    {
        $coffeeRuns = $this->getAll();
        $openForOrders = array();

        foreach ($coffeeRuns as $coffeeRun) {
            if ($coffeeRun->ordersCanBeMade()) {
                $openForOrders[] = $coffeeRun;
            }
        }

        return $openForOrders;
    }

    // ...
}

This is what's left in the controller after extraction:

public function displayCoffeeRunsOpenForOrders()
{
    return $this->coffeeRunsRepository->openForOrders();
}

I don't like the Repository word in there, while we're thinking about the Repository as just being a Collection. Let's rename the variable in our controller. Notice how well this reads:

public function displayCoffeeRunsOpenForOrders()
{
    return $this->coffeeruns->openForOrders();
}

"Return CoffeeRuns that are Open For Orders". I like it. What's even better, is that the filtering of the Collection is now happening in the object that worries about the Collection, namely the Repository. This means that the specific implementations can optimize for their own benifits. E.g. the database implementation of the Repository can use a specific WHERE query so that it doesn't need to filter CoffeeRuns in PHP, and as a result of that it doesn't need to load a whole list of CoffeeRun objects into memory.

Bump your head twice

What if we want a second method in our controller that displays all CoffeeRuns that are open for orders, and will happen before noon? The openForOrders() method won't be enough. We could use it, and then use a foreach as in the first example of the blog post, but then we're back to square one. We could also make it a Repository method, and that would be okay, but only if we don't have to do it too many times. Let's look for a different solution.

It looks like we now have two problems:

  1. We want to encapsulate the "looping over a list and picking out specific items"
  2. We want to combine several of these actions on a list, or at least do them in succession

The problems always come up when we're talking to the Repository. We made two changes to it, that were both half solutions. The Repository has too many responsibilities (Storage and Collection). What would happen if we'd return an object representing a list of CoffeeRun objects from the Repository, and use that to iterate over? The Repository only takes care of the storage, and the Collection object takes care of the Collection stuff.

What we want to do is "filter" CoffeeRuns so that we get CoffeeRuns that are open for orders, and then filter those again so that we get those open for orders that are happening before noon. In pseudo code:

coffeeRuns = repository.getAll
coffeeRuns
    .thatAre(openForOrders)
    .thatAre(happeningBeforeNoon)

If we can express openForOrders and happeningBeforeNoon as functions that take a CoffeeRun object and return a boolean (we'll call them predicates), then the thatAre() implementation could be very generic! Let's check it out:

$openForOrders = function (CoffeeRun $coffeeRun)
{
    return $coffeeRun->ordersCanBeMade();
};

$noon = new DateTime('today noon');
$happeningBeforeNoon = function (CoffeeRun $coffeeRun) use ($noon)
{
    return $coffeeRun->getDate() < $noon;
};

Now we'll create the CoffeeRuns Collection object, with a thatAre method:

interface CoffeeRuns
{
    /**
     * @param callable function that takes a CoffeeRun and returns a boolean
     *
     * @return CoffeeRuns that match the predicate
     */
    public function thatAre(callable $matchingPredicate);

    /**
     * @return CoffeeRun[] An array of CoffeeRuns
     */
    public function asArray();
}

Let's create a naïve array based implementation of it:

final class CoffeeRunsArray implements CoffeeRuns
{
    private $coffeeRuns;

    public function __construct(array $coffeeRuns)
    {
        $this->coffeeRuns = $coffeeRuns;
    }

    public function thatAre(callable $matchingPredicate)
    {
        return new static(
            array_filter(
                $this->coffeeRuns,
                $matchingPredicate
            )
        );
    }

    public function asArray()
    {
        $this->coffeeRuns;
    }
}

Right now, we can have our CoffeeRunsRepository return a CoffeeRuns Collection object for the getAll() method instead of CoffeeRun[] (an array of CoffeeRun) objects, and remove the specific methods we created, like openForOrders().

interface CoffeeRunsRepository
{
    /**
     * @return CoffeeRuns A collection of CoffeeRuns
     */
    public function getAll();
}

In the controller, we now get this:

final class CoffeeController
{
    private $coffeeRuns;
    private $openForOrders;
    private $happeningBeforeNoon;

    public function __construct(CoffeeRunsRepository $repository)
    {
        $this->coffeeRuns = $repository;

        $this->openForOrders = function (CoffeeRun $coffeeRun)
        {
            return $coffeeRun->ordersCanBeMade();
        };

        $noon = new DateTime('today noon');
        $this->happeningBeforeNoon = function (CoffeeRun $coffeeRun) use ($noon)
        {
            return $coffeeRun->getDate() < $noon;
        };
    }

    public function displayCoffeeRunsOpenForOrders()
    {
        $coffeeRuns =
            $this->coffeeRuns->getAll()
            ->thatAre($this->openForOrders);

        return $coffeeRuns->asArray();
    }

    public function displayCoffeeRunsOpenForOrdersAndBeforeNoon()
    {
        $coffeeRuns =
            $this->coffeeRuns->getAll()
            ->thatAre($this->openForOrders)
            ->thatAre($this->happeningBeforeNoon);

        return $coffeeRuns->asArray();
    }
}

As you can see, the individual methods got very readable. The operations on the Collections are encapsulated. I extracted the $openForOrders and $happeningBeforeNoon predicates to class variables, because I didn't like the duplication in both controller methods, however, I'm not satisfied with their presence in the constructor. We could create a Predicate interface and make the thatAre() method accept Predicate instances. As a result our controller would just do something like $coffeeRuns->thatAre(new OpenForOrders());. But let's leave it like this for now. I've got some critical questions coming in:

Wait a minute! You didn't solve the memory problem!

Exactly. Let's do that right now.

final class CoffeeRunsWithIterator implements CoffeeRuns
{
    private $coffeeRuns;
    private $predicates;

    public function __construct(Iterator $coffeeRuns, array $predicates = array())
    {
        $this->coffeeRuns = $coffeeRuns;
        $this->predicates = $predicates;
    }

    public function thatAre(callable $matchingPredicate)
    {
        $predicates = $this->predicates;
        $predicates[] = $matchingPredicate;

        return new static(
            $this->coffeeRuns,
            $predicates
        );
    }

    public function asArray()
    {
        $collectionAsArray = array();

        foreach ($this->coffeeRuns as $coffeeRun) {
            foreach ($this->predicates as $predicate) {
                if (call_user_func($predicate, $coffeeRun) !== true) {
                    continue 2;
                }
            }

            $collectionAsArray[] = $coffeeRun;
        }

        return $collectionAsArray;
    }
}

This is an implementation of the CoffeeRuns Collection interface that takes an Iterator. Whenever the thatAre() method gets called with a predicate, we don't do anything, except for creating a new instance of the Collection class with the same Iterator in it, but with the predicate added to the list of predicates that must be matched for each element in the Collection. It's only at the last step, when we try to extract the contents of the collection, that we run the Iterator and check every value against the given predicates. We still need to run the predicates against the whole collection, but because they are within an Iterator, the Iterator can e.g. get them from the database one by one. That's an implementation detail of the Repository.

What if we want to be able to use MySQL WHERE statements in our Repositories?

Let's say that we want to create a MySQL implementation of the Repository. We don't really want to write a query that will get the whole list of CoffeeRuns, even though we worked with an Iterator before (and thus preserving memory issues), so just passing closures to the thatAre() method will not help us a whole lot to write efficient queries. To improve on that, we could start passing our predicates as objects instead of closures, like this:

interface CoffeeRuns
{
    /**
     * @param Predicate that CoffeeRuns need to satisfy
     *
     * @return CoffeeRuns that match the predicate
     */
    public function thatAre(Predicate $matching);

    /**
     * @return CoffeeRun[] An array of CoffeeRuns
     */
    public function asArray();
}

A predicate could look something like this:

interface Predicate
{
    /**
     * @param CoffeeRun the CoffeeRun we want to check
     *
     * @return bool if the Predicate is satisfied by given CoffeeRun
     */
    public function isSatisfiedBy(CoffeeRun $coffeeRun);
}

And to get the same functionality as before we can implement it like this:

final class OpenForOrders implements Predicate
{
    public function isSatisfiedBy(CoffeeRun $coffeeRun)
    {
        return $coffeeRun->ordersCanBeMade();
    }
}

When we have this in place, implementing a repository that can translate these predicates to the correct WHERE statements is pretty much trivial: without doing an actual query, the Repository returns a CoffeeRuns collection object that can be filtered using Predicates and when that is done, the query can be built by translating the Predicates to the appropriate WHERE statements. Now the Predicate object can still be used to check if a given CoffeeRun satisfies the rule encapsulated within it, but within the Repository, a more efficient way is used against the whole database table full of CoffeeRuns. Eric Evans calls this concept Specification, and you can read all about it in his great book Domain Driven Design.

Doing something with all objects in our collection

Let's say we want people to stopOrdering() for all CoffeeRuns that are happening before noon.

$stopOrdering = function (CoffeeRun $coffeeRun)
{
    $coffeeRun->stopOrdering();

    return $coffeeRun;
};

$coffeeRuns = $repository->getAll();
$stoppedOrderingForTheseRuns = $coffeeRuns
    ->thatAre($happeningBeforeNoon)
    ->updateEach($stopOrdering);

Let's add it to the Collection interface:

interface CoffeeRuns
{
    /**
     * @param Predicate that CoffeeRuns need to satisfy
     *
     * @return CoffeeRuns that match the predicate
     */
    public function thatAre(Predicate $matching);

    /**
     * @param callable function that takes a CoffeeRun and returns an updated one
     *
     * @return CoffeeRuns that got updated
     */
    public function updateEach(callable $update);

    /**
     * @return CoffeeRun[] An array of CoffeeRuns
     */
    public function asArray();
}

The implementation for the CoffeeRunsArray could be something like this:

final class CoffeeRunsArray implements CoffeeRuns
{
    private $coffeeRuns;

    public function __construct(array $coffeeRuns)
    {
        $this->coffeeRuns = $coffeeRuns;
    }

    public function thatAre(Predicate $predicate)
    {
        return new static(
            array_filter(
                $this->coffeeRuns,
                function (CoffeeRun $run) use ($predicate) {
                    return $predicate->isSatisfiedBy($run);
                }
            )
        );
    }

    public function updateEach(callable $update)
    {
        return new static(
            array_map(
                $update,
                $this->coffeeRuns
            )
        );
    }

    public function asArray()
    {
        $this->coffeeRuns;
    }
}

As you can see, the implementation is trivial in this case, but again it provides us with a powerful abstraction. The code that operates on the Collection doesn't have to know anything about how the Collection is implemented, it just knows about the methods we provided for interacting with it. For those that are into functional programming, we just implemented filter and map here.

I hope you see the value in this. For pieces of code where a lot of actions need to happen on Collections of objects, this might be the way to go. Since the code doesn't have to know anything about the implementation of the Collection, you can change it at any time. For instance, when you start out, you can just use a simple Array implementation, and swap it out for an Iterator later on when you notice memory consumption going up. It can also make things pretty readable, as demonstrated above with the $coffeeRuns->thatAre(new OpenForOrders());.

You can also choose to go A LOT further, or less far into this, as you desire. Good luck on your explorations!

PS: I gave a talk on this subject at the PHP.GENT meetup, you can find the slides here.

Categories: PHP