A while ago (at least a year according to the GitHub repository this article is about) I needed a way to abstract paging in order to create a continuous iterator for PHP. Unfortunately, I couldn't find any solutions that fit all of these requirements:

  1. It must be an iterator. Kind of an important thing... I'll explain that next.
  2. It must be an array. Not only traversable, but randomly accessible.
  3. Cache data that had already been fetched. Since I'm accessing these elements randomly I don't want to fetch a page that has already been downloaded.
  4. Flexibility. To configure how the pages are fetched. It may be an API, a database, or some other external source. Also, some obvious stuff like the page sizes and total records.

So I created elliotchance/iterator to do exactly this.

Iterator, You Say?

"In computer programming, an iterator is an object that enables a programmer to traverse a container, particularly lists."

Traverse a container simply means that I (well not me personally, but the code) can step through items in a collection. The loop that is traversing these items does not need to know how many items there are in total, how each item is retrieved, or even the total number of items in the collection. It only needs to be able to ask the container for the next item and recognise that there are no more items left.

To create an iterator we only need to implement the built in Iterator interface. PHP is able to recognise automatically when instances implement this and then will know how to traverse the collection. For example, the following code:

foreach ($collection as $item) {
echo $item;
}

If $collection implemented Iterator PHP would basically do this:

$collection->rewind();
while ($collection->valid()) {
$item = $collection->current();
echo $item;
$collection->next();
}

How Do We Make This Useful?

There are a lot of systems that deliver a whole collection in parts, often called pages. A perfect example of this would be an API that retrieves all the search results of a query. If there were thousands of results trying to return them in a single query would be very expensive for both the server that is sending the results and the client that has to read it all. It's also possible the client doesn't even need all of these records but they have forgotten to limit the results to a certain number.

A page may contain 100 items, but what if we do want to print all thousand? We would have to make 10 separate API calls, something ugly like this:

$pageNumber = 1;
while (true) {
$url = "https://api.github.com/search/repositories?" . http_build_query([
'q' => 'fridge',
'page' => $pageNumber,
]);
$result = json_decode(file_get_contents($url), true);
if (!$result['items']) {
break;
}
foreach ($result['items'] as $item) {
// Do something with $item
}
}

This is complicated and error-prone. Here we are only printing the items as they come (the best possible scenario). If we needed to access them any other way the code would quickly become unwieldy.

An Iterator a Day Keeps the Doctor Away

Rather than implementing Iterator we can extend AbstractPagedIterator to further simplify what we really need to supply to randomly traverse a collection.

use Elliotchance\Iterator\AbstractPagedIterator;

class MyPagedIterator extends AbstractPagedIterator
{
/**
* The total number of items we expect to find. The last page may be partial.
* @return integer
*/
public function getTotalSize()
{
return 8;
}

/**
* The number of items per page. All pages must be the same size (except the
* last page).
* @return integer
*/
public function getPageSize()
{
return 3;
}

/**
* Lazy-load a specific page.
* @return array
*/
public function getPage($pageNumber)
{
$pages = [
[ 1, 2, 3 ],
[ 4, 5, 6 ],
[ 7, 8 ],
];
return $pages[$pageNumber];
}
}

[1, 2, 3] can be thought of as the 3 items in the first page, [4, 5, 6] are the items for the second page and [7, 8] are the remaining items on the final page. Now we can use all the PHP magic as if we had a continuous collection of 8 items:

$iterator = new MyPagedIterator();

echo $iterator[4]; // 5

echo count($iterator); // 8

foreach ($iterator as $item) {
echo "$item ";
}
// 1 2 3 4 5 6 7 8

A More Practical Example

Above we can see the implementation, but it's most useful when we are actually doing something under the hood to fetch those values. Let's find some fridges on GitHub!

use Elliotchance\Iterator\AbstractPagedIterator;

class GithubSearcher extends AbstractPagedIterator
{
protected $totalSize = 0;
protected $searchTerm;

public function __construct($searchTerm)
{
$this->searchTerm = $searchTerm;

// This will make sure totalSize is set before we try and access the data.
$this->getPage(0);
}

public function getTotalSize()
{
return $this->totalSize;
}

public function getPageSize()
{
return 100;
}

public function getPage($pageNumber)
{
$url = "https://api.github.com/search/repositories?" . http_build_query([
'q' => $this->searchTerm,
'page' => $pageNumber + 1,
]);
$result = json_decode(file_get_contents($url), true);
$this->totalSize = $result['total_count'];
return $result['items'];
}
}

$repositories = new GithubSearcher('fridge');
echo "Found " . count($repositories) . " results:\n";
foreach ($repositories as $repo) {
echo $repo['full_name'];
}

Should output something like:

Found 137 results:
octocat/my-fridge
... 136 more

Now we can iterate the data on demand. Hooray!