Generating A PDF From A Web Page Using PHP And Chrome

Generating a PDF document from a web page through PHP can be problematic. It's often something that seems quite simple, but actually generating the document can be difficult and time consuming.

There are a number of libraries available that allow you to generate PDF documents, but in reality they solve one problem and introduce several more. Packages exist as either HTML to PDF converters or allow you to generate a PDF by placing elements into the document.

Generating a PDF manually is not normally a good way to go just due to it being a very time consuming process and is difficult to test. You will probably have to spend time creating a rendering engine just for this purpose, and that will take time to create and maintain.

HTML to PDF generators seems like a great idea, and they used to be a pretty good way of converting a web page into a PDF document. They work by taking the HTML and CSS and processing this to generate the page layout within a PDF document. Unfortunately, the libraries available just can't keep up with the progress of CSS and so don't work that well with modern positioning techniques like flex. In my experience you can spend a considerable amount of time tweaking and fixing the web page to work with the PDF generator.

Perhaps the best way to render HTML and CSS is to use a web browser, and many web browsers have the ability to generate PDF documents via the print interface. This lead me to seeing if I could easily combine PHP with a web browser in order to generate great looking PDF documents quickly and easily. Chrome, or its open source equivalent Chromium, is commonly remote controlled in order to facilitate behavioural testing. This browser can also be used to render PDF documents.

My initial research showed that there are a number of APIs available that provide this facility. You simply provide the API with a URL or some markup and ask for a PDF file in return. Whilst these solutions worked quite nicely, I wanted something I could install onto a server to avoid any rate limiting (or cost) problems. There is also the problem of access and privacy as most of the time PDF documents will be generating invoices and as such sending personal data to a third party API might not be the best solution.

The Chrome PHP library provides a mechanism to expose Chrome to PHP, meaning that you can interact with it through PHP like you would with any other browser. This means that you can access pages through Chrome and then generate PDF documents based on that content.

To install the package you need to run the following composer command.

composer require chrome-php/chrome

Once the package is installed you can start using it by including the composer autoload.php file and creating the main HeadlessChromium/BrowserFactory class.

<?php
require('./vendor/autoload.php');

use HeadlessChromium\BrowserFactory;

$browserFactory = new BrowserFactory();

The great thing about this package is when creating the BrowserFactory object it will attempt to detect the presence of Chromium on your system. You can also supply a string that points to the locally installed version of Chrome, which means that you can package Chrome into a single package that you can interact with using PHP.

If you don't have Chromium installed you can download the version for your platform from the Chromium download site. Look at the bottom of the download site page for a list of operating systems that you can download the executable for. Once downloaded you can extract Chrome it into your project and reference it through the BrowserFactory constructor.

If you are running this on Linux then you just need to reference the chrome executable.

$browserFactory = new BrowserFactory('./chrome-linux/chrome');

If you are on Mac things get a little bit more complicated. Chromium comes as an .app package, which is like a directory. You need to drill into the package to point to the correct executable inside.

$browserFactory = new BrowserFactory('./chrome-mac/Chromium.app/Contents/MacOS/Chromium');

Note that if you are using Chromium on Mac you will need to ensure that the application is registered with the operating system security. If you don't do this the application will be blocked and you'll just get an error message.

With the browser factory created you can create an instance of a browser. This will start a Chrome instance that you can then interact with.

$browser = $browserFactory->createBrowser();

There are a number of options that you can pass to the createBrowser() method that influence how the browser will behave. For example, headless mode is on by default, so in order to turn headless mode off you would pass in "headless" as "false", like this.

// Disable headless mode.
$browser = $browserFactory->createBrowser(['headless' => false,]);

With this browser object in hand we can then create a page and navigate to it, waiting for the navigation to finish. It is also good practice to close down the browser once you are finished with it.

// Create a new page and navigate to an URL.
$page = $browser->createPage();
$url = 'https://www.hashbangcode.com/article/drupal-9-getting-good-score-google-pagespeed-insights';
$page->navigate($url)->waitForNavigation();

// Close the browser.
$browser->close();

The above example doesn't actually do anything, it just performs a GET request against an endpoint. In order to save the page as a PDF document we need to use the page object to generate a PDF document and then save this to a file, which we'll call download.pdf.

// Create a new page and navigate to an URL.
$page = $browser->createPage();
$url = 'https://www.hashbangcode.com/article/drupal-9-getting-good-score-google-pagespeed-insights';
$page->navigate($url)->waitForNavigation();

// Generate PDF document.
$pdf = $page->pdf();
$pdf->saveToFile('download.pdf');

// Close the browser.
$browser->close();

As the browser might have a problem whilst fetching the page we need to wrap the code in a "try/finally" statement, with the final step being to always close the browser. This creates more resiliency in the code and prevents Chrome from eating memory if the code fails to complete. Putting all of this together we get the following.

<?php
require('./vendor/autoload.php');

use HeadlessChromium\BrowserFactory;

// Start headless chrome.
$browser = $browserFactory->createBrowser();

try {
    // Create a new page and navigate to an URL.
    $page = $browser->createPage();
    $url = 'https://www.hashbangcode.com/article/drupal-9-getting-good-score-google-pagespeed-insights';
    $page->navigate($url)->waitForNavigation();

    // Generate PDF document.
    $pdf = $page->pdf();
    $pdf->saveToFile('download.pdf');
} finally {
    // Close the browser. 
    $browser->close();
}

The resulting PDF document here is an exact match of what the web page would look like when printed. This is an important consideration to realise here. The PDF document is generated through the Chrome print functionality, which will take into account any print CSS styles you have on the page. If you want to see what the PDF file looks like before you start this process then you can just use your browser to print the document.

We don't always have to pass a URL to the page object, we can also pass in a bunch of HTML. In the following example I have created a very simple block of HTML that I'm then rendering as a PDF document.

<?php
require('./vendor/autoload.php');

use HeadlessChromium\BrowserFactory;

// Start headless chrome.
$browser = $browserFactory->createBrowser();

try {
    // Create a new page and navigate to an URL.
    $page = $browser->createPage();

    $page->setHtml('<h1>Title</h1><p>Content</p>');

    // Generate PDF document.
    $pdf = $page->pdf();
    $pdf->saveToFile('generated.pdf');
} finally {
    // Close the browser. 
    $browser->close();
}

If you want to stream the file to the user instead of writing it to a file then this is possible through these lines of code. The data from the PDF document is base64 encoded, so we just need to decode that data before sending it to the user. In the following example we are setting some headers to allow the PDF file to be downloaded as a file called "downloaded.pdf" and then sending the data to the user. 

// Set headers for PDF.
header("Content-type:application/pdf");
header("Content-Disposition:attachment;filename='downloaded.pdf'");

// Extract the PDF data.
$pdf = $page->pdf();
$response = $pdf->getResponseReader()->waitForResponse()->getResultData('data');

// Stream PDF data to the user.
echo base64_decode($response);

The pdf() method can take a set of options that allow the PDF file to be changed in different ways. These are just the same settings that exist with the Chrome print to PDF functionality. Most of these options deal with altering the margins and width of the paper, but the printBackground setting might be useful if you want your PDF file to be printed with the backgrounds set by your print stylesheet.

// Render PDF document with backgrounds.
$pdf = $page->pdf(['printBackground' => true]);

I've only really scratched the surface of the Chrome PHP library here, but it's very well written and looks extensible. The fact that you can embed a downloaded Chrome instance it perhaps the most powerful feature here. Whilst PDF generation using this library is very useful I plan on using this more in the future as it provides a great way interacting with Chrome using pure PHP.

Comments

Excellent - will give it a try - am using an external API at the moment and would like to improve speed and reduce costs - this is potentially able to do both !!

Then I will need to find a self hosted PDF -> Word convertor :-)

Jim

Permalink

Add new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
1 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.