Drupal 9: Content Sharing Between Drupal Sites Using oEmbed

Drupal 9: Content Sharing Between Drupal Sites Using oEmbed

9th October 2022 - 21 minutes read time

A while ago I wrote about including oEmbed providers in a Drupal site and briefly touched upon creating custom providers. I didn't go into any more detail there so I decided to write another article looking at creating custom oEmbed providers.

In this article I will cover what oEmbed is and how to set up a custom oEmbed endpoint. I will also then go onto show how to setup and configure a Drupal site to consume that custom oEmbed endpoint. 

What Is oEmbed?

oEmbed is a specification that allows websites to share content through the use of an API. This allows users to share a URL for a web page and see a representation of that page embedded in any site. Using this technique allows for videos and images to be shared between sites easily, without the user having to worry about image dimensions or generating thumbnails.

It's up to the oEmbed provider to decide on what sort of content they will publish as part of an oEmbed request. The response will have a 'type' parameter that can be "photo", "video", "link" or "rich" (which is essentially just HTML), each of which will return different parameters depending on what is needed. You can see the full specification of oEmbed on the oembed.com website.

As it is possible for third party sites to add pretty much anything to the response there are some security considerations to be taken into account. The oEmbed group maintains a list of available providers (seen at https://oembed.com/providers.json) that you are recommended to adhere to. It is also recommended that all embedded content is done as an iframe element to prevent anything within the iframe leaking out into the rest of the page.

Let's take an example of embedding a Tweet using oEmbed. Twitter has added itself to the oEmbed providers list and so we can find out how to embed Tweets using this registry.

This is the entry for Twitter in the providers.json registry.

{
  "provider_name": "Twitter",
  "provider_url": "http://www.twitter.com/",
  "endpoints": [
    {
      "schemes": [
        "https://twitter.com/*",
        "https://twitter.com/*/status/*",
        "https://*.twitter.com/*/status/*"
      ],
      "url": "https://publish.twitter.com/oembed"
    }
  ]
}

Using this information (and looking at Twitter) we can see that a Tweet can be embedded using the format.

https://publish.twitter.com/oembed?url=https://twitter.com/<username>/status/<statusid>

We can ask for the oEmbed information for a Tweet using the following URL.

https://publish.twitter.com/oembed?url=https://twitter.com/bechillcomedian/status/1560398156965232640

This generates the following json.

{
  "url": "https:\/\/twitter.com\/bechillcomedian\/status\/1560398156965232640",
  "author_name": "Bec Hill",
  "author_url": "https:\/\/twitter.com\/bechillcomedian",
  "html": "\u003Cblockquote class=\"twitter-tweet\"\u003E\u003Cp lang=\"en\" dir=\"ltr\"\u003EWhile sitting on a flight yesterday I noticed that the seat fabric pattern definitely seems to be a code. Yet Google provides no help. \u003Ca href=\"https:\/\/twitter.com\/standupmaths?ref_src=twsrc%5Etfw\"\[email protected]\u003C\/a\u003E can your followers help enlighten me? \u003Ca href=\"https:\/\/t.co\/iSm2z8Op9r\"\u003Epic.twitter.com\/iSm2z8Op9r\u003C\/a\u003E\u003C\/p\u003E&mdash; Bec Hill (@bechillcomedian) \u003Ca href=\"https:\/\/twitter.com\/bechillcomedian\/status\/1560398156965232640?ref_src=twsrc%5Etfw\"\u003EAugust 18, 2022\u003C\/a\u003E\u003C\/blockquote\u003E\n\u003Cscript async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"\u003E\u003C\/script\u003E\n",
  "width": 550,
  "height": null,
  "type": "rich",
  "cache_age": "3153600000",
  "provider_name": "Twitter",
  "provider_url": "https:\/\/twitter.com",
  "version": "1.0"
}

The parameters in this response are part of the oEmbed standard. In this case we are receiving a "type" of "rich", which means that the "html" parameter will be present and will contain url-encoded HTML data. If this was a "photo" type we would receive a "url" parameter that would be an absolute URL back to the original photo.

What this response means is that if we add the Twitter status URL to an oEmbed aware web page we would see something like this.

A screenshot of a Tweet being embedded into a site through oEmbed.

All we have done here is add a link, the rest is generated by Twitter through the oEmbed API. You can see the original Tweet here.

It is up to the consumer to understand that a link is for a given oEmbed provider and act upon that information accordingly. The oEmbed data is then consumed into the site and allows for rich embedded content to be shared between websites.

Setting Up Drupal As An oEmbed Provider

Now that we have covered what oEmbed is, we can look at creating a Drupal module that will respond to an oEmbed request with the required json responses. All this needs is a route and a controller to send back the json response.

The route is pretty simple, we just point it at a controller action and set it to be publicly available. This lives in a file called oembed_content.routing.yml in the root of the module.

oembed_content_endpoint:
  path: '/oembed/endpoint'
  defaults:
    _controller: '\Drupal\oembed_content\Controller\OembedController::endpoint'
  requirements:
    # This is a publicly accessible path.
    _access: 'TRUE'

Next is the controller class itself, in which we need to accept the "url" parameter being passed to the action. We can use the filter_has_var() and filter_input() PHP functions to ensure that the parameter exists and that it is a valid URL.

Once we have the filtered URL in hand we can then extract the path of the node that we want to load and load this from the database. If we managed to load the node then we render the summary of the node and construct the json payload before sending it on using the JsonResponse class.

Here is the code of the controller in full, with comments to show what is going on at any given step. This class is located at src/Controller/OembedController.php within the module directory.

<?php

namespace Drupal\oembed_content\Controller;

use Drupal\Core\Controller\ControllerBase;
use Drupal\path_alias\AliasManagerInterface;
use Symfony\Component\DependencyInjection\ContainerInterface;
use Symfony\Component\HttpFoundation\JsonResponse;
use Symfony\Component\HttpFoundation\RequestStack;
use Drupal\Core\Render\RendererInterface;

/**
 * An oEmbed controller class to respond to oEmbed content requests.
 */
class OembedController extends ControllerBase {

  /**
   * The request stack.
   *
   * @var \Symfony\Component\HttpFoundation\RequestStack
   */
  protected $requestStack;

  /**
   * The path alias manager.
   *
   * @var \Drupal\path_alias\AliasManagerInterface
   */
  protected $pathAliasManager;

  /**
   * The rendering service.
   *
   * @var \Drupal\Core\Render\RendererInterface
   */
  protected $renderer;

  /**
   * Creates a OembedController object.
   *
   * @param \Symfony\Component\HttpFoundation\RequestStack $request_stack
   *   The request stack.
   * @param \Drupal\path_alias\AliasManagerInterface $path_alias_manager
   *   The path alias manager.
   * @param \Drupal\Core\Render\RendererInterface $renderer
   *   The rendering service.
   */
  public function __construct(RequestStack $request_stack, AliasManagerInterface $path_alias_manager, RendererInterface $renderer) {
    $this->requestStack = $request_stack;
    $this->pathAliasManager = $path_alias_manager;
    $this->renderer = $renderer;
  }

  /**
   * {@inheritdoc}
   */
  public static function create(ContainerInterface $container) {
    return new static(
      $container->get('request_stack'),
      $container->get('path_alias.manager'),
      $container->get('renderer')
    );
  }

  /**
   * Action triggered by the route oembed_content_endpoint.
   *
   * @return \Symfony\Component\HttpFoundation\JsonResponse
   *   A json
   *
   * @throws \Exception
   */
  public function endpoint() {
    // Validate the 'url' parameter is a URl.
    if (!filter_has_var(INPUT_GET, 'url')) {
      // Return a blank json response with a 404 http code.
      return new JsonResponse([], 404);
    }

    // Extract and filter the 'url' parameter.
    $url = filter_input(INPUT_GET, 'url', FILTER_VALIDATE_URL, ['flags' => FILTER_FLAG_PATH_REQUIRED]);

    // Extract the internal path from the passed URL.
    $host = $this->requestStack->getCurrentRequest()->getSchemeAndHttpHost();
    $path = str_replace($host, '', $url);
    $internalPath = $this->pathAliasManager->getPathByAlias($path);

    if (preg_match('/node\/(\d+)/', $internalPath, $matches)) {
      // This will load the node object.
      $node = $this->entityTypeManager
        ->getStorage('node')
        ->load($matches[1]);
    }

    if (!isset($node) || !$node->isPublished()) {
      // Node wasn't loaded, or is unpublished so return a 404.
      return new JsonResponse([], 404);
    }

    // Get the summary from the body field.
    $summary = $node->get('body')->view('summary');

    // Generate the result, rendering the summary field and adding the author
    // information.
    $result = [
      'type' => 'rich',
      'html' => trim((string) $this->renderer->render($summary)),
      'title' => $node->getTitle(),
      'author_name' => $node->getRevisionUser()->get('name')->value,
      'author_url' => $node->getRevisionUser()->toUrl()->setAbsolute()->toString(),
      'version' => '1.0',
      'provider_name' => 'my content',
      'height' => '500',
      'width' => '500',
    ];

    return new JsonResponse($result, 200);
  }

}

With his module active we can now test the oEmbed endpoint using the following URL.

https://www.example.com/oembed/endpoint?url=https://www.example.com/node/2

Assuming that this node exists then we get a response that looks a little like this.

{
  "type":"rich",
  "html":"\u003Cdiv class=\u0022text-content clearfix field field--name-body field--type-text-with-summary field--label-hidden field__item\u0022\u003E\u003Cp\u003ELorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\u003C\/p\u003E\u003C\/div\u003E\n",
  "title":"oEmbed Content",
  "author_name":"admin",
  "author_url":"https:\/\/www.example.com\/user\/1",
  "version":"1.0",
  "provider_name":"my_oembed_content",
  "height":"500",
  "width":"500"
}

We can now use this oEmbed endpoint to consume content from one Drupal site into another. This takes a little bit of setting up first.

Consuming oEmbed Content From One Drupal Site To Another

We can now configure a separate Drupal site to consume the content from the oEmbed endpoint we just created. There is one problem though. Out of the box, Drupal will only add providers from the approved providers.json list, which is provided by oEmbed. This is a security feature that prevents arbitrary oEmbed providers being added to any site.

For this reason we need to require and install the oembed_providers Drupal module. This module allows us to inject our own provider configurations and augment the existing list of providers.

composer require drupal/oembed_providers
drush pm:enable oembed_providers

How you inject the media into your content depends on your requirements, but my preference is to use the core media_library module to provide the media library interface, which can be installed at this point along with the other module. This also allows media items to be injected into the content of pages using CKEditor, which is what I'll be doing in this example.

The first thing we need to do is create a module in our consumer Drupal site that will inform the media module that we have a custom oEmbed provider that we want to use. We need to use the hook_media_source_info_alter() hook, which informs the media module about an oEmbed provider. The hook is only used to label and configure the oEmbed provider within Drupal and so doesn't mention where the endpoint actually exists or any other implementation details. These details are held in the providers.json file and Drupal would normally look up the details from that list.

Normally, adding a provider like this would cause Drupal to throw an error since it doesn't exist in the providers.json file. The oembed_providers module allows us to add additional providers using the same hook that we would use for normal providers.

With the oembed_providers module installed, head over to the page at Configuration > Media > oEmbed Providers and click on the "+ Add oEmbed provider" button. This will show you an interface where you can configure the endpoints for your custom oEmbed provider.

The important part is the configuration of the endpoints, which contains a number of different options.

  • Endpoint schemes : The schemes are essentially a list of URLs that will be considered "valid" items on the other site. Since we only care about nodes were we will enter the node path, but this might contain taxonomy terms or other forms of content. The format consists of a wildcarded path to the item of content we want to retrieve.
  • Endpoint URL : The endpoint URL is essentially the controller action we created above. This is where the oEmbed endpoint exists and will be used to fetch the embed information from the given URL.
  • Discovery : The oEmbed specification can provide a discovery mechanism that allows sites to find out what formats are available. We haven't implemented this here so it is left unticked.
  • Available formats : It is possible for an oEmbed endpoint to respond in both json and xml, which is part of the oEmbed specification. Since we know our endpoint only responds using json we only tick this option.

Here is the endpoint form, filled in with information about our endpoint.

A screenshot of the custom oEmbed providers interface, showing the configuration of a custom oEmbed provider.

It is also possible to add multiple endpoints to a single provider, which can allow us to pull in different types of content depending on what URL was entered by the user.

Make a note of the machine name that you created before as this is important information that will be used in the hook. Once complete, you should see the provider listed something like this.

A screenshot of a configured custom oEmbed provider on a Drupal site.

Now it is time to connect our new provider to the media module in Drupal.

This is done (as I mentioned before) using a hook_media_source_info_alter() hook. An array is defined in this hook that defines the provider we want to use, which links up with the machine name of the provider we just defined. The important part here is the "providers" field, which must match the 'my_oembed_content' machine name we just defined.

Here is the hook in full, which is located in the module file of a custom module. 

<?php

/**
 * @file
 * The consume_oembed_content.module file.
 */

use Drupal\Core\StringTranslation\TranslatableMarkup;
use Drupal\media_library\Form\OEmbedForm;

/**
 * Implements hook_media_source_info_alter().
 */
function consume_oembed_content_media_source_info_alter(array &$sources) {
  $sources['oembed_content'] = [
    'id' => 'my_oembed_content',
    'label' => new TranslatableMarkup('My oEmbed Content'),
    'description' => new TranslatableMarkup('Embed a node.'),
    'allowed_field_types' => ['string'],
    'default_thumbnail_filename' => 'no-thumbnail.png',
    'providers' => ['my_oembed_content'],
    'class' => 'Drupal\media\Plugin\media\Source\OEmbed',
    'forms' => [
      'media_library_add' => OEmbedForm::class,
    ],
    'provider' => 'oembed_content',
  ];
}

Once you have added this hook you must then either activate the module or clear the caches. Doing so will reveal the new provider when we come to set up the media item.

The following is a screenshot media source selection field on the media type creation form. This shows the newly configured oEmbed provider we added through the hook and the oEmbed providers module. 

A screenshot of the Drupal media source selection dialog, showing a custom oEmbed provider.

The next step is sorting out the new media source configuration. Here, we just tell Drupal to automatically create a new field that will contain our embedded content and select "my_oembed_content" from the list of allowed providers.

A screenshot of the Drupal media source selection dialog, showing the further configuration of the media selection dialog.

With all of this in place it is just now a case of embedding our content. When editing a page, open up the media library selection dialog and paste in a link to the content we want to embed. Drupal will figure out the rest of the details and present us with our embedded content.

This is a screenshot of a page of content on the Drupal site we just configured, pulling content from the oEmbed endpoint we created earlier.

A screenshot of a Drupal web page, with content being generated through a oEmbedded node from another site..

As you can see, the output is pretty basic, but once you have this mechanism working you can go on to extend it as much as you need.

Since you have control over the rendering process for the content you can extend this as much as you like to include custom templates or other rendering options in your oEmbed endpoint. In the example above I just render the summary and return this, but there is nothing to stop you making richly featured HTML embeds with injected styles and script elements. Drupal will render the embedded item as an iframe (in accordance with the oEmbed specification) and so you have the ability to add styles that do not effect the rest of the page.

You can also include the other fields that come from the oEmbed provider endpoint. When setting up your media item you will see a list of field mapping options. This allows you to add fields to your media item and map other parts of the oEmbed payload into your media item. Doing so means that you can capture things like author information or title, which we haven't allowed for in the above example.

By mapping oEmbed fields into your media item you can control the output of your media item in content a little more on the client end, rather than relying on the provider to sort out all of the theme items.

Conclusion

I have shown how to get basic content embedding from one site to another using oEmbed. It is possible to take this much further by mapping fields or customising the oEmbed endpoint output. The mechanism works very well though and is very easy for end users to make use of.

A nice part of doing this with Drupal is that the response from the endpoint is cached in the consuming site, which means we don't need to ask for it every time we load the page. This does mean that you need to clear your caches if you want to refresh your embedded content.

When setting up your oEmbed endpoint you need to also make sure that you take into account any permissions of the content you allow access to. This means checking things like published status or other permissions. This is especially important as the oEmbed endpoint is publicly available and so can be used to fish your content if you don't lock it down correctly.

Also, be careful about enumeration attacks on your content. By using node IDs in this way you can allow an attacker to just find all of the content on your site. Using GUID or non node ID based URLs is a good way around this.

What confused me when I first looked at this system was how content gets from one site to the other. Drupal is just consuming the oEmbed API and pulling in the relevant content so there really isn't a lot of detail here. As long as your users understand how to add media items with a single URL then they can inject whatever content they want. The media interface will automatically detect problems from the endpoint, which avoids problems like embedding non-existent content.

One note of warning is to be careful not to change the provider hook information after you have configured things. This will lead to some interesting errors about missing plugins that can be a pain to sort out.

More in this series

Comments

Permalink

Great article!

One thing you might want to add is an access check that e.g. the node is published 

larowlan (Sun, 10/09/2022 - 21:32)

Permalink

Thanks for reading larowlan!

Good point about the published status! I did add something in the code on the published status of the page, but I've added further clarifications.

Permalink

I'm the maintainer of the oEmbed Providers module. Great article! Beginning in the 2.x version of the module, we introduced Provider buckets, which are groupings of providers that are dynamically exposed as a media source. Instead of using hook_media_source_info_alter() to create a new media source in code, you could create a Provider bucket called "My oEmbed Content" with a machine name of "my_oembed_content" and a description of "Embed a node.". The oEmbed Providers module will then register the Provider bucket as a media source. (Take a look at https://git.drupalcode.org/project/oembed_providers/-/blob/2.x/oembed_p…)

Chris Burge (Thu, 10/13/2022 - 14:51)

Permalink

Hi Chris!

Thanks for reading! I really appreciate your comment about improvements to the module. I did see the provider buckets but didn't use them; so it's good to know they are useful.

Thanks for the work on creating the oEmbed Providers module. It really helps to make oEmbed a useful part of Drupal :)

Add new comment

The content of this field is kept private and will not be shown publicly.