Understanding Drupal 8's Migrate API

Chris

This past weekend, I was honored to be able to present a session at 2017's New England Drupal Camp (NEDCamp) about Drupal 8's Migrate API. Redfin has implemented many Drupal 8 migrations to date both from CSV data sources and legacy Drupal sites (Drupal 6 and 7). As a result, we want to share with you what we've learned in hopes of saving you the time often spent in the trials and errors of data migration.

My Master's degree is in Technology Education, so I understand that people learn things differently. Some people are auditory or visual learners while others like to read. To that end, I wanted to summarize our session here. If you are an audio or visual learner, please check out these resources:

Otherwise, let's peek at the concepts...

The overall process is:

  1. Scaffold out a module (.info.yml file). You can use Drupal Console for this.
  2. Write YAML files to define your migrations.
  3. Set up any global configuration if needed (your legacy database connection credentials in settings.php, a folder for your CSV files, etc).
  4. Extend/alter/write any custom plugins or code to support those migrations.

While you can use the Drupal Migrate UI to do this, I recommend building your Drupal 8 site the way you want (maybe taking advantage of newer paradigms like paragraphs, for example), and then worry about your migrations. There are four main modules that come in to play when not using the UI. Two are in core--migrate, and migrate_drupal. "migrate" lets you get anything into Drupal 8, while "migrate_drupal" extends and supports that to enhance your experience when migrating IN from a Drupal 6 or 7 (or 8!) site.

Two modules in the contrib space help you above and beyond what is provided in core. Migrate Tools provides Drush command integration for managing your migrations, while Migrate Plus provides loads of enhancements for migrating your data (additional plugins, etc). It's important to make sure you're using the right version of the module for your version of Drupal, by the way--but that's easy since you're using Composer, right?

Write Some Migrations

You will need to drop a migration YAML file in your module's /migrations folder. Generally, the yaml file is named after the id you specify for your migration plugin. As a full example, see core's d7_node.yml migration.

The Migrate API follows a traditional software design pattern called Extract, Transform, Load. To avoid some confusion with the concept of "Load" (in this case meaning loading data INTO your Drupal database), there's some different terminology used in Migrate:

  • Extract == Source
  • Transform == Process
  • Load == Destination

One thing that Migrate Plus provides is the concept of a "Migration Group." This allows multiple migrations to share some configuration across all of them. For example, if all migrations are coming from the same MySQL database (say, a Drupal 6 database), then that shared configuration can go into the migration group configuration once rather than into each individual migration.

There's some global configuration that goes into each individual migration, for example its "id" key (the unique ID of this migration), and its "label" (friendly name in the UI / Drush).

One thing you can specify also are "dependencies" - for example to a module. You can also enforce "migration_dependencies," which means that before THIS migration is run, THAT one needs to run. This is a great way to ensure references (like entity references, or taxonomy terms) are migrated into the site before anything that uses them.

Each migration then should specify three unique sections--source, process, and destination (sound familiar?).

Source

The source section specifies which plugin to use. These plugins are usually found in the module involved. For example, if you want to migrate data in from Drupal 7 nodes, take a look in core/modules/node/src/Plugin/migrate/source for the implementation / plugin id to use. Often, though, you'll actually find yourself writing a new Class in your own module which extends this one.

Each different source plugin might have some additional configuration that goes along with it. For example, with "plugin: d7_node" you might also specify "node_type: page" to migrate in only basic pages from your Drupal 7 site. You might also specify "key" here to say which database key from the $databases array in settings to use (if that wasn't specified globally in your migration group!).

The purpose of all source plugins is to provide a Row object (core/modules/migrate/src/Row.php) that is uniform and can be consumed by the next part of the process.

If you do write your own migration plugin, two methods I find myself frequently overriding are query() (so I can add conditions to the typical source query, for example - like only grabbing the last year's worth of blog posts), and prepareRow(). The method prepareRow() is your hook-like opportunity to manipulate the Row object that's about to be transformed and loaded into the database. You can add additionally-queried information, or translate certain values into others, or anything you need in prepareRow(). The only thing to beware of is every Row becomes a destination entity, so if you're doing something like joining on taxonomy terms, you're better to do that in prepareRow and add a new property to it with setSourceProperty() rather than, say, LEFT JOINing it on in the query.

Destination (yes, I skipped Process)

The destination plugins work largely the same way. You simply specify what entity (usually) you're migrating into. For example, you might have destination: 'entity:node' and any additional configuration related to that destination. One example for entity:node is to add default_bundle: page here so that you don't need to set bundle: 'page' in your process section (which we're about to get to). Similarly, if migrating files, you can specify source_base_path: 'https://example.org' to automatically download images from that path when importing!

Like source plugins, destination plugins have additional configuration here that is tied to the plugin.

There are so many things that are entities in Drupal 8, the possibilities are vast here - you can migrate nodes, users, comments, files, sure. But you can also migrate configuration, form display, entity view modules, anything in contrib, or your own legacy code! Migrate Plus also provides a "Table" destination so you can migrate directly into MySQL tables inside of Drupal (note that this is generally not best practice if you're migrating into entities--you're better off using the entity:whatever plugin so you take full advantage of the entity API).

Process

This is where all the real magic happens, in my opinion. To keep this blog post short (is it too late for that?), I won't go too deep into all the process plugins available, but I will talk about a few special cases and then encourage you to check out the documentation for yourself.

The "get" plugin is the most basic of all. It simply means "take the value off the Row object for property x, and map it to value y." In your real migration's yml file it would look like destVal: sourceVal - which simply means "take what's in $row->sourceVal and put it in the destination's "destVal" property.

The "migration_lookup" plugin goes one simple step further than get and translates the incoming ID to the new ID value on the new site. For example, if you have a migration that migrates person nodes and the nid was 65 for John Smith on the Drupal 6 site, but is 907 on the new Drupal 8 site, a reference to that person (say on the "authors" field of your "Research" content type) would also need to be translated. This plugin transforms the incoming 65 to the correct 907 by referencing a migration that has already been run (remember the migration_dependencies key?).

Multiple plugins can even be chained together to form a "pipeline" of transformations that can happen in order. For example, if your old database only had usernames as "Chris Wells" and "Patrick Corbett," but you wanted to make usernames, you could run that through the machine_name plugin to change it to "chris_wells" instead. But, what if there was already a "chris_wells" user? Well, you can then run the new value through dedupe_entity to append an "1" or "2" etc until it's unique. You can create fairly complex pipelines here in the yml file without having to touch any PHP code.

Sometimes a field in Drupal has a "nested value," like the body or link fields. The body field has a "value" and a "format" on it. To map these, you use a slash (/) to separate the field and the sub-field, like 'body/value': description and 'body/format': format -- just be sure and use those "ticks" (apostrophes, single-quotes, whatever you call them) around these types of keys.

Feel free to check out all the core process plugins, and even ones provided by contrib, like: get, migration_lookup, machine_name, dedupe_entity, default_value, concat, explode, flatten, static_map, substr, skip_on_empty, skip_row_if_not_set, menu_link_parent, callback, entity_generate, and entity_lookup!

There's one more special one, formerly known as "iterator" and now called "sub_process." This lets you create multi-step a pipeline against an array of structured data arrays. Make sure to pay some special attention to that one.

Put it all together

So by now you've created your shiny new Drupal 8 site just how you want and you've written a module (.info.yml file, really). You've placed all these migrations in it. You can place them in config/install and they will be read in as configuration. You can then edit them as needed using drush config-edit or similar in Drupal Console. You could also uninstall and reinstall your module each time you alter the yml files.

Alternatively, you can also place them in /migrations (off your module root) and instead they will be loaded as plugins instead of configuration. This way is likely preferred since you can just flush the plugin cache when you make changes to the YML file.

Once you also have your source set up (database, CSV files, XML, whatever), you can start to run your migrations with Drush!

The most commonly-used Drush commands for migrating (in my world) are:

  • migrate-status - where are things at?
  • migrate-import - run a migration (single or whole group)
  • migrate-rollback - "un-run" a migration
  • migrate-reset-status - used to reset a migration to "Idle" state if the PHP code you're writing bombs out or you press Ctrl-C in frustration.

Others I don't use as frequently are:

  • migrate-stop - stops a running migration and resets it to idle (I usually press Ctrl-C and then do a drush mrs (migrate-reset-status))
  • migrate-fields-source - list all the available properties on Row to import (I usually just inspect this in the debugger in prepareRow())
  • migrate-messages - display messages captured during migration (PHP warnings, etc) (I usually just look at the database table where these are stored instead of printing them in terminal)

WHOA.

So there you have it. Migration in a nutshell! Please do feel free to leave comments and questions below or reach out to us at Redfin if you need help migrating data into your shiny new Drupal 8 site.

Disqus Comments