by Adam Brett

Introduction to Puppet

This article was published on Friday, April 08, 2016 which was more than 18 months ago , this means the content may be out of date or no longer relevant. You should verify that the technical information in this article is still current before relying upon it for your own purposes.

Before I joined my current company, I'd used Bash, Chef, and Ansible commercially, to provision servers, and more commonly, Vagrant Boxes. Each has their own benefits and drawbacks, and work in slightly different ways. When I joined my current company, I got to add a new tool to my belt: Puppet.

Today, in a similar vein to my Ansible and Chef tutorials, I'm going to show you how to provision a basic LAMP stack, and then refactor our basic manifest into a more manageable and maintainable structure, so by the end you should have a good grounding in creating maintainable Puppet.

Vagrant and Puppet

To learn Puppet, we need a server we can provision and destroy often, and the quickest and easiest way to do that is with a virtual-machine using Vagrant and VirtualBox. Make sure you have both installed before continuing.

At the time of writing, VirtualBox 5 doesn't work well with Vagrant, so make sure you install VirtualBox 4.3.

Basics

Let's start by creating a new directory to hold our project, open up a terminal and type:

mkdir -p ~/Projects/puppet && cd ~/Projects/puppet

Next, we want to create a new Vagrantfile to control our virtual-machine:

vagrant init ubuntu/trusty64

This will create a new Vagrantfile in the current directory, and set the base image to ubuntu/trusty64. This is the official base image for the current Long-Term Support (LTS) release from Canoical (the company behind Ubuntu), and that's what we deploy new servers as at my current company.

Now you have your Vagrantfile in the root of your project directory, let's open it up and take a look. This file contains the basic configuration for the virtual-machine you want to run in VirtualBox, and then a bunch of commented out lines.

You don't need the commented out lines right now, so remove them all and leave yourself with the bare essentials:

VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"
end

We'll need a way to access our webserver once it's provisioned, so we'll tell Vagrant to forward port 80 from our box to port 8080 on localhost. To do that, add the following on the line before end:

config.vm.network "forwarded_port", guest: 80, host: 8080

Now, there is one last thing we need to do to configure Vagrant, and then we're finished with it. We need to tell Vagrant that we want to use Puppet as its provisioner, and where to find the commands to run. To do this, add the following lines to your Vagrantfile, again, on the line before end:

config.vm.provision "puppet"

Once you've done that, the entire contents of your Vagrantfile should look like this:

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"

  config.vm.network "forwarded_port", guest: 80, host: 8080

  config.vm.provision "puppet"
end

Basic Terms

Puppet, like Chef, is a daemon on your server, and comes in two forms, but unlike Chef, is a single daemon, and behaves differently depending on how it's invoked and configured. Also unlike Chef, Puppet has no local client, all operations are remote, on the server.

Puppet Apply runs when you tell it to, and is the version we'll be using in this article. It's useful for provisioning a single or small number of servers.

Puppet Agent phones home to something called a Puppet Master. This is the central location for your Puppet Manifests, Puppet Agent phones home to the Puppet Master on a schedule, and ensures the server remains in sync.

Puppet organises Resources, on your server into Manifests. Manifests can be either Classes or Defines, which are in turn combined in to Modules. A Resource is a built-in way to do something, like install a package, or change the contents of a file.

Puppet is unlike other configuration management tools in one significant and infuriating way: It does not execute in order of definition. Instead, Puppet is a declarative language. You define how you want the server to look, and Puppet decides the order in which to do it. This is the source of the majority of errors in Puppet Manifests. To overcome this weakness, there is a module pattern you can follow, which we'll get on to later.

Your First Manifest

Create a new directory in the root of your project called manifests, and inside there create a new file called default.pp.

Although Puppet itself is Ruby, all Manifests use the Puppet DSL. In it's simplest form, the DSL looks like this:

declared_type name ($params) {
    // do something
}

Or more concretely:

class hello_world ($name='World') {
    info("Hello, ${name}")
}

The simplest of all types is the Node type, so that's where we're going to start. Add the following to your default.pp file:

node default {

}

Here, we're specifically saying on the node (read: server) default (read: all), do this. A node definition matches a server based on hostname, so we can replace this with a specific hostname of a server:

node 'dev.adambrett.co.uk' {

}

And it also supports Regular Expressions, like so:

node /(A-Za-z0-9)+\.dev\.adambrett\.co\.uk/ {

}

As in Ruby, any string wrapped in // instead of single or double quotes becomes a regular expression. We're using Vagrant which means we have one server and can set the value to default.

To install a LAMP stack, there are three basic steps we need to take:

  1. Install Apache
  2. Install MySQL
  3. Install PHP

That's all we need to do. We're using an ubuntu base image, so we can rely on apt to install the necessary packages for us. To do that in Puppet we need to use Puppet's Package Resource[^1].

node default {
    package { 'apache2':
        ensure => present
    }

    package { 'mysql-server':
        ensure => present
    }

    package { 'php5':
        ensure => present
    }
}

Now we're ready to provision. Drop back to your terminal and run vagrant up, and you should see something like this:

puppet first run

That's it. If all you wanted was a working LAMP server, you now have it. SSH into the vagrant box using vagrant ssh from the root of your project and put an info.php file in /var/www/html with:

<? phpinfo();

And load it in your browser at http://localhost:8080/info.php and you should see everything working as expected.

Breaking It Down

It's important to understand the code you write, so although this is the simplest Puppet manifest we can put together, we should break it down.

In this Manifest, we're telling Puppet that we want to manage each of these packages, first passing in the package name to name the Resource. Every Resource in Puppet requires a name, and for most resources, this should match the name of the server artifact you're managing. This means the name of the package in Apt, or the full path to a file.

Secondly, every name must be unique for that Resource Type across all Manifests on your server. That means if you're using 3rd-party modules, or including modules of your own, the names must still be unique.

We can stop this becoming a problem in 3 different ways. The first is to wrap your definition in an if statement:

if ! defined(Package['apache2']) {
    package { 'apache2':
        ensure => present
    }
}

Now our definition won't get run if it's already defined elsewhere, this is a safe way to ensure that you don't get conflicts. The second way is to uniquely name your Resource Definition for your module, then include the Package using the name attribute:

package { 'my-module-apache2':
    ensure => present,
    name   => 'apache2'
}

Most Resource Types in Puppet will have an extra name attribute to allow you to do this, but it's not great practice.

You should notice a couple of important bits of syntax here. First, the name of the Resource my-module-apache2, is arbitrary. It can be anything you want, including a variable (we'll look at that later), as long as it's unique, but it is best practice for it to match the name of the server artifact you're managing.

Secondly, all strings should use single quotes, not double quotes. This isn't required by the Puppet parser, which allows single or double quotes, but single quotes for strings without variables, and double quotes for strings with variables is best practice. Strings with variable interpolation require double quotes. Single quote strings are literal strings, and variables will not get interpolated.

The final thing to notice here, is the alignment of the assignment arrows. Like single quotes for strings, this is not required by the parser, but is community best practice, and will cause an error in any linting tools you decide to use later.

The third and best way to overcome the restriction with Resource naming is to make sure that each shared Resource has it's own Manifest which we can import everywhere it's needed. You can include Manifests from other Modules in lots of places, and Puppet is smart enough to make sure it includes it the first time, and ignores the following includes. We'll look at that next.

Refactoring

The next step after you have any application working is to make it better, and Puppet is no different. The first thing we can do is make our manifest DRY, and get rid of the repetition.

DRY isn't always the goal of course, and doesn't make sense in certain circumstances. As you will see in a moment, this is not one of those circumstances, but this is a good chance to introduce you to the way Puppet does loops.

Calling this a loop is certainly a fallacy. As I said before, Puppet DSL is declarative, which means it doesn't have loops. This can be a pain, but the developers of Puppet have good reasons for excluding them. Instead, if we have a colletion of Resource definitions of the same type, we can combine them into a single definition. It looks something like this:

node default {
    package { ['apache2', 'mysql-server', 'php5']:
        ensure => present
    }
}

We've now replaced the three install steps with a single install step that installs all the packages we require. Puppet will run that same Resource Definition, for each item in our list, substituting the name with each item in turn.

You can test it works by running vagrant destroy followed by another vagrant up in the root of your project.

Extracting Include Files

When you're installing packages for a new server you want to do more than install the packages – you want to configure them too. You want to tell apache to use /vagrant instead of /var/www/html (which is the default location for vagrant to mount the current directory), or install php_mysql and php_pdo so you can access your MySQL Server from PHP.

The first step in this process is extracting each Package to it's own Manifest, and then it's own Module.

Before we do that, let's restore our install tasks to three separate tasks to make them easier to break up:

node default {
    package { 'apache2':
        ensure => present
    }

    package { 'mysql-server':
        ensure => present
    }

    package { 'php5':
        ensure => present
    }
}

Now create three new files in your manifests directory:

in manifests/apache.pp:

class apache {
    package { 'apache2':
        ensure => present
    }
}

and manifests/mysql.pp:

class mysql {
    package { 'mysql-server':
        ensure => present
    }
}

then manifests/php.pp:

class php {
    package { 'php5':
        ensure => present
    }
}

The first thing you should notice is that we're now wrapping our Package definitions in a class, not a node. Classes in Puppet are like Classes in Object Oriented Programming (OOP). They're not included if you don't explicitly include them. Unlike OOP, Puppet classes do not become objects, they cannot be instantiated, and they do not have methods (although they do have Properties, and Private Functions, which we'll see later).

Now we need to update our default manifest to include these new files. Change your default.pp to match the following:

import 'apache'
import 'mysql'
import 'php'

node default {
    include apache
    include mysql
    include php
}

We're import-ing our new Manifests outside of our Node Definition, and then include-ing them inside the Node Definition, because a single Manifest, which translates to a .pp file, can include more than one Node Definition, matching different nodes, like so:

import 'apache'
import 'mysql'
import 'php'

node /web/ {
    include apache
    include php
}

node /database/ {
    include mysql
}

That's all we need to do. include is a special keyword in Puppet that imports another manifest, as is, and runs it. It functionally the same as PHP's include_once, as you can include a Manifest as often as you like, and it will ignore all but the first.

When we start looking at Modules, we won't need to use import, as we'll be creating the Modules following Puppet's Autoload conventions, so Manifests will be automatically import-ed when they we reference them.

Test this new configuration by running vagrant destroy followed by vagrant up in your Terminal again. Everything should work as before.

Templates and Files

As I mentioned above, you will want to include related tasks in your new include files. As an example, we'll update your Apache DocumentRoot setting to point at /vagrant instead of /var/www/html, so you can serve your local files. Of all the ways to do this, the simplest is to include your VirtualHost as part of your Manifest, either in the form of a Template or a File.

The difference between a Template and a File is a small but important one. Puppet will copy a file to your server as it appears in your Manifest, without modification, whereas a Template can contain variables that Puppet will substitute with real values before copying across to the server. This is the path we'll take for our VirtualHost.

To create our VirtualHost, we need to follow these four steps:

  1. Install the new VirtualHost (with the DocumentRoot set to /vagrant)
  2. Disable the 000-Default VirtualHost
  3. Enable our new VirtualHost
  4. Reload the Apache config.

Templates

We'll start by copying our VirtualHost to the server, and to do that we'll use a Template. We can use a File for this, and hard-code the DocumentRoot, but the process is virtually the same as using Templates, and a Template is more flexible.

As Puppet is Ruby underneath, for templates it uses normal ERB files. Don't worry too much, the syntax is simple to learn.

To get started, create a directory in the root of your project called templates, and in that new directory create a file called virtual-hosts.conf.erb. It's customary to call a template file it's normal name, including file extension, and then append .erb to the end.

I ssh'd into our already provisioned vagrant box (using vagrant ssh), and grabbed the contents of the existing VirtualHost (at /etc/apache2/sites-available/000-Default). Now put the contents of that file in templates/virtual-hosts.conf.erb (I have stripped out the comments for the sake of brevity):

<VirtualHost *:80>
  ServerAdmin [email protected]
  DocumentRoot /var/www/html

  ErrorLog ${APACHE_LOG_DIR}/error.log
  CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

Now replace the line:

DocumentRoot /var/www/html

with:

DocumentRoot <%= @document_root %>

This is how you echo a variable in ERB. The name of the variable in this case is document_root. The @ at the start tells us that it's a variable from our Puppet Class, and not a variable local to the ERB file (which wouldn't have the @).

Because of the way Apache works, you need to add a new config section for the /vagrant directory, otherwise you'll get 403 Forbidden errors, so add the following below your DocumentRoot line:

<Directory <%= @document_root %>>
  Require all granted
</Directory>

All together, it should look like this:

<VirtualHost *:80>
  ServerAdmin [email protected]
  DocumentRoot <%= @document_root %>

  <Directory <%= @document_root %>>
    Require all granted
  </Directory>

  ErrorLog ${APACHE_LOG_DIR}/error.log
  CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

Now save the file and open up manifests/apache.pp, we need to add a new Resource to manage this Template and create the file on the server.

file { '/etc/apache2/sites-available/vagrant.conf':
  ensure  => present,
  content => template('virtual-hosts.conf.erb')
}

You don't need to include the templates directory in the template path, as we will need to configure Puppet to tell it where to find our Templates. When we look at Modules later, Template paths will be relative to the current Module.

The whole file should now look like this:

class apache {
    package { 'apache2':
        ensure => present
    }

    file { '/etc/apache2/sites-available/vagrant.conf':
      ensure  => present,
      content => template('virtual-hosts.conf.erb')
    }
}

The final thing we need to do to copy our Template across is to tell Puppet what to use for the value of document_root when it's copying the Template in to place. As we referenced the variable with the @ symbol in our Template, we need to make sure the variable is in the Class scope at the point we call the template function.

To do this, add it to the top of the class, prefixed with a $ sign, like so:

class apache {
    $document_root = '/vagrant'

    package { 'apache2':
        ensure => present
    }

    file { '/etc/apache2/sites-available/vagrant.conf':
      ensure  => present,
      content => template('virtual-hosts.conf.erb')
    }
}

This is an example of a Class Property in Puppet. We can put the definition anywhere in the Class definition, and we can reference it in other Manifests by prefixing it with the name of the Class that it's declared in:

$apache::document_root

We need to update our Vagrant configuration to make our Templates available inside the VM, and tell Puppet where to find them. Update your Vagrantfile to match the following:

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"

  config.vm.network "forwarded_port", guest: 80, host: 8080

  config.vm.provision "puppet" do |puppet|
      puppet.options = ["--templatedir", "/tmp/vagrant-puppet/templates"]
  end

  config.vm.synced_folder "templates", "/tmp/vagrant-puppet/templates"
end

Drop back to your terminal and run vagrant reload --provision. This demonstrates the idempotency of Puppet. We can provision a server over and over, and Puppet will not re-run anything that hasn't changed.

We need to vagrant reload here, and not vagrant provision, because we've added a new directory, and vagrant needs to reboot the VM to make this available. If we hadn't added the new directory, and wanted to run our changed Puppet Manifests, we would use vagrant provision instead, to avoid rebooting the VM.

We can test that this has worked by sshing into the server and checking the contents of the new file:

vagrant ssh
cat /etc/apache2/sites-available/vagrant.conf

And you should see something like this:

<VirtualHost *:80>
  ServerAdmin [email protected]
  DocumentRoot /vagrant

  <Directory /vagrant>
    Require all granted
  </Directory>

  ErrorLog ${APACHE_LOG_DIR}/error.log
  CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

Now we have three steps left to complete:

  1. Disable the default VirtualHost
  2. Enable our new VirtualHost
  3. Reload Apache

Files

Ubuntu's configuration convention for Apache VirtualHosts is to put the contents of all possible VirtualHosts in /etc/apache2/sites-available and then use the a2ensite and a2dissite tools to create and remove symlinks for each active VirtualHost in the /etc/apache2/sites-enabled directory.

When using Puppet, it's best to bypass these tools and create and remove the symlink ourselves using the File Resource. We need to add two new Resources to our manifests/apache.pp file: One to remove the existing symlink, and one to create our new one. They look something like this:

file { '/etc/apache2/sites-enabled/000-default.conf':
  ensure => absent
}

file { '/etc/apache2/sites-enabled/vagrant.conf':
  ensure => link,
  target => '/etc/apache2/sites-available/vagrant.conf',
}

Drop back to your terminal, run vagrant provision and you should see the normal Puppet output displayed, including your new Resources. When it's done, ssh into your vagrant box and run ls /etc/apache2/sites-enabled. You should see that 000-default.conf is now missing, and vagrant.conf is in its place.

Execution Order

Up to now, we've been applying our changes to the server incrementally. This means that our dependencies weren't an issue because of the order in which we've built our manifests. If we destroy our server and try to re-create it, we'll see the following errors:

puppet fail

This is because Puppet is a declarative language. Puppet will try to work out the best execution order itself, and 9 times out of 10, gets it wrong. In this instance, the package apache2 creates the directory /etc/apache2/sites-available, which needs to exist before we can copy our VirtualHost into it. Puppet isn't smart enough to work this out, so we need to tell it.

To do this, we add an execution order to our class. This means referencing the Resources we've declared and then linking them together. The best way to do this is to set the order at the bottom of your Class, like so:

class apache {
  $document_root = '/vagrant'

  package { 'apache2':
    ensure => present
  }

  file { '/etc/apache2/sites-available/vagrant.conf':
    ensure  => present,
    content => template('virtual-hosts.conf.erb')
  }

  file { '/etc/apache2/sites-enabled/000-default.conf':
    ensure => absent
  }

  file { '/etc/apache2/sites-enabled/vagrant.conf':
    ensure => link,
    target => '/etc/apache2/sites-available/vagrant.conf',
  }

  Package['apache2'] ->
  File['/etc/apache2/sites-enabled/000-default.conf'] ->
  File['/etc/apache2/sites-available/vagrant.conf'] ->
  File['/etc/apache2/sites-enabled/vagrant.conf']
}

This is an example of referencing a Resource in your Puppet Manifest. When referencing a Resource for any reason, you use the upper-case first version of the Resource Type, followed by the name that you gave it when you declared it.

Here, we're linking them together with Chaining Arrows, which is the simplest, clearest, and easiest method to maintain. Another option is to add the requirements to the Resource Definition, like so:

class apache {
  $document_root = '/vagrant'

  package { 'apache2':
    ensure => present
  }

  file { '/etc/apache2/sites-available/vagrant.conf':
    ensure  => present,
    content => template('virtual-hosts.conf.erb'),
    require => Package['apache2']
  }

  file { '/etc/apache2/sites-enabled/000-default.conf':
    ensure => absent,
    require => Package['apache2']
  }

  file { '/etc/apache2/sites-enabled/vagrant.conf':
    ensure  => link,
    target  => '/etc/apache2/sites-available/vagrant.conf',
    require => File['/etc/apache2/sites-available/vagrant.conf']
  }
}

This method looks clearer at first, but in reality grows out of hand over time, and becomes harder to manage and maintain.

After updating your Manifest to match the above example, re-create and provision your server, and everything should work as expected.

Notifying Services

When changing the configuration for a service, you need to tell that service that it's changed, so it can either reload the config, or restart itself to pick it up.

Notifying is how you do this in Puppet. It's there are two options for this, but the best is to reference the Resources at the bottom and use a different style of Chaining Arrow to declare a refresh. Before we do that, we need to create a new Resource to tell Puppet we want to manage the Service, as well as the Package. Update your apache.pp to match the following:

class apache {
  $document_root = '/vagrant'

  package { 'apache2':
    ensure => present
  }

  service { 'apache2':
    ensure => running,
    enable => true
  }

  file { '/etc/apache2/sites-available/vagrant.conf':
    ensure  => present,
    content => template('virtual-hosts.conf.erb')
  }

  file { '/etc/apache2/sites-enabled/000-default.conf':
    ensure => absent
  }

  file { '/etc/apache2/sites-enabled/vagrant.conf':
    ensure => link,
    target => '/etc/apache2/sites-available/vagrant.conf',
  }

  Package['apache2'] ->
  File['/etc/apache2/sites-enabled/000-default.conf'] ->
  File['/etc/apache2/sites-available/vagrant.conf'] ->
  File['/etc/apache2/sites-enabled/vagrant.conf']

  File['/etc/apache2/sites-available/vagrant.conf'] ~>
  Service['apache2']
}

Let's break this down and look at each part we've added:

service { 'apache2':
  ensure => running,
  enable => true
}

This is the Service Resource Type. It tells Puppet we want to manage the service with the name apache2. We want to ensure that it's always running, and enable => true means we want to start it at boot.

We've not updated our run order to include the service, because Puppet is smart enough to work that part out, and adding it in the wrong place will create a cyclic dependency with the following section:

File['/etc/apache2/sites-available/vagrant.conf'] ~>
Service['apache2']

This new block tells Puppet that if the File Resource File['/etc/apache2/sites-available/vagrant.conf'] changes, then notify (~>) the Service Resource Service['apache2'] that it needs to refresh. If we needed to explicitly add the service to our run order, it would need to go after this file.

That should be all we need to do now. If you run vagrant provision, or vagrant destroy followed by vagrant up, you should end up with a VM that's serving your current working directory. We can test this by dropping an info.php file with <? phpinfo(); into the root, then visiting http://localhost:8080/info.php in your browser.

Modules

As your Manifests grow, and you start to write more and more of them, it becomes a good idea to group related ones in to Modules. In Puppet, a Module is a collection of Manifests, Templates, and Files all related to a singular purpose or application. We'll refactor our three Manifests into Modules, and then take advantage of the Puppet Autoloader to include them in our Node Definition.

Start by creating a modules directory in the root of your project, at the same level as manifests and templates. Inside there, we need a directory to hold each module, so create directories for apache, mysql, and php. The last step is to create the directories to hold the files inside our Modules, so inside modules/apache, create manifests and templates, and inside modules/mysql and modules/php create a manifests directory, as we don't have any Templates for those. Here's a one-liner to make it simple for you:

mkdir -p modules/{apache/{manifests,templates},mysql/manifests,php/manifests}

Now move your Manifests from your manifests directory in the root of the project to the manifests directory inside each of the respective Modules:

mv manifests/apache.pp modules/apache/manifests/apache.pp
mv manifests/mysql.pp modules/mysql/manifests/mysql.pp
mv manifests/php.pp modules/php/manifests/php.pp

And move your virtual-hosts.conf.erb Template into the modules/apache/templates directory:

mv templates/virtual-hosts.conf.erb modules/apache/templates/virtual-hosts.conf.erb

Your templates directory in the root should be empty now, so let's get rid of it:

rmdir templates

The last thing we need to do is update our Vagrant configuration to tell it where to find the modules, and that our root templates directory doesn't exist anymore. Update it to match the following:

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"

  config.vm.network "forwarded_port", guest: 80, host: 8080

  config.vm.provision "puppet" do |puppet|
    puppet.module_path = "modules"
  end
end

This is actually simpler than before! That's thanks to Puppet's Autoloading Conventions.

Autoloading

Now everything is in the right place, we need to update our files to follow Puppet's Module Autoload Conventions. The entrypoint for a module is init.pp, so let's rename our Manifests to match it:

mv modules/apache/manifests/apache.pp modules/apache/manifests/init.pp
mv modules/mysql/manifests/mysql.pp modules/mysql/manifests/init.pp
mv modules/php/manifests/php.pp modules/php/manifests/init.pp

Our class names can now remain the same, as they are the entrypoint for the Module.

Next, we update the path to our template in modules/apache/manifests/init.pp, so:

file { '/etc/apache2/sites-available/vagrant.conf':
  ensure  => present,
  content => template('virtual-hosts.conf.erb')
}

Becomes:

file { '/etc/apache2/sites-available/vagrant.conf':
  ensure  => present,
  content => template('apache/virtual-hosts.conf.erb')
}

We don't need the path to modules, as it's relative to the modules directory by default, and Puppet always expects templates to be in a directory called templates, so we don't need to include that either. Instead we include the name of the Module, and the name of the Template, and that's it.

Now update manifests/default.pp to remove our import statements at the top – they're no-longer needed as we're autoloading from Modules instead. The rest can remain the same, as we're referencing Module names instead of Manifests now, default.pp in it's entirety should be:

node default {
    include apache
    include mysql
    include php
}

Drop back to your terminal and run vagrant destroy -f && vagrant up to test everything works as expected.

The Module Pattern

As your Modules grow, including your Resources in a single Manifest will become unmanageable. When designing a Module structure, we want to keep them maintainable and manageable, as well as making what's going on obvious to others (and your future self), requirements to keep in mind:

  1. A module should have a single entry point where someone reviewing it can get an overview of it's behavior
  2. Modules that have configuration should be configurable in a single way and single place
  3. Modules should consist of single-responsibility classes. As far as possible these classes should be private details hidden from the user
  4. For the common use cases, users should not need to know individual resource names
  5. For the most common use case, users should not need to provide any parameters
  6. Modules should have a consistent design and behaviour

The most important consideration when designing a Module is not what it does, but instead making it accessible for others to use, maintain, and understand.

Generalising our requirements from earlier so that they apply to most software you want to install, we end up with:

  1. Install the packages and any dependencies
  2. Create configuration files with environment specific values
  3. Start the service or services and restart it if the config files change

This implied run order and basic pattern applies to most pieces of software. These 3 points translate to distinct groups of actions and sticking with the above principal of single function classes we will create a class for each group.

To keep everything clear and obvious these will be install, config, and service.

We'll refactor our Apache Module to follow this format, and I'll leave refactoring the MySQL and PHP Modules as an exercise for the reader.

We'll start by creating our three class files:

touch modules/apache/manifests/{install.pp,config.pp,service.pp}

Now add empty classes to each of these files:

// modules/apache/manifests/install.pp
class apache::install {}
// modules/apache/manifests/config.pp
class apache::config {}
// modules/apache/manifests/service.pp
class apache::service {}

These classes live in the Apache Module, so we namespace them using the apache:: prefix. This is how we will refer to them in other Manifests.

Now we need to extract the appropriate parts from our init.pp manifests. In install.pp, add our Package Resources:

class apache::install {
  package { 'apache2':
    ensure => present
  }
}

That's nice and simple, so let's add our configuration files to config.pp:

class apache::config {
  file { '/etc/apache2/sites-available/vagrant.conf':
    ensure  => present,
    content => template('apache/virtual-hosts.conf.erb')
  }

  file { '/etc/apache2/sites-enabled/000-default.conf':
    ensure => absent
  }

  file { '/etc/apache2/sites-enabled/vagrant.conf':
    ensure => link,
    target => '/etc/apache2/sites-available/vagrant.conf',
  }
}

More resources here, but nothing we haven't seen before. Let's add our service to service.pp:

class apache::service {
  service { 'apache2':
    ensure => running,
    enable => true
  }
}

Easy. Now you should have the following left in init.pp:

class apache {
  $document_root = '/vagrant'

  Package['apache2'] ->
  File['/etc/apache2/sites-enabled/000-default.conf'] ->
  File['/etc/apache2/sites-available/vagrant.conf'] ->
  File['/etc/apache2/sites-enabled/vagrant.conf']

  File['/etc/apache2/sites-available/vagrant.conf'] ~>
  Service['apache2']
}

We're left with the configuration value for document_root, and our run order. We need to update this to include the new classes we've created, so init.pp serves as our entrypoint, and the details of our classes are not visible to the user.

class apache {
  $document_root = '/vagrant'

  class { 'apache::install': } ->
  class { 'apache::config': } ~>
  class { 'apache::service': }
}

Here, instead of including the classes as we were in our Node Definition, we're creating Resources for them. This has the added benefit of being able to define a run order on our classes, instead of the Resources in them. This means we can write our classes without worrying ourselves with the run order for our Resources.

As this class is small and not at all complex, we're also using the inline Chaining Arrows to short-cut the order, instead of referencing the Resources separately at the bottom as we were before. We are also using the notify (~>) arrow between config and service, which means if anything in the config class changes, everything in the service class will get reloaded automatically.

We can try running this now, but it's not going to work yet. there's one thing left to do, and that's to get the value of $document_root into the config class.

Parameters

Puppet Classes can take parameters, which become available in the scope of the class. We have one class that needs a parameter: the config class – for $document_root, so let's add that now:

class apache::config($document_root = '/var/www/html') {
  file { '/etc/apache2/sites-available/vagrant.conf':
    ensure  => present,
    content => template('apache/virtual-hosts.conf.erb')
  }

  file { '/etc/apache2/sites-enabled/000-default.conf':
    ensure => absent
  }

  file { '/etc/apache2/sites-enabled/vagrant.conf':
    ensure => link,
    target => '/etc/apache2/sites-available/vagrant.conf',
  }
}

A class can have more than one parameter, with each separated by a comma, but all we need is $document_root, so that's all I've added. I've set the default value to /var/www/html, because as per our requirements, default parameters should be good for the majority of use-cases, and using /vagrant is not going to be useful to most people.

We need to pass our parameter in to the config class in init.pp, but we can't pass in /vagrant, as init.pp is the entrypoint for our generic Module, so that should pass in /var/www/html by default too. init.pp becomes:

class apache($document_root = '/var/www/html') {
  class { 'apache::install': } ->
  class { 'apache::config':
    document_root => $document_root
  } ~>
  class { 'apache::service': }
}

We pass in the $document_root variable from the entrypoint into our config class as we would a parameter to any of the built-in Resource Types. Doing it this way is a form of dependency injection, and allows us to write tests for our Puppet Manifests in isolation later on.

We're going to configure the path in our Node Definition, because it's environment specific, so that's where it belongs:

node default {
    class { 'apache':
      document_root => '/vagrant'
    }
    include mysql
    include php
}

In our Node Definition, we swap out the include for a Class Resource, and pass in our parameter for this Node. We can destroy our server and re-create it now, and everything will work as before, but before we do, there's one last optimistaion we can make.

Params.pp

We can DRY this Module up, as we're repeating the default value for $document_root in init and config. The pattern we use for this is to create a params class, which stores our defaults, and then set the default value in our classes to that of the params class.

Create params.pp in your modules/apache/manifests directory, and add the following:

class apache::params {
  $document_root = '/var/www/html'
}

Now replace the first line of init.pp with:

class apache(
  $document_root = $apache::params::document_root
) inherits apache::params {

So it should look like this:

class apache(
  $document_root = $apache::params::document_root
) inherits apache::params {
  class { 'apache::install': } ->
  class { 'apache::config':
    document_root => $document_root
  } ~>
  class { 'apache::service': }
}

And the same for config:

class apache::config(
  $document_root = $apache::params::document_root
) inherits apache::params {
  file { '/etc/apache2/sites-available/vagrant.conf':
    ensure  => present,
    content => template('apache/virtual-hosts.conf.erb')
  }

  file { '/etc/apache2/sites-enabled/000-default.conf':
    ensure => absent
  }

  file { '/etc/apache2/sites-enabled/vagrant.conf':
    ensure => link,
    target => '/etc/apache2/sites-available/vagrant.conf',
  }
}

Now run vagrant destroy -f && vagrant up, and everything will work as it did before.

Operating System Facts

It can be necessary to write a Module that supports more than one operating system, for example, Ubuntu and Centos. The problem with this, is that each one names packages and services differently.

Thanks to the params.pp pattern, and Puppet's Operating System Facts Tool, called Facter, there is a simple way to handle this. If we extract our package and services names to be variables on the params class, and then reference them in the relevant classes, a single switch statement in params.pp can take care of everything.

Let's start by parametising the package name for apache2:

class apache::install(
  $package_name = $apache::params::package_name
) inherits apache::params {
  package { $package_name:
    ensure => present
  }
}

Then let's add it to params.pp:

class apache::params {
  $document_root = '/var/www/html'
  $package_name = 'apache2'
}

A vagrant destroy -f && vagrant up will confirm that this is still working for you.

To add support for Centos, we need $package_name to be httpd, we can add this as a parameter in init.pp, and pass it to the class in our Node Definition, but to keep the module easy to use we don't want our users to need to have knowledge of the underlying packages and module structure, we want our defaults to work for them out of the box.

To do this, we use a case statement in params.pp, like so:

class apache::params {
  $document_root = '/var/www/html'

  case $::osfamily {
    'RedHat': {
      $package_name = 'httpd'
    }
    'Debian': {
      $package_name = 'apache2'
    }
    default: {
      err("Unsupported Operating System #{::osfamily}")
    }
  }
}

Using this format, the default value on all Debian operating systems will be apache2, and the default value on all RedHat operating systems will be httpd. I have included a default option that will throw an error and end the execution if the $::osfamily isn't one of these two. Including default is not required, but is best practice and will be flagged by linting tools.

Conclusion

With this post I have tried to start with the smallest amount of knowledge required to get to something useful, and then built the more in-depth techniques in smaller easy to follow chunks. I hope this has given you a solid foundation in Puppet and made you realise you don't need to jump in at the deep end. Hopefully you'll be able to go off and apply this to your code or environment in a way that's useful to you.

For exclusive content, including screen-casts, videos, and early beta access to my projects, subscribe to my email list below.


I love discussion, but not blog comments. If you want to comment on what's written above, head over to twitter.