A datum is a piece of information.
Datum is a PowerShell module used to aggregate DSC configuration data from multiple sources allowing you to define generic information (Roles) and specific overrides (i.e. per Node, Location, Environment) without repeating yourself.
To see it in action, I recommend looking at the full day workshop (reduced to 3.5 hours) by Raimund Andree and Jan-Hendrick Peters. The video recording here, and the DSC Workshop repository there.
Datum is working with PowerShell 7.
A new version is in the works, encouragement welcomed. :)
This PowerShell Module enables you to easily manage a Policy-Driven Infrastructure using Desired State Configuration (DSC), by letting you organise the Configuration Data in a hierarchy adapted to your business context, and injecting it into Configurations based on the Nodes and the Roles they implement.
This (opinionated) approach allows to raise cattle instead of pets, while facilitating the management of Configuration Data (the Policy for your infrastructure) and provide defaults with the flexibility of specific overrides, per layers, based on your environment.
The Configuration Data is composed in a customisable hierarchy, where the storage can be using the file system, and the format Yaml, Json, PSD1 allowing all the use of version control systems such as git.
The idea follows the model developed by the Puppet, Chef and Ansible communities (possibly others), in the configuration data management area:
- Puppet Hiera and Role and Profiles method (very similar in principle, as I used their great documentation for inspiration. Thanks Glenn S. for the pointers, and James McG for helping me understand!)
- Chef Databags, Roles and attributes (thanks Steve for taking the time to explain!)
- Ansible Playbook and Roles (Thanks Trond H. for the introduction!)
Although not in v1 yet, Datum is currently used in Production to manage several hundreds of machines, and is actively maintained. A stable v1 release is expected for March 2018, while some concepts are thought through, and prototype code refactored.
To simplify the key concept, a Datum hierarchy is some blocks of data (nested hashtables) organised in layers, so that a subset of data can be overridden by another block of data from another layer.
Assuming you have configured two layers of data representing:
- Per Node Overrides
- Generic Role Data
If you Define a data block for the Generic data:
# Generic layer
Data1:
Property11: DefaultValue11
Property12: DefaultValue12
Data2:
Property21: DefaultValue21
Property22: DefaultValue22
You can transform the Data by overriding what you want in the per Node override:
Data2:
Property21: NodeOverrideValue21
Property22: NodeOverrideValue22
The resulting data would now be:
# Generic layer
Data1:
Property11: DefaultValue11
Property12: DefaultValue12
Data2:
Property21: NodeOverrideValue21
Property22: NodeOverrideValue22
The order of precedence you define for your layers define the Most specific (at the top of your list), to the Most generic (at the bottom).
On the file system, this data could be represented in two folders, one per layer, and a Datum configuration file, a Datum.yml
C:\Demo
│ Datum.yml
├───NodeOverride
│ Data.yml
└───RoleData
Data.yml
The Datum.yml would look like this (the order is important):
ResolutionPrecedence:
- NodeOverride\Data
- RoleData\Data
You can now use Datum to lookup the Merged Data
, per key:
$Datum = New-DatumStructure -DefinitionFile .\Demo\Datum.yml
Lookup 'Data1' -DatumTree $Datum
# Name Value
# ---- -----
# Property11 DefaultValue11
# Property12 DefaultValue12
Lookup 'Data2' -DatumTree $Datum
# Name Value
# ---- -----
# Property21 NodeOverrideValue21
# Property22 NodeOverrideValue22
This demonstrate the override principle, but it will always return the same thing. How do we make it relative to a Node's meta data?
The idea is that we want to apply the override only on certain conditions, that could be expressed like:
- A node is given the role SomeRole, it's in London, and is named SRV01
- The Role SomeRole defines default data for Data1 and Data2
- But because SRV01 is in London, use Data2 defined in the london location instead (leave Data1 untouched).
In this scenario we would create two layers as per the file layout below:
Demo2
│ Datum.yml
├───Locations
│ London.yml
└───Roles
SomeRole.yml
# SomeRole.yml
Data1:
Property11: RoleValue11
Property12: RoleValue12
Data2:
Property21: RoleValue21
Property22: RoleValue22
# London.yml
Data2:
Property21: London Override Value21
Property22: London Override Value22
Now let's create a Node
hashtable that describe our SRV01:
$SRV01 = @{
Nodename = 'SRV01'
Location = 'London'
Role = 'SomeRole'
}
Let's create SRV02 for witness, which is in Paris (the override won't apply).
$SRV02 = @{
Nodename = 'SRV02'
Location = 'Paris'
Role = 'SomeRole'
}
And we configure the Datum.yml
's Resolution Precedence with relative paths using the Node's properties:
# Datum.yml
ResolutionPrecedence:
- 'Locations\$($Node.Location)'
- 'Roles\$($Node.Role)'
We can now mount the Datum tree, and do a lookup in the context of a Node:
Import-Module Datum
$Datum = New-DatumStructure -DefinitionFile .\Datum.yml
lookup 'Data1' -Node $SRV01 -DatumTree $Datum
# Name Value
# ---- -----
# Property11 RoleValue11
# Property12 RoleValue12
lookup 'Data2' -Node $SRV01 -DatumTree $Datum
# Name Value
# ---- -----
# Property21 London Override Value21
# Property22 London Override Value22
And for our witness, not in the London location, Data2 is not overridden:
lookup 'Data2' -Node $SRV02 -DatumTree $Datum
# Name Value
# ---- -----
# Property21 RoleValue21
# Property22 RoleValue22
Magic!
The overall goal, better covered in the book Infrastructure As Code by Kief Morris, is to enable a team to "quickly, easily, and confidently adapt their infrastructure to meet the changing needs of their organization".
To do so, we define our Infrastructure in a set of Policies: human-readable documents describing the intended result (or, Desired State), in structured, declarative aggregation of data, that are also usable by computers: The Configuration Data.
We then interpret and transform the data to pass it over to the platform (DSC) and technology components (DSC Resources) grouped in manageable units (Resources, Configurations, and PowerShell Modules).
Finally, the decentralised execution of the platform can let the nodes converge towards their policy.
The policies and their execution are composed in layers of abstraction, so that people with different responsibilities, specialisations and accountabilities have access to the right amount of data in the layer they operate for their task.
As it simplest, a scalable implementation regroups:
- A Role defining the configurations to include, along with the data,
- Nodes implementing that role,
- Configurations (DSC Composite Resources) included in the role,
The abstraction via roles allows to apply a generic 'template' to all nodes, while enabling Node specific data such as Name, GUID, Encryption Certificate Thumbprint for credentials.
At a high level, we can compose a Role that will apply to a set of nodes, with what we'd like to see configured.
In this document, we define a generic role we intend to use for Windows Servers, and include the different Configurations we need (Shared1,SoftwareBaseline).
We then provide the data for the parameters to those configurations.
# WindowsServerDefault.yml
Configurations: #Configurations to Include for Nodes of this role
- Shared1
- SoftwareBaseline
Shared1: # Parameters for Configuration Shared1
DestinationPath: C:\MyRoleParam.txt
Param1: This is the Role Value!
SoftwareBaseline: # Parameters for DSC Composite Configuration SoftwareBaseline
Sources:
- Name: chocolatey
Disabled: false
Source: https://chocolatey.org/api/v2
Packages:
- Name: chocolatey
- Name: NotepadPlusplus
Version: '7.5.2'
- Name: Putty
The Software baseline for this role is self documenting. Its specific data apply to that role, and can be different for another role, while the underlying code would not change. Adding a new package to the list is simple and does not require any DSC or Chocolatey knowledge.
We define the nodes with the least amount of uniqueness, to avoid snowflakes. Below, we only say where the Node is located, what role is associated to it, its name (SRV01, the file's BaseName) and a unique identifier.
# SRV01.yml
NodeName: 9d8cc603-5c6f-4f6d-a54a-466a6180b589
role: WindowsServerDefault
Location: LON
This is where the Configuration Data is massaged in usable ways for the underlying technologies (DSC resources).
Here we are creating a SoftwareBaseline by:
- Installing Chocolatey from a Nuget Feed (using the Resource ChocolateySoftware)
- Registering a Set of Sources provided from the Configuration Data
- Installing a Set of packages as per the Configuration data
Configuration SoftwareBaseline {
Param(
$PackageFeedUrl = 'https://chocolatey.org/api/v2',
$Sources = @(),
$Packages
)
Import-DscResource -ModuleName PSDesiredStateConfiguration
Import-DscResource -ModuleName Chocolatey -ModuleVersion 0.0.46
ChocolateySoftware ChocoInstall {
Ensure = 'Present'
PackageFeedUrl = $PackageFeedUrl
}
foreach($source in $Sources) {
if(!$source.Ensure) { $source.add('Ensure', 'Present') }
Get-DscSplattedResource -ResourceName ChocolateySource -ExecutionName "$($Source.Name)_src" -Properties $source
}
foreach ($Package in $Packages) {
if(!$Package.Ensure) { $Package.add('Ensure','Present') }
if(!$Package.Version) { $Package.add('version', 'latest') }
Get-DscSplattedResource -ResourceName ChocolateyPackage -ExecutionName "$($Package.Name)_pkg" -Properties $Properties
}
}
In this configuration example, Systems Administrators do not need to be Chocolatey Software specialists to know how to create a Software baseline using the Chocolatey DSC Resources.
Finally, the root configuration is where each node is processed (and the Magic happens).
We import the Module or DSC Resources needed by the Configurations, and for each Node, we lookup
the Configurations implemented by the policies (Lookup 'Configurations'
), and for each of those we lookup
for the parameters that applies and splat them to the DSC Resources (sort of...).
This file does not need to change, it dynamically uses what's in $ConfigurationData
!
# RootConfiguration.ps1
configuration "RootConfiguration"
{
Import-DscResource -ModuleName PSDesiredStateConfiguration
Import-DscResource -ModuleName SharedDscConfig -ModuleVersion 0.0.3
Import-DscResource -ModuleName Chocolatey -ModuleVersion 0.0.46
node $ConfigurationData.AllNodes.NodeName {
(Lookup 'Configurations').Foreach{
$ConfigurationName = $_
$Properties = $(lookup $ConfigurationName -DefaultValue @{})
Get-DscSplattedResource -ResourceName $ConfigurationName -ExecutionName $ConfigurationName -Properties $Properties
}
}
}
RootConfiguration -ConfigurationData $ConfigurationData -Out "$BuildRoot\BuildOutput\MOF\"
Although Datum has been primarily targeted at DSC Configuration Data, it can be used in other contexts where the hierarchical model and lookup makes sense.
The Datum hierarchy, similar to Puppet's Hiera, is defined typically in a Datum.yml at the base of the Config Data files. Although Datum comes only with a built-in Datum File Provider (Not SHIPS) supporting the JSON, Yaml, and PSD1 format, it can call external PowerShell modules implementing the Provider functionalities.
A branch of the Datum Tree would be defined within the DatumStructure of the Datum.yml like so:
# Datum.yml
DatumStructure:
- StoreName: AllNodes
StoreProvider: Datum::File
StoreOptions:
Path: "./AllNodes"
Instantiating a variable from that definition would be done with this:
$Datum = New-DatumStructure -DefinitionFile Datum.yml
This returns a hashtable with a key 'AllNodes' (StoreName), by using the internal command (under the hood):
Datum\New-DatumFileProvider -Path "./AllNodes"
Should you create a module (e.g. named 'MyReddis'), implementing the function New-DatumReddisProvider
you could write the following Datum.yml to use it (as long as it's in your PSModulePath):
# Datum.yml
DatumStructure:
- StoreName: AllNodes
StoreProvider: MyReddis::Reddis
StoreOptions:
YourParameter: ParameterValue
If you do, please let me know I'm interested :)
You can have several root branches, of different Datum Store Providers, with custom options (but prefer to Keep it super simple).
So, what should those store providers look like? What do they do?
In short, they abstract the underlying data storage and format, in a way that will allow us to consistently do key/value lookups.
The main reason(s) it is not based on SHIPS (or Jim Christopher, aka @beefarino's Simplex module, which I tried and enjoyed!), is that the PowerShell Providers did not seem to provide enough abstraction for read-only key/value pair access. These are still very useful (and used) as an intermediary abstraction, such as the FileSystem provider used in the Datum FileProvider.
In short, I wanted an uniform key, that could abstract the container, storage, and the structure within the Format. Imagine the standard FileSystem provider:
Directory > File > PSD1
Where the file SERVER01.PSD1
is in the folder .\AllNodes\
, and has the following data:
# SERVER01.PSD1
@{
Name = 'SERVER01'
MetaData = @{
Subkey = 'Data Value'
}
}
I wanted that the key 'AllNodes\SERVER01\MetaData\Subkey' returns 'Data Value
'.
However, while the notation with Path Separator (\
) is used for lookups (more on this later), the provider abstracts the storage+format using the familiar dot notation.
From the example above where we loaded our Datum Tree, we'd use the following to return the value:
$Datum.AllNodes.SERVER01.MetaData.Subkey
So we're just accessing variable properties, and our Config Data stored on the FileSystem, is just mounted in a variable (in case of the FileProvider).
With the dot notation we have access using absolute keys to all values via the root $datum
, but this is not much different from having all data in one big hashtable or PSD1 file... This is why we have...
We can mount different Datum Stores (unit of Provider + Parameters) as branches onto our root
variable.
Typically, I mount the following structure (with many more files not listed here):
DSC_ConfigData
│ Datum.yml
├───AllNodes
│ ├───DEV
│ └───PROD
├───Environments
├───Roles
└───SiteData
I can access the data with:
$Datum.AllNodes.DEV.SRV01
or
$Datum.SiteData.London
But to be a hierarchy, there should be an order of precedence, and the lookup
is a function that resolves a relative path, in the paths defined by the order of precedence.
Datum.yml
defines another section for ResolutionPrecedence: this is an ordered list of prefix to use to search for a relative path, from the most specific to the most generic.
Should you do a Lookup
for a relative path of property\subkey
, and the Yaml would contain the following block:
ResolutionPrecedence:
- 'AllNodes'
- 'Environments'
- 'Location'
- 'Roles\All'
In this case the lookup function would try the following absolute paths sequentially:
$Datum.AllNodes.property.Subkey
$Datum.Environments.property.Subkey
$Datum.Location.property.Subkey
$Datum.Roles.All.property.Subkey
Although you can configure Datum to behave differently based on your needs, like merging together the data found at each layer, the most common and simple case, is when you only want the 'MostSpecific' data defined in the hierarchy (and this is the default behaviour).
In that case, even if you usually define the data in the roles
layer (the most generic layer), if there's an override in a more specific layer, it will be used instead.
But the ordering shown above is not very flexible. How do we apply the relation between the list of roles, the current Node, Environment, location and so on?
As we've seen that a Node implements a role, is in a location and from a specific Environment, how do we express these relations (or any relation that would make sense in your context)?
We can define the names and values of those information in the Node meta data (SRV01.yml) like so:
# SRV01.yml
NodeName: 9d8cc603-5c6f-4f6d-a54a-466a6180b589
role: WindowsServerDefault
Location: LON
Environment: DEV
And use variable substitution in the ResolutionPrecedence
block of the Datum.yml
so that the Search Prefix can be dynamic from one Node to another:
# in Datum.yml
ResolutionPrecedence:
- 'AllNodes\$($Node.Environment)\$($Node.Name)'
- 'AllNodes\$($Node.Environment)\All'
- 'Environments\$($Node.Environment)'
- 'SiteData\$($Node.Location)'
- 'Roles\$($Node.Role)'
- 'Roles\All'
The lookup
of the Property Path 'property\Subkey'
would try the following for the above ResolutionPrecedence:
$Datum.AllNodes.($Node.Environment).($Node.Name).property.Subkey
$Datum.AllNodes.($Node.Environment).All.property.Subkey
$Datum.Environments.($Node.Environment).property.Subkey
$Datum.SiteData.($Node.Location).property.Subkey
$Datum.Roles.($Node.Role).property.Subkey
$Datum.Roles.All.property.Subkey
If you remember the part of the Root Configuration:
node $ConfigurationData.AllNodes.NodeName {
# ...
}
It goes through all the Nodes in $ConfigurationData.AllNodes
, so the absolute path is changing based on the current value of $Node
.
Regardless of the Datum Store Provider used (there's only the Datum File Provider built-in, but you can write your own), Datum tries to handle the data similarly to an ordered case-insensitive Dictionary, where possible (i.e. PSD1 don't support Ordering). All data is referenced under one variable, so it looks like a big tree with many branches and leafs like the one below.
$Datum
+
|
+--+AllNodes
| + DEV
| | +SRV01
| | ++ NodeName: SRV01
| | role: Role1
| | Location: Lon
| | ExampleProperty1: 'From Node'
| | Test: '[TEST=Roles\Role1\Shared1\DestinationPath
| |
| +-+PROD
|
+--+Environments
| +
| +-+DEV
| | Description: 'This is the DEV Environment'
| +-+PROD
| Description: 'This is the PROD Environment'
|
+--+Roles
| +-+Role1
| Configurations
| - Shared1
|
+--+SiteData
+-+Lon
If you provide a key, Datum will return All values underneath (to the right):
$Datum.AllNodes.Environments
# DEV PROD
# --- ----
# {Description, Test} {Description}
In the Tree described above, the Lookup function iterates through the ResolutionPrecedence's key prefix, and append the provided key suffix:
For the following ResolutionPrecedence:
ResolutionPrecedence:
- 'AllNodes\$($Node.Environment)\$($Node.Name)'
- 'Roles\$($Node.Role)'
- 'Roles\All
Within the $Node
block, doing Lookup 'Configurations'
will actually look for:
$Datum.AllNodes.($Node.Environment).($Node.Name).Configurations
$Datum.Roles.($Node.Role).Configurations
$Datum.Roles.All.Configurations
By default the merge behaviour is to not merge, which means the first occurence will be returned and the lookup stopped.
The other merge behaviours depends on the (rough) data type of the key to be merged.
Datum identifies 4 main types in whatever matches first the following:
- Hashtable: Hashtables or Ordered Dictionaries
- Array of Hashtables: Every
IEnumerable
(except string) that can be casted-as [Hastable[]]
- Array of Base type objects: Every other
IEnumerable
(except string) - Base Types: Everything else (Int, String, PSCredential, DateTime...)
Their merge behaviour can be defined in the Datum.yml
, either by using a Short name that reference a preset, or a structure that details the behaviour based on the type.
There is a default Behaviour (MostSpecific
by default), and you can specify ordered overrides:
default_lookup_options: MostSpecific
This is the recommended setting and also the default, so that any sub-key merge has to be explicitly declared like so:
lookup_options:
<Key Name>: MostSpecific/First|hash/MergeTopKeys|deep/MergeRecursively
<Other Key>:
merge_hash: MostSpecific/First|deep|hash/*
merge_basetype_array: MostSpecific/First|Sum/Add|Unique
merge_hash_array: MostSpecific/First|Sum|DeepTuple/DeepItemMergeByTuples|UniqueKeyValTuples
merge_options:
knockout_prefix: --
tuple_keys:
- Name
- Version
The key to be used here is the suffix as used in with the Lookup
function: e.g. 'Configurations', 'Role1\Data1'.
Each layer will be merged with the result of the previous layer merge:
Precedence 0 +
|
+---+ Merge 0+-+
| |
Precedence 1 + |
+---+Merge 1 +
| |
Precedence 2 +-------------+ |
+----+Merge 2
|
Precedence 4 +--------------------------+
The Short name presets represent the following:
First, MostSpecific or any un-matched string:
merge_hash: MostSpecific
merge_baseType_array: MostSpecific
merge_hash_array: MostSpecific
hash or MergeTopKeys:
merge_hash: hash
merge_baseType_array: MostSpecific
merge_hash_array: MostSpecific
merge_options:
knockout_prefix: '--'
depp or MergeRecursively:
merge_hash: deep
merge_baseType_array: Unique
merge_hash_array: DeepTuple
merge_options:
knockout_prefix: --
tuple_Keys:
- Name
- Version
The Lookup Options can also define keys using a (valid) Regex, for this the key has to start with ^
, for instance:
lookup_options:
^LCM_Config\\.*: deep
The lookup will always favor non-regex exact match, and failing that will then use the first matching regex, before falling back on the default_lookup_option
.
If you've been following that far, you might wonder how it works for subkeys.
Say you want to merge a subkey of a configuration where the role defines the following:
Configurations:
- SoftwareBaseline
SoftwareBaseline:
PackageFeed: https://chocolatey.org/api/v2
Packages:
- Name: Package1
Version: v0.0.2
And an override file somewhere in the hierarchy:
SoftwareBaseline:
Packages:
- Name: Package2
Version: v4.5.2
You want the packages to have a deep tuple merge (that is, merge the hashtables based on matching key/values pairs, where $ArrayItem.Property1 -eq $otherArrayItem.Property1
, more on this later).
If the default Merge behaviour is MostSpecific, and no override exist for SoftwareBaseline
, it will never merge Packages, and always return the Most specific.
If you add a Merge behaviour for the key SoftwareBaseline
of hash, it will merge the keys PackageFeed
and Packages
but not below, that means the result for a `Lookup SoftwareBaseline will be (assuming the Role has the lowest ResolutionPrecedence):
SoftwareBaseline:
PackageFeed: https://chocolatey.org/api/v2
Packages:
- Name: Package2
Version: v4.5.2
The PackageFeed
key is present, but only the most specific Package
value has been used (there's only 1 package).
To also merge the Packages, you need to also define the Packages Subkey like so:
default_lookup_option: MostSpecific
lookup_options:
SoftwareBaseline: hash
SoftwareBaseline\Packages:
merge_hash_array: DeepTuple
merge_options:
Tuple_Keys:
- Name
- Version
If you omit the first key (SoftwareBaseline
), and the Lookup is only doing a lookup of that root key, it will never 'walk down' the variable to see what needs merging below the top key. This is the default behaviour in DscInfraSample's RootConfiguration.ps1
.
However, if you do a lookup directly to the subkey, Lookup 'SofwareBaseline\Packages'
, it'll now work (as it does not have to 'walk down' the variable).
- Default
- general
- per lookup override
The data typically stored in Datum is usually defined by the Provider and underlying technology. For the Datum File Provider, and Yaml format, that would be mostly Text/strings, Integer, and Boolean, composed in dictionary (ordered, hashtable, or PSCustomObject), or collections.
More complex objects, such as credentials can be stored or referenced by use of Data handler.
(To be Continued)
Back in 2014, Steve Murawski then working for Stack Exchange lead the way by implementing some tooling, and open sourced them on the PowerShell.Org's Github. This work has been complemented by Dave Wyatt's contribution mainly around the Credential store. After these two main contributors moved on from DSC and Pull Server mode, the project stalled (in the Dev branch), despite its unique value.
I refreshed this to be more geared for PowerShell 5, and updated the dependencies as some projects had evolved and moved to different maintainers, locations, and name.
As I was re-writing it, I found that the version offered a very good way to manage configuration data, but in a prescriptive way, lacking a bit of flexibility for some much needed customisation (layers and ordering). Steve also pointed me to Chef's Databag, and later I discovered Puppet's Hiera, which is where I get most of my inspiration.