Ansible scripts for provisioning ETL, Portal, Downloader and Submission systems
Easiest way for MacOSX to get the latest Ansible is using Homebrew:
Note: Ansible version should be at least 1.9
$ brew install ansible
For other platforms, refer to Ansible documentation.
$ sudo pip install shade
They install the required dependencies, but you might need to install additional clients from here.
Create a new file, vars/main.yml
using vars/main.yml.template
as template, providing necessary settings.
Furthermore, there are several variables that need to be set in: group_vars/vars/main.yml
- Proxy
- http_proxy
- External URLs
- external_submission_url
- external_docs_url
- Misc
- icgc_url
- Contact
- smtp_server
- sender
- recipients
Edit /etc/ssh_config
and add the following to avoid having to accept connecting to each server.
Host 10.5.74.*
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
Execute the following command:
You can also execute playbooks individually:
$ ansible-playbook -i config/hosts submission.yml
During the running of a playbook, our tasks and roles are configured to add new groups of hosts to the in memory inventory.
For the following explanations, we will use the first few lines of the download.yml
playbook as an example:
- include: tasks/setup.yml group=hadoop-master:hadoop-worker:download
- include: tasks/setup-existing.yml group=hadoop-master:hadoop-worker:download
-
all_instances
- All hosts from all groups that are either being provisioned and/or used are lumped together into a single host group. The purpose of this group is so we can write out the hostname and ips of every server to the/etc/hosts
file of every server. This way, every server, knows about every other server. -
We do some parsing of host names to generate groups of related servers based on the the way they are named. For example, in the
hadoop-worker
group we have the hostnamesdcc-hadoop-worker-[1:2]
. A group calledhadoop_worker
is created containing the hostsdcc-hadoop-worker-1
anddcc-hadoop-worker-2
. The utility of this becomes more obvious when you look at groups and playbooks that use a wider array of servers with different purposes, such as theportal.yml
playbook with theportal
host group.
The playbooks do not load any production data into either Elasticsearch or Postgres. This is left up to the user.
Ensure that the host images for the openstack instances provide enough resources. The hosts provisioned by the playbook in testing had the following configuration
OS | Ubuntu 12.04 |
RAM | 16GB |
# CPU | 8 |
Storage | 160GB |
It is a good idea that once the hadoop cluster is provisioned, that you inspect the web UI of the NameNode/Master to ensure the nodes are up and their services are enabled. The created NameNode can easily provision more RAM than you provided it, so that is something to be aware of. Also ensure the configuration on the nodes is not stale.
All downloaded software from ICGC comes with install scripts. Should you require a more up to date version of any of the software, the install script should provide you with the functionality to update the software.
Example installing a specific version:
$ ./install -r 4.0.4
A list can be found here.
Installs the latest Oracle Java 8 using PPA repository.
- Create a custom playbook
- name: Installs Oracle JDK with PPA
hosts: all
gather_facts: no
sudo: yes
roles:
- jdk-ppa
- Create a custom hosts file
[java]
127.0.0.1
- Run the playbook
ansible-playbook -i java_hosts java.yml