Category Archives: computers

OpenStack Nova and Hypervisor disk consumption

Recently I found myself in a situation at $DAYJOB where I needed to account for the local disk consumption of a nova-compute node, a hypervisor. I had a heck of a time gathering all the information I needed to figure out why the space was being consumed the way it was, and since I couldn’t find a single source for all this information I felt it was best to write up a post about it (so that I can find it next time I’m in the same scenario).

This post explores the various ways Nova consumes hypervisor disk space with regard to instance images and booting.

A Nova setup with libvirt/kvm as the hypervisor, ephemeral disk space being provided by a filesystem directory is assumed, and instances booting with ephemeral disk as opposed to Block Storage volumes.

There are two main ways Nova consumes the underlying hypervisor disk space: cached images downloaded from the image service (Glance) and instance ephemeral disk files. All of this content is stored in Nova’s state path, which by default is configured for /var/lib/nova/. In our setup, we mount a filesystem to Nova’s sate path so that we can contain the disk usage to the images and instances that are booted on the system, without risk of filling up / or other critical filesystems.

Image cache

Nova make use of an image cache on each hypervisor. This is a place where each image that’s used to boot an instance is downloaded to and preserved for a period of time. Each time a new instance is created on a hypervisor, the disk cache is checked to see if the image requested for the instance is already in the cache. If it is, that image is used then as the basis for the instance. If not, the image is downloaded from the image store and placed in the cache. This cache is typically on the same filesystem as where Nova stores the instance data (the state path), and thus the images in the cache will account for some amount of overall disk usage and availability for instances. Images in the cache are held for a period of time determined by configuration. Whether to clean unused cache images is a configuration toggle, along with a minimum age for the image before discarding the image.

The amount of space consumed by an image in the cache depends on details from the image itself. While the listed size of the image in the image store may be small, either due to compression or just overall content, the virtual size of the image may be much larger. This virtual size is used as a value to resize the downloaded image to, as Nova will resize each downloaded image to the match the virtual size.

The virtual size of the image depends on how the image was created, the source of the image, and the options used while creating the image. When using the qemu-img tool to create an image, a size can be specified. This will become the virtual size of the image. If creating an image of an existing instance, either by way of an image creation or a backup (which uses the same method), the size of the image will be matched to the size of disk for the flavor the instance uses. If the instance’s flavor states a 200G disk, then the image virtual size from that instance will be 200G, regardless of how little space is actually consumed within the instance.

During an instance creation, Nova downloads an image from glance, checks the virtual size of the image, and resizes the file to the virtual size of the image. This file is saved in an instances/_base/ subdirectory within Nova’s state path. The resize creates a sparse file, where the apparent size matches the virtual size, when the actual consumed size may be much lower. Use of the du utility can show the difference: du -h --apparent-size <file> vs du -h <file>.

Instance ephemeral disk

Each server instance that Nova manages will have its own directory to store data. Part of that data is the ephemeral disk data, the data within the instance itself. The amount of space consumed by the ephemeral disk depends on configuration details of Nova, and on the flavor of the instance.

Copy on write

During an instance creation attempt, Nova will download and resize the base image, if the base image doesn’t already exist. Then Nova may either create a copy on write file for the instance, linked to the base image, or copy the entire base image for the instance with no linkage. This decision is based on a configuration entry, use_cow_images, which defaults to True.

A copy on write file is an overlay file that will overlay on top of the base image file and keep track of any changes to the filesystem within the image.

If copy on write is desired, overlays are created from the base image to the instances/<uuid>/disk file within the state path. Otherwise a direct copy will be made to the same path.

Preallocation

In either case, Nova may make a call to pre-allocate enough blocks on that file to be able to fill the size of the flavor’s disk. This is determined by a configuration entry preallocate_image.

If copy on write is used then the file will appear one of two ways. Without image preallocation, the file will only be as large as the amount of change that has occurred in the file since boot, thus it can be quite small to start with, but may expand to the full size of the flavor’s disk size. With image preallocation set, both the apparent and actual size will be the full size of the flavor’s disk size.

If a direct copy of the image file is used, then the file will appear in one of two ways. Without image preallocation, then the file will appear exactly as it does in the image cache. The apparent size will can be quite large, but the actual size will be relatively smaller. With image preallocation, both the apparent and the actual size will be the full size of the flavor’s disk size.

Launching

Qemu will be launched referencing the disk file in instances/<uuid>/, which may or may not be linked to the cached image file. This linkage is what determines whether or not an image file in the cache is still “in use”, and will prevent Nova from removing the file when it ages out.

Conclusion

The amount of disk space consumed on a hypervisor depends on numerous factors, such as source image virtual size, the number of active unique images used to boot instances on the hypervisor, and configuration settings regarding disk preallocation and copy on write files. The vast majority of overall consumption of space by Nova will be the sum of all the cached images and all the ephemeral disks for all the instances booted on a given hypervisor.

Configuration items that drive decisions

  • preallocate_disk: Can set to none or space. If space, an fallocate call is made on the instance (overlay) disk to allocate enough blocks to cover the flavor disk size. Without preallocating, the underlying hypervisor filesystem can become overcommitted, and if an instance causes enough data change to occur to it’s disk file, the host filesystem may become exhausted. An operator could prevent exhaustion by relying on the DiskFilter scheduling filter to avoid scheduling instances to where disk has been fully committed, but there are defects and drawbacks to this filter (a subject for a future post). The default value is none
  • use_cow_images: Can be set to True or False. If True, the instance’s disk file is a copy on write file, attached to the base image in nova’s image cache (instances/_base/). When this happens, the base image for any booted instance is always held open, and cannot be cleaned. This can drive up the storage overhead on a hypervisor. The default value is True.
  • remove_unused_base_images: Can be set to True or False. If True, when a cached image is no longer used by an instance on the hypervisor, and has reached a minimum age, the image will be removed from the cache. This can prevent unbound growth of the image cache on a hypervisor. The default is True.
  • remove_unused_original_minimum_age_seconds: An integer of seconds to indicate how old an image file must be before it is a candidate for removal if unused. The default value is 86400.

Why I love open source!

The other day I decided it was time I got familiar with Docker. Yes, I know, I’m a bit late to the party, but better late than never. I understood some of the concepts around Docker, just not necessarily the mechanics, so it was time to dive in.

Docker these days has a handy utility named docker-machine. This tool is used to create a target system to create docker containers on. This is really useful if you’re on a Mac and don’t have a kernel that supports containers natively. By using docker-machine I was able to provision a VM via VirtualBox that was all set up to run containers. From that point in, docker commands ran as expected, and containers showed up inside the VirtualBox VM.

This is all well and good, but being a curious nerd I wanted to see what other drivers there were for docker-machine. Unsurprisingly, there are numerous drivers, many of them cloud based. There is an OpenStack driver as well, which is great! My day job is all about OpenStack and I have numerous clouds at my disposal. The idea of utilizing my cloud to run docker containers just seems natural to me, so that’s what I tried to do.

Unfortunately I ran into a problem. Our clouds work in a way that requires the allocation of a “floating IP” address to an instance in order for that instance to be accessible by the outside world. The docker-machine OpenStack driver supports this, by passing in the correct arguments to tell the driver where to allocate the floating IP from, a pool. It turns out that my account on the cloud I was targeting has admin level rights (a scenario many of my customers will be in), and thus was able to see more available floating IP addresses in the pool than a normal user would, many of which had been allocated to a different project (projects, or tenants in OpenStack are a way to segregate groups of users and resources within a cloud). The docker-machine driver simply attempted to use the first address it thought was available for the instance it just created. In my case, this address had already been allocated to a different tenant and the OpenStack API returned an error when the assignment to my instance was attempted.

I understood the problem, and I had a general idea of how to fix it. The driver should filter the floating IP addresses by my project ID when searching for an available floating IP to use. I essentially had three choices at this point:
Door #1: If the project didn’t have a public bug tracker I could give up on the tool or write a negative review somewhere or a snarky tweet about it and find something else to play with.
Door #2: If the project had a public bug tracker I could file a bug in the tracker and explain the scenario that led to the error. I would just have to wait and hope somebody at the project cared enough to fix my bug.
Door #3:Because docker-machine is an open source project, I had a third choice. I could pull down the code and try my hand at fixing the problem myself.

Obviously I went with door #3. I’ve never looked at the source code behind Docker before (written in Go), but I figured I could fudge my way through a small change. Thankfully the Docker project has spent a fair amount of time thinking about how to make contributing to the project an easy process. Numerous documents exist to help guide a first time contributor through setting up a development environment, understanding the code testing tools, and walking through the submission and review process. Following these guides I was able to start making modifications to the docker-machine code and testing them out on my laptop. What I thought was going to be a simple change turned out to be a more involved code addition, which led me to reading the code and developer documents for a supporting library that docker-machine uses.

Through much iteration and testing, I was finally able to create a change that resolved my issue in a satisfactory way. Being a good open source citizen, I then submitted this change back up to the project in hopes of inclusion in a future release. I myself am not currently blocked in using this tool, but I’d like my customers to be able to use this tool as well, in a way that doesn’t require me to distribute a modified binary to them.

This is the real joy of Open Source to me. I found a tool I want to make use of, I discovered a way in which the tool doesn’t quite work right for me, I have access to the code to debug the problem, I have access to the documentation and supporting code to develop a solution, and I have the opportunity to contribute a change back to the tool. This process feels so natural to me now that any other way just seems broken. Open Source has enabled me to make my life better, as well as potentially making the life of other users of the software better too, and that gets me right in the feels.

Ansible copying content from one remote system to another

Just a quick tip of you’re trying to do the same thing I was trying to do.

The Problem

I am generating some content on Server A. I want to replicate this content onto Servers B and C.

The Solution

TL;DR: Read files content from Server A. Write files on Servers B and C from those contents.

Ansible provides a couple modules that make this possible. The first one we’ll look at is the slurp module. This module allows you to read in contents of a file from a remote system. Here is an task to read the content, utilizing the run_once mechanism:

This will read each of those files in and save the results in the pki_certs variable. Ansible will only do this once, presumably on the host where this content was generated with a previous run_once task. However the variable data will be assigned to every host to make it easily accessible.

Next we need to write out the content on our other systems. There are a couple things to consider. First, because the files were read via a with_items loop, the registered content is in a list, specifically in pki_certs.results. This is easy enough to deal with because the results list is a list of dictionaries, and the name of the file is part of that dictionary. The filename resides in the item key, while the content resides in the content key. This allows us to template out both the path to be written as well as the content to be written at that path.

The next thing we need to consider is that the slurp module stores content in base 64 encoding. That means when we write it back out, we need to decode it from base 64, otherwise Ansible will happily write out some long strings that look nothing like your file. To decode from base 64, simply use the b64decode filter on the content variable.

The last thing to consider has to do with yaml and whitespace and ansible. This may not come into play with every file, but these files have multiple lines. A somewhat recent change in Ansible means that if the “short form” of task description is used (with key=value parameters) your written out file will have double linefeeds. The simple solution is to use “long form” task syntax as you’ll see below:

Because every host has access to the pki_certs variable this task can run across all of them. You might see a change registered for the first host in the loop, even though it was the source of the content, due to permissions or ownership changes, however subsequent runs will be nice and clean.

Hopefully this helps you out and saves you from spending an afternoon poking around at it like I just did!

Persistent SSH connections with context!

SSH, the Secure Shell, is an awesome tool. Rather indispensable for somebody like me who has to operate on remote systems. I use it constantly to either run code from a privileged host or log into systems to diagnose problems. My entire cloud of servers is just a terminal session away.

I’m also a huge fan of laptops. I really like being portable with my computer. Partly because I work from home, which means I often work from a coffee shop, or various parts of my home. I don’t have a “workstation” that I’m tied to, and I haven’t for years. I fell in love with the ease in just closing up my laptop and walking outside, or riding my bike to the cafe and opening it back up to continue work right where I left off.

Unfortunately, over time, the ease of transport has lessened, and for good reasons. First up is the VPN, or Virtual Private Network. VPNs allow me as a remote person to securely log into my employer’s network in order to access resources, or SSH into systems. VPNs are ubiquitous now for remote workers. In the good days, my VPN was automatic. If I closed my laptop and relocated within my house, upon opening my laptop the VPN would re-establish itself without my interaction. SSH, with it’s built in ability to re-establish communication would often come back fine, and whatever I was working on, i.e. my context, would be saved. But as time went on, automatic VPNs began to be viewed as insecure. They required stored credentials on my laptop, and it mean that whomever had my laptop had access to these credentials. To combat this, VPNs started using “One Time Passwords“, or OTPs. OTPs come in many flavors, but essentially they combine a Thing You Have (like a number generating physical device) with a Thing You Know (a passphrase only you know) into a unique string of characters. The numbers from the device plus your passphrase. This combo could be used only once to authenticate and after that it was invalid. More secure, but this ended the days of automatically established VPNs, and it often meant that the time it took me to re-establish my VPN went beyond SSH’s ability to recover a connection. Because of this I’d often find myself walking around my house with my laptop open rather than closed, to keep my connections running. Not nearly as cool and convenient of just closing it and walking around.

Of course, this doesn’t consider transitions from my home to a coffee shop. Two problems there, length of time to get to my destination exceeds SSH recovery time, and the local network details will have changed, preventing SSH recovery completely. This means whenever I go somewhere not my home, I have to re-establish my SSH session(s) and recover my context.

Keeping context is a solved problem. There are tools out there that help with this. GNU Screen and Tmux are very popular options. These utilities essentially create a terminal session that is insulated from disconnections. When you reconnect to wherever a screen or tmux session is running, you can re-attach to the session and all your context is back. These tools have been around for a while and work really well, when you remember to do your work inside one of them. However getting to them is still a manual process. I have to wait for my SSH session to finally realize it can’t re-establish my connection, then I have to re-issue the SSH connection command on my local laptop, and once connected I have to re-attach to whatever session I was working on. Not a lot of work, but certainly an annoyance.

What I want is something that will keep my SSH connections persistent. Persistent across network outages or even network relocations. Not only do I want the connection itself persistent, but I want the context within that connection to be persistent as well. I don’t just want my ssh connection to re-establish itself should it timeout, I want to be re-attached to whatever session I was working in.

Thankfully there are a few tools out there that help with this! Mosh and autossh.

Mosh is kind of the new kid on the block, and is rather interesting. It does a few more things than just keep a persistent connection with context. It also does some things which really help with performance (perceived and actual) over slow connections. When you start a mosh session, it uses ssh to connect to the target and starts some software there, software that your local mosh client will use to communicate with. When the network dies or changes, mosh will quickly re-establish communication with the remote software and your terminal acts as if nothing has changed.

I played around a bit with mosh when it first came out and discovered some things I didn’t like about the setup. First, mosh requires new software be installed on  your connection target. This can either be extremely easy, or a nightmare depending on the target, corporate policy, etc… The other thing I really didn’t like about mosh is what it does to your local terminal window. I currently use OSX as my operating system, and within it I use iTerm2 as my terminal emulator. Often I use the built in search function of iTerm2 to find things in scrollback, or I just simply use the touchpad to scroll back my iTerm2 window to read things that have “scrolled off” my screen. These things are quick and natural and useful. Unfortunately the way mosh works, neither of those things are possible. Scrolling back will only show you the things on your terminal from BEFORE you started your mosh session. All that has happened within your mosh session and has scrolled off your screen is lost. Mosh says to use screen or tmux to capture that, and use the scrollback capability of screen or tmux to review or search it. Because of these reasons, I don’t use mosh, although I will say it is really neat, and does feel extremely fast. If I worked more on very laggy connections I may feel different about it.

The other option I mentioned is AutoSSH. AutoSSH is similar to mosh, in that it attempts to re-establish a broken connection, but it is different in a few key ways. First, it’s a pure ssh implementation. It does not require additional software to be installed on the remote host, and it does not attempt any communication over anything other than ssh. It does not however attempt to keep context. All it will do by itself is re-establish an ssh connection to a given remote host. In order to retain context, screen or tmux are needed. Thankfully it is trivial to use screen or tmux in a way that automatically (re)connects to a session. In my case, I use screen. Screen has one important feature over Tmux for me, and that feature is the way it does scrollback. When using screen in a iTerm2 window, anything that scrolls off the screen is still in the “history” of iTerm2, which means I can scroll up with the touch pad, or use iTerm2’s search feature to find things. This does not work when using Tmux, so I have gone with screen.

Screen has the ability to with one action either create a new session, or if the session named already exists, disconnect that session from wherever it may be connected and reconnect it to where you are now. That is accomplished via $ screen -D -R session_name  . This can be added to an execution of autossh, so that when autossh initially establishes your connection, or ever re-establishes your connection, the execution will run:

This is nearly perfect, but it doesn’t seem to react as fast as mosh does to network disconnects and reconnects. This is due to some defaults in autossh, namely how frequently it polls the monitoring port for activity. The default poll time is 600 seconds, which can be quite a long time. I’ve found that a poll time of 5 seconds seems to keep things feeling fast. To adjust this, it’s as simple as adding an environment variable when launching autossh. Also due to a bug one needs to also adjust the time autossh will wait to first start polling a connection.

Now autossh will start monitoring my connection after 5 seconds, and monitor it every 5 seconds for changes. When it reconnects, it will automatically reattach my screen session for context. Any scrollback is still in my terminal window so my local native terminal actions still work, which means I can roam at will without losing my work! Granted, this does require that screen is installed on the remote host, but screen is nearly ubiquitous these days, and hardly ever contentious to get installed if it isn’t already on your remote host.

This setup has made my life more awesome, and I hope it will make your life more awesome too, dear reader. If you have anything to add, or other tricks for this style of work life you’d like to share, please use the comments boxes. They require my approval but I’ll get to them quite quickly!

SSH Key Rotation with Ansible

Introduction to SSH Keys

SSH keys are fantastic things. They provide a 2-part blob of data, a private part and a public part, that can be used to authenticate ssh connections. You keep the private part private, often with a passphrase to “unlock” it, while you can hand out the public part to things like GitHub, compute cloudsother systems that you might wish to connect to via SSH, and remote servers you will ssh to. The public part of your SSH key pair gets stored in a special file that SSH servers on remote systems read, the authorized_keys file. When you connect, your ssh client will provide details about your private key that the remote end can validate against your public key to authenticate you. This is a great convenience over having to provide a password every single time.

This convenience for users is also a necessity for infrastructure administration. SSH is ubiquitous in the Linux world, and the vast majority of administration is accomplished over SSH. Without the ability to use SSH Keys (or similar auth mechanisms) one would not be able to automate actions across many systems easily.

With convenience comes responsibility though. Having a key that an automated process can use to manipulate your fleet of systems is great, but it’s also a pretty juicy attack vector. For that reason it is good practice to rotate your keys often. Rotating keys is the act of replacing the keys you’re currently using with new keys, and removing the ability for old keys to be used to log into your systems.

Rotating keys requires a new key. Creating a new key is fairly simple. Getting the public part of this key out into your fleet, and removing existing public keys is a bit harder. Thankfully we have orchestration and automation tools such as Ansible. The rest of this blog post will discuss how to use Ansible to automate rotating your ssh credentials across your fleet.

Orchestrating SSH Key Rotation

Lets consider the steps necessary to rotate a key:

  1. Create a new key
  2. Add new key to authorized_keys files on your fleet
  3. Test new key
  4. Remove previous keys from authorized_keys files

As stated before, step 1 is simple, and for the sake of this post we’ll assume that this has been completed, and there is a new key-pair, located at ~/.ssh/id_rsa_new and ~/.ssh/id_rsa_new.pub. The private key part is id_rsa_new, the public is id_rsa_new.pub. It’s the pub we need to distribute. For now, we’ll also assume that this key has not yet replaced the existing key, and we can still use the existing key to reach our fleet.

Step 2 is adding the new key to the authorized_keys file. This is where our Ansible playbook will begin. First we need a play header and a couple variables defined to reference the public and private parts of our new key-pair.

Next we’ll need a task to copy the public part of our new key-pair to the remote hosts. For this we will use the authorized_key module. This module allows us to provide a key to add, which we will do.

Now for step 3, we will want to test this new key, to make sure that our new key addition is working. To do this, we will need to direct Ansible to use our new private key when connecting to our servers. We can use a set_fact task to set ansible_ssh_private_key variable to our new private key.

Our next task will make use of this new key when creating the connection (provided ControlPersist is not at play).

The next task is step 4, removing previous keys. Because of our previous task, this step will make use of the new key, and accomplish step 3 along the way.

Currently, the authorized_key Ansible module does not have a method to remove all but the specified ssh key. However I have sent a pull request to accomplish this, by way of the exclusive keyword. The task here will assume that this pull request has merged.

This task looks just like the first task, but with the addition of exclusive=yes. If you don’t want to use the modified authorized_key module, you could make use of the copy module which could get content from the new_pub_key file similar to how authorized_key gets content from the file.

If all has gone well, all that should be left in the authorized_keys file is the public part of our new key-pair. Our new key has been successfully rotated in and the old key is no longer allowed to log in.

Next Steps

There are more things we could do with our playbook. We could automate the creation of the key itself, which would look something like this:

The when conditional here makes sure that only one key is generated, by only running on the first  host. Delegation is also used to make the action happen on the system calling ansible, rather than a remote host.

We could also move the private key file into a location that our local ssh config is prepared to use by default:

Any number of other tasks could be added around these, or specific options to the existing tasks. This blog post is just enough to get you started.

Conclusion

SSH keys are awesome. Anybody using ssh should be using keys. Keys are powerful, and thus need care. Rotate keys frequently and make sure to invalidate old keys. Automation can make this process a lot easier and more reliable.

For convenience, here is a complete playbook code block:

And lastly here is a horn-less unicorn pooping a rainbow I found on photobucket, because this post has been far too serious.

Wheeeeee!

Linuxfest Northwest 2013

Time flies, and another LFNW is upon us. I haven’t fully decompressed from attending my first OpenStack Summit but now I have to switch gears and get ready for LFNW.

This year I’m giving 3 presentations, one of which is a 80 minute session that I’m sharing with a good friend — a repeat of a popular session we did last year. Here is a link to the sessions:

LFNW is a free event with good quality sessions given by good quality people. The “hallway track” is just as valuable as the other tracks as you can build lasting relationships with the tech movers and shakers of the area.

This will be something like my 10th year going. I hope to see some of you there too!

Documenting python code using sphinx and github

Documentation is good right? Doesn’t everybody like to have a good source of documents when working with a piece of software? I know I sure do. But creating documentation can be a drag, and creating pretty documentation can be even more of a drag, more time consuming than writing the code in the first place. Ain’t nobody got time for that!

Thankfully there are utilities out there that will help with creating documentation. My language of choice is Python and for generating documentation I like to use Sphinx. Sphinx appeals to me because much of the documentation can be auto-generated based on existing code. It works with python docstrings in a way that can be PEP257 compatible. It takes very little setup to get from nothing to decent looking documentation. For an example of what sphinx output can look like, see my pyendeavor project.

The first step is to make frequent use of docstrings in the code. Not only will this help to generate useful documentation later, but it is also really handy for anybody working with the code to understand what the code is doing. A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. It’s more than just a comment, because it will become an attribute of the object itself. When I look at a function or a class init I ask 3 main questions:

  1. What does this code do?
  2. What are the inputs to it?
  3. What will I get in return?

These questions can be broken down into the docstring very easily:

Those three lines answer the question pretty well, and if we were just going to look at the code and not try to generate html/pdf/whatever documentation we could be done. Instead lets try to give it a little more structure, structure that sphinx will appreciate:

A human reading this docstring is still going to know what’s going on pretty well, and sphinx is going to read it even better:


The python source files themselves service as input to the documentation creation tool.  Just go about the business of writing code and keep the docstrings flowing and updated with changes and very useful documentation can be produced.

How does one generate the documentation though? How does one use sphinx? I’m glad you asked! Sphinx has a utility that can help get started — sphinx-apidoc. First make a docs/ subdir of the project (or whatever you want to call it). This is where some sphinx control files will go, although the rendered output doesn’t necessarily have to go there. For my example software pyendeavor I have a single python package, pyendeavor, located in the src/pyendeavor directory. To get started with sphinx, I would issue the command:

The F causes a full setup to happen, the -o docs tells sphinx to direct it’s output to the newly created docs directory, and the src/ tells sphinx to look in src/ for my modules.

Sphinx uses reStructuredText as the input format, which is pretty easy to work with.  And if we look at the files it generated for us, we’ll see that there isn’t much there:

And if we look in the pyendevor file we’ll see more, but here is a snippit:

This just tells sphinx to read the module files and generate content for module members, undocumented members, and to follow inheritance. These are all just commands that sphinx understands, but you don’t really have to.

There is a conf.py file that will need some attention. Sphinx will need to know how to import the code, so a system path entry to where the code can be found needs to be added.  There is a helpful comment near the top, just clear the hash and update the path:

Now we are ready to make some content! Make sure to be in the docs/ dir and run make. A list of possible make targets will be displayed, but the we care about is ‘html’, so run that one:

Lots of output, but the end result is some html pages in _build/html/ :

A browser can be used to view index.html and all the linked docs. How awesome! Useful documentation without having to do much more than just use docstrings in the code (which should be done anyway).

Now that docs can be generated, they should be put somewhere useful. That’s where the github part of this post comes in. A lot of projects are posted on github and I’ve started using it for more of mine too. One nice feature is a way to create a webspace for a project by pushing content to a ‘gh-pages’ branch of a project. These following steps will help setup a repo to have a place to publish the html content of sphinx. They are based on the directions I found here, however instead of using a directory outside our project space we’re going to make use of a git workdir so that we never have to leave the project directory to get things done.

First lets create a directory to hold our new branch from the top level of our source.

Next create a new workdir named html within that directory:

(More information about git-new-workdir can be found here but essentially it is a way to create a subdirectory that can be checked out to a new branch, but all the git content will be linked. A git fetch in the topdir of the clone will also update the git content in the workdir path. No need for multiple pulls.)

Now we have to prepare the workdir for sphinx content. To do this we need to create an empty gh-pages branch within the html directory:

Initially there will be a copy of the source tree in the html directory that can be blown away with:

Back in the docs/ directory a change needs to be made to the Makefile to tell it to output content to where we want it, the gh-pages/html/ directory. Look for:

and change it to

Now from the docs/ directory, run make html again. This time you’ll notice that the output goes to ../gh-pages/html/

Switching back to that directory the files can be added and committed with:

Now the branch can be pushed to github:

After upwards to 10 minutes later the pages site for the repo can be visited, like mine for pyendeavor. Github also has a feature for README files in the base of your repo, supporting/rendering markdown and reStructuredText. This README.rst file can also be included in your pages output with a simple tweak to the index.rst file in docs/. The ..include directive will tell sphinx to include the content from the README.rst file when generating html output:

The README.rst file source can be seen here. The same content of the README file will display in the generated docs, updating this content only has to happen once.

It is a good idea to add the control files in docs/ to the repository and keep them under source control:

If new modules or packages are added to the source tree then a new run of sphinx-apidoc is necessary. The -f flag will tell sphinx that it is OK to overwrite the existing files:

When functions change or new docstrings are added to the code new html content needs to be generated by running make html and committing/pushing the new content in the gh-pages/html/ directory.

That’s all there is to it, sphinx generated docs rendered by github via the gh-pages branch. All within a single directory tree. Have fun, and get to documenting!

Python magic and remote APIs

I’m a pretty big fan of Python as a programming language. It allows me to program by discovery, that is poke and prod at things until the work. Not having to compile an entire program every time I change something is pretty fantastic, as is the ability to insert a debug statement and be able to break a program at that point, then run arbitrary python code within that context. Pretty indispensable to how I write software.

Another thing I like about Python, which some may not, is the ability to do magic things. Not quite so magic as xkcd would like us to believe, but fun stuff indeed.

Recently one of the services at work grew a json API to bang against, and for fun I thought I’d whip up some python to play with it. My team had a few utility scripts that would bang on the old xmlrpc interface to get some data, I wanted to see how much faster it was with json.

First, if you have to do anything web stuff, you really should be using the Requests module. It is so, so much better than using urllib(2) directly.

The API I wanted to program against had an Auth end point that would return to you a token string. This string could be included with later API calls to provide authentication. Requests lets you create a session object that can have attributes that carry on to all web calls, such as a custom auth header with a token.

Now the session object can be used just like requests itself, and it’ll include the new header we’ve added. While this was neat at first, I quickly realized that I wanted to make this session object an attribute of a more generic object for working with the API. Each time you use session or requests you have to fill in a url and that’s tedious, so I made a python class to handle that for me. One bit of magic I used here was a python property.

A python property is a way populate a class attribute on the fly / as needed without the code using your object needing to know that it’s happening behind the scenes. It’s a getter/setter without having to get and set, and it caches the value for future getting.  My class sets some data during the init process, and creates a property for the session attribute, which can then be used in later functions, like a login or query function.

With this structure we can do things like:

We get back a json blob that has what the API returned to us. What happened was that the query function built up the information for the requests bit, which was passed into self.session.post(). Since this was the first time trying to access self.session we went through the @property tagged session() function. That function determined that self._session was not populated yet and called _auth(). _auth() in turn did the login dance to generate the token, built up a requests.Session object, tweaked the header and stuffed it into self._session. session() then returned that to the caller thus delivering the actual session object. Magic!  The next time session is accessed it will quickly return the value of self._session. Properties are awesome and useful.

A CServ() object is okay, but not useful on its own. If I wanted to get a bit of data about a computer and use that data numerous times I’d either have to store a copy of the data in a local variable, or do queries each time I wanted the data. Neither are efficient. What I really want is to be able to create a Computer object, and from that object access attributes directly, like say Computer.name. Here is where some more magic comes in. We already know we can create properties to back attributes. We could go find out what all possible things we could look up about a computer in our CServ service, then write out properties for each of those. That… doesn’t sound fun. Particularly if you think about this CServ having Computer items, Switch items, Account items, etc… That would be a lot of typing!

What if instead there was a way to do dynamic properties?  What if when we tried to access Computer.primary_ip the code would just know to do a query to look up the ‘primary_ip’ attribute of a Computer.Computer cserv API class?  Well we’re in luck, because there is a way!

First we’re going to create a subclass of CServ, CServOBJ. This class will be a base class for any number of objects, like Computers, Accounts, etc.. We can save a lot of code duplication by putting the shared bits in CServOBJ.

Right now we don’t need to overload the __init__ method, so we can dive right into the magic. In python, when you attempt to access an object’s attribute, behind the scenes an object’s __getattr__(attribute) method is called. Normally you don’t see it because it’s all built in, but we can override it to make attribute access do something different. In our case, we want to do an API look up to get the value if we don’t already have it, so we’ll overload the function:

Objects in python also have a built in __dict__ that keeps track of all the attributes. Our simple little bit of code will try to return the value for the name of the attribute the function gets. If that attribute doesn’t exist in the built in dict, a keyerror would happen. We catch that error and call our _setAttrib() function. This function is where the look up is built up using some other class attributes we’ll get to later. A session call is made and the value is fed into the setattr python built-in. All this work happens behind the scenes to the bit of code just trying to access the attribute, and the lookup only happens once.  That’s all we really need for now in the base class, lets create a Computer class.

That’s all there is to it.  _qclass is defined as a class attribute, it does not change per-object.  It is the class name passed into the remote API.  The object creation takes a number, which is the identifier for computers in our system.  It assigns that to the number attribute so that if we reference computer.number we don’t make another API call.  _qval is the place holder that will be common across all the objects for what do use as a look up key.  The parent class’s init is called (which skips all the way up to CServ) to complete the object creation.

With this setup, we can program against it very easily to access and cache data:

MAGIC!

Now if you are like me, you spend a lot of time in things like ipython or bpython to interactively program stuff and play with objects and whatnot.  These environments provide tab completion, help functions, etc…  With our current code though, we couldn’t tab complete the available attributes.  Only the name attribute and the functions we’ve created would show up.  To fix that, we need to overload the built in __dir__ function.  This function is used when getting a listing of what is available to an object, dir(object).  A useful exploratory tool.  ipython/bpython use this method to see what tab completion options to provide you.  Luckily our internal service provides an API call to get a listing of possible attributes, so we can hook that into __dir__.  But of course we only want to do this API call once (per object) so we will want to make it a property.  Since there is nothing API class specific we can put the code into the CServOBJ class:

Since we are creating a property at this level we will grow an __init__ function to prep for that. Then we define the attribs() function. A short-cut is taken here, instead of calling out to some other function to load the attributes the load is done directly.  Any time a Computer object gets a dir() call our overloaded function will return a sorted list that is the combination of the built in functions/attributes, anything that has been added to the specific object, and the available API attributes.  Tab-completion achieved!

This has been a quick look into some of the magic you can do with Python.  It’s not quite antigravity, but it is useful, and food for thought for anybody that’s programming against remote APIs.  Objects are good, objects with dynamic attributes are better, and tab completion is icing on the cake.  Enjoy!

Hanging up the Hat

Seven years ago I put on a Red Hat.  I had been a part of the community for a few years prior to that as well, first as a newbie in #redhat looking for help, later as somebody providing help to others, then as an early cabal member in the newly formed Fedora project, later as a leader of Fedora Legacy.  Along the way I’ve made some very strong friendships that I hope will continue.

This is my last week as a Red Hatter.

I’d like to think that the Fedora project is a better place now than when I joined.  I hope my time here has made a positive difference.  My new role will leave me with less time to participate directly in Fedora, although I will likely continue maintaining a few packages here and there for my personal use.

I do plan on being at FUDCon in January, I hope to see many of you there.

I’ll likely be shutting down this particular blog this week, but a new one will start and I’ll link to it from here if you really like me and wish to keep tabs on what I’m doing 🙂

Text Mode for Fedora 18

Anaconda has been through a pretty major UI rewrite.  Anybody that has tried either Fedora 18 Alpha or any of the nightly images since then should be well aware of this.

The UI rewrite was done for many reasons and accomplishes many goals.  I’m not going to rehash that here.  What I am going to talk about is what happened to text mode.

Text mode in F17 and before was ncurses based.  This gave some kind of pretty UI to do things.  There were drawbacks though.  ncurses didn’t work on all the terminals people throw at Anaconda, in particular dumb serial terminals and x3270, the terminal for s390x.  Because of that we also had a (non-interactive) very simple display mode called ‘cmdline’.  This just did simple line printing of progress during a kickstart.  Unfortunately due to the way the old UI was coded there wasn’t a good complete separation of presentation from computation so many things were written 3 times.  Once for gui, once for text, and once for cmdline.  Fun right?

With Fedora 18 there is one text UI.  It is used on full featured consoles as well as dumb ones.  It doesn’t use ncurses, it just uses simple line printing.  It can be used over serial (interactively!) and over x3270 (non-interactively).  It is a simple question and answer prompt.

The design of text mode for F18 and beyond is closely modelled after the design of the GUI for F18 and beyond.  A hub and a set of spokes, so that users can do tasks in whatever order they wish, potentially while things happen in the background.  There is a main setup hub where the user can set a time zone, set a root password, and do some basic storage configuration (pick target disks and a strategy to clear space on those disks).  Once all tasks are complete the user can progress into the actual installation where we just throw up a running list of tasks the backend is accomplishing.

There are very few things you can do with text mode in Fedora 18.  You cannot pick languages,  and you cannot pick installation source.  These can be provided via boot time arguments.  You also cannot do advanced storage configuration which would need a kickstart file to accomplish via “text mode”.  We are planning to add some functionality for Fedora 19, but we haven’t decided which items and how rich those items will be.  Text mode is still de-emphasized in favor of direct GUI or remote GUI by way of VNC.  For kickstarts however text mode is still pretty great.  The minimal UI does not prevent a fully customized kickstart from being executed.

Give text mode a spin!  It’s simple, fast, unobtrusive, and gets the job done.