Interactive debugging python code in (mini) Kubernetes
by Jesse Keating
Lately I've been playing around with Kubernetes. If you don't know what Kubernetes (k8s) is, then the rest of this post is going to be very confusing to you.
Needed concepts
I'm going to talk about a few things. Here are some links to places to get up to speed should any of this not make any sense:
Background
I'm somewhat late to the k8s game, and I'm still trying to get my bearings. One task I set out to figure out is how I can replicate my development workflow I had built up with Docker Compose to launch containers of my application locally for testing. The application I've been developing on consists of a Zookeeper service and three Python services. I have the Python services broken out into three separate containers based on the same image, but with different launch commands. When testing things locally, I often want to introduce code that I haven't yet committed to a repository, and instead of building new images every time, I make use of a volume mount to bring my code into the container at runtime. This works well with Python thanks to a feature in pip, the tool for installing python packages. I can tell pip to perform an editable install. This type of install makes use of a symlink in the installation target path which links to the source directory of the install. In my Dockerfile I clone the source code to /zuul
, and then perform the pip install from there. What this means is that after the install, I could simply alter the files in the original checkout in /zuul
and restart the process and the changes will take effect. To expand on that further, this gives me the ability to attach a volume mount from my laptop's zuul checkout directory (where I have edited files) to the /zuul
path within the container at start time, so that I can make use of edited files without rebuilding the image.
On to Kubernetes!
My workflow worked great in docker-compose, but now I want to do this with k8s. To demonstrate how this works, I've created a simple Python web server in a demo app. I've put the source for this on GitHub for convenience.
First, I need to install and run minikube, a tool to run a Kubernetes cluster locally on my laptop. Minikube will download some data and launch a virtual machine in which to run the k8s services, including its own Docker daemon.
Build Docker image
With minikube running, I next need to configure my Docker client to make use of the Docker Engine within minikube. A simple command `eval $(minikube docker-env)` will set up my client. Now I can build my Docker image so that it'll be available for use within k8s. I've cloned my repository into a src/derpops/demoapp
directory relative to my homer. From there I just need a simple Docker build command to build my image. NOTE! I'm using a tag other than latest
so that Kubernetes will not try to pull the latest version of my image. My image will only exist locally, so a pull would fail.
$ docker build -t demoapp:demo . Sending build context to Docker daemon 129.5kB Step 1 : FROM python:alpine ---> 83da41380580 Step 2 : RUN apk --no-cache add --update git ---> Using cache ---> bf6e5c3fd23c Step 3 : RUN git clone https://github.com/j2sol/demoapp.git /demoapp ---> Using cache ---> eacba0083289 Step 4 : RUN pip install -e /demoapp ---> Using cache ---> 3f881178557d Step 5 : RUN pip install rpdb ---> Using cache ---> 81845e2506d2 Step 6 : RUN apk del git ---> Using cache ---> 19d2d2bb91d1 Step 7 : CMD demoapp ---> Using cache ---> 9eddb5c0209a Successfully built 9eddb5c0209a
Creating a Deployment
With the image in place, I can now create a k8s Deployment. The Deployment lets me define a container to launch, with the image I built above, and a command to run within the container. I can do this with a simple kubectl command:
$ kubectl create deployment demoapp --image demoapp:demo deployment "demoapp" created $ kubectl get pods NAME READY STATUS RESTARTS AGE demoapp-3399253556-ngkv0 1/1 Running 0 1s
If my deployment is created successfully, a new pod will show up, which will be running my code.
Creating a Service
To see if my code is working properly, I need to be able to reach the web server. A k8s Service is necessary to set up the networks appropriately. Just like with the Deployment, a simple kubectl command will suffice to create the service:
$ kubectl create service nodeport demoapp --tcp=8000 service "demoapp" created $ kubectl get services NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE demoapp 10.0.0.2328000:32369/TCP 1m kubernetes 10.0.0.1 443/TCP 23h
This command created a service that will allow me to reach port 8000 of the container running the application. My code specifically tells the python library to use port 8000 and to listen on all addresses. This type of service is a nodeport, which makes this port reachable from every node. However, since this is minikube, I only have one node, and I can ask minikube to tell me what the IP address of the node is with the service
command. I'll ask it to just display the URL, instead of opening the URL in my browser, and then use curl
to access the URL:
$ minikube service demoapp --url http://192.168.64.3:32369 $ curl http://192.168.64.3:32369 Hi there!%
Injecting new code
My application works, but now I want to alter the code. To get new code into my container, I need to add a volume mount to my deployment.
VolumeMounts are used to expose content into a container. A VolumeMount definition combines the name of a volume and a path to mount it within the container. The name matches a defined Volume, of which many types are supported. The type we're interested in, the hostPath
type, exposes a file or a directory from the node (the machine a container is running on). The use of minikube automatically exposes a folder from the host machine minikube runs on into the VM where k8s is running, which is the node. In my case, the /Users
directory on my laptop is exposed as /Users
on the node, and thus I can make use of this in a hostPath
volume.
To add a volume to my Deployment, I'll need to create a yaml
file describing my Deployment spec. I can do this quickly by repeating the earlier command but adding arguments to print out YAML:
$ kubectl create deployment demoapp --image demoapp:demo --dry-run -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: creationTimestamp: null labels: app: demoapp name: demoapp spec: replicas: 1 selector: matchLabels: app: demoapp strategy: {} template: metadata: creationTimestamp: null labels: app: demoapp spec: containers: - image: demoapp:demo name: demoapp resources: {} status: {}
I saved this output to a new file, demoapp-deployment.yaml
, where I can make adjustments as needed. I need to alter the containers
spec for the demoapp
container to define a volumeMount
:
resources: {} volumeMounts: - name: demosource mountPath: /demoapp/demoapp status: {}
This references a volume by the name of demosource
which I also need to define in the spec
as a new key:
mountPath: /demoapp/demoapp volumes: - name: demosource hostPath: path: /Users/jkeating/src/derpops/demoapp/demoapp status: {}
These additions will cause the directory /Users/jkeating/src/derpops/demoapp/demoapp
to be mounted to the path /demoapp/demoapp
within the container when it launches. This will overlay the version of code on my laptop on top of the code that already exists in the container.
To get the new definition of my Deployment in use, I'll delete the existing Deployment and create a new one from the YAML file:
$ kubectl delete deployment demoapp deployment "demoapp" deleted $ kubectl create -f demoapp-deployment.yaml deployment "demoapp" created $ kubectl get pods NAME READY STATUS RESTARTS AGE demoapp-2344086337-x6697 1/1 Running 0 1s demoapp-3399253556-ngkv0 1/1 Terminating 0 30m
The existing pod is being terminated while a new pod is running. If all went well, I should still be able to use curl
to reach the web server:
$ curl http://192.168.64.3:32369 Hi there!%
The output is the same as earlier, as I haven't changed any code. But what if I change the value of the variable RESPONSE to something new by editing the file demoapp/__init__.py
on my laptop.
self.end_headers() RESPONSE='''Edited code''' self.wfile.write(bytes(RESPONSE, 'UTF-8'))
To get my new code in use, I can simply delete the Pod my Deployment created. A Deployment uses a ReplicaSet to control how many Pods are active. Deleting the pod will trigger the creation of a new one, which should pick up my new code. To determine which pod to delete, I'll use the get pods
command via kubectl
. This will list all the running pods. Then I'll use the delete pod
command to delete the `demoapp` pod, and then make sure a new one is created:
$ kubectl get pods NAME READY STATUS RESTARTS AGE demoapp-2344086337-x6697 1/1 Running 0 11m $ kubectl delete pod demoapp-2344086337-x6697 pod "demoapp-2344086337-x6697" deleted $ kubectl get pods NAME READY STATUS RESTARTS AGE demoapp-2344086337-c6qth 1/1 Running 0 1s demoapp-2344086337-x6697 1/1 Terminating 0 11m
The new pod should have my new code, which I'll verify with curl
once more:
$ curl http://192.168.64.3:32369 Edited code%
Interactive debugging
Getting new code used in the container is fun, but what is even more useful is being able to interactively debug this new code. Python developers should be familiar with the use of pdb, the python debugger. This utility can be used to insert a break point into the source code in order to interactively debug the code at that point in the execution. Pdb is fantastic when you are able to execute the code directly, but requires being attached to the tty that started the python process. That's a difficult feat inside of a system like k8s. Thankfully there is a wrapper around pdb specifically for connecting to remote python processes, called rpdb. When using rpdb, the wrapper will redirect stdout/stdin to a socket handler, which can be accessed over TCP. This will allow me to define a breakpoint in the code and then connect to the socket remotely in order to attach to the debugger to interact with the process. (One downside of rpdb is that it is not part of the standard library, and thus the library will need to be explicitly installed into the python environment. I've done this in my Dockerfile.)
To debug my code, I first have to add the breakpoint to one of my source files. At the appropriate line, I need to insert import rpdb; rpdb.set_trace("0.0.0.0")
. I specifically need to tell rpdb to listen on 0.0.0.0
instead of the default 127.0.0.0
. This will make rpdb
listen on all addresses rather than just localhost. This is less secure, but required in order to reach it through k8s. Once again, I'll edit the demoapp/__init__.py
file on my laptop:
self.end_headers() RESPONSE='''Edited code''' import rpdb; rpdb.set_trace('0.0.0.0') self.wfile.write(bytes(RESPONSE, 'UTF-8'))
I also need to update my k8s Service to expose the new port. Rpdb will listen on port 4444
by default, so I'll use that in my service. First I'll delete the existing demoapp
service and then re-create it adding a second port:
$ kubectl delete service demoapp service "demoapp" deleted $ kubectl create service nodeport demoapp --tcp=8000 --tcp=4444 service "demoapp" created $ kubectl get services NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE demoapp 10.0.0.1458000:31661/TCP,4444:30611/TCP 0s kubernetes 10.0.0.1 443/TCP 1d
I can test that the new service works by using curl
to the new port:
$ curl http://192.168.64.3:31661 Edited code%
Now that the code is edited and the service is created to forward ports through, I can restart the container. Once again I'll delete the running pod to trigger the creation of a replacement.
$ kubectl get pods NAME READY STATUS RESTARTS AGE demoapp-2344086337-c6qth 1/1 Running 0 17m $ kubectl delete pod demoapp-2344086337-c6qth pod "demoapp-2344086337-c6qth" deleted $ kubectl get pods NAME READY STATUS RESTARTS AGE demoapp-2344086337-c6qth 1/1 Terminating 0 18m demoapp-2344086337-mvh0g 1/1 Running 0 0s
Since my breakpoint is inside the do_GET
function, I'll need to use curl
to initiate a GET
request. This will seem to hang in the terminal, as the break point has been reached and execution is waiting. At this point, I should be able to connect to the waiting debugger, using the Service information to determine which port to connect to. In another terminal I can use nc
to connect to the debugger! From this point on, it's debugging as usual.
$ curl http://192.168.64.3:31661 _ --------------------------------------- $ nc 192.168.64.3 30611 > /demoapp/demoapp/__init__.py(11)do_GET() -> self.wfile.write(bytes(RESPONSE, 'UTF-8')) (Pdb) l 6 self.send_response(200) 7 self.send_header('Content-type', 'text/html') 8 self.end_headers() 9 RESPONSE='''Edited code''' 10 import rpdb; rpdb.set_trace('0.0.0.0') 11 -> self.wfile.write(bytes(RESPONSE, 'UTF-8')) 12 13 def run(server_class=http.server.HTTPServer, handler_class=myHandlers): 14 server_address = ('', 8000) 15 myserver = server_class(server_address, handler_class) 16 myserver.serve_forever() (Pdb)
From here I can change the value of RESPONSE
once more, and then continue execution, which should cause my curl
command to return with my new message:
$ curl http://192.168.64.3:31661 Live debugging sure is fun!% ---------------------------------- 16 myserver.serve_forever() (Pdb) RESPONSE='''Live debugging sure is fun!''' (Pdb) c _
Conclusion
Kuberetes is a pretty huge leap forward in container orchestration. With that advancement comes some complexity, and a whole lot of new concepts to learn. However, the basic building blocks are there to continue using workflows that have been useful in the past. This workflow to debug code live is just a small example of what is possible with k8s, minikube, and containers in general.
I've added a complete demoapp-deployment.yaml
file to the git repository, including a Service definition. Hopefully this example will be useful! As always, comment here or on Twitter should you have any thoughts to share.
Happy kubing!
tags: k8s - kubernetes - minikube - python - debugging