Your submission was sent successfully! Close

Jump to main content
  1. Blog
  2. Article

on 18 August 2021

How to run Apache Spark on MicroK8s and Ubuntu Core, in the cloud: Part 2

If you have followed Part 1 of this blog post, you’ll have a working setup that allows you to run MicroK8s on Ubuntu Core in a VM on your local workstation using Multipass. But you’re itching to get this up and running on the cloud. I know, so am I! So let’s step through that now. Currently, this is known to work on GCE, so we’ll run this on the GCP cloud. In the future, you will be able to run Ubuntu Core on all the major clouds. If you don’t already have one, run and make yourself a GCP account and come back here as quickly as you can!

Fresh bake: building an Ubuntu Core VM image

For the first step, we’ll need a freshly built OS image of Ubuntu Core. We can use Qemu for this. By default, your Ubuntu Core systems are linked to your Ubuntu ONE account, so that you and only you can log in. 

Just be sure that you have uploaded your SSH public key to your Ubuntu ONE account profile before you start, or you could have some trouble logging into your new VM. This time we’ll use Ubuntu Core 20. Use the following commands:


sudo apt install xz-utils qemu
unxz ubuntu-core-20-amd64.img.xz

qemu-img resize -f raw ubuntu-core-20-amd64.img 60G

qemu-system-x86_64 -smp 2 -m 2048 -net nic,model=virtio -net user,hostfwd=tcp::8022-:22,hostfwd=tcp::8090-:80 -vga qxl -drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=ubuntu-core-20-amd64.img,cache=none,format=raw,id=disk1,if=none -device virtio-blk-pci,drive=disk1,bootindex=1 -machine accel=kvm

You’ll need to follow the onscreen instructions in the Qemu console window to associate the image with your Ubuntu ONE account.

Once done, verify that the logins work by going back to your terminal and connecting to the VM via SSH:

ssh <your Ubuntu ONE username>@localhost -p 8022
sudo snap install lxd

If all has gone well, you should have been able to log in and install LXD. If so, then great – you can now shut down and power off the VM. In order to convert the VM image that we just built to VHDX format so that we can import it to GCP, you can use the following steps:

sudo apt install qemu-utils

qemu-img convert ubuntu-core-20-amd64.img -O vhdx -o subformat=dynamic ubuntu-core-20-amd64.vhdx

G-force! Moving to the cloud

We should be able to import the VM image to GCP now! Use the following commands:

curl -O

tar xzf google-cloud-sdk-352.0.0-linux-x86_64.tar.gz

gcloud auth login
gcloud config set project <YOUR_PROJECT>

gsutil cp ubuntu-core-20-amd64.vhdx gs://<YOUR_BUCKET>/

gcloud compute images import ubuntu-core-20 --data-disk --source-file=gs://<YOUR_BUCKET>/ubuntu-core-20-amd64.vhdx 

Now, we need to enable secure boot for the image, or the Ubuntu Core 20 subsystem won’t play with us. Enabling secure boot also means that the system will run with full disk encryption. So the first step is to create another image based on the one we just imported, but with the UEFI_COMPATIBLE flag turned on:

gcloud compute images create ubuntu-core-20-secureboot --source-disk ubuntu-core-20 --guest-os-features="UEFI_COMPATIBLE"

Alrighty, it is time. Are you ready? Run the following command to launch that Ubuntu Core VM on the cloud. The command enables nested virtualisation, and it’s going to launch an 8-core, 32GB-RAM N2-series instance with secure boot and a second 60GB block device for LXD. (We need an N2-series because these instances support nested virtualisation; but if you want to take a lower or higher instance spec, be my guest!)

gcloud beta compute instances create ubuntu-core-20 --zone=europe-west1-b --machine-type=n2-standard-8 --network-interface network=default --network-tier=PREMIUM --maintenance-policy=MIGRATE --service-account=<YOUR_SERVICE_ACCOUNT> --scopes=,,,,, --min-cpu-platform="Intel Cascade Lake" --image=ubuntu-core-20-secureboot --boot-disk-size=60GB --boot-disk-type=pd-balanced --boot-disk-device-name=ubuntu-core-20 --shielded-secure-boot --no-shielded-vtpm --no-shielded-integrity-monitoring --reservation-affinity=any --enable-nested-virtualization --create-disk=size=60,mode=rw,auto-delete=yes,name=storage-disk,device-name=storage-disk

Once the image is up and running, we can ssh to it in the cloud using the keypair we associated with our Ubuntu ONE account, as before:

GCE_IIP=$(gcloud compute instances list | grep ubuntu-core-20 | awk '{ print $5 }')
ssh <your Ubuntu ONE username>@$GCE_IIP
sudo lxd init --auto --storage-create-device=/dev/sdb --storage-backend=zfs
sudo lxc init ubuntu:focal microk8s --vm -c limits.memory=28GB -c limits.cpu=7
sudo lxc config device override microk8s root size=40GB
sudo lxc start microk8s
sleep 90 # give the instance time to boot
sudo lxc exec microk8s -- sudo snap install microk8s --classic
sudo lxc exec microk8s -- sudo microk8s enable storage registry dns ingress

Well, look at that! You got LXD and MicroK8s onboard your shiny new Ubuntu Core cloud server, all nested and virtualised. Now that you’ve done this, you can head over to Part 3. See you there!

Related posts

10 August 2023

Write a Spark big data job with ChatGPT

AI Article

I’ve read and watched more than a few articles about ChatGPT in the last couple of months. It seems the large language model AI hype machine just can’t stop.  As somebody with a passion for music production, some of the more interesting things I’ve seen included a guy using ChatGPT to build a virtual effect ...

3 July 2023

Charmed Spark beta release is out – try it today

AI Article

The Canonical Data Fabric team is pleased to announce the first beta release of Charmed Spark, our solution for Apache Spark. Apache Spark is a free, open source software framework for developing distributed, parallel processing jobs. It’s popular with data engineers and data scientists alike when building data pipelines for both batch an ...

3 May 2023

Big data security foundations in five steps

Data Platform Article

We’ve all read the headlines about spectacular data breaches and other security incidents, and the impact that they have had on the victim organisations. And in some ways there’s no place more vulnerable to attack than a big data environment like a data lake. ...