This blog is going to be apart of a larger series as I progress through the MLOps ZoomCamp 2022 course. Future blogs will include links to others so you can easily follow along.
Understand practical aspects of productionizing ML services — from collecting requirements to model deployment and monitoring. This post is mostly for my own knowledge to be reminded of any commands or configurations I work on during the span of the course, however, If my notes help you out in anyway that is wonderful.
All resources to the course video playlist and GitHub at the very bottom of the post under the resources & citation section.
Disclaimer: This is by no means is production code, secure practice, nor any expectations of confidentiality, integrity, or availability are made. This tutorial is simply my steps taken with a focus on the course content, not security. Further, there may be many gaps not covered here due to the focus on commands, process, and direct course content. For the full, detailed tutorial, visit the video playlist at the bottom and support DataTalks.Club directly!
This part begins with creating a new AWS instance, the first steps of our configuration need us to create the instance and download private key. Once the private key is downloaded we can move it to our SSH folder. By default on mac the file permissions will be 644, which, ssh will complain is overly permissive. To resolve this we can simply set the permissions to 600 with the following commands:
mv ~Downloads/macbook-awskey.pem ~..ssh/macbook-awskey.pem
chmod 600 ~.ssh/macbook-awskey.pem
The next step is to verify we can connect to our AWS instance using our key pair. To do this we will need to use ssh with the -i flag, and the path to our key. The default user name is Ubuntu and our IP address is in the AWS console.
ssh -i ~/.ssh/macbook-awskey.pem [email protected]_IP_HERE
This works, however, connecting with the long command is annoying and this course is running for a few weeks. Lets now configure SSH so that we can use a host name instead of this long command. First, if you don’t already have a configuration file, we can make one with the touch command. Any editor works, but Ill use my favourite, vi editor. Remember, once in vi, the [i] key will set it to insert mode, and [esc][:][W][Q] command will write the file to disk and quit vi interface.
sudo vi ~.ssh/configHost mlops-zoomcamp
IdentityFile /Users/Your Username/.ssh/macbook-awskey.pem
Note The above IP address in the Host Name field will need to be updated every time the AWS instance is restarted.
To test this connection, we should be able to simply execute ssh mlops-zoomcamp from here on out.
Because this system comes with python3.10.4, we can install Anaconda to manage all packages and downgrade our python version to ensure compatibility with the course content.
next we can install docker and docker compose in a software folder. We then make docker-compose executable
sudo apt update
sudo apt install docker.io
wget https://github.com/docker/compose/releases/download/v2.5.0/docker-compose-linux-x86_64 -O docker-compose
chmod +x docker-compose
next we want docker-compose to be accessible everywhere, so we can setup our .bashrc dot file to include this soft path. To do this we simply edit the .bashrc file, add the path to the bottom. Afterwards we can exit vi and execute the .bashrc file with source. We can verify docker-compose works by executing the command outside of the /soft/ folder.
sudo vi .bashrc
While we are at it, we can also verify docker itself by executing the hello-world container.
sudo docker run hello-world
Afterwards we can add our user to the docker group so we don’t have to execute sudo each time for the sake of this tutorial.
sudo usermod -aG docker $USER
At this point we deviate from the video a bit to configure our remote git ssh key pair setup. We will need to generate our ssh key for this machine so we can push our homework back to github. The steps to generate a new RSA key are below, followed by a cat of the public key so we can add it to github.
ssh-keygen -t rsa -b 4096 -C "[email protected]"
Once added to your GitHub profile, we can verify this process worked by downloading the repository via the git URL using our added key.
git clone [email protected]:crowdere/mlops-zoomcamp.git
Next, we want to make a working directory for our notebooks and setup Jupyter notebook to expose a public interface for us. This will be done using the same setup as the MLOps video, with the remote ssh VSCode plugin.
Now that this is running, we can download the plugin, connect to our AWS instance, and then expose the port via ssh tunnelling.
We will also port forward the connection with out VSCode remote connection as demonstrated in the MLOps video.
And finally at this point all of our hard work is nearly completed. We can verify the reverse tunnel is configured correctly by visiting the Jupyter notebooks server and pasting our token in.
At this point we are complete the first video, and to tie things up nicely we will also execute the only command from the optional video 1.3. This command will install our first dependency. The dependency we need to install is pyarrow. Apache Arrow is a development platform for in-memory analytic’s. It contains a set of technologies that enable big data systems to store, process and move data fast. [Apache 2022]. This library is used for reading in the Parquet files for training our model in the upcoming blogs.