MLOps ZoomCamp 2022


MLOps Zoomcamp 2022 | Image credit to DataTalks.Club — Alexey Grigorev

This blog is going to be apart of a larger series as I progress through the MLOps ZoomCamp 2022 course. Future blogs will include links to others so you can easily follow along.

Understand practical aspects of productionizing ML services — from collecting requirements to model deployment and monitoring. This post is mostly for my own knowledge to be reminded of any commands or configurations I work on during the span of the course, however, If my notes help you out in anyway that is wonderful.

All resources to the course video playlist and GitHub at the very bottom of the post under the resources & citation section.

Disclaimer: This is by no means is production code, secure practice, nor any expectations of confidentiality, integrity, or availability are made. This tutorial is simply my steps taken with a focus on the course content, not security. Further, there may be many gaps not covered here due to the focus on commands, process, and direct course content. For the full, detailed tutorial, visit the video playlist at the bottom and support DataTalks.Club directly!

This part begins with creating a new AWS instance, the first steps of our configuration need us to create the instance and download private key. Once the private key is downloaded we can move it to our SSH folder. By default on mac the file permissions will be 644, which, ssh will complain is overly permissive. To resolve this we can simply set the permissions to 600 with the following commands:

mv ~Downloads/macbook-awskey.pem ~..ssh/macbook-awskey.pem
chmod 600 ~.ssh/macbook-awskey.pem

The next step is to verify we can connect to our AWS instance using our key pair. To do this we will need to use ssh with the -i flag, and the path to our key. The default user name is Ubuntu and our IP address is in the AWS console.

ssh -i ~/.ssh/macbook-awskey.pem [email protected]_IP_HERE
Successfully connected to the instance | Image credit to author

This works, however, connecting with the long command is annoying and this course is running for a few weeks. Lets now configure SSH so that we can use a host name instead of this long command. First, if you don’t already have a configuration file, we can make one with the touch command. Any editor works, but Ill use my favourite, vi editor. Remember, once in vi, the [i] key will set it to insert mode, and [esc][:][W][Q] command will write the file to disk and quit vi interface.

touch ~.ssh/config
sudo vi ~.ssh/config
Host mlops-zoomcamp
HostName x.x.x.x
User ubuntu
IdentityFile /Users/Your Username/.ssh/macbook-awskey.pem
StrictHostKeyChecking no

Note The above IP address in the Host Name field will need to be updated every time the AWS instance is restarted.

To test this connection, we should be able to simply execute ssh mlops-zoomcamp from here on out.

Successful bash welcome message for our new AWS instance using custom host name | Image credit to author

Because this system comes with python3.10.4, we can install Anaconda to manage all packages and downgrade our python version to ensure compatibility with the course content.

wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
bash Anaconda3-2022.05-Linux-x86_64.sh

next we can install docker and docker compose in a software folder. We then make docker-compose executable

sudo apt update
sudo apt install docker.io
mkdir soft
cd soft
wget https://github.com/docker/compose/releases/download/v2.5.0/docker-compose-linux-x86_64 -O docker-compose
chmod +x docker-compose

next we want docker-compose to be accessible everywhere, so we can setup our .bashrc dot file to include this soft path. To do this we simply edit the .bashrc file, add the path to the bottom. Afterwards we can exit vi and execute the .bashrc file with source. We can verify docker-compose works by executing the command outside of the /soft/ folder.

sudo vi .bashrc
export PATH="$HOME/soft:$PATH"
source .bashrc
docker-compose
Executing which command to verify the path was set correctly in bashrc | Image credit to author

While we are at it, we can also verify docker itself by executing the hello-world container.

sudo docker run hello-world

Afterwards we can add our user to the docker group so we don’t have to execute sudo each time for the sake of this tutorial.

sudo usermod -aG docker $USER
Executing docker without sudo verified we are successfully added to the group | Image credit to author

At this point we deviate from the video a bit to configure our remote git ssh key pair setup. We will need to generate our ssh key for this machine so we can push our homework back to github. The steps to generate a new RSA key are below, followed by a cat of the public key so we can add it to github.

ssh-keygen -t rsa -b 4096 -C "[email protected]"
ssh-add ~/.ssh/id_rsa
cat id_rsa.pub

Once added to your GitHub profile, we can verify this process worked by downloading the repository via the git URL using our added key.

git clone [email protected]:crowdere/mlops-zoomcamp.git
Successfully cloned the repo using our SSH key | Image credit to author

Next, we want to make a working directory for our notebooks and setup Jupyter notebook to expose a public interface for us. This will be done using the same setup as the MLOps video, with the remote ssh VSCode plugin.

cd ~
mkdir notebooks
cd notebooks
jupyter notebook

Now that this is running, we can download the plugin, connect to our AWS instance, and then expose the port via ssh tunnelling.

Remote SSH VSCode plugin | Image credit to author
Connecting via our SSH configuration host name | Image credit to author

We will also port forward the connection with out VSCode remote connection as demonstrated in the MLOps video.

Port forwarding via SSH connection over VSCode remote plugin | Image credit to author

And finally at this point all of our hard work is nearly completed. We can verify the reverse tunnel is configured correctly by visiting the Jupyter notebooks server and pasting our token in.

Successfully accessing Jupyter notebooks over internet | Image credit to author

At this point we are complete the first video, and to tie things up nicely we will also execute the only command from the optional video 1.3. This command will install our first dependency. The dependency we need to install is pyarrow. Apache Arrow is a development platform for in-memory analytic’s. It contains a set of technologies that enable big data systems to store, process and move data fast. [Apache 2022]. This library is used for reading in the Parquet files for training our model in the upcoming blogs.

Installation of pyarrow dependency | Image credit to author



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Article

‘Green growth’ can help African countries address socio-economic inequalities: Report

Next Article

:where() :is() :has()? New CSS selectors that make your life easier — dailydevlinks

Related Posts