Aman Mujeeb explores the A360 AI platform through the eys of a new data scientist, and compares the ease of use of A360 AI to an industry heavyweight: Amazon SageMaker Studio. The following is an adaptation of Aman’s post on Medium about his comparative exploration, and has been edited with the author’s permission for content and clarity.
I am an Industrial Engineer from Penn State with one year of experience in Data Science. I have worked on numerous projects involving Supervised Learning, Unsupervised Learning, Deep Learning, and the machine learning library TensorFlow. As a new data scientist, I wanted to find a machine learning model development and deployment platform that is easy to use and requires less code and time to complete routine data science tasks. As part of my work, I decided to compare the A360 AI platform to another platform commonly used by experienced data scientists, Amazon SageMaker Studio.
Summary of Findings
In the following blog, I will share my experience building and deploying a machine learning model using both machine learning platforms and why I believe that A360 AI outperforms Amazon SageMaker Studio.
A360 AI’s seamless UI makes it easy to navigate and use compared to Amazon SageMaker Studio. The entire process of the ML workflow becomes easier when a data scientist chooses to build their models in A360 AI. For instance, A360 AI logs models and saves model artifacts automatically, while SageMaker Studio requires users to manually perform these tasks. The greatest differences I noted were in the time to complete a task and the amount of code required to perform routine data science experimentation tasks. In fact, deploying a model in A360 AI requires only 20% of the time as in SageMaker Studio.
Comparison Use Case
I used both Sagemaker Studio and A360 to build and deploy a highway traffic classification model. Selected features are used to build a supervised learning model using scikit-learn to predict whether a chosen route is congested with traffic (1) or open (0). The data set I used would be helpful for a logistics company looking to solve the problem of finding the most optimal route for their drivers.
The plot below shows the relative importance of a subset of a features from the data set I used in this exercise:
Realtive Importance of Selected Features
Key Differences Between A360 AI and SageMaker Studio
In the following sections, I’ll delineate the key differences between Sagemaker Studio and A360 AI throughout the model training and deployment lifecycle:
Launching the Workspace
DevOps and IT Support
Model Development, Hyperparameter Tuning, Logging and Saving
Ease of Use
1. Launching the Workspace
Create the Notebook and Upload Data:
In this example, I used Jupyter Notebooks for model development. Before creating a notebook workspace in Amazon SageMaker Studio, one first needs to create an S3 bucket to store data. Data can be uploaded to S3 buckets via the AWS web portal.
Creating an S3 Bucket in AWS
Uploading Data to a S3 Bucket in AWS
After creating an S3 bucket for your project in the AWS web management portal, you can create a SageMaker instance by navigating to the Studio web page under the SageMaker control panel on AWS and selecting “Launch Sagemaker Studio.”
Launching SageMaker Studio from the Amazon SageMaker Website
Clicking the “Launch” button will take you to a page to set up a SageMaker domain, with both “Quick” and “Standard” setup options. These options are where you can choose a JupyterLab notebook workspace. After setting up the SageMaker domain, you’ll be able to launch it from the control panel.
Configuring SageMaker Domain
Launching the SageMaker Domain from the Control Panel
In A360 AI, creating an online data repo and project workspace can be done together. Every A360 AI project has a default data repo that stores model artifacts, experiment tracking data, and other information. These data repos can also be used for reading and writing training data. Currently, A360 AI data repos are 1:1 with S3 buckets, although the platform abstracts out the hard work. You can choose to create a new S3 bucket/data repo or specify an existing S3 bucket.
Creating a Project in A360 AI
Adding a Data Repo to a New Project in A360 AI
After creating a project, you can create a workspace and assign it to that project. A360 AI gives you a drop-down menu of available container images that you can use to provision a JupyterLab notebook instance. I chose to start up a JupyterLab notebook with the Tensorflow ML library installed by default. The notebook environment also has Scipy, Pandas, and other common data science libraries pre-installed. Of course, if I wanted to install other tools, I could do that through the Jupyter notebook using the !pip -install command. When setting up a workspace you can allocate CPU and memory for your notebook server with either preconfigured and custom options. I chose the preconfigured option with the smallest available compute cluster with 2 CPUs and 8 GB memory.
Creating a Workspace in A360 AI
Uploading Data Using A360 AI
Uploading Data Using A360 AI
Like SageMaker Studio and AWS, A360 AI gives users the ability to upload data to an S3 bucket through a GUI. In this exercise I wanted to explore uploading data using command line tools and notebook code. To load data in SageMaker Studio, users can use boto3 to interact with S3 in the notebook environment. A360 AI provides an MDK (model development kit) for users to interact with data repos/S3 buckets. In Amazon SageMaker Studio it took me 13 lines of code to load two CSV files (X and Y as shown below) whereas in A360 AI it only took me 7 lines of code.
A360 AI and SageMaker Studio both perform well in terms of launching the workspace but SageMaker Studio does require support from the DevOps team to get started, as I will explain in the next section.
2. DevOps and IT Support
While launching Amazon SageMaker Studio, certain permission levels and IAM access were needed, which for a data scientist is not easy to understand. If the AWS administrator and data scientist in a company are seperate people, they will need a lot of back and forth communication between them to ensure that permissions and access are correctly configured. Because of the seamless provisioning that is provided to data scientist users on A360, I did not face any such issue while creating or launching the workspace.
Permission Error in AWS
3. Model Development, Hyperparameter Tuning, Logging, and Saving
SageMaker Studio requires training data to be prepared in a certain way which requires going through the documentation to learn the AWS CLI and boto3. SageMaker Studio has extensive model support so data scientists can utilize it to build machine learning models and provide business insights.
A360 AI does not require users to know a CLI interface or boto3, one just needs to create a project with a Data Repo that automatically connects to cloud storage (such as S3) from the UI without any credential management. The whole procedure is quite easy and the documentation for it is fairly short and simple.
A360 AI provides example code for defining hyperparameters and supports different forms of machine learning and deep learning classifiers out-of-the-box and through import. A360 AI also tracks hyperparameter experiments, tracks training runs, and provides logs to access model and system performance during training.
A360 AI is comparatively easier than SageMaker Studio as one does not need to specify each time they want to log the model’s performance metrics.
Using SageMaker Studio, it was really hard for me to keep track of all the models I built, whereas using A360 I was able to save and log all of my models with hyperparameters using only a few lines of code by utilizing the A360 MDK.
Training a Model and Logging Experiments and Runs in A360 AI
Training a Model and Logging Experiments and Runs in SageMaker Studio
Tracking experimentation in SageMaker Studio was similar to A360 AI in a few aspects. I was not able to log my desired metrics in a neatly-formatted table as I was unable to find the documentation online. A360 AI was easier to use in terms of creating experiments and logging data compared to SageMaker Studio.
The documentation for A360 AI is pretty easy to understand and the use cases provided in the example repository on GitHub (https://github.com/andromeda360/a360-example-repo) are extremely helpful. SageMaker Studio barely has any examples regarding logging experiments, their documentation is quite lengthy, and logging requires using the YAML language which most Data Scientists are not familiar with.
Saving the Model:
I encountered an issue saving the model automatically on SageMaker Studio so I had to manually save the model. The following code was needed in SageMaker Studio to save and upload the model to the desired S3 bucket whereas this is automatically done in A360 AI.
Saving a Model in SageMaker Studio
4. Model Deployment
In order to deploy a model in SageMaker Studio, you first need to convert it to the joblib or pickle format. After conversion, you need to utilize boto3 to upload it to AWS. You’ll also need to prepare a prediction – or entry – script to run inference with the model and a second script to configure the deployment of the joblib/pickled model to a serving instance.
A360 AI allows data scientists to directly access the model through the dashboard UI and deploy the model with a few clicks. All the user needs to do is click “package,” then publish the model with an intuitive UI. A360 AI also encourages the peer-review process for model deployment. Sending model code for review before deployment allows teams to ensure the model they are putting into the production environment has gone through a proper QA process.
Once the model is submitted for review, the reviewer gets a notification to check the model and approve it for deployment. The reviewer does not need to go to the workspace to look for the notebooks, they are rendered as a snapshot view on the UI (see more in the snapshot view section below).
A360 AI is far better than SageMaker Studio in terms of deployment as this is done in a few clicks.
The first script below was used in Amazon SageMaker Studio for deployment in AWS. The second script was used to run inference with that deployed model.
Deployment of a Inferencing Model to AWS in SageMaker
Inference Script in AWS
In A360 AI, once the model is saved and the experiment is created, users can publish the chosen model and send it for one final review through the UI. A requirements text file is used to ensure that the proper dependancies are installed in the container with the deployed model.
Model Console in A360 AI
Packaging a Model in A360 AI
Requirements for Inferencing with Deployed Model
Making predictions with the model requires a simple prediction script.
A360 AI Prediction Script
Deploying a model is also performed through a UI in A360. In this case, I deployed my model to a Kubernetes cluster on Amazon AWS.
Deploying a Model in A360 AI
Running inference – or making predictions – can be done using a simple REST command pointed at the specific API endpoint for that deployed model. I tested the deployment and endpoint from a Jupyter notebook in A360 AI.
Testing the Endpoint in A360 AI from a Jupyter Notebook
A360 AI provides an integrated monitoring dashboard for users to monitor the cloud endpoint resource usage, availability, and hit frequency as well as data and concept drift. Amazon SageMaker Studio requires extra code to log performance metrics and does not provide a dashboard.
Resource Usage Monitoring in A360 AI
Availability Monitoring in A360 AI
Hit Frequency Monitoring in A360 AI
A360 AI’s neat UI lets data scientists monitor their model without code. Amazon SageMaker Studio requires more than 300 lines of code to monitor model deployments, which can be difficult to understand and time consuming.
A360 AI is not only designed for data scientists but also for machine learning engineers, and allows them to easily deploy models and monitor infrastructure. SageMaker Studio is primarily designed for data scientists to develop models and is lacking the same infrastructure and deployment support.
SageMaker Notebook Code for Batch Visualization of Performance
SageMaker Monitoring Script
6. Snapshot View
A360 AI also allows users to collaborate with their team and makes it easy for the team to get a snapshot view of different stages of the ML workflow. The snapshot view provides a model overview, model artifacts, and model logs, all in one place.
Snapshot View in A360 AI
7. Ease of Use
SageMaker Studio requires going through a certain procedure of communication with S3 in order to load data, update data, and deploy a model. This also requires activation of boto3. Although clear documentation with use cases have been provided for SageMaker, a new user could be intimidated by the number of new procedures used for model workflows and the number of services provided by Amazon.
A360 AI has a seamless procedure of building, logging, and deploying a model. A360 AI does not require one to have extensive knowledge about AWS and makes a data scientist’s job much easier. Also, A30 AI’s documentation is fairly organized and does not require extensive amounts of modifications in terms of deploying the model. The welcome dashboard in A360 AI also shows all of the tasks a user can perform in a intuitive and simple layout. In the image below, the dashboard menu mirrors the sidebar panes under the Data Science console, with Projects, Notebook Servers (Workspaces), Models, Data, and Monitoring.
A360 AI Main Dashboard
A360 AI outperforms Amazon SageMaker Studio in terms of usability, model development, and deployment. The entire A360 AI platform is streamlined, simpler than SageMaker, and more intuitive.
8. Workflow Time Spent
Hours Spent on Example Workflow in Each Platform
Working with A360 AI saves data scientists a significant amount of time in terms of building and deploying models compared to Amazon SageMaker Studio. A360 AI’s MDK helped me build and deploy the model in this example five times faster than Amazon SageMaker Studio due to the ease of use. It took me around 3 hours to train and deploy the model on A360 AI whereas on SageMaker the same procedure took me 15 hours.
The entire model development and deployment process on A360 AI required 300 lines of code compared to 700 lines of code on Amazon SageMaker Studio.
Lines of Code Required for Example Workflow
My personal preference would be inclined towards Andromeda 360’s A360 AI interface over Amazon’s SageMaker Studio.
A360 AI helps data scientists focus more on developing the model and tuning it rather than spending time on deployment. A360 AI also performs better in terms of reducing costs requiring IT and DevOps support.
Moreover, A360 AI gives data scientists confidence that their model is deployable in the first place and therefore can be tested and fixed quickly if any issue was found.
Although if one is already an AWS user, they do not need to transition over to another software to build and deploy their machine learning models. Instead, they could just head over to SageMaker Studio.