This topic is very timely as the 2022 World Cup is happening now in Qatar! This data science use case was done by Ayesha Farheen, Mohammed Khabab, and Umme Salma.
Introduction
“The Power of A360 AI” is a series of articles written by Andromeda 360, Inc. data scientists where we showcase machine learning-powered business applications we have built for our clients to improve business outcomes. A360 AI is an open AI delivery platform that enables enterprises to build and deploy machine learning (ML) models securely into production within minutes.
Read this post to find out why A360 AI capabilities are more effective than most cloud-specific AI platforms such as AWS SageMaker. A360 AI Community Edition is freely available to data scientists. You can sign up here and try these sample use cases yourself.
Business Use Case
Football clubs spend a massive amount of money every year to buy professional football players during the transfer window and predicting player value in the transfer market is one of the difficult tasks for managers of the club. This project aims to accurately predict the market values for FIFA players, which can be used as a baseline to simplify the negotiation process and estimate a player’s market value in an objective quantitative way.
Dataset
The dataset used here is scraped from sofifa. The dataset contains 18,179 players information. There are around 74 features which include their height, weight, age, preferred foot, skill moves, skill ratings, international reputation etc., each skill ratings are subcategorized into domains which are scored from 0 to 100. The skill ratings are therefore web scraped by taking the mean of their domains. Skills rating of footballers and their respective domains are gives as:
- Ball Skills: Ball Control, Dribbling
- Passing: Crossing, Short Pass, Long Pass
- Defense: Marking, Slide Tackle, Stand Tackle
- Mental: Aggression, Reactions, Attack Position, Interceptions, Vision, Composure
- Physical: Acceleration, Stamina, Balance, Sprint Speed, Agility, Jumping
- Shooting: Heading, Short Power, Finishing, Long Shots, Curve, Free Kick, Accuracy, Penalties, Volleys
- Goalkeeping: Positioning, Diving, Handling, Kicking, Reflexes
Workflow
In this project, we followed the standard data science workflow:
- Exploratory Data Analysis (EDA)
- Data preprocessing and feature Engineering
- Model training, experiments, and evaluation
- Model deployment
- Inference
- Monitoring
Note that the notebooks for each step above have been uploaded in the A360 AI example GitHub repository
1. Exploratory data analysis
Since the business values and goals are clear, the first step is to clean the dataset and generate visualization from the data and see if we can find some insights. After cleaning we generate the visualization, look at the plots below.

Figure 1
(1) Players’ Age is an important indicator of market value, as it reflects both experience and ability. We used age factor to estimate market value, bearing in mind that players’ values usually increase until their mid- 20s and decrease thereafter. From the plot we found that maximum of the player’s age falls in the range of 18-22 and we observe that a younger player with few years of experience costs more than the older one. If a player is already close to retirement, he has fewer years left to keep performing. Hence the market value decreases with respect to age.

Figure 2
(2) Not only is the talent of the players crucial in determining the market value, but also the popularity and “superstar status”. In other word the market value of football player also depends on their crowd-pulling power, independent of how good they are, the image of a player outside the football pitch influences the number of jerseys sold and money earned from portrait rights. As from the Figure (2) we can see that 5-star and 4-star rated players have much more value than the players with less international reputation. The popularity of athletes has a commercial value. This is particular important for a club. Even though players like Messi, Ronaldo and Ibrahimovic are getting closer to retirement, their brand value is still very high as they have built up an international status during their career. Everybody knows their face and this gives them an extra munition when negotiating sponsor deals with famous commercial brands.

Figure 3
(3) Player positions are an important factor in determining the market value of football players, Goalkeeper, Defender, Midfielder, and Forward player is important in estimating market value. Player position affects salaries and transfer fees, they reflect a player’s degree of specialization and their ability to attract fans. Attackers receive higher attention and rewards than goalkeepers as the attackers are more visible to the crowd and thus have a greater capacity to attract the crowd.

Figure 4
(4) Figure 4 shows value with respect to overall rating of players and we observe that value increases with overall rating of the players but we have many players with similar overall ratings in such case we look for the age of the players; younger players get the high values although having similar overall rating with older one.

Figure 5
(5) Figure 5 shows the five highest valued players in the dataset.
2. Data Preprocessing
From EDA we can observe that, how a player’s market value increases or decreases based on the features discussed in Exploratory Data Analysis. After that we encode the categorical columns. Since there is 70+ features in the dataset and all are in numerical, the main obstacle is that to find what are the most important features that describe the value of a player, for that we used feature importance technique and reduce the dimensionality and kept only the most important ones.
Highlight of A360 AI functionality
In A360 AI platform, we provide an easy-to-use API, called MDK (model development kit), to connect the cloud storage with the working Jupyter Lab environment. If you were in SageMaker, you would have to use a Python package boto3 and write at least 10-20 lines of the code to download/upload your data from/to an S3 bucket. With A360 MDK API (a360ai), you can directly load the csv file from S3 and write the feature engineered dataframe back to S3 with just one line of the code, such as a360ai.load_dataset and a360ai.write_data.
3. Model training
After preprocessing, we split the preprocessed data into train and test in the ratio of 80:20 respectively, and applied various ML Algorithms, such as Decision Tree, Random Forest, XGBoost, and an autoML technique AutoKeras. Among all the algorithms we got the highest accuracy for Random Forest, as we can see in Figure 6 which is 0.97 R2 score.

Figure 6
Highlight of A360 AI functionality
With A360 MDK, you can easily track your model experiment and hyperparameter tuning. By adding a few lines of the code to log your hyperparameters and metrics, MDK will track your model experiment and provide a clean table to show you the metrics corresponding to the hyperparameters, so you can quickly see which model has the best result.
4. Model Deployment
In A360 AI platform, deployment is fast and easy. You only need to set a final run with our MDK API in the modeling notebook. Then A360 AI’s packaging technology, called Starpack, will fetch the model artifacts and training data baseline to package the model as a Docker container. This container is then deployed automatically into a scalable, secure Kubernetes pod as a cloud endpoint REST API. We only need to do a few clicks on the platform UI (A360 Deployment Hub). Below are a few screenshots of deployment process on the UI. During the deployment process, saving the endpoint API key is a requirement step. The API key is required to invoke the cloud endpoint. The whole deployment process only took about 5 minutes to complete.

Fig-7: A360 platform for deployment
Highlight of A360 AI functionality
Starpack is the key technology of A360 AI as Model Deployment as Code (MDaC), building upon the concepts of Terraform which is an Infrastructure as Code (IaC) approach. Starpack utilizes the declarative language (YAML specification) to automatically deploy ML models leveraging GitOps. Along with a UI console, A360 AI completely abstracts the infrastructure complexity for ML deployment from data scientists.
In addition, we have created a web app with Streamlit and deployed it on Heroku.
5. Inference
Once our REST API is available, we can invoke it with new input data. The endpoint URL can be easily retrieved from the Deployment Hub UI. In the notebook, we simply utilized Python request and use API key to send input data as JSON format to the endpoint, and got the prediction result back. The inference process on A360 AI platform is very straightforward.
6. Monitoring - Data Drift
After the model is deployed, it is a big milestone that the model can actually be used in the business application. However, the job is not done yet. Data scientists would want to closely monitor their model performance as the new data will continue coming in.
Highlight of A360 AI functionality
A360 AI has a pre-built monitoring dashboard that helps data scientists to monitor the data drift. The metric sigma (mean value of the standard deviation of the training data) is defaulted in the dashboard to monitor the data drift. If the sigma value is over 2-3, it flags the data drift and data scientists should examine the new incoming data and see if the model re-training is required.
Conclusion
Here we showcase a business use case, estimate a player’s market value in an objective quantitative way, and walk you through the data science process you can take in A360 AI platform to tackle this problem. We also demonstrate how our MDK and Starpack can make data scientists more efficient in building and deploying ML models as well as processing data from the cloud and monitoring the data drift.