IPA: Building AI-Driven Kubernetes Autoscaler
-Shafin Hasnat
Feb 18, 2025
Before We Begin
Generative AI is rapidly evolving, improving its reasoning and problem-solving capabilities. Logs generated by Kubernetes applications contain valuable insights for intelligent scaling decisions. Instead of relying on predefined thresholds, AI can analyze logs, reason through the data, and suggest scaling recommendations.
This is where IPA (Intelligent Pod Autoscaler) comes in – a Kubernetes autoscaler powered by LLM-based AI. Unlike Kubernetes’ native HPA (Horizontal Pod Autoscaler) and VPA (Vertical Pod Autoscaler), which depend on manually set threshold values, IPA takes a more dynamic approach. Defining thresholds can be tricky, especially with limited understanding of traffic patterns. IPA eliminates this guesswork by externally analyzing metrics and logs, then intelligently suggesting both horizontal and vertical scaling strategies for running pods.
Introduction
The Intelligent Pod Autoscaler (IPA) is a Kubernetes operator designed to transform how applications scale in containerized environments. By combining real-time metrics from Prometheus with the analytical power of Large Language Models (LLMs), IPA introduces a smarter, more adaptive approach to scaling.
Traditional autoscalers rely on static, threshold-based rules, often leading to inefficiencies – either over-provisioning resources or failing to scale in time to meet demand. IPA eliminates these limitations by leveraging AI to analyze complex metric patterns, predict workload trends, and intelligently adjust both horizontal and vertical scaling.
Seamlessly integrating into Kubernetes clusters, IPA collects cluster-wide and application-specific metrics, feeding them into an LLM that detects subtle correlations and trends. This enables dynamic scaling recommendations that proactively adjust to workload fluctuations – whether it’s ensuring stability during traffic spikes or optimizing costs during low-demand periods.
For DevOps teams, IPA is a game-changer, delivering greater operational efficiency, enhanced application performance, and smarter resource utilization – keeping your applications always right-sized, at the right time.
Architecture and Workflow
The Intelligent Pod Autoscaler (IPA) operates as a Custom Resource Definition (CRD) in Kubernetes, enabling users to define IPA custom resources that specify deployment details for applications running in the cluster. It is possible to scale multiple target deployments with a single IPA custom resource.
Key Components of IPA Architecture:
- IPA Controller: The controller continuously collects application-specific and cluster-wide metrics from Prometheus. These metrics include CPU and memory utilization, network request rates, and overall cluster resource usage. Every minute, the controller triggers a reconciliation process, packaging this data into a POST request and sending it to the IPA Agent for analysis.
- IPA Agent: The IPA Agent can be deployed as a shared service or as a dedicated instance. This component leverages the power of Gemini LLM to analyze real-time metrics and predict optimal scaling decisions. Based on its insights, the agent generates precise scaling recommendations, determining the ideal number of pods and the appropriate resource requests and limits. The IPA Controller interacts with the IPA Agent via its
/llmagent
endpoint to fetch these recommendations. - Feedback Loop: Once the IPA Agent generates scaling recommendations, the controller applies the changes, updating the Kubernetes deployment with the optimal pod count and fine-tuned resource allocations. This automated feedback loop ensures that applications stay efficiently scaled, adapting seamlessly to demand fluctuations.
Development
Kubernetes Operator
Developing a Kubernetes operator is straightforward using the Kubebuilder framework, written in Go. The Intelligent Pod Autoscaler (IPA) uses the API group ipa.shafinhasnat.me
and API version v1alpha1
.
Default API types are defined in api/v1alpha1/ipa_types.go
and serve as the foundation for IPA’s functionality, allowing users to configure deployment details and scaling behavior. Here are the default API types of IPA operator:
type IPASpec struct {
Metadata Metadata `json:"metadata"`
}
type Metadata struct {
PrometheusUri string `json:"prometheusUri"`
LLMAgent string `json:"llmAgent"`
IPAGroup []IPAGroup `json:"ipaGroup"`
}
type IPAGroup struct {
Deployment string `json:"deployment"`
Namespace string `json:"namespace"`
Ingress string `json:"ingress,omitempty"`
}
The Reconcile method in internal/controller/ipa_controller.go is responsible for executing the reconciliation process. This method continuously monitors the cluster, collects metrics, and communicates with the IPA Agent to determine optimal scaling decisions.
To perform these tasks, Reconcile relies on functions defined in internal/agent/agent.go, which sends API calls to the in-cluster Prometheus service to collect metrics with predefined PromQL queries. These queries include collecting defined deployment replica specs, memory and CPU usage rate, node available memory, and defined ingress incoming traffic rate. Events of the defined deployment are also collected in this phase. These results are then aggregated as a string and sent to the IPA Agent in the /llmagent path as a POST request.
To run the IPA operator in a development environment, use the following command:
make install run
Running this for the first time will install Go dependencies, deploy the CRD in the local Kubernetes cluster, and start the custom resource and controller. I used Minikube to provision the cluster in the development phase.
Here are some useful commands:
# Build and push controller
make docker-build docker-push IMG=shafinhasnat/ipa:<version>
# Deploy and undeploy controller
make deploy IMG=shafinhasnat/ipa:<version>
make undeploy
# Build CRD installer
make build-installer IMG=shafinhasnat/ipa:<version>
IPA Agent
The IPA agent is a key component. The IPA controller relies on it to analyze metrics and generate scaling recommendations. It is a Flask application that interacts with the GEMINI API to provide these recommendations. IPA internally makes a POST request with metrics strings in the body to the IPA agent. To function properly, the base URL of the IPA agent must be set in spec.metadata.llmAgent within the IPA custom resource manifest. The IPA agent logs its output to the llm.log file.
The IPA agent image shafinhasnat/ipaagent is available on Docker Hub and can be used to deploy a self-hosted instance.
docker run shafinhasnat/ipaagent -e GEMINI_API_KEY=<GEMINI_API_KEY> -p 80:5000 -d
A shared IPA agent is running with base URL – https://ipaagent.shafinhasnat.me. Please refer to the IPA agent repo.
API Documentation: Environment variables to run IPA agent locally:
GEMINI_API_KEY (required)
DUMP_DATASET (optional)
API endpoints:
[GET] / – IPA agent log dashboard
[POST] /askllm – Run LLM metrics analysis with Gemini API. Body: {"metrics": <str>}
Usage
IPA requires a target deployment to autoscale, so applications must be deployed as a Deployment in a namespace. It is recommended to use a namespace other than default.
Another dependency is Prometheus. The IPA controller executes queries via Prometheus API calls, so the Prometheus service name must be specified in spec.metadata.prometheusUri within the IPA custom resource manifest.
If the deployment is a web application, IPA requires NGINX Ingress Controller metrics to collect HTTP request counts. The Ingress resource name should be set in spec.metadata.ingress in the manifest.
To set up a test application, deploy a Deployment named testapp in the ipaapp namespace, expose it using a ClusterIP service, and connect the service with an Ingress resource named testappingress. Here are kubectl commands to create the test application ecosystem:
kubectl create namespace ipaapp
kubectl create deployment testapp --image=shafinhasnat/cpuload --namespace=ipaapp --replicas=1 --port=8080
kubectl expose deployment cpuload --name=testappsvc --namespace=ipaapp --port=8001 --target-port=8080 --type=ClusterIP
kubectl create ingress testappingress --namespace=ipaapp --class=nginx --rule="cpuload.shafinhasnat.me/*=cpuload-service:8001"
Now, the application is accessible from cpuload.shafinhasnat.me host.
Setting up IPA requires creating the IPA custom resource definition in the cluster first. To set up the CRD:
kubectl apply -f https://raw.githubusercontent.com/shafinhasnat/ipa/refs/heads/main/dist/install.yaml
At this point, an up and running IPA agent is required along with the target application to scale and Prometheus. In this case, we are using the shared IPA agent mentioned above. The IPA custom resource manifest requires configuration with the correct values to function properly. Below is the IPA custom resource manifest for the test application:
apiVersion: ipa.shafinhasnat.me/v1alpha1
kind: IPA
metadata:
name: ipa
spec:
metadata:
prometheusUri: http://prometheus-server.default.svc
llmAgent: https://ipaagent.shafinhasnat.me
ipaGroup:
- deployment: testapp
namespace: ipaapp
ingress: testappingress
It is possible to specify multiple deployments under the sepc.metadata.ipaGroup path. Apply the manifest with the kubectl apply -f command, and IPA will start collecting metrics from the very moment and start taking action.
Observation
The reconciliation period for IPA is 1 minute. Every minute the IPA controller collects metrics and makes a REST call to the specified IPA agent. The real-time logs can be seen from the IPA agent dashboard on the / path. Based on the response from the IPA agent, the number of pods in the launched testapp deployment will scale up and down, and the resource request and limit will increase and decrease based on the load on the application.
A script was run to generate load on the test application for 5 minutes. Based on the load, here is the observed replica and resource scaling summary for a 10-minute span.
The scaling recommendation below was given by Gemini gemini-1.5-flash. The result may vary based on the LLM model being used. (Scaling recommendations are not present in original text, so omitted.)
Conclusion
IPA is a hobby project aimed at leveraging AI in the DevOps ecosystem. The results can be improved by fine-tuning models with custom datasets. A similar approach can also be applied to cluster scaling.
Source code:
IPA - https://github.com/shafinhasnat/ipa
IPA agent - https://github.com/shafinhasnat/ipa_llm_agent