Skip to content

[Initiative]: Cloud Native AI Scheduling Challenges Whitepaper #1641

@raravena80

Description

@raravena80

Name

Cloud Native AI Scheduling Challenges Whitepaper

Short description

Whitepaper about the scheduling challenges for AI/ML workloads in Cloud Native environments

Responsible group

TOC

Does the initiative belong to a subproject?

Yes

Subproject name

Cloud Native AI Working Group

Primary contact

@raravena80

Additional contacts

@zanetworker
@ronaldpetty

Initiative description

https://docs.google.com/document/d/1KNmTKwI_cRXZ0KVBqdBhkO1EuS4PhLIUvT16Y2a5erU/edit?tab=t.0#heading=h.l5opvu2gvmzq

This paper aims to enumerate and educate the various challenges and opportunities regarding optimizing resource allocation (aka scheduling) for Cloud Native Artificial Intelligence (CNAI) workloads. Cloud Native allows easy scaling of resources, making it ideal for AI workloads of two types: training and inference. A standard Cloud Native scheduler like the one provided with Kubernetes is, by default, better suited for microservice-type workloads and not yet for AI-related workloads.

Deliverable(s) or exit criteria

Final draft version to be handed off to the CNCF publishing staff.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/initiativeAn initiative or an item related to imitative processes

    Type

    No type

    Projects

    Status

    New

    Status

    status/new

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions