1 minute read

Tags:

  • clkao

  • What yo will learn today

Problems

  • Setup machine learning enviroment is gard

  • Reproducible researches require lots of tooling

  • It may take lots of time(months) to apply AI/ML reserach result in pruduction

ModelOps

chart

Paradigm: Interactivity & Agile

Jupyter Architecture

image




  • 願望清單

    • CLKAO 的同事,每年玩一種語言



JupyterHub (on K8s) Architecture

jupyterhub

Container & Orchestration

  • Containers: Isolated process space, filesystem
  • Orchestrator: Decide where to run things
    • Kubernetes: declare desired state
  • Container Networking: connect (or disconnect) w/ other pod
  • Persisten Storage: when container requires persistent fs



  • docker

    • 從 linux 的 LSC 建置起來的



  • Docker and Kubernetes Orchestration

kubernetes




GPU

  • openCL 公有
  • NVIDIA 私有



aiacademy: Jupyterhub

Imgur

  • keycloak
  • Spawner

  • Juyterhub Routing

Components

  • JupyterHub + JypyterLab

    • kubespawner
    • pre-project storage, gpu quota
  • Keycloask: SSO

  • Gitlab: courses material and data management

    • Image Bilding & Registry
  • Custom DaemenSet

    • git-sync

What can possibly go wrong?

  • What can cause cascade failures?

  • SPoF!
    • storage: needs to be distributed & HA
    • hub fafilures
    • 死不透的 issues
  • hardware failure - memory, gpu card, power






The littlest JupyterHub

  • TLJH (The Littlest JupyterHub) vs. Z2JH (from zero to JupyterHub on Kubernetes)

PrimeHub

KubeCon

Futther readings

Tags:

Updated: