[Akash DataLab] -- Google Colab like Client for Akash Notebook Deployment with native GPU acceleration and Git-like versioning of data. #609

dominikusbrian · 2024-06-22T02:44:31Z

dominikusbrian
Jun 22, 2024

Introduction

Nowadays, one of the best way to interact with AI Agents for development or data analytics purposed are through Notebook Interface. This commonly performed through Google Colab (most people) and self-hosted with Jupyter Notebook on JupyterHub / JupyterLab, or sometimes through services like Binder (mainly academia) and many more. The notebooks can simply sit on top of a CPU infra and sometimes when needed, GPU too.

Traditionally, all this is simply done in a centralized server, and key or access be shared between known group of people. This approach however, restric the availability and accessibily of the notebooks to limited group of people. Manual replication, by copying the .ipynb file and reproducing it elsewhere often troubled with incoherent infrastructure or settings, which user need to adjust properly. For developers, this environment management is part of daily lives, but unfortunately for most people this create a techical barrier of access and is hindering further progress in usage of valuable data.

In light of recent development in AI, big data has become even bigger and with the rising demand for data used for LLM-tuning, training, and other GPU-accelerated ML model developments. Along with this, there's a strong group of people that are working hard to make sure there's a better distribution of this opportunity/market outside the Centralized Big-Tech like Google, Meta, and so on. the need for better infrastructure that facilitate this Decentralized AI and Decentralized Data interaction in one place is apparent. This is the original conception of the need for Akash DataLab -- A Google Colab like Client for Akash Notebook Deployment with native GPU acceleration and Git-like versioning of data.

On the data side, as a pain point, this led to the "availability" and "accessibility" of pre/post processed data.

If that pain-point is lifted up, progress in usability and high-quality consumption of data will be accelerated by much. Moreover, this days, some, of the ETL (extract, transform, load) and other basic data operations can readily be assisted by LLM-Driven AI Agent(s) either as standalone support tools, or perhaps as companion in exploring the data. Even better, if this AI Agent(s) is groomed specifically to be the guardian of the given dataset, therefore knowledgeable of analysis and things been done by others on the specific dataset. User can simply goes on from there to validate, improve, or utilize the previous progress to make further advancement.

So instead of simply having access to the finalized , fully-trained AI, for instance, user/developer can simply "fork" the training pipeline, and modify it directly. This transparency and reproducibility, on top of decentralized feature of the infrastructure (if this built on Akash and other decentralized supercloud).

The above scenario actually works even better for dataset that are of public domain, and analysis performed in the open (both instantly or eventually). For other private cases, the AI Data Guardian should be trained/instructed as such, that they regulate who have access to what, and so on.

Proposed Architecture

To initiate the possible infrastructure to be used for building this, one can look toward integrating 3 main components:

[UI] The Automated Deployment of Jupyter Notebook/Lab that are well protected and can be done in a self-service way
[Kernel] Development of GPU accelerated Notebook Environment
[Data Backend] Git for Data versioning that keep track of any shared data (public/private) much like GitHub but more data-centric

more on this, along with illustration of tech-stack and such will be updated and shared here.

Discussions

[UI]

This should be one-page sefl-explanatory about how the process goes and user are just one-click away from getting their deployment, up and running. Below is an example of the quick link ,from rapids ai (the team that develop cuDF and other exciting AI tools)

[Kernel]

Perhaps done with curating series of template, having akash SDL for specific infra. So after user click the deployment link and put in their specific request much like how this is done in Digital Ocean's Paperspace or Hugging Face Space, where user specify what kind of application they want to run (unlike the usual cloud option which ask for what spec/infra to deploy). The kernel is then being prepared automatically through a specific combination of Akash SDL and and pre-installing it with some python kernel configured for the application for that specific SDL deployment (so the cuda version, and other hardware related dependencies are checked).

User can simply interact and modify the kernel into something of their own. The kernel should be available for user to store and use at later time. On the backend, depending on how long is the idle duration, the kernel can then be killed and the updated config stored for future use.

This can help deploying a GPU-accelerated AI/Data application, like that of using cuDF's with Google Colab becoming reality on Akash

[Data Backend]

Version-controlled DB integration between the data that lives across different deployments. But also want it to be flexible enough, so I can inject or extract any schema or data forms. TiDB is an interesting new player, in what was dominated by Databrick, Snowflake, and so on.

On the versioning aspect, Dolt is interesting ecosystem to work with and integrate their DoltHub DoltLab, and other tools hosted on Akash, and provided as free service (at least up to certain level of usage)

Ecosystem Review

This section will cover existing solution (mostly in Web2) that does all the above , something similar, or part of the desired features.

Do some survey on the Web3 initiatives on the above components and so on.

Recent development of AI stemming from Decentralized Storage infrastructure Arweave called the AO network, along with its AOS (decentralized operating system) also is an interesting to track on. https://cookbook_ao.g8way.io/welcome/getting-started.html

About Naming and Branding

Not much thought here for now, picked Akash DataLab on top of my head just to catch attention and illustrate what the final platform might look/sounds like. Other option, maybe something like Co-DataLab or something ? Co for collaboration, to emphasize that the main goal is indeed for open-collaboration.

anilmurty · 2024-06-27T18:05:15Z

anilmurty
Jun 27, 2024
Maintainer

Hey @dominikusbrian - thanks for writing this up. This is definitely a very important use case and market for us to go after. That said, I'll say there actually already are a couple ways for users of Akash to run Jupyter notebooks:

There are a few templates in awesome-akash/ console for doing this
There is a first-class service built by https://brev.dev/ that uses Akash GPUs (H100s, A100s, V100s, 4090s, A6000s) - have you tried it? It's really slick!

Not dissuading you from pursuing your inititative but want to make sure you are aware of the above things. If you still want to pursue this, I think this needs a lot more detail in terms of what will be built, cost, timelines etc

1 reply

dominikusbrian Jul 5, 2024
Author

Hi @anilmurty yes, both of them are great. Tried both the jupyterlab (the best for personal use) and jupyterhub (the go to option for providing notebook for others) deployment on akash in my Hackathon participation few months ago, also the brev.dev one is great especially for quick deployment, their notebook environment is very cool (though usage can start to become a bit expensive).

Sure, I will refine this properly, to see the necessity. For now the key things that missing from the jupyterhub solution is the ease of data storage setup and ssh terminal access on the backend side of thing. Brev.dev is a good packaged solution but not quite tunable for different use cases that are not production ready and a bit more messy. For instance, parallel multiparameters scientific computations and something on that dimension.

Recently been talking to other tech ecosystems that has built a similar solution to this, most likely will merge this discussion and effort toward a more promising and realistic integration of DeepSquare with Akash https://github.com/orgs/akash-network/discussions/623. They have proven their tech stack and system for building decentralized HPC and could benefit more from compute resources available on Akash.

Users like me that are previously actively building and using traditional HPC in University and Company, can also have access to the usual SLURM parallel job and workflow control, for our scientific computations and AI Sandbox, but this time on decentralized Akash Network and with affordable pricing. I will close this discussion and work toward integrating the proposed component here into a more holistic Decentralized HPC use case on Akash.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Akash Network

[Akash DataLab] -- Google Colab like Client for Akash Notebook Deployment with native GPU acceleration and Git-like versioning of data. #609

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Akash Network

[Akash DataLab] -- Google Colab like Client for Akash Notebook Deployment with native GPU acceleration and Git-like versioning of data. #609

dominikusbrian Jun 22, 2024

Introduction

Proposed Architecture

Discussions

[UI]

[Kernel]

[Data Backend]

Ecosystem Review

About Naming and Branding

Replies: 1 comment · 1 reply

anilmurty Jun 27, 2024 Maintainer

dominikusbrian Jul 5, 2024 Author

dominikusbrian
Jun 22, 2024

Replies: 1 comment 1 reply

anilmurty
Jun 27, 2024
Maintainer

dominikusbrian Jul 5, 2024
Author