Skip to content

Commit

Permalink
Minor update the github io page, for KDD'24 tutorial placeholder (#310)
Browse files Browse the repository at this point in the history
* init for KDD'24 tutorial placeholder
  • Loading branch information
yxdyc authored May 7, 2024
1 parent 453db43 commit 17c2c76
Show file tree
Hide file tree
Showing 5 changed files with 122 additions and 17 deletions.
8 changes: 1 addition & 7 deletions docs/sphinx_doc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,4 @@ pip install sphinx sphinx-autobuild sphinx_rtd_theme recommonmark
mv build/html position_to_publish
```

- For convenience (you don’t have to compile from scratch again), the built
directory (including the html files) can be download as follows:
```bash
# cd docs/sphinx_doc
wget https://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/data_juicer/sphinx_API_build_0801.zip
unzip sphinx_API_build_0801.zip
```
Automatic action in github can be found in [here](https://github.com/modelscope/data-juicer/blob/main/.github/workflows/deploy_sphinx_docs.yml).
9 changes: 2 additions & 7 deletions docs/sphinx_doc/README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,5 @@ pip install sphinx sphinx-autobuild sphinx_rtd_theme recommonmark
mv build/html position_to_publish
```

- 为了方便起见(不必再次从头开始编译),可以按如下方式下载构建的目录(包括 html 文件):

```bash
# cd docs/sphinx_doc
wget https://dail-wlcb.oss-cn-wulanchabu.aliyuncs.com/data_juicer/sphinx_API_build_0801.zip
unzip sphinx_API_build_0801.zip
```
Github上的自动化部署配置可参考 [该处](
https://github.com/modelscope/data-juicer/blob/main/.github/workflows/deploy_sphinx_docs.yml).
107 changes: 107 additions & 0 deletions docs/sphinx_doc/source/_static/tutorial_kdd24.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Multi-modal Data Processing for Foundation Models: Practical Guidances and Use Cases</title>
<!-- Bootstrap -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
</head>

<body>
<nav class="navbar navbar-expand-lg navbar-light bg-light">
<a class="navbar-brand" href="#">KDD 2024 Hands-on Tutorial</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav mr-auto">
<li class="nav-item active"> <a class="nav-link" href="#">Home <span class="sr-only">(current)</span></a> </li>
<li class="nav-item"> <a class="nav-link" href="#Schedule">Schedule</a> </li>
<li class="nav-item"> <a class="nav-link" href="#Organizers">Organizers</a> </li>
</ul>
</div>
</nav>
<header>
<div class="jumbotron">
<div class="container">
<div class="row">
<div class="col-10 col-lg-12">
<h2 class="text-center">KDD 2024 Hands-on Tutorial</h2>
<h1 class="text-center"><strong>Multi-modal Data Processing for Foundation Models: Practical Guidances and Use Cases</strong></h1>
<h4 class="text-center"><em><strong>Date & Time</strong></em>: X:XX pm - Y:YY pm, August XX, 2024</h4>
<h4 class="text-center"><em><b>Location</b></em>: To be updated</h4>
</div>
</div>
</div>In the era of foundation models, the ability to process multi-modal data efficiently and effectively has become paramount.
In this tutorial, participants will dive into the essential techniques for processing multi-modal data. We will explore how large-scale high-quality data enhances model performance and introduce the open-sourced Data-Juicer system, designed to tackle the complexities of data variety, quality and scale.
Attendees will gain practical experience with Data-Juicer's operators, mastering data formatting, mapping, filtering, deduplication and selection.
A significant portion of the tutorial is dedicated to the Data-Juicer Sandbox Lab and typical use cases for static and dynamic data, including text, image, audio, and video. The lab is a playground integrated with unified models and evaluators, and facilitates experiments with data recipes that represent methodical sequences of operators and streamline the creation of scalable data processing pipelines. This experience is designed to not only solidify the concepts discussed but also to provide a space for innovation and exploration, highlighting how data recipes can be optimized and deployed in high-performance distributed environments.
<p></p>By the end of this tutorial, attendees will be equipped with the practical knowledge and skills to navigate the complexities of multi-modal data processing. They will leave with actionable knowledge with an industrial open-source system and an enriched perspective on the importance of high-quality data in AI, poised to implement sustainable and scalable solutions in their projects.
</div>
<div class="container">
<div class="row">
<div class="col-sm-6 col-lg-12">
<p class="text-justify">
</p>
<p><h5>Tutorial Slides</h5> <a href="to_be_uploaded" target="_blank"><b>slides.pdf</b></a> </p>
</div>
</div>
</div>

</header>
<section>

<div class="container">
<div class="row">
<div class="col-12 mb-2 text-center">
<h2><a id="Schedule">Schedule</a></h2>
</div>
</div>
<div class="row">
<div class="col-sm-6 col-lg-12" style="margin-bottom: 3em;">
<h6 class="text-left"><b>Date</b>: August XX, 2024</h6>
<h6 class="text-left"><b>Location</b>: To be updated.</h6>
<h6 class="text-left">(xx min) | Introduction and Overview: Multi-modal Data Processing and the
Data-Juicer System</h6>
<h6 class="text-left">(xx min) | Building Blocks of Data Processing: Data-Juicer’s Operators</h6>
<h6 class="text-left">(xx min) | Composing Atomic Capabilities: Data-Juicer’s Data Recipes</h6>
<h6 class="text-left">(xx min) | Exploring Data Recipes: The Data-Juicer Sandbox Lab</h6>
<h6 class="text-left">(xx min) | From Exploration to Production: High-Performance Data Factory</h6>
<h6 class="text-left">(xx min) | Static Data Use Cases: Text and Image Data Processing</h6>
<h6 class="text-left">(xx min) | Dynamic Data Use Cases: Video and Audio Data Processing</h6>
<h6 class="text-left">(xx min) | Conclusion and Resources</h6>
<p> </p>
</div>
</div>
</div>
<div class="container">

<div class="row">
<div class="col-lg-12 mb-4 mt-2 text-center">
<h2><a id="Organizers">Organizers</a></h2>
<h5>We are the <a href="https://github.com/modelscope/data-juicer" target="_blank">Data-Juicer</a> team from Alibaba Tongyi</h5>
<img src="https://img.alicdn.com/imgextra/i3/O1CN017Eq5kf27AlA2NUKef_!!6000000007757-0-tps-1280-720.jpg" width = "640" height = "360" alt="Data-Juicer"/>

</div>
</div>
</div>


</section>
<div class="container"> </div>
<footer class="text-center">
<div class="container">
<div class="row">
<div class="col-12"> </div>
</div>
</div>
</footer>
<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script src="js/jquery-3.2.1.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="js/popper.min.js"></script>
<script src="js/bootstrap-4.0.0.js"></script>
</body>
</html>
2 changes: 1 addition & 1 deletion docs/sphinx_doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'data_juicer'
copyright = '2023, Data-Juicer Team'
copyright = '2024, Data-Juicer Team'
author = 'Data-Juicer Team'

# The theme to use for HTML and HTML Help pages. See the documentation for
Expand Down
13 changes: 11 additions & 2 deletions docs/sphinx_doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,19 @@
Welcome to data-juicer's documentation!
=======================================

Tutorial
---------------------------------------

We will give a tutorial on KDD'24, Multi-modal Data Processing for Foundation Models: Practical Guidances and Use Cases, see more details `here <_static/tutorial_kdd24.html>`_!





.. toctree::
:maxdepth: 2
:glob:
:caption: Data-Juicer API Reference
:caption: API Reference

data_juicer.core
data_juicer.ops
Expand All @@ -22,7 +31,7 @@ Welcome to data-juicer's documentation!
data_juicer.config
data_juicer.format

Indices and tables
Indices and Tables
==================

* :ref:`genindex`
Expand Down

0 comments on commit 17c2c76

Please sign in to comment.