Skip to content

Commit

Permalink
added acknowledgement in homepage (#77)
Browse files Browse the repository at this point in the history
* minor improvement for homepage

* minor fix
  • Loading branch information
yxdyc authored Nov 16, 2023
1 parent 461b681 commit afe06dc
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 4 deletions.
23 changes: 21 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ Data-Juicer is a one-stop data processing system to make data higher-quality,
juicier, and more digestible for LLMs.
This project is being actively updated and maintained, and we will periodically enhance and add more features and data recipes. We welcome you to join us in promoting LLM data development and research!

If you find Data-Juicer useful for your research or development, please kindly
cite our [work](#references).


----

## News
Expand Down Expand Up @@ -62,7 +66,8 @@ Table of Contents
* [Demos](#demos)
* [License](#license)
* [Contributing](#contributing)
* [References](#references)
* [Acknowledgement](#acknowledgement)
* [References](#references)

## Features

Expand Down Expand Up @@ -315,10 +320,24 @@ docker exec -it <container_id> bash
Data-Juicer is released under Apache License 2.0.

## Contributing
We greatly welcome contributions of new features, bug fixes, and discussions. Please refer to [How-to Guide for Developers](docs/DeveloperGuide.md).
We are in a rapidly developing field and greatly welcome contributions of new
features, bug fixes and better documentations. Please refer to
[How-to Guide for Developers](docs/DeveloperGuide.md).

Welcome to join our [Slack channel](https://join.slack.com/t/data-juicer/shared_invite/zt-23zxltg9d-Z4d3EJuhZbCLGwtnLWWUDg?spm=a2c22.12281976.0.0.7a8253f30mgpjw), or [DingDing group](https://qr.dingtalk.com/action/joingroup?spm=a2c22.12281976.0.0.7a8253f30mgpjw&code=v1,k1,C0DI7CwRFrg7gJP5aMC95FUmsNuwuKJboT62BqP5DAk=&_dt_no_comment=1&origin=11) for discussion.

## Acknowledgement
Data-Juicer is used across various LLM products and research initiatives,
including industrial LLMs from Alibaba Cloud's Tongyi, such as Dianjin for
financial analysis, and Zhiwen for reading assistant, as well as the Alibaba
Cloud's platform for AI (PAI).
We look forward to more of your experience, suggestions and discussions for collaboration!

Data-Juicer thanks and refers to several community projects, such as
[Huggingface-Datasets](https://github.com/huggingface/datasets), [Bloom](https://huggingface.co/bigscience/bloom), [RedPajama](https://github.com/togethercomputer/RedPajama-Data), [Pile](https://huggingface.co/datasets/EleutherAI/pile), [Alpaca-Cot](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT), [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), [DeepSpeed](https://www.deepspeed.ai/), [Arrow](https://github.com/apache/arrow), [Ray](https://github.com/ray-project/ray), [Beam](https://github.com/apache/beam), [LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness), [HELM](https://github.com/stanford-crfm/helm), ....



## References
If you find our work useful for your research or development, please kindly cite the following [paper](https://arxiv.org/abs/2309.02033).
```
Expand Down
17 changes: 16 additions & 1 deletion README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@
Data-Juicer 是一个一站式数据处理系统,旨在为大语言模型 (LLM) 提供更高质量、更丰富、更易“消化”的数据。
本项目在积极更新和维护中,我们将定期强化和新增更多的功能和数据菜谱。欢迎您加入我们推进 LLM 数据的开发和研究工作!

如果Data-Juicer对您的研发有帮助,请引用我们的[工作](#参考文献)


----

## 新消息
Expand Down Expand Up @@ -61,6 +64,7 @@ Data-Juicer 是一个一站式数据处理系统,旨在为大语言模型 (LLM
* [演示样例](#演示样例)
* [开源协议](#开源协议)
* [贡献](#贡献)
* [致谢](#致谢)
* [参考文献](#参考文献)

## 特点
Expand Down Expand Up @@ -299,10 +303,21 @@ Data-Juicer 在 Apache License 2.0 协议下发布。

## 贡献

我们非常欢迎贡献新功能、修复漏洞以及讨论。请参考[开发者指南](docs/DeveloperGuide_ZH.md)。
大模型是一个高速发展的领域,我们非常欢迎贡献新功能、修复漏洞以及文档改善。请参考[开发者指南](docs/DeveloperGuide_ZH.md)。

欢迎加入我们的[Slack channel](https://join.slack.com/t/data-juicer/shared_invite/zt-23zxltg9d-Z4d3EJuhZbCLGwtnLWWUDg?spm=a2c22.12281976.0.0.7a8275bc8g7ypp), 或[DingDing群](https://qr.dingtalk.com/action/joingroup?spm=a2c22.12281976.0.0.7a8275bc8g7ypp&code=v1,k1,C0DI7CwRFrg7gJP5aMC95FUmsNuwuKJboT62BqP5DAk=&_dt_no_comment=1&origin=11) 。

## 致谢

Data-Juicer 被各种 LLM产品和研究工作使用,包括来自阿里云-通义的行业大模型,例如点金
(金融分析),智文(阅读助手),还有阿里云人工智能平台 (PAI)。 我们期待更多您的体验反馈、建议和合作共建!


Data-Juicer 感谢并参考了社区开源项目:
[Huggingface-Datasets](https://github.com/huggingface/datasets), [Bloom](https://huggingface.co/bigscience/bloom), [RedPajama](https://github.com/togethercomputer/RedPajama-Data), [Pile](https://huggingface.co/datasets/EleutherAI/pile), [Alpaca-Cot](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT), [Megatron-LM](https://github.com/NVIDIA/Megatron-LM), [DeepSpeed](https://www.deepspeed.ai/), [Arrow](https://github.com/apache/arrow), [Ray](https://github.com/ray-project/ray), [Beam](https://github.com/apache/beam), [LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness), [HELM](https://github.com/stanford-crfm/helm), ....



## 参考文献
如果您发现我们的工作对您的研发有帮助,请引用以下[论文](https://arxiv.org/abs/2309.02033) 。

Expand Down
2 changes: 1 addition & 1 deletion thirdparty/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Dependencies of Auto Evaluation Toolkit, see [`tools/evaluator/README.md`](../to
## Installation

The auto-evaluation toolkit requires customized [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) and [HELM](https://github.com/stanford-crfm/helm).
To avoid dependency problems when installing those packages, we recommand using NGC's PyTorch container (`nvcr.io/nvidia/pytorch:22.12-py3`).
To avoid dependency problems when installing those packages, we recommend using NGC's PyTorch container (`nvcr.io/nvidia/pytorch:22.12-py3`).
Assuming the path to your shared file system (where your data and model checkpoints are saved) is `/mnt/shared`, start the docker container with following commands.

```shell
Expand Down

0 comments on commit afe06dc

Please sign in to comment.