Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add models from Hugging Face/transformers from MLAgility - part 1 #615

Merged
merged 63 commits into from
Jul 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
b326b3c
popular_on_huggingface/bert-base-uncased.py
jcwchen Jun 21, 2023
510f609
add transformers models
jcwchen Jun 21, 2023
fede10e
remove gpt1 and gpt2 for now
jcwchen Jun 22, 2023
97de789
config
jcwchen Jun 22, 2023
3af24e7
get model name from build_dir
jcwchen Jul 11, 2023
543ba43
find_model_hash_name
jcwchen Jul 11, 2023
8e41e92
subprocess.PIPE
jcwchen Jul 11, 2023
5dda38a
new models
jcwchen Jul 12, 2023
22f6123
7 models
jcwchen Jul 12, 2023
7dbf731
only keep 4
jcwchen Jul 12, 2023
2ce4161
remove 4
jcwchen Jul 12, 2023
ee483be
remove albert-base-v2
jcwchen Jul 12, 2023
e1690be
del model and sess
jcwchen Jul 13, 2023
d466f26
check_path
jcwchen Jul 13, 2023
777f0a6
drop models in CI
jcwchen Jul 13, 2023
94cd553
add bert_generation
jcwchen Jul 13, 2023
c865e0f
--binary
jcwchen Jul 13, 2023
6426b71
Merge branch 'jcw/add-hgf' of https://github.com/jcwchen/models into …
jcwchen Jul 13, 2023
f29974b
disable bert_generation.py
jcwchen Jul 13, 2023
da253f3
no binary
jcwchen Jul 13, 2023
5d1fc5f
cancel in progress
jcwchen Jul 13, 2023
fcffd4a
binary
jcwchen Jul 13, 2023
8272854
minimal
jcwchen Jul 13, 2023
db26881
--mini
jcwchen Jul 14, 2023
ebfbc04
manually check
jcwchen Jul 14, 2023
9f528a9
only keep
jcwchen Jul 14, 2023
b1c9ae4
run_test_dir
jcwchen Jul 14, 2023
5c42644
coma
jcwchen Jul 14, 2023
9db1f1d
cache_converted_dir = "~/.cache"
jcwchen Jul 14, 2023
19a9851
delete and clean cache
jcwchen Jul 14, 2023
3fc4703
clean
jcwchen Jul 14, 2023
240c30c
clean all
jcwchen Jul 14, 2023
abe2b16
only clean
jcwchen Jul 14, 2023
f36b21b
--cache-dir", cache_converted_dir
jcwchen Jul 14, 2023
ceefb83
disable openai_clip-vit-large-patch14
jcwchen Jul 14, 2023
da4cfa9
disable
jcwchen Jul 14, 2023
e05446a
only keep 4
jcwchen Jul 14, 2023
dd2c86a
comma
jcwchen Jul 14, 2023
51e2e09
runs-on: macos-latest
jcwchen Jul 14, 2023
abefcb0
not using conda
jcwchen Jul 14, 2023
9dfe0ed
final_model_path
jcwchen Jul 15, 2023
63f2b00
git-lfst pull dir
jcwchen Jul 15, 2023
288d29a
git diff
jcwchen Jul 15, 2023
22ede0b
Merge branch 'new-models' into jcw/add-hgf
jcwchen Jul 18, 2023
7c34b5a
use onnx.load to compare
jcwchen Jul 20, 2023
35a3ca8
test_utils.pull_lfs_file(final_model_path)
jcwchen Jul 21, 2023
20817c2
only test changed models
jcwchen Jul 24, 2023
a1483a8
test_utils
jcwchen Jul 24, 2023
d126f3b
get_cpu_info
jcwchen Jul 24, 2023
4495108
ext names
jcwchen Jul 24, 2023
5acb493
test_utils.get_changed_models()
jcwchen Jul 24, 2023
0a2ae7c
compare 2
jcwchen Jul 24, 2023
1ddb303
fix init
jcwchen Jul 24, 2023
cd3e260
transformers==4.29.2
jcwchen Jul 24, 2023
8ad0a61
test
jcwchen Jul 24, 2023
251d682
initializer
jcwchen Jul 24, 2023
c9b2756
update bert-generation
jcwchen Jul 24, 2023
c5178b0
fixed numpy
jcwchen Jul 24, 2023
9dafea5
print(f"initializer {k}")
jcwchen Jul 24, 2023
3774ffa
update bert from mac
jcwchen Jul 24, 2023
1de728d
remove bert-generation
jcwchen Jul 24, 2023
450e2ad
mlagility_subdir_count number
jcwchen Jul 24, 2023
3cd5bba
remove unused onnx
jcwchen Jul 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ on:
schedule:
- cron: '31 11 * * 4'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
analyze:
name: Analyze
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/linux_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ on:
pull_request:
branches: [ main, new-models]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
# This workflow contains a single job called "build"
build:
Expand Down
17 changes: 10 additions & 7 deletions .github/workflows/mlagility_validation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,23 @@ on:
pull_request:
branches: [ main, new-models]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest
runs-on: macos-latest
strategy:
matrix:
python-version: ['3.8']
python-version: ["3.8"]

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@c85c95e3d7251135ab7dc9ce3241c5835cc595a9 # v3.5.3
name: Checkout repo
- uses: conda-incubator/setup-miniconda@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@61a6322f88396a6271a6ee3565807d608ecaddd1 # v4.7.0
with:
miniconda-version: "latest"
activate-environment: mla
python-version: ${{ matrix.python-version }}

- name: Install dependencies and mlagility
Expand All @@ -34,4 +37,4 @@ jobs:
run: |
# TODO: remove the following after mlagility has resovled version contradict issue
pip install -r models/mlagility/requirements.txt
python workflow_scripts/run_mlagility.py
python workflow_scripts/run_mlagility.py --drop
4 changes: 4 additions & 0 deletions .github/workflows/windows_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ on:
pull_request:
branches: [ main, new-models]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
# This workflow contains a single job called "build"
build:
Expand Down
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
2 changes: 2 additions & 0 deletions models/mlagility/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
numpy==1.24.4
torch==2.0.1
torchvision==0.15.2
transformers==4.29.2
3 changes: 2 additions & 1 deletion workflow_scripts/check_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ def has_vnni_support():

def run_onnx_checker(model_path):
model = onnx.load(model_path)
onnx.checker.check_model(model, full_check=True)
del model
Fixed Show fixed Hide fixed
onnx.checker.check_model(model_path, full_check=True)


def ort_skip_reason(model_path):
Expand Down
3 changes: 1 addition & 2 deletions workflow_scripts/generate_onnx_hub_manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@
import onnx
from onnx import shape_inference
import argparse
from test_models import get_changed_models
from test_utils import pull_lfs_file
from test_utils import get_changed_models, pull_lfs_file


# Acknowledgments to pytablereader codebase for this function
Expand Down
11 changes: 11 additions & 0 deletions workflow_scripts/mlagility_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,15 @@
"torch_hub/densenet121.py",
"torch_hub/inception_v3.py",
"torch_hub/googlenet.py",
#"transformers/bert_generation.py", # non consistent created model from mlagility
#"popular_on_huggingface/bert-base-uncased.py",
#"popular_on_huggingface/xlm-roberta-large.py",
#"popular_on_huggingface/bert-large-uncased.py",
"popular_on_huggingface/openai_clip-vit-large-patch14.py",
#"popular_on_huggingface/xlm-roberta-base.py", # output nan
#"popular_on_huggingface/roberta-base.py", # output nan
"popular_on_huggingface/distilbert-base-uncased.py",
#"popular_on_huggingface/distilroberta-base.py", # output nan
"popular_on_huggingface/distilbert-base-multilingual-cased.py",
#"popular_on_huggingface/albert-base-v2", # Status Message: indices element out of data bounds, idx=8 must be within the inclusive range [-2,1]
]
49 changes: 34 additions & 15 deletions workflow_scripts/run_mlagility.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import subprocess
import sys
import ort_test_dir_utils
import test_utils
Fixed Show fixed Hide fixed


def get_immediate_subdirectories_count(dir_name):
Expand All @@ -21,7 +22,7 @@ def find_model_hash_name(stdout):
line = line.replace("\\", "/")
# last part of the path is the model hash name
return line.split("/")[-1]
raise Exception(f"Cannot find Build dir in {stdout}.")
raise Exception(f"Cannot find Build dir in {stdout}.")


ZOO_OPSET_VERSION = "16"
Expand All @@ -33,34 +34,45 @@ def find_model_hash_name(stdout):


def main():
# caculate first; otherwise the directories might be deleted by shutil.rmtree
mlagility_subdir_count = get_immediate_subdirectories_count(mlagility_models_dir)

parser = argparse.ArgumentParser(description="Test settings")

parser.add_argument("--all_models", required=False, default=False, action="store_true",
help="Test all ONNX Model Zoo models instead of only chnaged models")
parser.add_argument("--create", required=False, default=False, action="store_true",
help="Create new models from mlagility if not exist.")
parser.add_argument("--drop", required=False, default=False, action="store_true",
help="Drop downloaded models after verification. (For space limitation in CIs)")
parser.add_argument("--skip", required=False, default=False, action="store_true",
help="Skip checking models if already exist.")


args = parser.parse_args()
errors = 0

changed_models_set = set(test_utils.get_changed_models())
print(f"Changed models: {changed_models_set}")
for model_info in models_info:
directory_name, model_name = model_info.split("/")
_, model_name = model_info.split("/")
model_name = model_name.replace(".py", "")
model_zoo_dir = model_name
print(f"----------------Checking {model_zoo_dir}----------------")
final_model_dir = osp.join(mlagility_models_dir, model_zoo_dir)
final_model_name = f"{model_zoo_dir}-{ZOO_OPSET_VERSION}.onnx"
final_model_path = osp.join(final_model_dir, final_model_name)
if not args.all_models and final_model_path not in changed_models_set:
print(f"Skip checking {final_model_path} because it is not changed.")
continue
if osp.exists(final_model_path) and args.skip:
print(f"Skip checking {model_zoo_dir} because {final_model_path} already exists.")
continue
try:
print(f"----------------Checking {model_zoo_dir}----------------")
final_model_dir = osp.join(mlagility_models_dir, model_zoo_dir)
final_model_name = f"{model_zoo_dir}-{ZOO_OPSET_VERSION}.onnx"
final_model_path = osp.join(final_model_dir, final_model_name)
if osp.exists(final_model_path) and args.skip:
print(f"Skip checking {model_zoo_dir} because {final_model_path} already exists.")
continue
cmd = subprocess.run(["benchit", osp.join(mlagility_root, model_info), "--cache-dir", cache_converted_dir,
"--onnx-opset", ZOO_OPSET_VERSION, "--export-only"],
cwd=cwd_path, stdout=subprocess.PIPE,
stderr=sys.stderr, check=True)
model_hash_name = find_model_hash_name(cmd.stdout)
print(model_hash_name)
mlagility_created_onnx = osp.join(cache_converted_dir, model_hash_name, "onnx", model_hash_name + base_name)
if args.create:
ort_test_dir_utils.create_test_dir(mlagility_created_onnx, "./", final_model_dir)
Expand All @@ -75,14 +87,21 @@ def main():
except Exception as e:
errors += 1
print(f"Failed to check {model_zoo_dir} because of {e}.")

if args.drop:
subprocess.run(["benchit", "cache", "delete", "--all", "--cache-dir", cache_converted_dir],
cwd=cwd_path, stdout=sys.stdout, stderr=sys.stderr, check=True)
subprocess.run(["benchit", "cache", "clean", "--all", "--cache-dir", cache_converted_dir],
cwd=cwd_path, stdout=sys.stdout, stderr=sys.stderr, check=True)
shutil.rmtree(final_model_dir, ignore_errors=True)
shutil.rmtree(cache_converted_dir, ignore_errors=True)
total_count = len(models_info) if args.all_models else len(changed_models_set)
if errors > 0:
print(f"All {len(models_info)} model(s) have been checked, but {errors} model(s) failed.")
print(f"All {total_count} model(s) have been checked, but {errors} model(s) failed.")
sys.exit(1)
else:
print(f"All {len(models_info)} model(s) have been checked.")
print(f"All {total_count} model(s) have been checked.")


mlagility_subdir_count = get_immediate_subdirectories_count(mlagility_models_dir)
if mlagility_subdir_count != len(models_info):
print(f"Expected {len(models_info)} model(s) in {mlagility_models_dir}, but got {mlagility_subdir_count} model(s) under models/mlagility."
f"Please check if you have added new model(s) to models_info in mlagility_config.py.")
Expand Down
34 changes: 3 additions & 31 deletions workflow_scripts/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,25 +23,6 @@ def get_all_models():
return model_list


def get_changed_models():
model_list = []
cwd_path = Path.cwd()
# git fetch first for git diff on GitHub Action
subprocess.run(["git", "fetch", "origin", "main:main"],
cwd=cwd_path, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# obtain list of added or modified files in this PR
obtain_diff = subprocess.Popen(["git", "diff", "--name-only", "--diff-filter=AM", "origin/main", "HEAD"],
cwd=cwd_path, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdoutput, _ = obtain_diff.communicate()
diff_list = stdoutput.split()

# identify list of changed ONNX models in ONXX Model Zoo
model_list = [str(model).replace("b'", "").replace("'", "")
for model in diff_list if onnx_ext_name in str(model) or tar_ext_name in str(model)]
return model_list


def main():
parser = argparse.ArgumentParser(description="Test settings")
# default all: test by both onnx and onnxruntime
Expand All @@ -53,12 +34,12 @@ def main():
parser.add_argument("--create", required=False, default=False, action="store_true",
help="Create new test data by ORT if it fails with existing test data")
parser.add_argument("--all_models", required=False, default=False, action="store_true",
help="Test all ONNX Model Zoo models instead of only chnaged models")
help="Test all ONNX Model Zoo models instead of only changed models")
parser.add_argument("--drop", required=False, default=False, action="store_true",
help="Drop downloaded models after verification. (For space limitation in CIs)")
args = parser.parse_args()

model_list = get_all_models() if args.all_models else get_changed_models()
model_list = get_all_models() if args.all_models else test_utils.get_changed_models()
# run lfs install before starting the tests
test_utils.run_lfs_install()

Expand Down Expand Up @@ -106,16 +87,7 @@ def main():
print("[PASS] {} is checked by onnx. ".format(model_name))
if args.target == "onnxruntime" or args.target == "all":
try:
# git lfs pull those test_data_set_* folders
root_dir = Path(model_path).parent
for _, dirs, _ in os.walk(root_dir):
for dir in dirs:
if "test_data_set_" in dir:
test_data_set_dir = os.path.join(root_dir, dir)
for _, _, files in os.walk(test_data_set_dir):
for file in files:
if file.endswith(".pb"):
test_utils.pull_lfs_file(os.path.join(test_data_set_dir, file))
test_utils.pull_lfs_directory(Path(model_path).parent)
check_model.run_backend_ort_with_data(model_path)
print("[PASS] {} is checked by onnxruntime. ".format(model_name))
except Exception as e:
Expand Down
35 changes: 35 additions & 0 deletions workflow_scripts/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,18 @@ def pull_lfs_file(file_name):
print(f'LFS pull completed for {file_name} with return code= {result.returncode}')


def pull_lfs_directory(directory_name):
# git lfs pull those test_data_set_* folders
for _, dirs, _ in os.walk(directory_name):
for dir in dirs:
if "test_data_set_" in dir:
test_data_set_dir = os.path.join(directory_name, dir)
for _, _, files in os.walk(test_data_set_dir):
for file in files:
if file.endswith(".pb"):
pull_lfs_file(os.path.join(test_data_set_dir, file))


def run_lfs_prune():
result = subprocess.run(['git', 'lfs', 'prune'], cwd=cwd_path, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f'LFS prune completed with return code= {result.returncode}')
Expand Down Expand Up @@ -62,3 +74,26 @@ def remove_tar_dir():
def remove_onnxruntime_test_dir():
if os.path.exists(TEST_ORT_DIR) and os.path.isdir(TEST_ORT_DIR):
rmtree(TEST_ORT_DIR)


def get_changed_models():
tar_ext_name = ".tar.gz"
onnx_ext_name = ".onnx"
model_list = []
cwd_path = Path.cwd()
# TODO: use the main branch instead of new-models
branch_name = "new-models" # "main"
# git fetch first for git diff on GitHub Action
subprocess.run(["git", "fetch", "origin", f"{branch_name}:{branch_name}"],
cwd=cwd_path, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# obtain list of added or modified files in this PR
obtain_diff = subprocess.Popen(["git", "diff", "--name-only", "--diff-filter=AM", "origin/" + branch_name, "HEAD"],
cwd=cwd_path, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdoutput, _ = obtain_diff.communicate()
diff_list = stdoutput.split()

# identify list of changed ONNX models in ONXX Model Zoo
model_list = [str(model).replace("b'", "").replace("'", "")
for model in diff_list if onnx_ext_name in str(model) or tar_ext_name in str(model)]
return model_list