Skip to content

Commit

Permalink
Suraj/update triton main (#1)
Browse files Browse the repository at this point in the history
* Changed copyright (triton-inference-server#5705)

* Modify timeout test in L0_sequence_batcher to use portable backend (triton-inference-server#5696)

* Modify timeout test in L0_sequence_batcher to use portable backend

* Use identity backend that is built by default on Windows

* updated upstream container name (triton-inference-server#5713)

* Fix triton container version (triton-inference-server#5714)

* Update the L0_model_config test expected error message (triton-inference-server#5684)

* Use better value in timeout test L0_sequence_batcher (triton-inference-server#5716)

* Use better value in timeout test L0_sequence_batcher

* Format

* Update JAX install (triton-inference-server#5613)

* Add notes about socket usage to L0_client_memory_growth test (triton-inference-server#5710)

* Check TensorRT error message more granularly (triton-inference-server#5719)

* Check TRT err msg more granularly

* Clarify source of error messages

* Consolidate tests for message parts

* Pin Python Package Versions for HTML Document Generation (triton-inference-server#5727)

* updating with pinned versions for python dependencies

* updated with pinned sphinx and nbclient versions

* Test full error returned when custom batcher init fails (triton-inference-server#5729)

* Add testing for batcher init failure, add wait for status check

* Formatting

* Change search string

* Add fastertransformer test  (triton-inference-server#5500)

Add fastertransformer test that uses 1GPU.

* Fix L0_backend_python on Jetson  (triton-inference-server#5728)

* Don't use mem probe in Jetson

* Clarify failure messages in L0_backend_python

* Update copyright

* Add JIRA ref, fix _test_jetson

* Add testing for Python custom metrics API (triton-inference-server#5669)

* Add testing for python custom metrics API

* Add custom metrics example to the test

* Fix for CodeQL report

* Fix test name

* Address comment

* Add logger and change the enum usage

* Add testing for Triton Client Plugin API (triton-inference-server#5706)

* Add HTTP client plugin test

* Add testing for HTTP asyncio

* Add async plugin support

* Fix qa container for L0_grpc

* Add testing for grpc client plugin

* Remove unused imports

* Fix up

* Fix L0_grpc models QA folder

* Update the test based on review feedback

* Remove unused import

* Add testing for .plugin method

* Install jemalloc (triton-inference-server#5738)

* Add --metrics-address and testing (triton-inference-server#5737)

* Add --metrics-address, add tests to L0_socket, re-order CLI options for consistency

* Use non-localhost address

* Add testing for basic auth plugin for HTTP/gRPC clients (triton-inference-server#5739)

* Add HTTP basic auth test

* Add testing for gRPC basic auth

* Fix up

* Remove unused imports

* Add multi-gpu, multi-stream testing for dlpack tensors (triton-inference-server#5550)

* Add multi-gpu, multi-stream testing for dlpack tensors

* Update note on SageMaker MME support for ensemble (triton-inference-server#5723)

* Run L0_backend_python subtests with virtual environment (triton-inference-server#5753)

* Update 'main' to track development of 2.35.0 / r23.06 (triton-inference-server#5764)

* Include jemalloc into the documentation (triton-inference-server#5760)

* Enhance tests in L0_model_update (triton-inference-server#5724)

* Add model instance name update test

* Add gap for timestamp to update

* Add some tests with dynamic batching

* Extend supported test on rate limit off

* Continue test if off mode failed

* Fix L0_memory_growth (triton-inference-server#5795)

(1) reduce MAX_ALLOWED_ALLOC to be more strict for bounded tests, and generous for unbounded tests.
(2) allow unstable measurement from PA.
(3) improve logging for future triage

* Add note on --metrics-address (triton-inference-server#5800)

* Add note on --metrics-address

* Copyright

* Minor fix for running "mlflow deployments create -t triton --flavor triton ..." (triton-inference-server#5658)

UnboundLocalError: local variable 'meta_dict' referenced before assignment

The above error shows in listing models in Triton model repository

* Adding test for new sequence mode (triton-inference-server#5771)

* Adding test for new sequence mode

* Update option name

* Clean up testing spacing and new lines

* MLFlow Triton Plugin: Add support for s3 prefix and custom endpoint URL (triton-inference-server#5686)

* MLFlow Triton Plugin: Add support for s3 prefix and custom endpoint URL

Signed-off-by: Xiaodong Ye <[email protected]>

* Update the function order of config.py and use os.path.join to replace filtering a list of strings then joining

Signed-off-by: Xiaodong Ye <[email protected]>

* Update onnx flavor to support s3 prefix and custom endpoint URL

Signed-off-by: Xiaodong Ye <[email protected]>

* Fix two typos in MLFlow Triton plugin README.md

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments (replace => strip)

Signed-off-by: Xiaodong Ye <[email protected]>

* Address review comments (init regex only for s3)

Signed-off-by: Xiaodong Ye <[email protected]>

* Remove unused local variable: slash_locations

Signed-off-by: Xiaodong Ye <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>

* Fix client script (triton-inference-server#5806)

* Add MLFlow test for already loaded models. Update copyright year (triton-inference-server#5808)

* Use the correct gtest filter (triton-inference-server#5824)

* Add error message test on S3 access decline (triton-inference-server#5825)

* Add test on access decline

* Fix typo

* Add MinIO S3 access decline test

* Make sure bucket exists during access decline test

* Restore AWS_SECRET_ACCESS_KEY on S3 local test (triton-inference-server#5832)

* Restore AWS_SECRET_ACCESS_KEY

* Add reason for restoring keys

* nnshah1 stream infer segfault fix (triton-inference-server#5842)

match logic from infer_handler.cc

* Remove unused test (triton-inference-server#5851)

* Add and document memory usage in statistic protocol (triton-inference-server#5642)

* Add and document memory usage in statistic protocol

* Fix doc

* Fix up

* [DO NOT MERGE Add test. FIXME: model generation

* Fix up

* Fix style

* Address comment

* Fix up

* Set memory tracker backend option in build.py

* Fix up

* Add CUPTI library in Windows image build

* Add note to build with memory tracker by default

* use correct lib dir on CentOS (triton-inference-server#5836)

* use correct lib dir on CentOS

* use new location for opentelemetry-cpp

* Document that gpu-base flag is optional for cpu-only builds (triton-inference-server#5861)

* Update Jetson tests in Docker container (triton-inference-server#5734)

* Add flags for ORT build

* Separate list with commas

* Remove unnecessary detection of nvcc compiler

* Fixed Jetson path for perf_client, datadir

* Create version directoryy for custom model

* Remove probe check for shm, add shm exceed error for Jetson

* Copyright updates, fix Jetson Probe

* Fix be_python test num on Jetson

* Remove extra comma, non-Dockerized Jetson comment

* Remove comment about Jetson being non-dockerized

* Remove no longer needed flag

* Update `main` post-23.05 release (triton-inference-server#5880)

* Update README and versions for 23.05 branch

* Changes to support 23.05 (triton-inference-server#5782)

* Update python and conda version

* Update CMAKE installation

* Update checksum version

* Update ubuntu base image to 22.04

* Use ORT 1.15.0

* Set CMAKE to pull latest version

* Update libre package version

* Removing unused argument

* Adding condition for ubuntu 22.04

* Removing installation of the package from the devel container

* Nnshah1 u22.04 (triton-inference-server#5770)

* Update CMAKE installation

* Update python and conda version

* Update CMAKE installation

* Update checksum version

* Update ubuntu base image to 22.04

* updating versions for ubuntu 22.04

* remove re2

---------

Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>

* Set ONNX version to 1.13.0

* Fix L0_custom_ops for ubuntu 22.04 (triton-inference-server#5775)

* add back rapidjson-dev

---------

Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: nv-kmcgill53 <[email protected]>

* Fix L0_mlflow (triton-inference-server#5805)

* working thread

* remove default install of blinker

* merge issue fixed

* Fix L0_backend_python/env test (triton-inference-server#5799)

* Fix L0_backend_python/env test

* Address comment

* Update the copyright

* Fix up

* Fix L0_http_fuzz (triton-inference-server#5776)

* installing python 3.8.16 for test

* spelling

Co-authored-by: Neelay Shah <[email protected]>

* use util functions to install python3.8 in an easier way

---------

Co-authored-by: Neelay Shah <[email protected]>

* Update Windows versions for 23.05 release (triton-inference-server#5826)

* Rename Ubuntu 20.04 mentions to 22.04 (triton-inference-server#5849)

* Update DCGM version (triton-inference-server#5856)

* Update DCGM version (triton-inference-server#5857)

* downgrade DCGM version to 2.4.7 (triton-inference-server#5860)

* Updating link for latest release notes to 23.05

---------

Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: nv-kmcgill53 <[email protected]>
Co-authored-by: Iman Tabrizian <[email protected]>

* Disable memory tracker on Jetpack until the library is available (triton-inference-server#5882)

* Fix datadir for x86 (triton-inference-server#5894)

* Add more test on instance signature (triton-inference-server#5852)

* Add testing for new error handling API (triton-inference-server#5892)

* Test batch input for libtorch (triton-inference-server#5855)

* Draft ragged TensorRT unit model gen

* Draft libtorch special identity model

* Autoformat

* Update test, fix ragged model gen

* Update suffix for io for libtorch

* Remove unused variables

* Fix io names for libtorch

* Use INPUT0/OUTPUT0 for libtorch

* Reorder to match test model configs

* Remove unnecessary capitalization

* Auto-format

* Capitalization is necessary

* Remove unnecessary export

* Clean up Azure dependency in server build (triton-inference-server#5900)

* [DO NOT MERGE]

* Remove Azure dependency in server component build

* Finalize

* Fix dependency

* Fixing up

* Clean up

* Add response parameters for streaming GRPC inference to enhance decoupled support (triton-inference-server#5878)

* Update 'main' to track development of 2.36.0 / 23.07 (triton-inference-server#5917)

* Add test for detecting S3 http2 upgrade request (triton-inference-server#5911)

* Add test for detecting S3 http2 upgrade request

* Enhance testing

* Copyright year update

* Add Redis cache build, tests, and docs (triton-inference-server#5916)

* Updated handling for uint64 request priority

* Ensure HPCX dependencies found in container (triton-inference-server#5922)

* Add HPCX dependencies to search path

* Copy hpcx to CPU-only container

* Add ucc path to CPU-only image

* Fixed if statement

* Fix df variable

* Combine hpcx LD_LIBRARY_PATH

* Add test case where MetricFamily is deleted before deleting Metric (triton-inference-server#5915)

* Add test case for metric lifetime error handling

* Address comment

* Use different MetricFamily name

* Add testing for Pytorch instance group kind MODEL (triton-inference-server#5810)

* Add testing for Pytorch instance group kind MODEL

* Remove unused item

* Update testing to verify the infer result

* Add copyright

* Remove unused import

* Update pip install

* Update the model to use the same add sub logic

* Add torch multi-gpu and multi-device models to L0_io

* Fix up model version

* Add test for sending instance update config via load API (triton-inference-server#5937)

* Add test for passing config via load api

* Add more docs on instance update behavior

* Update to suggested docs

Co-authored-by: Ryan McCormick <[email protected]>

* Use dictionary for json config

* Modify the config fetched from Triton instead

---------

Co-authored-by: Ryan McCormick <[email protected]>

* Fix L0_batcher count check (triton-inference-server#5939)

* Add testing for json tensor format (triton-inference-server#5914)

* Add redis config and use local logfile for redis server (triton-inference-server#5945)

* Add redis config and use local logfile for redis server

* Move redis log config to CLI

* Have separate redis logs for unit tests and CLI tests

* Add test on rate limiter max resource decrease update (triton-inference-server#5885)

* Add test on rate limiter max resource decrease update

* Add test with explicit resource

* Check server log for decreased resource limit

* Add docs on decoupled final response feature (triton-inference-server#5936)

* Allow changing ping behavior based on env variable in SageMaker and entrypoint updates (triton-inference-server#5910)

* Allow changing ping behavior based on env variable in SageMaker

* Add option for additional args

* Make ping further configurable

* Allow further configuration of grpc and http ports

* Update docker/sagemaker/serve

* Update docker/sagemaker/serve

---------

Co-authored-by: GuanLuo <[email protected]>

* Remove only MPI libraries in HPCX in L0_perf_analyzer (triton-inference-server#5967)

* Be more specific with MPI removal

* Delete all libmpi libs

* Ensure L0_batch_input requests received in order (triton-inference-server#5963)

* Add print statements for debugging

* Add debugging print statements

* Test using grpc client with stream to fix race

* Use streaming client in all non-batch tests

* Switch all clients to streaming GRPC

* Remove unused imports, vars

* Address comments

* Remove random comment

* Set inputs as separate function

* Split set inputs based on test type

* Add test for redis cache auth credentials via env vars (triton-inference-server#5966)

* Auto-formatting (triton-inference-server#5979)

* Auto-format

* Change to clang-format-15 in CONTRIBTUING

* Adding tests ensuring locale setting is passed to python backend interpreter

* Refactor build.py CPU-only Linux libs for readability (triton-inference-server#5990)

* Improve the error message when the number of GPUs is insufficient (triton-inference-server#5993)

* Update README to include CPP-API Java Bindings (triton-inference-server#5883)

* Update env variable to use for overriding /ping behavior (triton-inference-server#5994)

* Add test that >1000 model files can be loaded in S3 (triton-inference-server#5976)

* Add test for >1000 files

* Capitalization for consistency

* Add bucket cleaning at end

* Move test pass/fail to end

* Check number of files in model dir at load time

* Add testing for GPU tensor error handling (triton-inference-server#5871)

* Add testing for GPU tensor error handling

* Fix up

* Remove exit 0

* Fix jetson

* Fix up

* Add test for Python BLS model loading API (triton-inference-server#5980)

* Add test for Python BLS model loading API

* Fix up

* Update README and versions for 23.06 branch

* Fix LD_LIBRARY_PATH for PyTorch backend

* Return updated df in add_cpu_libs

* Remove unneeded df param

* Update test failure messages to match Dataloader changes (triton-inference-server#6006)

* Add dependency for L0_python_client_unit_tests (triton-inference-server#6010)

* Improve performance tuning guide (triton-inference-server#6026)

* Enabling nested spans for trace mode OpenTelemetry (triton-inference-server#5928)

* Adding nested spans to OTel tracing + support of ensemble models

* Move multi-GPU dlpack test to a separate L0 test (triton-inference-server#6001)

* Move multi-GPU dlpack test to a separate L0 test

* Fix copyright

* Fix up

* OpenVINO 2023.0.0 (triton-inference-server#6031)

* Upgrade OV to 2023.0.0

* Upgrade OV model gen script to 2023.0.0

* Add test to check the output memory type for onnx models (triton-inference-server#6033)

* Add test to check the output memory type for onnx models

* Remove unused import

* Address comment

* Add testing for implicit state for PyTorch backend (triton-inference-server#6016)

* Add testing for implicit state for PyTorch backend

* Add testing for libtorch string implicit models

* Fix CodeQL

* Mention that libtorch backend supports implicit state

* Fix CodeQL

* Review edits

* Fix output tests for PyTorch backend

* Allow uncompressed conda execution enviroments (triton-inference-server#6005)

Add test for uncompressed conda execution enviroments

* Fix implicit state test (triton-inference-server#6039)

* Adding target_compile_features cxx_std_17 to tracing lib (triton-inference-server#6040)

* Update 'main' to track development of 2.37.0 / 23.08

* Fix intermittent failure in L0_model_namespacing (triton-inference-server#6052)

* Fix PyTorch implicit model mounting in gen_qa_model_repository (triton-inference-server#6054)

* Fix broken links pointing to the `grpc_server.cc` file (triton-inference-server#6068)

* Fix L0_backend_python expected instance name (triton-inference-server#6073)

* Fix expected instance name

* Copyright year

* Fix L0_sdk: update the search name for the client wheel (triton-inference-server#6074)

* Fix name of client wheel to be looked for

* Fix up

* Add GitHub action to format and lint code (triton-inference-server#6022)

* Add pre-commit

* Fix typos, exec/shebang, formatting

* Remove clang-format

* Update contributing md to include pre-commit

* Update spacing in CONTRIBUTING

* Fix contributing pre-commit link

* Link to pre-commit install directions

* Wording

* Restore clang-format

* Fix yaml spacing

* Exclude templates folder for check-yaml

* Remove unused vars

* Normalize spacing

* Remove unused variable

* Normalize config indentation

* Update .clang-format to enforce max line length of 80

* Update copyrights

* Update copyrights

* Run workflows on every PR

* Fix copyright year

* Fix grammar

* Entrypoint.d files are not executable

* Run pre-commit hooks

* Mark not executable

* Run pre-commit hooks

* Remove unused variable

* Run pre-commit hooks after rebase

* Update copyrights

* Fix README.md typo (decoupled)

Co-authored-by: Ryan McCormick <[email protected]>

* Run pre-commit hooks

* Grammar fix

Co-authored-by: Ryan McCormick <[email protected]>

* Redundant word

Co-authored-by: Ryan McCormick <[email protected]>

* Revert docker file changes

* Executable shebang revert

* Make model.py files non-executable

* Passin is proper flag

* Run pre-commit hooks on init_args/model.py

* Fix typo in init_args/model.py

* Make copyrights one line

---------

Co-authored-by: Ryan McCormick <[email protected]>

* Fix default instance name change when count is 1 (triton-inference-server#6088)

* Add test for sequence model instance update (triton-inference-server#5831)

* Add test for sequence model instance update

* Add gap for file timestamp update

* Update test for non-blocking sequence update

* Update documentation

* Remove mentioning increase instance count case

* Add more documentaion for scheduler update test

* Update test for non-blocking batcher removal

* Add polling due to async scheduler destruction

* Use _ as private

* Fix typo

* Add docs on instance count decrease

* Fix typo

* Separate direct and oldest to different test cases

* Separate nested tests in a loop into multiple test cases

* Refactor scheduler update test

* Improve doc on handling future test failures

* Address pre-commit

* Add best effort to reset model state after a single test case failure

* Remove reset model method to make harder for chaining multiple test cases as one

* Remove description on model state clean up

* Fix default instance name (triton-inference-server#6097)

* Removing unused tests (triton-inference-server#6085)

* Update post-23.07 release  (triton-inference-server#6103)

* Update README and versions for 2.36.0 / 23.07

* Update Dockerfile.win10.min

* Fix formating issue

* fix formating issue

* Fix whitespaces

* Fix whitespaces

* Fix whitespaces

* Improve asyncio testing (triton-inference-server#6122)

* Reduce instance count to 1 for python bls model loading test (triton-inference-server#6130)

* Reduce instance count to 1 for python bls model loading test

* Add comment when calling unload

* Fix queue test to expect exact number of failures (triton-inference-server#6133)

* Fix queue test to expect exact number of failures

* Increase the execution time to more accurately capture requests

* Add CPU & GPU metrics in Grafana dashboard.json for K8s op prem deployment (fix triton-inference-server#6047) (triton-inference-server#6100)

Signed-off-by: Xiaodong Ye <[email protected]>

* Adding the support tracing of child models invoked from a BLS model (triton-inference-server#6063)

* Adding tests for bls

* Added fixme, cleaned previous commit

* Removed unused imports

* Fixing commit tree:
Refactor code, so that OTel tracer provider is initialized only once
Added resource cmd option, testig
Added docs

* Clean up

* Update docs/user_guide/trace.md

Co-authored-by: Ryan McCormick <[email protected]>

* Revision

* Update doc

* Clean up

* Added ostream exporter to OpenTelemetry for testing purposes; refactored trace tests

* Added opentelemetry trace collector set up to tests; refactored otel exporter tests to use OTel collector instead of netcat

* Revising according to comments

* Added comment regarding 'parent_span_id'

* Added permalink

* Adjusted test

---------

Co-authored-by: Ryan McCormick <[email protected]>

* Test python environments 3.8-3.11 (triton-inference-server#6109)

Add tests for python 3.8-3.11 for L0_python_backends

* Improve L0_backend_python debugging (triton-inference-server#6157)

* Improve L0_backend_python debugging

* Use utils function for artifacts collection

* Add unreachable output test for reporting source of disconnectivity (triton-inference-server#6149)

* Update 'main' to track development of 2.38.0 / 23.09 (triton-inference-server#6163)

* Fix the versions in the doc (triton-inference-server#6164)

* Update docs with NVAIE messaging (triton-inference-server#6162)

Update docs with NVAIE messaging

* Add sanity tests for parallel instance loading (triton-inference-server#6126)

* Remove extra whitespace (triton-inference-server#6174)

* Remove a test case that sanity checks input value of --shape CLI flag (triton-inference-server#6140)

* Remove test checking for --shape option

* Remove the entire test

* Add test when unload/load requests for same model is received at the same time (triton-inference-server#6150)

* Add test when unload/load requests for same model received the same time

* Add test_same_model_overlapping_load_unload

* Use a load/unload stress test instead

* Pre-merge test name update

* Address pre-commit error

* Revert "Address pre-commit error"

This reverts commit 781cab1.

* Record number of occurrence of each exception

* Make assert failures clearer in L0_trt_plugin (triton-inference-server#6166)

* Add end-to-end CI test for decoupled model support (triton-inference-server#6131) (triton-inference-server#6184)

* Add end-to-end CI test for decoupled model support

* Address feedback

* Test preserve_ordering for oldest strategy sequence batcher (triton-inference-server#6185)

* added debugging guide (triton-inference-server#5924)

* added debugging guide

* Run pre-commit

---------

Co-authored-by: David Yastremsky <[email protected]>

* Add deadlock gdb section to debug guide (triton-inference-server#6193)

* Fix character escape in model repository documentation (triton-inference-server#6197)

* Fix docs test (triton-inference-server#6192)

* Add utility functions for array manipulation (triton-inference-server#6203)

* Add utility functions for outlier removal

* Fix functions

* Add newline to end of file

* Add gc collect to make sure gpu tensor is deallocated (triton-inference-server#6205)

* Testing: add gc collect to make sure gpu tensor is deallocated

* Address comment

* Check for log error on failing to find explicit load model (triton-inference-server#6204)

* Set default shm size to 1MB for Python backend (triton-inference-server#6209)

* Trace Model Name Validation (triton-inference-server#6199)

* Initial commit

* Cleanup using new standard formatting

* QA test restructuring

* Add newline to the end of test.sh

* HTTP/GRCP protocol changed to pivot on ready status & error status. Log file name changed in qa test.

* Fixing unhandled error memory leak

* Handle index function memory leak fix

* Fix the check for error message (triton-inference-server#6226)

* Fix copyright for debugging guide (triton-inference-server#6225)

* Add watts units to GPU power metric descriptions (triton-inference-server#6242)

* Update post-23.08 release  (triton-inference-server#6234)

* CUDA 12.1 > 12.2

* DLIS-5208: onnxruntime+windows - stop treat warnings on compile as errors

* Revert "DLIS-5208: onnxruntime+windows - stop treat warnings on compile as errors"

This reverts commit 0cecbb7.

* Update Dockerfile.win10.min

* Update Dockerfile.win10.min

* Update README and versions for 23.08 branch

* Update Dockerfile.win10

* Fix the versions in docs

* Add the note about stabilization of the branch

* Update docs with NVAIE messaging (triton-inference-server#6162) (triton-inference-server#6167)

Update docs with NVAIE messaging

Co-authored-by: David Zier <[email protected]>

* Resolve merge conflict

---------

Co-authored-by: tanmayv25 <[email protected]>
Co-authored-by: David Zier <[email protected]>

* Add tests/docs for queue size (pending request count) metric (triton-inference-server#6233)

* Adding safe string to number conversions (triton-inference-server#6173)

* Added catch for out of range error for trace setting update

* Added wrapper to safe parse options

* Added option names to errors

* Adjustments

* Quick fix

* Fixing option name for Windows

* Removed repetitive code

* Adjust getopt_long for Windows to use longindex

* Moved try catch into ParseOption

* Removed unused input

* Improved names

* Refactoring and clean up

* Fixed Windows

* Refactored getopt_long for Windows

* Refactored trace test, pinned otel's collector version to avoid problems with go requirements

* Test Python execute() to return Triton error code (triton-inference-server#6228)

* Add test for Python execute error code

* Add all supported error codes into test

* Move ErrorCode into TritonError

* Expose ErrorCode internal in TritonError

* Add docs on IPv6 (triton-inference-server#6262)

* Add test for TensorRT version-compatible model support (triton-inference-server#6255)

* Add tensorrt version-compatibility test

* Generate one version-compatible model

* Fix copyright year

* Remove unnecessary variable

* Remove unnecessary line

* Generate TRT version-compatible model

* Add sample inference to TRT version-compatible test

* Clean up utils and model gen for new plan model

* Fix startswith capitalization

* Remove unused imports

* Remove unused imports

* Add log check

* Upgrade protobuf version (triton-inference-server#6268)

* Add testing for retrieving shape and datatype in backend API (triton-inference-server#6231)

Add testing for retrieving output shape and datatype info from backend API

* Update 'main' to track development of 2.39.0 / 23.10 (triton-inference-server#6277)

* Apply UCX workaround (triton-inference-server#6254)

* Add ensemble parameter forwarding test (triton-inference-server#6284)

* Exclude extra TRT version-compatible models from tests (triton-inference-server#6294)

* Exclude compatible models from tests.

* Force model removal, in case it does not exist

Co-authored-by: Ryan McCormick <[email protected]>

---------

Co-authored-by: Ryan McCormick <[email protected]>

* Adding installation of docker and docker-buildx (triton-inference-server#6299)

* Adding installation of docker and docker-buildx

* remove whitespace

* Use targetmodel from header as model name in SageMaker (triton-inference-server#6147)

* Use targetmodel from header as model name in SageMaker

* Update naming for model hash

* Add more error messages, return codes, and refactor HTTP server (triton-inference-server#6297)

* Fix typo (triton-inference-server#6318)

* Update the request re-use example (triton-inference-server#6283)

* Update the request re-use example

* Review edit

* Review comment

* Disable developer tools build for In-process API + JavaCPP tests (triton-inference-server#6296)

* Add Python binding build. Add L0_python_api to test Python binding (triton-inference-server#6319)

* Add L0_python_api to test Python binding

* Install Python API in CI image

* Fix QA build

* Increase network timeout for valgrind (triton-inference-server#6324)

* Tests and docs for ability to specify subdirectory to download for LocalizePath (triton-inference-server#6308)

* Added custom localization tests for s3 and azure, added docs

* Refactor HandleInfer into more readable chunks (triton-inference-server#6332)

* Refactor model generation scripts (triton-inference-server#6336)

* Refactor model generation scripts

* Fix codeql

* Fix relative path import

* Fix package structure

* Copy the gen_common file

* Add missing uint8

* Remove duplicate import

* Add testing for scalar I/O in ORT backend (triton-inference-server#6343)

* Add testing for scalar I/O in ORT backend

* Review edit

* ci

* Update post-23.09 release (triton-inference-server#6367)

* Update README and versions for 23.09 branch (triton-inference-server#6280)

* Update `Dockerfile` and `build.py`  (triton-inference-server#6281)

* Update configuration for Windows Dockerfile (triton-inference-server#6256)

* Adding installation of docker and docker-buildx

* Enable '--expt-relaxed-constexpr' flag for custom ops models

* Upate Dockerfile version

* Disable unit tests for Jetson

* Update condition (triton-inference-server#6285)

* removing Whitespaces (triton-inference-server#6293)

* removing Whitespaces

* removing whitespaces

* Add security policy (triton-inference-server#6376)

* Adding client-side request cancellation support and testing (triton-inference-server#6383)

* Add L0_request_cancellation (triton-inference-server#6252)

* Add L0_request_cancellation

* Remove unittest test

* Add cancellation to gRPC server error handling

* Fix up

* Use identity model

* Add tests for gRPC client-side cancellation (triton-inference-server#6278)

* Add tests for gRPC client-side cancellation

* Fix CodeQL issues

* Formatting

* Update qa/L0_client_cancellation/client_cancellation_test.py

Co-authored-by: Ryan McCormick <[email protected]>

* Move to L0_request_cancellation

* Address review comments

* Removing request cancellation support from asyncio version

* Format

* Update copyright

* Remove tests

* Handle cancellation notification in gRPC server (triton-inference-server#6298)

* Handle cancellation notification in gRPC server

* Fix the request ptr initialization

* Update src/grpc/infer_handler.h

Co-authored-by: Ryan McCormick <[email protected]>

* Address review comment

* Fix logs

* Fix request complete callback by removing reference to state

* Improve documentation

---------

Co-authored-by: Ryan McCormick <[email protected]>

---------

Co-authored-by: Ryan McCormick <[email protected]>

* Fixes on the gRPC frontend to handle AsyncNotifyWhenDone() API (triton-inference-server#6345)

* Fix segmentation fault in gRPC frontend

* Finalize all states upon completion

* Fixes all state cleanups

* Handle completed states when cancellation notification is received

* Add more documentation steps

* Retrieve dormant states to minimize the memory footprint for long streams

* Update src/grpc/grpc_utils.h

Co-authored-by: Ryan McCormick <[email protected]>

* Use a boolean state instead of raw pointer

---------

Co-authored-by: Ryan McCormick <[email protected]>

* Add L0_grpc_state_cleanup test (triton-inference-server#6353)

* Add L0_grpc_state_cleanup test

* Add model file in QA container

* Fix spelling

* Add remaining subtests

* Add failing subtests

* Format fixes

* Fix model repo

* Fix QA docker file

* Remove checks for the error message when shutting down server

* Fix spelling

* Address review comments

* Add schedulers request cancellation tests (triton-inference-server#6309)

* Add schedulers request cancellation tests

* Merge gRPC client test

* Reduce testing time and covers cancelling other requests as a consequence of request cancellation

* Add streaming request cancellation test

---------

Co-authored-by: Iman Tabrizian <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: Jacky <[email protected]>

* Add missing copyright (triton-inference-server#6388)

* Add basic generate endpoints for LLM tasks (triton-inference-server#6366)

* PoC of parsing request prompt and converting to Triton infer request

* Remove extra trace

* Add generate endpoint

* Enable streaming version

* Fix bug

* Fix up

* Add basic testing. Cherry pick from triton-inference-server#6369

* format

* Address comment. Fix build

* Minor cleanup

* cleanup syntax

* Wrap error in SSE format

* Fix up

* Restrict number of response on non-streaming generate

* Address comment on implementation.

* Re-enable trace on generate endpoint

* Add more comprehensive llm endpoint tests (triton-inference-server#6377)

* Add security policy (triton-inference-server#6376)

* Start adding some more comprehensive tests

* Fix test case

* Add response error testing

* Complete test placeholder

* Address comment

* Address comments

* Fix code check

---------

Co-authored-by: dyastremsky <[email protected]>
Co-authored-by: GuanLuo <[email protected]>

* Address comment

* Address comment

* Address comment

* Fix typo

---------

Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: dyastremsky <[email protected]>

* Add Python backend request cancellation test (triton-inference-server#6364)

* Add cancelled response status test

* Add Python backend request cancellation test

* Add Python backend decoupled request cancellation test

* Simplified response if cancelled

* Test response_sender.send() after closed

* Rollback test response_sender.send() after closed

* Rollback non-decoupled any response on cancel

* Add TRT-LLM backend build to Triton (triton-inference-server#6365) (triton-inference-server#6392)

* Add TRT-LLM backend build to Triton (triton-inference-server#6365)

* Add trtllm backend to build

* Temporarily adding version map for 23.07

* Fix build issue

* Update comment

* Comment out python binding changes

* Add post build

* Update trtllm backend naming

* Update TRTLLM base image

* Fix cmake arch

* Revert temp changes for python binding PR

* Address comment

* Move import to the top (triton-inference-server#6395)

* Move import to the top

* pre commit format

* Add Python backend when vLLM backend built (triton-inference-server#6397)

* Update build.py to build vLLM backend (triton-inference-server#6394)

* Support parameters object in generate route

* Update 'main' to track development of 2.40.0 / 23.11 (triton-inference-server#6400)

* Fix L0_sdk (triton-inference-server#6387)

* Add documentation on request cancellation (triton-inference-server#6403)

* Add documentation on request cancellation

* Include python backend

* Update docs/user_guide/request_cancellation.md

Co-authored-by: Iman Tabrizian <[email protected]>

* Update docs/user_guide/request_cancellation.md

Co-authored-by: Neelay Shah <[email protected]>

* Update docs/README.md

Co-authored-by: Neelay Shah <[email protected]>

* Update docs/user_guide/request_cancellation.md

Co-authored-by: Ryan McCormick <[email protected]>

* Remove inflight term from the main documentation

* Address review comments

* Fix

* Update docs/user_guide/request_cancellation.md

Co-authored-by: Jacky <[email protected]>

* Fix

---------

Co-authored-by: Iman Tabrizian <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: Jacky <[email protected]>

* Fixes in request cancellation doc (triton-inference-server#6409)

* Document generate HTTP endpoint (triton-inference-server#6412)

* Document generate HTTP endpoint

* Address comment

* Fix up

* format

* Address comment

* Update SECURITY.md to not display commented copyright (triton-inference-server#6426)

* Fix missing library in L0_data_compression (triton-inference-server#6424)

* Fix missing library in L0_data_compression

* Fix up

* Add Javacpp-presets repo location as env variable in Java tests(triton-inference-server#6385)

Simplify testing when upstream (javacpp-presets) build changes. Related to triton-inference-server/client#409

* TRT-LLM backend build changes (triton-inference-server#6406)

* Update url

* Debugging

* Debugging

* Update url

* Fix build for TRT-LLM backend

* Remove TRTLLM TRT and CUDA versions

* Fix up unused var

* Fix up dir name

* FIx cmake patch

* Remove previous TRT version

* Install required packages for example models

* Remove packages that are only needed for testing

* Add gRPC AsyncIO request cancellation tests (triton-inference-server#6408)

* Fix gRPC test failure and refactor

* Add gRPC AsyncIO cancellation tests

* Better check if a request is cancelled

* Use f-string

* Fix L0_implicit_state (triton-inference-server#6427)

* Fixing vllm build (triton-inference-server#6433)

* Fixing torch version for vllm

* Switch Jetson model TensorRT models generation to container (triton-inference-server#6378)

* Switch Jetson model TensorRT models generation to container

* Adding missed file

* Fix typo

* Fix typos

* Remove extra spaces

* Fix typo

* Bumped vllm version (triton-inference-server#6444)

* Adjust test_concurrent_same_model_load_unload_stress (triton-inference-server#6436)

* Adding emergency vllm latest release (triton-inference-server#6454)

* Fix notify state destruction and inflight states tracking (triton-inference-server#6451)

* Ensure notify_state_ gets properly destructed

* Fix inflight state tracking to properly erase states

* Prevent removing the notify_state from being erased

* Wrap notify_state_ object within unique_ptr

* Update TRT-LLM backend url (triton-inference-server#6455)

* TRTLLM backend post release

* TRTLLM backend post release

* Update submodule url for permission issue

* Update submodule url

* Fix up

* Not using postbuild function to workaround submodule url permission issue

* Added docs on python based backends (triton-inference-server#6429)


Co-authored-by: Neelay Shah <[email protected]>

* L0_model_config Fix (triton-inference-server#6472)

* Minor fix for L0_model_config

* Add test for Python model parameters (triton-inference-server#6452)

* Test Python BLS with different sizes of CUDA memory pool (triton-inference-server#6276)

* Test with different sizes of CUDA memory pool

* Check the server log for error message

* Improve debugging

* Fix syntax

* Add documentation for K8s-onprem StartupProbe (triton-inference-server#5257)

Co-authored-by: dyastremsky <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>

* Update `main` post-23.10 release   (triton-inference-server#6484)

* Update README and versions for 23.10 branch (triton-inference-server#6399)

* Cherry-picking vLLM backend changes (triton-inference-server#6404)

* Update build.py to build vLLM backend (triton-inference-server#6394)

* Add Python backend when vLLM backend built (triton-inference-server#6397)

---------

Co-authored-by: dyastremsky <[email protected]>

* Add documentation on request cancellation (triton-inference-server#6403) (triton-inference-server#6407)

* Add documentation on request cancellation

* Include python backend

* Update docs/user_guide/request_cancellation.md

* Update docs/user_guide/request_cancellation.md

* Update docs/README.md

* Update docs/user_guide/request_cancellation.md

* Remove inflight term from the main documentation

* Address review comments

* Fix

* Update docs/user_guide/request_cancellation.md

* Fix

---------

Co-authored-by: Iman Tabrizian <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: Jacky <[email protected]>

* Fixes in request cancellation doc (triton-inference-server#6409) (triton-inference-server#6410)

* TRT-LLM backend build changes (triton-inference-server#6406) (triton-inference-server#6430)

* Update url

* Debugging

* Debugging

* Update url

* Fix build for TRT-LLM backend

* Remove TRTLLM TRT and CUDA versions

* Fix up unused var

* Fix up dir name

* FIx cmake patch

* Remove previous TRT version

* Install required packages for example models

* Remove packages that are only needed for testing

* Fixing vllm build (triton-inference-server#6433) (triton-inference-server#6437)

* Fixing torch version for vllm

Co-authored-by: Olga Andreeva <[email protected]>

* Update TRT-LLM backend url (triton-inference-server#6455) (triton-inference-server#6460)

* TRTLLM backend post release

* TRTLLM backend post release

* Update submodule url for permission issue

* Update submodule url

* Fix up

* Not using postbuild function to workaround submodule url permission issue

* remove redundant lines

* Revert "remove redundant lines"

This reverts commit 86be7ad.

* restore missed lines

* Update build.py

Co-authored-by: Olga Andreeva <[email protected]>

* Update build.py

Co-authored-by: Olga Andreeva <[email protected]>

---------

Co-authored-by: Tanmay Verma <[email protected]>
Co-authored-by: dyastremsky <[email protected]>
Co-authored-by: Iman Tabrizian <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: Jacky <[email protected]>
Co-authored-by: Kris Hung <[email protected]>
Co-authored-by: Katherine Yang <[email protected]>
Co-authored-by: Olga Andreeva <[email protected]>

* Adding structure reference to the new document (triton-inference-server#6493)

* Improve L0_backend_python test stability (ensemble / gpu_tensor_lifecycle) (triton-inference-server#6490)

* Test torch allocator gpu memory usage directly rather than global gpu memory for more consistency

* Add L0_generative_sequence test (triton-inference-server#6475)

* Add testing backend and test

* Add test to build / CI. Minor fix on L0_http

* Format. Update backend documentation

* Fix up

* Address comment

* Add negative testing

* Fix up

* Downgrade vcpkg version (triton-inference-server#6503)

* Collecting sub dir artifacts in GitLab yaml. Removing collect function from test script. (triton-inference-server#6499)

* Use post build function for TRT-LLM backend (triton-inference-server#6476)

* Use postbuild function

* Remove updating submodule url

* Enhanced python_backend autocomplete (triton-inference-server#6504)

* Added testing for python_backend autocomplete: optional input and model_transaction_policy

* Parse reuse-grpc-port and reuse-http-port as booleans (triton-inference-server#6511)

Co-authored-by: Francesco Petrini <[email protected]>

* Fixing L0_io (triton-inference-server#6510)

* Fixing L0_io

* Add Python-based backends CI (triton-inference-server#6466)

* Bumped vllm version

* Add python-bsed backends testing

* Add python-based backends CI

* Fix errors

* Add vllm backend

* Fix pre-commit

* Modify test.sh

* Remove vllm_opt qa model

* Remove vLLM ackend tests

* Resolve review comments

* Fix pre-commit errors

* Update qa/L0_backend_python/python_based_backends/python_based_backends_test.py

Co-authored-by: Tanmay Verma <[email protected]>

* Remove collect_artifacts_from_subdir function call

---------

Co-authored-by: oandreeva-nv <[email protected]>
Co-authored-by: Tanmay Verma <[email protected]>

* Enabling option to restrict access to HTTP APIs based on header value pairs (similar to gRPC)

* Upgrade DCGM from 2.4.7 to 3.2.6 (triton-inference-server#6515)

* Enhance GCS credentials documentations (triton-inference-server#6526)

* Test file override outside of model directory (triton-inference-server#6516)

* Add boost-filesystem

* Update ORT version to 1.16.2 (triton-inference-server#6531)

* Adjusting expected error msg (triton-inference-server#6517)

* Update 'main' to track development of 2.41.0 / 23.12 (triton-inference-server#6543)

* Enhance testing for pending request count (triton-inference-server#6532)

* Enhance testing for pending request count

* Improve the documentation

* Add more documentation

* Add testing for Python backend request rescheduling (triton-inference-server#6509)

* Add testing

* Fix up

* Enhance testing

* Fix up

* Revert test changes

* Add grpc endpoint test

* Remove unused import

* Remove unused import

* Update qa/L0_backend_python/request_rescheduling/grpc_endpoint_test.py

Co-authored-by: Iman Tabrizian <[email protected]>

* Update qa/python_models/bls_request_rescheduling/model.py

Co-authored-by: Iman Tabrizian <[email protected]>

---------

Co-authored-by: Iman Tabrizian <[email protected]>

* Check that the wget is installed (triton-inference-server#6556)

* secure deployment considerations guide (triton-inference-server#6533)

* draft document

* updates

* updates

* updated

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* updates

* update

* updates

* updates

* Update docs/customization_guide/deploy.md

Co-authored-by: Kyle McGill <[email protected]>

* Update docs/customization_guide/deploy.md

Co-authored-by: Kyle McGill <[email protected]>

* fixing typos

* updated with clearer warnings

* updates to readme and toc

---------

Co-authored-by: Kyle McGill <[email protected]>

* Fix typo and change the command line order (triton-inference-server#6557)

* Fix typo and change the command line order

* Improve visual experience. Add 'clang' package

* Add error during rescheduling test to L0_generative_sequence (triton-inference-server#6550)

* changing references to concrete instances

* Add testing for implicit state enhancements (triton-inference-server#6524)

* Add testing for single buffer

* Add testing for implicit state with buffer growth

* Improve testing

* Fix up

* Add CUDA virtual address size flag

* Add missing test files

* Parameter rename

* Test fixes

* Only build implicit state backend for GPU=ON

* Fix copyright (triton-inference-server#6584)

* Mention TRT LLM backend supports request cancellation (triton-inference-server#6585)

* update model repository generation for onnx models for protobuf (triton-inference-server#6575)

* Fix L0_sagemaker (triton-inference-server#6587)

* Add C++ server wrapper to the doc (triton-inference-server#6592)

* Add timeout to client apis and tests (triton-inference-server#6546)

Client PR: triton-inference-server/client#429

* Change name generative -> iterative (triton-inference-server#6601)

* name changes

* updated names

* Add documentation on generative sequence (triton-inference-server#6595)

* Add documentation on generative sequence

* Address comment

* Reflect the "iterative" change

* Updated description of iterative sequences

* Restricted HTTP API documentation 

Co-authored-by: Ryan McCormick <[email protected]>

* Add request cancellation and debugging guide to generated docs (triton-inference-server#6617)

* Support for http request cancellation. Includes fix for seg fault in generate_stream endpoint.

* Bumped vLLM version to v0.2.2 (triton-inference-server#6623)

* Upgrade ORT version (triton-inference-server#6618)

* Use compliant preprocessor (triton-inference-server#6626)

* Update README.md (triton-inference-server#6627)

* Extend request objects lifetime and fixes possible segmentation fault (triton-inference-server#6620)

* Extend request objects lifetime

* Remove explicit TRITONSERVER_InferenceRequestDelete

* Format fix

* Include the inference_request_ initialization to cover RequestNew

---------

Co-authored-by: Neelay Shah <[email protected]>

* Update protobuf after python update for testing (triton-inference-server#6638)

This fixes the issue where python client has
`AttributeError: 'NoneType' object has no attribute 'enum_types_by_name'
errors after python version is updated.

* Update post-23.11 release (triton-inference-server#6653)

* Update README and versions for 2.40.0 / 23.11 (triton-inference-server#6544)

* Removing path construction to use SymLink alternatives

* Update version for PyTorch

* Update windows Dockerfile configuration

* Update triton version to 23.11

* Update README and versions for 2.40.0 / 23.11

* Fix typo

* Ading 'ldconfig' to configure dynamic linking in container (triton-inference-server#6602)

* Point to tekit_backend (triton-inference-server#6616)

* Point to tekit_backend

* Update version

* Revert tekit changes (triton-inference-server#6640)

---------

Co-authored-by: Kris Hung <[email protected]>

* PYBE Timeout Tests (triton-inference-server#6483)

* New testing to confirm large request timeout values can be passed and retrieved within Python BLS models.

* Add note on lack of ensemble support (triton-inference-server#6648)

* Added request id to span attributes (triton-inference-server#6667)

* Add test for optional internal tensor within an ensemble (triton-inference-server#6663)

* Add test for optional internal tensor within an ensemble

* Fix up

* Set CMake version to 3.27.7 (triton-inference-server#6675)

* Set CMake version to 3.27.7

* Set CMake version to 3.27.7

* Fix double slash typo

* restore typo (triton-inference-server#6680)

* Update 'main' to track development of 2.42.0 / 24.01 (triton-inference-server#6673)

* iGPU build refactor (triton-inference-server#6684) (triton-inference-server#6691)

* Mlflow Plugin Fix (triton-inference-server#6685)

* Mlflow plugin fix

* Fix extra content-type headers in HTTP server (triton-inference-server#6678)

* Fix iGPU CMakeFile tags (triton-inference-server#6695)

* Unify iGPU test build with x86 ARM

* adding TRITON_IGPU_BUILD to core build definition; adding logic to skip caffe2plan test if TRITON_IGPU_BUILD=1

* re-organizing some copies in Dockerfile.QA to fix igpu devel build

* Pre-commit fix

---------

Co-authored-by: kyle <[email protected]>

* adding default value for TRITON_IGPU_BUILD=OFF (triton-inference-server#6705)

* adding default value for TRITON_IGPU_BUILD=OFF

* fix newline

---------

Co-authored-by: kyle <[email protected]>

* Add test case for decoupled model raising exception (triton-inference-server#6686)

* Add test case for decoupled model raising exception

* Remove unused import

* Address comment

* Escape special characters in general docs (triton-inference-server#6697)

* vLLM Benchmarking Test (triton-inference-server#6631)

* vLLM Benchmarking Test

* Allow configuring GRPC max connection age and max connection age grace (triton-inference-server#6639)

* Add ability to configure GRPC max connection age and max connection age grace
* Allow pass GRPC connection age args when they are set from command
----------
Co-authored-by: Katherine Yang <[email protected]>

---------

Signed-off-by: Xiaodong Ye <[email protected]>
Co-authored-by: Olga Andreeva <[email protected]>
Co-authored-by: GuanLuo <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: Tanmay Verma <[email protected]>
Co-authored-by: Kris Hung <[email protected]>
Co-authored-by: Jacky <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: dyastremsky <[email protected]>
Co-authored-by: Katherine Yang <[email protected]>
Co-authored-by: Iman Tabrizian <[email protected]>
Co-authored-by: Gerard Casas Saez <[email protected]>
Co-authored-by: Misha Chornyi <[email protected]>
Co-authored-by: R0CKSTAR <[email protected]>
Co-authored-by: Elias Bermudez <[email protected]>
Co-authored-by: ax-vivien <[email protected]>
Co-authored-by: Neelay Shah <[email protected]>
Co-authored-by: nv-kmcgill53 <[email protected]>
Co-authored-by: Matthew Kotila <[email protected]>
Co-authored-by: Nikhil Kulkarni <[email protected]>
Co-authored-by: Misha Chornyi <[email protected]>
Co-authored-by: Iman Tabrizian <[email protected]>
Co-authored-by: David Yastremsky <[email protected]>
Co-authored-by: Timothy Gerdes <[email protected]>
Co-authored-by: Mate Mijolović <[email protected]>
Co-authored-by: David Zier <[email protected]>
Co-authored-by: Hyunjae Woo <[email protected]>
Co-authored-by: Tanay Varshney <[email protected]>
Co-authored-by: Francesco Petrini <[email protected]>
Co-authored-by: Dmitry Mironov <[email protected]>
Co-authored-by: Ryan McCormick <[email protected]>
Co-authored-by: Sai Kiran Polisetty <[email protected]>
Co-authored-by: oandreeva-nv <[email protected]>
Co-authored-by: kyle <[email protected]>
Co-authored-by: Neal Vaidya <[email protected]>
Co-authored-by: siweili11 <[email protected]>
  • Loading branch information
1 parent ad9d754 commit 7b98b8b
Show file tree
Hide file tree
Showing 883 changed files with 76,704 additions and 32,486 deletions.
4 changes: 3 additions & 1 deletion .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
BasedOnStyle: Google

IndentWidth: 2
ColumnLimit: 80
ContinuationIndentWidth: 4
UseTab: Never
MaxEmptyLinesToKeep: 2
Expand Down Expand Up @@ -34,4 +35,5 @@ BinPackArguments: true
BinPackParameters: true
ConstructorInitializerAllOnOneLineOrOnePerLine: false

IndentCaseLabels: true
IndentCaseLabels: true

84 changes: 84 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

name: "CodeQL"

on:
pull_request:

jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write

strategy:
fail-fast: false
matrix:
language: [ 'python' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support

steps:
- name: Checkout repository
uses: actions/checkout@v3

# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.

# Details on CodeQL's query packs refer to:
# https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
queries: +security-and-quality


# Autobuild attempts to build any compiled languages (C/C++, C#, Go, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v2

# Command-line programs to run using the OS shell.
# See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.

# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
with:
category: "/language:${{matrix.language}}"
39 changes: 39 additions & 0 deletions .github/workflows/pre-commit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

name: pre-commit

on:
pull_request:

jobs:
pre-commit:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
- uses: pre-commit/[email protected]

5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
/build
/builddir
/.vscode
*.so
__pycache__
tmp
*.log
test_results.txt
74 changes: 74 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Copyright 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

repos:
- repo: https://github.com/timothycrosley/isort
rev: 5.12.0
hooks:
- id: isort
additional_dependencies: [toml]
- repo: https://github.com/psf/black
rev: 23.1.0
hooks:
- id: black
types_or: [python, cython]
- repo: https://github.com/PyCQA/flake8
rev: 5.0.4
hooks:
- id: flake8
args: [--max-line-length=88, --select=C,E,F,W,B,B950, --extend-ignore = E203,E501]
types_or: [python, cython]
- repo: https://github.com/pre-commit/mirrors-clang-format
rev: v16.0.5
hooks:
- id: clang-format
types_or: [c, c++, cuda, proto, textproto, java]
args: ["-fallback-style=none", "-style=file", "-i"]
- repo: https://github.com/codespell-project/codespell
rev: v2.2.4
hooks:
- id: codespell
additional_dependencies: [tomli]
args: ["--toml", "pyproject.toml"]
exclude: (?x)^(.*stemmer.*|.*stop_words.*|^CHANGELOG.md$)
# More details about these pre-commit hooks here:
# https://pre-commit.com/hooks.html
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-case-conflict
- id: check-executables-have-shebangs
- id: check-merge-conflict
- id: check-json
- id: check-toml
- id: check-yaml
exclude: ^deploy(\/[^\/]+)*\/templates\/.*$
- id: check-shebang-scripts-are-executable
- id: end-of-file-fixer
types_or: [c, c++, cuda, proto, textproto, java, python]
- id: mixed-line-ending
- id: requirements-txt-fixer
- id: trailing-whitespace
7 changes: 7 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "Triton Inference Server: An Optimized Cloud and Edge Inferencing Solution."
url: https://github.com/triton-inference-server
repository-code: https://github.com/triton-inference-server/server
authors:
- name: "NVIDIA Corporation"
67 changes: 49 additions & 18 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright 2020-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -38,6 +38,7 @@ option(TRITON_ENABLE_TRACING "Include tracing support in server" OFF)
option(TRITON_ENABLE_NVTX "Include NVTX support in server" OFF)
option(TRITON_ENABLE_GPU "Enable GPU support in server" ON)
option(TRITON_ENABLE_MALI_GPU "Enable Arm Mali GPU support in server" OFF)
option(TRITON_IGPU_BUILD "Enable options for iGPU compilation in sever" OFF)
set(TRITON_MIN_COMPUTE_CAPABILITY "6.0" CACHE STRING
"The minimum CUDA compute capability supported by Triton" )
set(TRITON_EXTRA_LIB_PATHS "" CACHE PATH "Extra library paths for Triton Server build")
Expand All @@ -54,6 +55,7 @@ option(TRITON_ENABLE_VERTEX_AI "Include Vertex AI API in server" OFF)
# Metrics
option(TRITON_ENABLE_METRICS "Include metrics support in server" ON)
option(TRITON_ENABLE_METRICS_GPU "Include GPU metrics support in server" ON)
option(TRITON_ENABLE_METRICS_CPU "Include CPU metrics support in server" ON)

# Cloud storage
option(TRITON_ENABLE_GCS "Include GCS Filesystem support in server" OFF)
Expand Down Expand Up @@ -85,6 +87,10 @@ if(TRITON_ENABLE_TRACING AND NOT TRITON_ENABLE_STATS)
message(FATAL_ERROR "TRITON_ENABLE_TRACING=ON requires TRITON_ENABLE_STATS=ON")
endif()

if (TRITON_ENABLE_METRICS_CPU AND NOT TRITON_ENABLE_METRICS)
message(FATAL_ERROR "TRITON_ENABLE_METRICS_CPU=ON requires TRITON_ENABLE_METRICS=ON")
endif()

if (TRITON_ENABLE_METRICS_GPU AND NOT TRITON_ENABLE_METRICS)
message(FATAL_ERROR "TRITON_ENABLE_METRICS_GPU=ON requires TRITON_ENABLE_METRICS=ON")
endif()
Expand Down Expand Up @@ -113,6 +119,19 @@ FetchContent_Declare(
GIT_TAG ${TRITON_THIRD_PARTY_REPO_TAG}
)

# Some libs are installed to ${TRITON_THIRD_PARTY_INSTALL_PREFIX}/{LIB}/lib64 instead
# of ${TRITON_THIRD_PARTY_INSTALL_PREFIX}/{LIB}/lib on Centos
set (LIB_DIR "lib")
# /etc/os-release does not exist on Windows
if(EXISTS "/etc/os-release")
file(STRINGS /etc/os-release DISTRO REGEX "^NAME=")
string(REGEX REPLACE "NAME=\"(.*)\"" "\\1" DISTRO "${DISTRO}")
message(STATUS "Distro Name: ${DISTRO}")
if(DISTRO MATCHES "CentOS.*")
set (LIB_DIR "lib64")
endif()
endif()

set(TRITON_CORE_HEADERS_ONLY OFF)

FetchContent_MakeAvailable(repo-third-party repo-core)
Expand Down Expand Up @@ -152,7 +171,16 @@ endif()
if (WIN32)
set(_FINDPACKAGE_PROTOBUF_CONFIG_DIR "${TRITON_THIRD_PARTY_INSTALL_PREFIX}/protobuf/cmake")
else()
set(_FINDPACKAGE_PROTOBUF_CONFIG_DIR "${TRITON_THIRD_PARTY_INSTALL_PREFIX}/protobuf/lib/cmake/protobuf")
set(_FINDPACKAGE_PROTOBUF_CONFIG_DIR "${TRITON_THIRD_PARTY_INSTALL_PREFIX}/protobuf/${LIB_DIR}/cmake/protobuf")
endif()

# Triton with Opentelemetry is not supported on Windows
# FIXME: add location for Windows, when support is added
# JIRA DLIS-4786
if (WIN32)
set(_FINDPACKAGE_OPENTELEMETRY_CONFIG_DIR "")
else()
set(_FINDPACKAGE_OPENTELEMETRY_CONFIG_DIR "${TRITON_THIRD_PARTY_INSTALL_PREFIX}/opentelemetry-cpp/${LIB_DIR}/cmake/opentelemetry-cpp")
endif()

if (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
Expand All @@ -168,15 +196,15 @@ endif() # TRITON_ENABLE_GCS
if(${TRITON_ENABLE_S3})
set(TRITON_DEPENDS ${TRITON_DEPENDS} aws-sdk-cpp)
endif() # TRITON_ENABLE_S3
if(${TRITON_ENABLE_AZURE_STORAGE})
set(TRITON_DEPENDS ${TRITON_DEPENDS} azure-storage-cpplite)
endif() # TRITON_ENABLE_AZURE_STORAGE
if(${TRITON_ENABLE_HTTP} OR ${TRITON_ENABLE_METRICS} OR ${TRITON_ENABLE_SAGEMAKER} OR ${TRITON_ENABLE_VERTEX_AI})
set(TRITON_DEPENDS ${TRITON_DEPENDS} libevent libevhtp)
endif() # TRITON_ENABLE_HTTP || TRITON_ENABLE_METRICS || TRITON_ENABLE_SAGEMAKER || TRITON_ENABLE_VERTEX_AI
if(${TRITON_ENABLE_GRPC})
set(TRITON_DEPENDS ${TRITON_DEPENDS} grpc)
endif() # TRITON_ENABLE_GRPC
if(NOT WIN32 AND ${TRITON_ENABLE_TRACING})
set(TRITON_DEPENDS ${TRITON_DEPENDS} opentelemetry-cpp)
endif() # TRITON_ENABLE_TRACING

ExternalProject_Add(triton-server
PREFIX triton-server
Expand All @@ -189,21 +217,23 @@ ExternalProject_Add(triton-server
${_CMAKE_ARGS_VCPKG_TARGET_TRIPLET}
-DGTEST_ROOT:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/googletest
-DgRPC_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/grpc/lib/cmake/grpc
-Dc-ares_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/c-ares/lib/cmake/c-ares
-Dabsl_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/absl/lib/cmake/absl
-Dnlohmann_json_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/nlohmann_json/lib/cmake/nlohmann_json
-Dc-ares_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/c-ares/${LIB_DIR}/cmake/c-ares
-Dabsl_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/absl/${LIB_DIR}/cmake/absl
-DCURL_DIR:STRING=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/curl/${LIB_DIR}/cmake/CURL
-Dnlohmann_json_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/nlohmann_json/${LIB_DIR}/cmake/nlohmann_json
-DLibevent_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/libevent/lib/cmake/libevent
-Dlibevhtp_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/libevhtp/lib/cmake/libevhtp
-Dstorage_client_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/google-cloud-cpp/lib/cmake/storage_client
-Dazure-storage-cpplite_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/azure-storage-cpplite
-Dgoogle_cloud_cpp_common_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/google-cloud-cpp/lib/cmake/google_cloud_cpp_common
-DCrc32c_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/crc32c/lib/cmake/Crc32c
-DAWSSDK_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/cmake/AWSSDK
-Daws-cpp-sdk-core_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/cmake/aws-cpp-sdk-core
-Daws-cpp-sdk-s3_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/cmake/aws-cpp-sdk-s3
-Daws-c-event-stream_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/aws-c-event-stream/cmake
-Daws-c-common_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/aws-c-common/cmake
-Daws-checksums_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/lib/aws-checksums/cmake
-Dstorage_client_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/google-cloud-cpp/${LIB_DIR}/cmake/storage_client
-Dgoogle_cloud_cpp_common_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/google-cloud-cpp/${LIB_DIR}/cmake/google_cloud_cpp_common
-DCrc32c_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/crc32c/${LIB_DIR}/cmake/Crc32c
-DAWSSDK_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/cmake/AWSSDK
-Daws-cpp-sdk-core_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/cmake/aws-cpp-sdk-core
-Daws-cpp-sdk-s3_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/cmake/aws-cpp-sdk-s3
-Daws-c-event-stream_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/aws-c-event-stream/cmake
-Daws-c-common_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/aws-c-common/cmake
-Daws-checksums_DIR:PATH=${TRITON_THIRD_PARTY_INSTALL_PREFIX}/aws-sdk-cpp/${LIB_DIR}/aws-checksums/cmake
-Dopentelemetry-cpp_DIR:PATH=${_FINDPACKAGE_OPENTELEMETRY_CONFIG_DIR}
-DTRITON_IGPU_BUILD:BOOL=${TRITON_IGPU_BUILD}
-DTRITON_THIRD_PARTY_REPO_TAG:STRING=${TRITON_THIRD_PARTY_REPO_TAG}
-DTRITON_COMMON_REPO_TAG:STRING=${TRITON_COMMON_REPO_TAG}
-DTRITON_CORE_REPO_TAG:STRING=${TRITON_CORE_REPO_TAG}
Expand All @@ -223,6 +253,7 @@ ExternalProject_Add(triton-server
-DTRITON_MIN_COMPUTE_CAPABILITY:STRING=${TRITON_MIN_COMPUTE_CAPABILITY}
-DTRITON_ENABLE_METRICS:BOOL=${TRITON_ENABLE_METRICS}
-DTRITON_ENABLE_METRICS_GPU:BOOL=${TRITON_ENABLE_METRICS_GPU}
-DTRITON_ENABLE_METRICS_CPU:BOOL=${TRITON_ENABLE_METRICS_CPU}
-DTRITON_ENABLE_GCS:BOOL=${TRITON_ENABLE_GCS}
-DTRITON_ENABLE_AZURE_STORAGE:BOOL=${TRITON_ENABLE_AZURE_STORAGE}
-DTRITON_ENABLE_S3:BOOL=${TRITON_ENABLE_S3}
Expand Down
Loading

0 comments on commit 7b98b8b

Please sign in to comment.