- SysML: The New Frontier of Machine Learning Systems
- Data Validation for Machine Learning
- Coordination-aware assurance for end-to-end machine learning systems: the R3E approach
- https://ai.googleblog.com/2019/12/improving-out-of-distribution-detection.html
- https://arxiv.org/pdf/1904.07204.pdf
- https://arxiv.org/pdf/1910.01500.pdf
- https://mlperf.org/training-overview
- https://aimatrix.ai/en-us/
- https://www.microsoft.com/en-us/research/blog/reliability-in-reinforcement-learning/
- https://dzone.com/articles/qa-how-reliable-are-your-machine-learning-systems
- https://dl.acm.org/doi/10.1145/3352020.3352024
- Towards Observability Data Management at Scale
- https://www.sysml.cc/doc/2019/199.pdf
- https://monitorml.com/index.html
- https://dl.acm.org/doi/pdf/10.5555/1251203.1251209
- https://github.com/rdsea/bigdataincidentanalytics/tree/reasoning
- https://www.alibabacloud.com/blog/using-alibaba-cloud-tsdb-in-big-data-cluster-monitoring-scenarios_595164
- https://chromium.googlesource.com/external/github.com/tensorflow/tensorflow/+/r0.10/tensorflow/g3doc/tutorials/monitors/index.md
- https://mlsys.org/Conferences/2019/doc/2019/167.pdf
- https://github.com/tensorflow/data-validation
- https://towardsdatascience.com/hands-on-tensorflow-data-validation-61e552f123d7
- https://databricks.com/session/apache-spark-data-validation
- https://papers.nips.cc/paper/7947-a-simple-unified-framework-for-detecting-out-of-distribution-samples-and-adversarial-attacks.pdf
- https://cloud.google.com/blog/products/gcp/improving-data-quality-for-machine-learning-and-analytics-with-cloud-dataprep
- https://dl.acm.org/doi/fullHtml/10.1145/3332301
- https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines
- https://shivaram.org/publications/keystoneml-icde17.pdf
- Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis
- Jeff Smith. 2018. Machine Learning Systems: Designs that scale (1st. ed.). Manning Publications Co., USA.https://www.manning.com/books/machine-learning-systems
- Prediction-Serving Systems, https://queue.acm.org/detail.cfm?id=3210557
- Ryan Chard, Logan Ward, Zhuozhao Li, Yadu Babuji, Anna Woodard, Steven Tuecke, Kyle Chard, Ben Blaiszik, and Ian Foster. 2019. Publishing and Serving Machine Learning Models with DLHub. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning) (PEARC ’19). Association for Computing Machinery, New York, NY, USA, Article 73, 1–7. DOI:https://doi.org/10.1145/3332186.3332246
- https://cloud.google.com/ml-engine/docs/custom-prediction-routines
- https://predictionio.apache.org/
- https://github.com/EthicalML/awesome-production-machine-learning#model-deployment-and-orchestration-frameworks
- https://www.usenix.org/system/files/conference/hotedge18/hotedge18-papers-talagala.pdf
- https://arxiv.org/pdf/1706.08420.pdf
- https://arxiv.org/pdf/1907.08349.pdf
- https://arxiv.org/pdf/1908.00080.pdf
- http://proceedings.mlr.press/v70/kumar17a.html
- https://www.ericsson.com/en/blog/2019/12/tinyml-as-a-service
- https://static.sched.com/hosted_files/osseu19/f9/elc2019-tinymlaas.pdf
- https://heartbeat.fritz.ai/how-to-fit-large-neural-networks-on-the-edge-eb621cdbb33
- https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning
- https://research.google/pubs/pub43146/