Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,706 public repositories matching this topic...
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
Updated
Mar 20, 2024 - Python
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
-
Updated
Dec 24, 2024 - Python
Learn and understand Docker&Container technologies, with real DevOps practice!
-
Updated
Nov 23, 2024 - Go
List of Data Science Cheatsheets to rule the world
-
Updated
Jul 18, 2024
flink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
-
Updated
May 25, 2024 - Java
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
-
Updated
Dec 12, 2024 - Python
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
-
Updated
Dec 24, 2024 - Java
🧙 Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
Dec 23, 2024 - Python
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
Dec 21, 2024 - Scala
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
-
Updated
Dec 24, 2024 - Jupyter Notebook
Alluxio, data orchestration for analytics and machine learning in the cloud
-
Updated
Nov 27, 2024 - Java
A Flexible and Powerful Parameter Server for large-scale machine learning
-
Updated
Jan 16, 2024 - Java
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 425 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia