diff --git a/docs/ai/llm/readme.md b/docs/ai/llm/readme.md index 53aeeb4bf6f..be07e0a2695 100644 --- a/docs/ai/llm/readme.md +++ b/docs/ai/llm/readme.md @@ -44,6 +44,7 @@ Moving from information to knowledge age - [AI Voice Generator: Versatile Text to Speech Software | Murf AI](https://murf.ai/) - [Tammy AI](https://tammy.ai/) - captions (Android App) +- [MeetGeek | Record, Transcribe & Share Meeting Notes](https://meetgeek.ai/) #### [Midjourney](https://www.midjourney.com/) diff --git a/docs/databases/data-warehousing/11-dw-databases.md b/docs/databases/data-warehousing/11-dw-databases.md index 2e05a11cba7..0dc93d94e0d 100755 --- a/docs/databases/data-warehousing/11-dw-databases.md +++ b/docs/databases/data-warehousing/11-dw-databases.md @@ -1,8 +1,8 @@ # DW - Databases 1. SnowFlake -2. AWS Redshift -3. Snowflake +2. Clickhouse +3. AWS Redshift 4. AWS Athena 5. Google BigQuery 6. Elastic @@ -13,7 +13,9 @@ 11. FireBolt 12. [Databricks](technologies/apache/databricks/readme.md) -## A new class of cloud data warehouses built for AWS +## Firebolt + +A new class of cloud data warehouses built for AWS Firebolt has completely redesigned the cloud data warehouse to deliver a super fast, incredibly efficient analytics experience at any scale diff --git a/docs/databases/nosql-databases/clickhouse.md b/docs/databases/nosql-databases/clickhouse.md new file mode 100644 index 00000000000..64416f8be4b --- /dev/null +++ b/docs/databases/nosql-databases/clickhouse.md @@ -0,0 +1,143 @@ +# ClickHouse + +ClickHouse is an open source column-oriented database management system capable of realtime generation of analytical data reports using SQL queries. + +### Key Features + +- True column-oriented storage +- Vectorized query execution +- Data compression +- Parallel and distributed query execution +- Real time query processing +- Real time data ingestion +- On-disk locality of reference +- Cross-datacenter replication +- High availability +- SQL support +- Local and distributed joins +- Pluggable external dimension tables +- Arrays and nested data types +- Approximate query processing +- Probabilistic data structures +- Full support of IPv6 +- Features for web analytics +- State-of-the-art algorithms +- Detailed documentation - Clean documented code + +#### History  + +ClickHouse is developed by a Russian company called Yandex. It is designed for multiple projects within Yandex. Yandex needed a DBMS to analyze large amounts of data, thus they began to develop their own column-oriented DBMS. The prototype of ClickHouse appeared in 2009 and it was released to open-source in 2016. + +#### Compression  + +[Dictionary Encoding](https://dbdb.io/browse?compression=dictionary-encoding) [Delta Encoding](https://dbdb.io/browse?compression=delta-encoding) [Naïve (Page-Level)](https://dbdb.io/browse?compression=naive-page-level) + +In addition to general-purpose encoding with LZ4 (default) or Zstd, ClickHouse supports dictionary encoding via LowCardinality data type, as well as delta, double-delta and Gorilla encodings via column codecs. + +#### Concurrency Control + +[Not Supported](https://dbdb.io/browse?concurrency-control=not-supported) + +ClickHouse does not support multi-statement transactions. + +#### Data Model  + +[Relational](https://dbdb.io/browse?data-model=relational) + +ClickHouse uses the relational database model. + +#### Foreign Keys + +[Not Supported](https://dbdb.io/browse?foreign-keys=not-supported) + +ClickHouse does not support foreign keys. + +#### Indexes  + +[Log-Structured Merge Tree](https://dbdb.io/browse?indexes=log-structured-merge-tree) + +ClickHouse supports primary key indexes. The indexing mechanism is called a sparse index. In the MergeTree, data are sorted by primary key lexicographically in each part. Then ClickHouse selects some marks for every Nth row, where N is chosen adaptively by default. Together these marks serve as a sparse index, which allows efficient range queries. + +#### Joins  + +[Hash Join](https://dbdb.io/browse?joins=hash-join) + +ClickHouse uses hash join by default, which is done by placing the right part of data in a hash table in memory. If there's not enough memory for hash join it falls back to merge join. + +#### Logging  + +[Physical Logging](https://dbdb.io/browse?logging=physical-logging) + +ClickHouse replicates its data on multiple nodes and monitors data synchronicity on replicas. It recovers after failures by syncing data from other replica nodes. + +#### Parallel Execution  + +[Intra-Operator (Horizontal)](https://dbdb.io/browse?parallel-execution=intra-operator) [Inter-Operator (Vertical)](https://dbdb.io/browse?parallel-execution=inter-operator) + +ClickHouse utilizes half cores for single-node queries and one replica of each shard for distributed queries by default. It could be tuned to utilize only one core, all cores of the whole cluster or anything in between. + +#### Query Compilation  + +[Code Generation](https://dbdb.io/browse?query-compilation=code-generation) + +ClickHouse supports runtime code generation. The code is generated for every kind of query on the fly, removing all indirection and dynamic dispatch. Runtime code generation can be better when it fuses many operations together and fully utilizes CPU execution units. + +#### Query Execution  + +[Vectorized Model](https://dbdb.io/browse?query-execution=vectorized-model) + +#### Query Interface  + +[Custom API](https://dbdb.io/browse?query-interface=custom-api) [SQL](https://dbdb.io/browse?query-interface=sql) [HTTP / REST](https://dbdb.io/browse?query-interface=http-rest) [Command-line / Shell](https://dbdb.io/browse?query-interface=command-line-shell) + +ClickHouses provides two types of parsers: a full SQL parser and a data format parser. It uses SQL parser for all types of queries and the data format parser only for INSERT queries. Beyond the query language, it provides multiple user interfaces, including HTTP interface, JDBC driver, TCP interface, command-line client, etc. + +#### Storage Architecture + +[Disk-oriented](https://dbdb.io/browse?storage-architecture=disk-oriented) [In-Memory](https://dbdb.io/browse?storage-architecture=in-memory) [Hybrid](https://dbdb.io/browse?storage-architecture=hybrid) + +ClickHouse has multiple types of table engines. The type of the table engine determines where the data is stored, concurrent level, whether indexes are supported and some other properties. Key table engine family for production use is a [MergeTree](https://clickhouse.tech/docs/en/engines/table_engines/mergetree_family/mergetree/) that allows for resilient storage of large volumes of data and supports replication. There's also a [Log family](https://clickhouse.tech/docs/en/engines/table_engines/log_family/log_family/) for lightweight storage of temporary data and [Distributed engine](https://clickhouse.tech/docs/en/engines/table_engines/special/distributed/) for querying a cluster. + +#### Storage Model  + +[Decomposition Storage Model (Columnar)](https://dbdb.io/browse?storage-model=decomposition-storage-model-columnar) + +ClickHouse is a column-oriented DBMS and it stores data by columns. + +#### Storage Organization + +[Indexed Sequential Access Method (ISAM)](https://dbdb.io/browse?storage-organization=indexed-sequential-access-method-isam) [Sorted Files](https://dbdb.io/browse?storage-organization=sorted-files) + +#### Stored Procedures  + +[Not Supported](https://dbdb.io/browse?stored-procedures=not-supported) + +Currently, stored procedures and UDF are listed as open issues in ClickHouse. + +#### System Architecture  + +[Shared-Nothing](https://dbdb.io/browse?system-architecture=shared-nothing) + +ClickHouse system in a distributed setup is a cluster of shards. It uses asynchronous multimaster replication and there is no single point of contention across the system. + +#### Views  + +[Virtual Views](https://dbdb.io/browse?views=virtual-views) [Materialized Views](https://dbdb.io/browse?views=materialized-views) + +ClickHouse supports both virtual views and materialized views. The materialized views store data transformed by corresponding SELECT query. The SELECT query can contain DISTINCT, GROUP BY, ORDER BY, LIMIT, etc. + +## Internals + +[Modern SQL in 2023 - ClickHouse - YouTube](https://www.youtube.com/watch?v=zhrOYQpgvkk) + +## Links + +[Fast Open-Source OLAP DBMS - ClickHouse](https://clickhouse.com/) + +[GitHub - ClickHouse/ClickHouse: ClickHouse® is a free analytics DBMS for big data](https://github.com/ClickHouse/ClickHouse) + +[ClickHouse - YouTube](https://www.youtube.com/c/ClickHouseDB) + +[What Is ClickHouse? | ClickHouse Docs](https://clickhouse.com/docs/en/intro) + +Used by - Zerodha diff --git a/docs/databases/nosql-databases/comparisions.md b/docs/databases/nosql-databases/comparisions.md new file mode 100644 index 00000000000..e8280fcde93 --- /dev/null +++ b/docs/databases/nosql-databases/comparisions.md @@ -0,0 +1,14 @@ +# Comparisions + +### Clickhouse vs Snowflake + +ClickHouse is designed for real-time data analytics and exploration at scale. Snowflake is a cloud data warehouse that is well-optimized for executing long-running reports and ad-hoc data analysis. When it comes to real-time analytics, ClickHouse shines with faster queries at a fraction of the cost. + +- Cost: ClickHouse is cost-effective. ClickHouse Cloud is 3-5x more cost-effective than Snowflake. +- Performance: ClickHouse has faster queries. ClickHouse Cloud querying speeds are over 2x faster than Snowflake. +- Data compression: ClickHouse Cloud results in 38% better data compression than Snowflake. +- Architecture: ClickHouse uses Shared-Nothing Architecture by default, but also supports Shared-Disk Architecture. +- Querying: ClickHouse uses SQL for querying, with support for SQL joins. +- Integration: ClickHouse integrates with some common tools for visual analytics, including Superset, Grafana and Tableau. + +[ClickHouse vs Snowflake](https://clickhouse.com/comparison/snowflake) diff --git a/docs/databases/nosql-databases/druid/architecture.md b/docs/databases/nosql-databases/druid/architecture.md index 6312fcca305..1e4d1af6e7c 100755 --- a/docs/databases/nosql-databases/druid/architecture.md +++ b/docs/databases/nosql-databases/druid/architecture.md @@ -1,45 +1,75 @@ # Architecture Druid has a multi-process, distributed architecture that is designed to be cloud-friendly and easy to operate. Each Druid process type can be configured and scaled independently, giving you maximum flexibility over your cluster. This design also provides enhanced fault tolerance: an outage of one component will not immediately affect other components. + Druid's process types are: -- [**Historical**](http://druid.io/docs/latest/design/historical.html) processes are the workhorses that handle storage and querying on "historical" data (including any streaming data that has been in the system long enough to be committed). Historical processes download segments from deep storage and respond to queries about these segments. They don't accept writes. - - Each Historical process serves up data that has been partitioned into segments. These segments are assigned to Historical by the Coordinator via ZooKeeper - - When a Historical process is assigned a segment, it will copy the file from deep storage to its local storage - - When a query is received from the Broker process, the Historical process returns the results- [**MiddleManager**](http://druid.io/docs/latest/design/middlemanager.html) processes handle ingestion of new data into the cluster. They are responsible for reading from external data sources and publishing new Druid segments. - - The MiddleManager process is a worker process that executes submitted tasks. Middle Managers forward tasks to Peons that run in separate JVMs. The reason we have separate JVMs for tasks is for resource and log isolation. Each [Peon](https://druid.apache.org/docs/latest/design/peons.html) is capable of running only one task at a time, however, a MiddleManager may have multiple Peons. - - During real-time ingestion, the MiddleManager also serves queries on real-time data before it has been pushed to deep storage. - - When a query is received from the Broker process, the MiddleManager process executes that query on real-time data and returns results.- [**Broker**](http://druid.io/docs/latest/design/broker.html) processes receive queries from external clients and forward those queries to Historicals and MiddleManagers. When Brokers receive results from those subqueries, they merge those results and return them to the caller. End users typically query Brokers rather than querying Historicals or MiddleManagers directly. - - Broker process is responsible for knowing the internal state of the cluster (from the ZooKeeper) - - The broker finds out information from ZooKeeper about the Druid cluster - - Which Historical processes are serving which segments - - Which MiddleManager processes are serving which tasks' data - - When a query is run, the Broker will figure out which process to contact- [**Coordinator**](http://druid.io/docs/latest/design/coordinator.html) processes watch over the Historical processes. They are responsible for assigning segments to specific servers, and for ensuring segments are well-balanced across Historicals. - - Segment management and distribution - - It communicates with the Historical nodes to: - - **Load -** Copy a segment from deep storage and start serving it - - **Drop -** Delete a segment from its local copy and stop serving it- [**Overlord**](http://druid.io/docs/latest/design/overlord.html) processes watch over the MiddleManager processes and are the controllers of data ingestion into Druid. They are responsible for assigning ingestion tasks to MiddleManagers and for coordinating segment publishing. - - Accepting ingestion supervisors and tasks - - Coordinating which servers run which tasks - - Managing locks so tasks don't conflict with each other - - Returning supervisor and task status to callers- [**Router**](http://druid.io/docs/latest/development/router.html) processes areoptionalprocesses that provide a unified API gateway in front of Druid Brokers, Overlords, and Coordinators. They are optional since you can also simply contact the Druid Brokers, Overlords, and Coordinators directly. +### [Historical](http://druid.io/docs/latest/design/historical.html) + +Historical processes are the workhorses that handle storage and querying on "historical" data (including any streaming data that has been in the system long enough to be committed). Historical processes download segments from deep storage and respond to queries about these segments. They don't accept writes. + +- Each Historical process serves up data that has been partitioned into segments. These segments are assigned to Historical by the Coordinator via ZooKeeper +- When a Historical process is assigned a segment, it will copy the file from deep storage to its local storage +- When a query is received from the Broker process, the Historical process returns the results + +### [MiddleManager](http://druid.io/docs/latest/design/middlemanager.html) + +MiddleManager processes handle ingestion of new data into the cluster. They are responsible for reading from external data sources and publishing new Druid segments. + +- The MiddleManager process is a worker process that executes submitted tasks. Middle Managers forward tasks to Peons that run in separate JVMs. The reason we have separate JVMs for tasks is for resource and log isolation. Each [Peon](https://druid.apache.org/docs/latest/design/peons.html) is capable of running only one task at a time, however, a MiddleManager may have multiple Peons. +- During real-time ingestion, the MiddleManager also serves queries on real-time data before it has been pushed to deep storage. +- When a query is received from the Broker process, the MiddleManager process executes that query on real-time data and returns results. + +### [Broker](http://druid.io/docs/latest/design/broker.html) + +Broker processes receive queries from external clients and forward those queries to Historicals and MiddleManagers. When Brokers receive results from those subqueries, they merge those results and return them to the caller. End users typically query Brokers rather than querying Historicals or MiddleManagers directly. + +- Broker process is responsible for knowing the internal state of the cluster (from the ZooKeeper) +- The broker finds out information from ZooKeeper about the Druid cluster + - Which Historical processes are serving which segments + - Which MiddleManager processes are serving which tasks' data + - When a query is run, the Broker will figure out which process to contact + +### [Coordinator](http://druid.io/docs/latest/design/coordinator.html) + +Coordinator processes watch over the Historical processes. They are responsible for assigning segments to specific servers, and for ensuring segments are well-balanced across Historicals. + +- Segment management and distribution +- It communicates with the Historical nodes to: + - **Load -** Copy a segment from deep storage and start serving it + - **Drop -** Delete a segment from its local copy and stop serving it + +### [Overlord](http://druid.io/docs/latest/design/overlord.html) + +Overlord processes watch over the MiddleManager processes and are the controllers of data ingestion into Druid. They are responsible for assigning ingestion tasks to MiddleManagers and for coordinating segment publishing. + +- Accepting ingestion supervisors and tasks +- Coordinating which servers run which tasks +- Managing locks so tasks don't conflict with each other +- Returning supervisor and task status to callers + +### [Router](http://druid.io/docs/latest/development/router.html) + +Router processes are optional processes that provide a unified API gateway in front of Druid Brokers, Overlords, and Coordinators. They are optional since you can also simply contact the Druid Brokers, Overlords, and Coordinators directly. ![image](../../../media/Druid_Architecture-image1.jpg) + Druid processes can be deployed individually (one per physical server, virtual server, or container) or can be colocated on shared servers. One common colocation plan is a three-type plan: 1. **"Data"** servers run Historical and MiddleManager processes. - 2. **"Query"** servers run Broker and (optionally) Router processes. - 3. **"Master"** servers run Coordinator and Overlord processes. They may run ZooKeeper as well. + In addition to these process types, Druid also has three external dependencies. These are intended to be able to leverage existing infrastructure, where present. - **[Deep storage](http://druid.io/docs/latest/design/index.html#deep-storage),** shared file storage accessible by every Druid server. This is typically going to be a distributed object store like S3 or HDFS, cassandra, Google Cloud Storage or a network mounted filesystem. Druid uses this to store any data that has been ingested into the system. - [**Metadata store**](http://druid.io/docs/latest/design/index.html#metadata-storage), shared metadata storage. This is typically going to be a traditional RDBMS like PostgreSQL or MySQL. - [**ZooKeeper**](http://druid.io/docs/latest/design/index.html#zookeeper) is used for internal service discovery, coordination, and leader election. + The idea behind this architecture is to make a Druid cluster simple to operate in production at scale. For example, the separation of deep storage and the metadata store from the rest of the cluster means that Druid processes are radically fault tolerant: even if every single Druid server fails, you can still relaunch your cluster from data stored in deep storage and the metadata store. + The following diagram shows how queries and data flow through this architecture: ![image](../../../media/Druid_Architecture-image2.jpg) diff --git a/docs/databases/nosql-databases/druid/documentation.md b/docs/databases/nosql-databases/druid/documentation.md index 46f06d27762..71998bbfc65 100755 --- a/docs/databases/nosql-databases/druid/documentation.md +++ b/docs/databases/nosql-databases/druid/documentation.md @@ -1,6 +1,6 @@ # Documentation -Getting Started +### Getting Started - [Design](http://druid.io/docs/latest/design/index.html) - [What is Druid?](http://druid.io/docs/latest/design/index.html#what-is-druid) @@ -22,9 +22,10 @@ Getting Started - [Tutorial: Compacting segments](http://druid.io/docs/latest/tutorials/tutorial-compaction.html) - [Tutorial: Deleting data](http://druid.io/docs/latest/tutorials/tutorial-delete-data.html) - [Tutorial: Writing your own ingestion specs](http://druid.io/docs/latest/tutorials/tutorial-ingestion-spec.html) - - [Tutorial: Transforming input data](http://druid.io/docs/latest/tutorials/tutorial-transform-spec.html)- [Clustering](http://druid.io/docs/latest/tutorials/cluster.html) + - [Tutorial: Transforming input data](http://druid.io/docs/latest/tutorials/tutorial-transform-spec.html) +- [Clustering](http://druid.io/docs/latest/tutorials/cluster.html) -Data Ingestion +### Data Ingestion - [Ingestion overview](http://druid.io/docs/latest/ingestion/index.html) - [Data Formats](http://druid.io/docs/latest/ingestion/data-formats.html) @@ -48,7 +49,7 @@ Data Ingestion - [FAQ](http://druid.io/docs/latest/ingestion/faq.html) - [Misc. Tasks](http://druid.io/docs/latest/ingestion/misc-tasks.html) -Querying +### Querying - [Overview](http://druid.io/docs/latest/querying/querying.html) - [Timeseries](http://druid.io/docs/latest/querying/timeseriesquery.html) @@ -59,7 +60,8 @@ Querying - [DataSource Metadata](http://druid.io/docs/latest/querying/datasourcemetadataquery.html) - [Search](http://druid.io/docs/latest/querying/searchquery.html) - [Select](http://druid.io/docs/latest/querying/select-query.html) -- [Scan](http://druid.io/docs/latest/querying/scan-query.html)- Components +- [Scan](http://druid.io/docs/latest/querying/scan-query.html) +- Components - [Datasources](http://druid.io/docs/latest/querying/datasource.html) - [Filters](http://druid.io/docs/latest/querying/filters.html) - [Aggregations](http://druid.io/docs/latest/querying/aggregations.html) @@ -75,9 +77,10 @@ Querying - [Sorting Orders](http://druid.io/docs/latest/querying/sorting-orders.html) - [Virtual Columns](http://druid.io/docs/latest/querying/virtual-columns.html) -Design +### Design -- [Overview](http://druid.io/docs/latest/design/index.html)- Storage +- [Overview](http://druid.io/docs/latest/design/index.html) +- Storage - [Segments](http://druid.io/docs/latest/design/segments.html) - Node Types - [Historical](http://druid.io/docs/latest/design/historical.html) @@ -93,7 +96,7 @@ Design - [Metadata Storage](http://druid.io/docs/latest/dependencies/metadata-storage.html) - [ZooKeeper](http://druid.io/docs/latest/dependencies/zookeeper.html) -Operations +### Operations - [API Reference](http://druid.io/docs/latest/operations/api-reference.html) - [Coordinator](http://druid.io/docs/latest/operations/api-reference.html#coordinator) @@ -116,7 +119,7 @@ Operations - [TLS Support](http://druid.io/docs/latest/operations/tls-support.html) - [Password Provider](http://druid.io/docs/latest/operations/password-provider.html) -Configuration +### Configuration - [Configuration Reference](http://druid.io/docs/latest/configuration/index.html) - [Recommended Configuration File Organization](http://druid.io/docs/latest/configuration/index.html#recommended-configuration-file-organization) @@ -131,7 +134,7 @@ Configuration - [General Query Configuration](http://druid.io/docs/latest/configuration/index.html#general-query-configuration) - [Configuring Logging](http://druid.io/docs/latest/configuration/logging.html) -Development +### Development - [Overview](http://druid.io/docs/latest/development/overview.html) - [Libraries](http://druid.io/docs/latest/development/libraries.html) @@ -139,7 +142,8 @@ Development - [JavaScript](http://druid.io/docs/latest/development/javascript.html) - [Build From Source](http://druid.io/docs/latest/development/build.html) - [Versioning](http://druid.io/docs/latest/development/versioning.html) -- [Integration](http://druid.io/docs/latest/development/integrating-druid-with-other-technologies.html)- Experimental Features +- [Integration](http://druid.io/docs/latest/development/integrating-druid-with-other-technologies.html) +- Experimental Features - [Overview](http://druid.io/docs/latest/development/experimental.html) - [Approximate Histograms and Quantiles](http://druid.io/docs/latest/development/extensions-core/approximate-histograms.html) - [Datasketches](http://druid.io/docs/latest/development/extensions-core/datasketches-extension.html) @@ -147,7 +151,7 @@ Development - [Router](http://druid.io/docs/latest/development/router.html) - [Kafka Indexing Service](http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html) -Misc +### Misc - [Druid Expressions Language](http://druid.io/docs/latest/misc/math-expr.html) - [Papers & Talks](http://druid.io/docs/latest/misc/papers-and-talks.html) diff --git a/docs/databases/nosql-databases/druid/intro.md b/docs/databases/nosql-databases/druid/intro.md index 20cecdd0b24..80cd676a00c 100755 --- a/docs/databases/nosql-databases/druid/intro.md +++ b/docs/databases/nosql-databases/druid/intro.md @@ -1,7 +1,8 @@ # Druid -Apache Druid (incubating) is a real-time analytics database designed for fast slice-and-dice analytics ("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) on large data sets. Druid is most often used as a database for powering use cases where real-time ingest, fast query performance, and high uptime are important. As such, Druid is commonly used for powering GUIs of analytical applications, or as a backend for highly-concurrent APIs that need fast aggregations. Druid works best with event-oriented data.- **High performance, column oriented, distributed data store** +Apache Druid (incubating) is a real-time analytics database designed for fast slice-and-dice analytics ("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)" queries) on large data sets. Druid is most often used as a database for powering use cases where real-time ingest, fast query performance, and high uptime are important. As such, Druid is commonly used for powering GUIs of analytical applications, or as a backend for highly-concurrent APIs that need fast aggregations. Druid works best with event-oriented data. +- High performance, column oriented, distributed data store - Druid is primarily used to store, query, and analyze large event streams - Druid is optimized for sub-second queries to slice-and-dice, drill down, search, filter, and aggregate this data. Druid is commonly used to power interactive applications where performance, concurrency, and uptime are important. @@ -43,15 +44,20 @@ Once Druid has ingested your data, a copy is stored safely in [deep storage](htt ## Druid Important Points Druid does not natively support nested data, so, we need to flatten arrays in our JSON events by providing a [flattenspec](https://druid.apache.org/docs/latest/ingestion/index.html#flattenspec), or by doing some preprocessing before the event lands in it. + Druid assigns types to columns - string, long, float, complex, etc. The type enforcement at the column level can be restrictive if the incoming data presents with mixed types for a particular field/fields. Each column except the timestamp can be of type dimension or metric. + One can filter and group by on dimension columns, but not on metric columns. This needs some forethought when picking which columns to pre-aggregate and which ones will be used for slice-and-dice analyses. + Partition keys must be picked carefully for load-balancing and scaling up. Streaming new updates to the table after creation requires using one of the [supported ways of ingesting](https://druid.apache.org/docs/latest/ingestion/index.html#streaming) - Kafka, Kinesis, or Tranquility. + Druid works well for event analytics in environments where data is somewhat predictable and rollups and pre-aggregations can be defined a priori. It involves some maintenance and tuning overhead in terms of engineering, but for event analytics that doesn't involve complex joins, it can serve queries with low latency and scale up as required. + Druid is fundamentally a column store, and is designed for analytical queries (GROUPBYs with complex WHERE clauses) that need to scan across multiple partitions. Druid stores its index in [segment files](http://druid.io/docs/latest/design/segments.html), which are partitioned by time. Segment files are columnar, with the data for each column laid out in **separate data structures**. By storing each column separately, Druid can decrease query latency by scanning only those columns that are required for a query. There are different column types for different data types (string, numbers, etc.). Different columns can have different encoding and compression algorithms applied. For example, string columns will be dictionary encoded, LZF compressed, and have search indexes created for faster filtering. Numeric columns will have completely different compression and encoding algorithms applied. Druid segments are immutable once finalized, so updates in Druid have limitations. Although more recent versions of Druid have added "lookups", or the ability to join a mutable table external to Druid with an immutable one in Druid, I would not recommend Druid for any workflows where the same underlying data is frequently updated and those updates need to complete in less than a second (say, powering a social media profile page). Druid supports bulk updates, which are more commonly seen with analytic workloads. ## Druid File Format -## Segments +### Segments - Datasources are broken into files called segments - Segments are grouped into time chunks and potentially, further partitioned at creation time @@ -67,7 +73,6 @@ Druid is fundamentally a column store, and is designed for analytical queries (G - Columnar formats are the standard for analytical workloads due to superior scan performance - String columns in Druid are indexed with bitmap indexes - Columns are further compressed with general-purpose algorithms like LZ4 (Lossless compression) -- Example ![image](../../../media/Druid-image1.jpg) @@ -75,7 +80,8 @@ Druid is fundamentally a column store, and is designed for analytical queries (G - Filtering if a domain exists require reading less data - Compression of like data performs better than a row-oriented format - Druid only needs to read the columns involved in a query, eliminating extraneous fetches from disk and memory -- ![image](../../../media/Druid-image2.jpg) + +![image](../../../media/Druid-image2.jpg) ## Druid Data Modelling @@ -89,7 +95,7 @@ Druid is fundamentally a column store, and is designed for analytical queries (G - When designing a table, you must - Choose the datasource name - The source for the input data - - The columns you want to store- **Columns** + - The columns you want to store - Creating a data model is more than just copying the table structure of a RDBMS table - When transitioning from a RDBMS, you will need to choose - Which columns should be included @@ -98,34 +104,37 @@ Druid is fundamentally a column store, and is designed for analytical queries (G - In Druid, columns will become - Dimensions (stored as-is) - Metrics (partially aggregated) -- **Supported Types** + +### Supported Types ![image](../../../media/Druid-image3.jpg) -- **Multi-value and Nested Dimensions** +#### Multi-value and Nested Dimensions ![image](../../../media/Druid-image4.jpg) -- **Druid Native Batch** - - Native batch is a built-in way to index data in Druid - - Native batch ingestion is useful when you have a file that you want to load into Druid - - There has to be a method to convert your file to Druid's file format and make segments (Native batch performs this conversion) - - Druid also supports batching ingestion with Hadoop - - **Native Batch Architecture** +#### Druid Native Batch + +- Native batch is a built-in way to index data in Druid +- Native batch ingestion is useful when you have a file that you want to load into Druid +- There has to be a method to convert your file to Druid's file format and make segments (Native batch performs this conversion) +- Druid also supports batching ingestion with Hadoop +- **Native Batch Architecture** ![image](../../../media/Druid-image5.jpg) -- **Druid SQL Query API** - - Data stored in Druid can be queried with a SQL API - - These calls go over a HTTP REST API - - The payload of the REST call is JSON - - The query results come back as JSON or CSV - - Not every SQL feature is supported - - The query engine is implemented with Apache Calcite - - The SQL queries are translated into Druid's native queries - - There is a native JSON query API available too - - The SQL API is virtually at parity with the JSON and is easier to use - - Where clauses +### Druid SQL Query API + +- Data stored in Druid can be queried with a SQL API +- These calls go over a HTTP REST API + - The payload of the REST call is JSON + - The query results come back as JSON or CSV +- Not every SQL feature is supported +- The query engine is implemented with Apache Calcite +- The SQL queries are translated into Druid's native queries +- There is a native JSON query API available too + - The SQL API is virtually at parity with the JSON and is easier to use +- Where clauses ![image](../../../media/Druid-image6.jpg) @@ -193,18 +202,12 @@ Druid is fundamentally a column store, and is designed for analytical queries (G - Duplicate data removal using HyperLogLog- - Druid not build for historical storage - MiddleManager creates segments, and then put it in deep storage, Historical pulls data from deep storage, create segment cache and serves it. Local Cache is there until the retention policy. -- Middle manager will skip the data if it's not in the right format. There's no way - -to know which data was skipped. +- Middle manager will skip the data if it's not in the right format. There's no way to know which data was skipped. ## Ingestion spec tuning 1. Task duration - 60 minutes (PT60M) (current - 5 minutes) - 2. Task completion time - 60 minutes (current - 5 minutes) - 3. Segment size - 5 Million - 4. Segment granularity - 1 day/1hour (current - 1 hour) - 5. Handoff period - currently too large (tune it) diff --git a/docs/databases/nosql-databases/druid/others.md b/docs/databases/nosql-databases/druid/others.md index 3fe014c3af7..fb8620b9d6d 100755 --- a/docs/databases/nosql-databases/druid/others.md +++ b/docs/databases/nosql-databases/druid/others.md @@ -3,8 +3,11 @@ ## Plywood Plywood is a JavaScript library that simplifies building interactive visualizations and applications for large data sets. Plywood acts as a middle-layer between data visualizations and data stores. + Plywood is architected around the principles of nested [Split-Apply-Combine](http://www.jstatsoft.org/article/view/v040i01/v40i01.pdf), a powerful divide-and-conquer algorithm that can be used to construct all types of data visualizations. Plywood comes with its own [expression language](https://github.com/implydata/plywood/blob/master/docs/expressions) where a single Plywood expression can translate to multiple database queries, and where results are returned in a nested data structure so they can be easily consumed by visualization libraries such as [D3.js](http://d3js.org/). + You can use Plywood in the browser and/or in node.js to easily create your own visualizations and applications. + Plywood also acts as a very advanced query planner for Druid, and Plywood will determine the most optimal way to execute Druid queries. diff --git a/docs/databases/nosql-databases/druid/paper.md b/docs/databases/nosql-databases/druid/paper.md index 1351a79ae78..305913695d7 100755 --- a/docs/databases/nosql-databases/druid/paper.md +++ b/docs/databases/nosql-databases/druid/paper.md @@ -1,42 +1,48 @@ # Paper -1. Realtime Node +### 1. Realtime Node -Real-time nodes encapsulate the functionality to ingest and query event streams. Events indexed via these nodes are immediately available for querying. The nodes are only concerned with events for some small time range and periodically hand off immutable batches of events they have collected over this small time range to other nodes in the Druid cluster that are specialized in dealing with batches of immutable events. Real-time nodes leverage Zookeeper [19] for coordination with the rest of the Druid cluster. The nodes announce their online state and the data they serve in Zookeeper. +Real-time nodes encapsulate the functionality to ingest and query event streams. Events indexed via these nodes are immediately available for querying. The nodes are only concerned with events for some small time range and periodically hand off immutable batches of events they have collected over this small time range to other nodes in the Druid cluster that are specialized in dealing with batches of immutable events. Real-time nodes leverage Zookeeper for coordination with the rest of the Druid cluster. The nodes announce their online state and the data they serve in Zookeeper. -Real-time nodes maintain an in-memory index buffer for all in- coming events. These indexes are incrementally populated as events are ingested and the indexes are also directly queryable. Druid be- haves as a row store for queries on events that exist in this JVM heap-based buffer. To avoid heap overflow problems, real-time nodes persist their in-memory indexes to disk either periodically or after some maximum row limit is reached. This persist process converts data stored in the in-memory buffer to a column oriented storage format described in Section 4. Each persisted index is im- mutable and real-time nodes load persisted indexes into off-heap memory such that they can still be queried. This process is de- scribed in detail in [33] and is illustrated in Figure 2. +Real-time nodes maintain an in-memory index buffer for all in- coming events. These indexes are incrementally populated as events are ingested and the indexes are also directly queryable. Druid be- haves as a row store for queries on events that exist in this JVM heap-based buffer. To avoid heap overflow problems, real-time nodes persist their in-memory indexes to disk either periodically or after some maximum row limit is reached. This persist process converts data stored in the in-memory buffer to a column oriented storage format described in Section 4. Each persisted index is im- mutable and real-time nodes load persisted indexes into off-heap memory such that they can still be queried. This process is de- scribed in detail in and is illustrated in Figure 2. -On a periodic basis, each real-time node will schedule a back- ground task that searches for all locally persisted indexes. The task merges these indexes together and builds an immutable block of data that contains all the events that have been ingested by a real- time node for some span of time. We refer to this block of data as a "segment". During the handoff stage, a real-time node uploads this segment to a permanent backup storage, typically a distributed file system such as S3 [12] or HDFS [36], which Druid refers to as "deep storage". The ingest, persist, merge, and handoff steps are fluid; there is no data loss during any of the processes. -2. Historical Nodes +On a periodic basis, each real-time node will schedule a back- ground task that searches for all locally persisted indexes. The task merges these indexes together and builds an immutable block of data that contains all the events that have been ingested by a real- time node for some span of time. We refer to this block of data as a "segment". During the handoff stage, a real-time node uploads this segment to a permanent backup storage, typically a distributed file system such as S3 or HDFS, which Druid refers to as "deep storage". The ingest, persist, merge, and handoff steps are fluid; there is no data loss during any of the processes. -Historical nodes encapsulate the functionality to load and serve the immutable blocks of data (segments) created by real-time nodes. In many real-world workflows, most of the data loaded in a Druid cluster is immutable and hence, historical nodes are typically the main workers of a Druid cluster. Historical nodes follow a shared- nothing architecture and there is no single point of contention among the nodes. The nodes have no knowledge of one another and are operationally simple; they only know how to load, drop, and serve immutable segments. +### 2. Historical Nodes -Similar to real-time nodes, historical nodes announce their on- line state and the data they are serving in Zookeeper. Instructions to load and drop segments are sent over Zookeeper and contain infor- mation about where the segment is located in deep storage and how to decompress and process the segment. Before a historical node downloads a particular segment from deep storage, it first checks a local cache that maintains information about what segments already exist on the node. If information about a segment is not present in the cache, the historical node will proceed to download the segment from deep storage. Once pro- cessing is complete, the segment is announced in Zookeeper. At this point, the segment is queryable. The local cache also allows for historical nodes to be quickly updated and restarted. On startup, the node examines its cache and immediately serves whatever data it finds. Historical nodes can support read consistency because they only deal with immutable data. Immutable data blocks also enable a sim- ple parallelization model: historical nodes can concurrently scan and aggregate immutable blocks without blocking. +Historical nodes encapsulate the functionality to load and serve the immutable blocks of data (segments) created by real-time nodes. In many real-world workflows, most of the data loaded in a Druid cluster is immutable and hence, historical nodes are typically the main workers of a Druid cluster. Historical nodes follow a shared-nothing architecture and there is no single point of contention among the nodes. The nodes have no knowledge of one another and are operationally simple; they only know how to load, drop, and serve immutable segments. -*Tiers* +Similar to real-time nodes, historical nodes announce their online state and the data they are serving in Zookeeper. Instructions to load and drop segments are sent over Zookeeper and contain information about where the segment is located in deep storage and how to decompress and process the segment. Before a historical node downloads a particular segment from deep storage, it first checks a local cache that maintains information about what segments already exist on the node. If information about a segment is not present in the cache, the historical node will proceed to download the segment from deep storage. Once processing is complete, the segment is announced in Zookeeper. At this point, the segment is queryable. The local cache also allows for historical nodes to be quickly updated and restarted. On startup, the node examines its cache and immediately serves whatever data it finds. Historical nodes can support read consistency because they only deal with immutable data. Immutable data blocks also enable a simple parallelization model: historical nodes can concurrently scan and aggregate immutable blocks without blocking. + +#### Tiers Historical nodes can be grouped in different tiers, where all nodes in a given tier are identically configured. Different performance and fault-tolerance parameters can be set for each tier. The purpose of tiered nodes is to enable higher or lower priority segments to be dis- tributed according to their importance. For example, it is possible to spin up a "hot" tier of historical nodes that have a high num- ber of cores and large memory capacity. The "hot" cluster can be configured to download more frequently accessed data. A parallel "cold" cluster can also be created with much less powerful backing hardware. The "cold" cluster would only contain less frequently accessed segments. -3. Broker Nodes + +### 3. Broker Nodes Broker nodes act as query routers to historical and real-time nodes. Broker nodes understand the metadata published in Zookeeper about what segments are queryable and where those segments are located. Broker nodes route incoming queries such that the queries hit the right historical or real-time nodes. Broker nodes also merge partial results from historical and real-time nodes before returning a final consolidated result to the caller. -*Caching* -Broker nodes contain a cache with a LRU [31, 20] invalidation strategy. The cache can use local heap memory or an external distributed key/value store such as Memcached [16]. Each time a bro- ker node receives a query, it first maps the query to a set of seg- ments. Results for certain segments may already exist in the cache and there is no need to recompute them. For any results that do not exist in the cache, the broker node will forward the query to the correct historical and real-time nodes. Once historical nodes return their results, the broker will cache these results on a per segment ba- sis for future use. +#### Caching + +Broker nodes contain a cache with a LRU invalidation strategy. The cache can use local heap memory or an external distributed key/value store such as Memcached. Each time a bro- ker node receives a query, it first maps the query to a set of seg- ments. Results for certain segments may already exist in the cache and there is no need to recompute them. For any results that do not exist in the cache, the broker node will forward the query to the correct historical and real-time nodes. Once historical nodes return their results, the broker will cache these results on a per segment ba- sis for future use. Real-time data is never cached and hence requests for real-time data will al- ways be forwarded to real-time nodes. Real-time data is perpetually changing and caching the results is unreliable. The cache also acts as an additional level of data durability. In the event that all historical nodes fail, it is still possible to query results if those results already exist in the cache. -4. Coordinator Nodes -Druid coordinator nodes are primarily in charge of data manage- ment and distribution on historical nodes. The coordinator nodes tell historical nodes to load new data, drop outdated data, replicate data, and move data to load balance. Druid uses a multi-version concurrency control swapping protocol for managing immutable segments in order to maintain stable views. If any immutable seg- ment contains data that is wholly obsoleted by newer segments, the outdated segment is dropped from the cluster. Coordinator nodes undergo a leader-election process that determines a single node that runs the coordinator functionality. The remaining coordinator nodes act as redundant backups. +### 4. Coordinator Nodes -A coordinator node runs periodically to determine the current state of the cluster. It makes decisions by comparing the expected state of the cluster with the actual state of the cluster at the time of the run. As with all Druid nodes, coordinator nodes maintain a Zookeeper connection for current cluster information. Coordinator nodes also maintain a connection to a MySQL database that con- tains additional operational parameters and configurations. One of the key pieces of information located in the MySQL database is a table that contains a list of all segments that should be served by historical nodes. This table can be updated by any service that cre- ates segments, for example, real-time nodes. The MySQL database also contains a rule table that governs how segments are created, destroyed, and replicated in the cluster. -*Rules* +Druid coordinator nodes are primarily in charge of data management and distribution on historical nodes. The coordinator nodes tell historical nodes to load new data, drop outdated data, replicate data, and move data to load balance. Druid uses a multi-version concurrency control swapping protocol for managing immutable segments in order to maintain stable views. If any immutable segment contains data that is wholly obsoleted by newer segments, the outdated segment is dropped from the cluster. Coordinator nodes undergo a leader-election process that determines a single node that runs the coordinator functionality. The remaining coordinator nodes act as redundant backups. + +A coordinator node runs periodically to determine the current state of the cluster. It makes decisions by comparing the expected state of the cluster with the actual state of the cluster at the time of the run. As with all Druid nodes, coordinator nodes maintain a Zookeeper connection for current cluster information. Coordinator nodes also maintain a connection to a MySQL database that contains additional operational parameters and configurations. One of the key pieces of information located in the MySQL database is a table that contains a list of all segments that should be served by historical nodes. This table can be updated by any service that creates segments, for example, real-time nodes. The MySQL database also contains a rule table that governs how segments are created, destroyed, and replicated in the cluster. + +#### Rules Rules govern how historical segments are loaded and dropped from the cluster. Rules indicate how segments should be assigned to different historical node tiers and how many replicates of a segment should exist in each tier. Rules may also indicate when segments should be dropped entirely from the cluster. Rules are usually set for a period of time -LoadBalancing -These query patterns suggest replicating recent historical seg- ments at a higher rate, spreading out large segments that are close in time to different historical nodes, and co-locating segments from different data sources. +#### LoadBalancing + +These query patterns suggest replicating recent historical segments at a higher rate, spreading out large segments that are close in time to different historical nodes, and co-locating segments from different data sources. ## Storage Engine diff --git a/docs/databases/nosql-databases/druid/readme.md b/docs/databases/nosql-databases/druid/readme.md index 09e643584be..3af1bcfc565 100755 --- a/docs/databases/nosql-databases/druid/readme.md +++ b/docs/databases/nosql-databases/druid/readme.md @@ -1,6 +1,6 @@ # Druid -- [Druid](databases/nosql-databases/druid/intro.md) +- [Druid Intro](databases/nosql-databases/druid/intro.md) - [Architecture](databases/nosql-databases/druid/architecture.md) - [Documentation](documentation) - [Paper](paper) diff --git a/docs/databases/nosql-databases/readme.md b/docs/databases/nosql-databases/readme.md index 24fdc8e2dbc..0c44b452e9e 100755 --- a/docs/databases/nosql-databases/readme.md +++ b/docs/databases/nosql-databases/readme.md @@ -8,3 +8,7 @@ - [Cassandra](cassandra/readme.md) - [AWS DyanamoDB](aws-dynamodb/readme.md) - [Time Series DB](time-series-db/readme.md) +- [YugabyteDB](databases/nosql-databases/yugabytedb.md) +- [Clickhouse](databases/nosql-databases/clickhouse.md) +- [Snowflake](databases/nosql-databases/snowflake.md) +- [Comparisions](databases/nosql-databases/comparisions.md) diff --git a/docs/databases/nosql-databases/snowflake.md b/docs/databases/nosql-databases/snowflake.md new file mode 100644 index 00000000000..a86bc20b63f --- /dev/null +++ b/docs/databases/nosql-databases/snowflake.md @@ -0,0 +1,91 @@ +# Snowflake + +Snowflake is a cloud-based database and is currently offered as a pay-as-you-go service in the Amazon cloud. It is developed by Snowflake Computing. + +Snowflake adopts a shared-nothing architecture. It uses Amazon S3 for its underlying data storage. It performs query execution within in elastic clusters of virtual machines, called virtual warehouse. The Cloud Service layer stores the collection of services that manage computation clusters, queries, transactions, and all the metadata like database catalogs and access control information in a key-value store (FoundationDB). + +#### History + +Implementation of Snowflake began in late 2012 and has been generally available since June 2015. + +#### Concurrency Control  + +[Multi-version Concurrency Control (MVCC)](https://dbdb.io/browse?concurrency-control=multi-version-concurrency-control-mvcc) + +Snowflake supports MVCC. As Snowflake's underlying data storage is done by Amazon S3, each write operation instead of performing writes in place, it creates a new entire file including the changes. The stale version of data is replaced by the newly created file, but is not deleted immediately. Snowflake allows users to define how long the stale version will be kept in S3, which is up to 90 days. Based on MVCC, Snowflake also supports time travel query. + +#### Data Model  + +[Relational](https://dbdb.io/browse?data-model=relational) [Document / XML](https://dbdb.io/browse?data-model=document-xml) + +Snowflake is relational as it supports ANSI SQL and ACID transactions. It offers built-in functions and SQL extensions for traversing, flattening, and nesting of semi-structured data, with support for popular formats such as JSON and Avro. When storing semi-structured data, Snowflake can perform automatic type inference to find the most common types and store them using the same compressed columnar format as native relational data. Thus it can accelerate query execution on them. + +#### Foreign Keys + +[Supported](https://dbdb.io/browse?foreign-keys=supported) + +Snowflake supports defining and maintaining constraints, but does not enforce them, except for NOT NULL constraints, which are always enforced including foreign key constraint. + +#### Indexes  + +[Not Supported](https://dbdb.io/browse?indexes=not-supported) + +Snowflake does not support index, as maintaining index is expensive due to its architecture. Snowflake uses min-max based pruning, and other techniques to accelerate data access. + +#### Isolation Levels  + +[Snapshot Isolation](https://dbdb.io/browse?isolation-levels=snapshot-isolation) + +According to their paper and talk, Snowflake supports Snapshot Isolation. However, according to their documentation, it is said that Read Committed is the only Isolation level that is supported. + +#### Joins  + +[Hash Join](https://dbdb.io/browse?joins=hash-join) + +#### Query Compilation + +[Not Supported](https://dbdb.io/browse?query-compilation=not-supported) + +#### Query Execution  + +[Vectorized Model](https://dbdb.io/browse?query-execution=vectorized-model) + +Snowflake processes data in pipelined fashion, in batches of a few thousand rows in columnar format. It also uses a push instead of pull model as the relational operators push the intermediate results to their downstream operators. + +#### Query Interface  + +[SQL](https://dbdb.io/browse?query-interface=sql) + +#### Storage Architecture  + +[Disk-oriented](https://dbdb.io/browse?storage-architecture=disk-oriented) + +Snowflake's data storage is done via Amazon S3 service. Upon query execution, the responsible work nodes uses HTTP -based interface to read/write data. The worker node also uses its local disk as a cache. + +#### Storage Model  + +[Hybrid](https://dbdb.io/browse?storage-model=hybrid) + +Snowflake horizontally partitions data into large immutable files which are equivalent to blocks or pages in a traditional database system. Within each file, the values of each attribute or column are grouped together and heav- ily compressed, a well-known scheme called PAX or hybrid columnar. Each table file has a header which, among other metadata, contains the offsets of each column within the file. + +#### Stored Procedures + +[Not Supported](https://dbdb.io/browse?stored-procedures=not-supported) + +#### System Architecture  + +[Shared-Disk](https://dbdb.io/browse?system-architecture=shared-disk) + +It uses Amazon S3 for its underlying data storage. It performs query execution within in elastic clusters of virtual machines, called virtual warehouse. Upon query execution, virtual warehouse use HTTP-based interface to read/write data from S3. The Cloud Service layer stores the collection of services that manage computation clusters, queries, transactions, and all the metadata like database catalogs and access control information, in FoundationDB. + +#### Views + +[Virtual Views](https://dbdb.io/browse?views=virtual-views) + +## Links + +[The Snowflake Data Cloud - Mobilize Data, Apps, and AI](https://www.snowflake.com/en/) + +[What is Snowflake? 8 Minute Demo - YouTube](https://www.youtube.com/watch?v=9PBvVeCQi0w) + +[Snowflake Explained In 9 Mins | What Is Snowflake Database | Careers In Snowflake | MindMajix - YouTube](https://www.youtube.com/watch?v=hJHWmYcdDn8) diff --git a/docs/databases/others/yugabytedb.md b/docs/databases/nosql-databases/yugabytedb.md similarity index 100% rename from docs/databases/others/yugabytedb.md rename to docs/databases/nosql-databases/yugabytedb.md diff --git a/docs/databases/others/databases-others.md b/docs/databases/others/databases-others.md index 9cc0b918caa..01fdc65795a 100755 --- a/docs/databases/others/databases-others.md +++ b/docs/databases/others/databases-others.md @@ -133,39 +133,6 @@ JanusGraph is a highly scalable [graph database](https://en.wikipedia.org/wiki/G [https://docs.janusgraph.org](https://docs.janusgraph.org/) -## ClickHouse - -ClickHouse is anopensourcecolumn-oriented database management system capable ofrealtime generation of analytical data reports usingSQLqueries. - -**Key Features** - -- True column-oriented storage -- Vectorized query execution -- Data compression -- Parallel and distributed query execution -- Real time query processing -- Real time data ingestion -- On-disk locality of reference -- Cross-datacenter replication -- High availability -- SQL support -- Local and distributed joins -- Pluggable external dimension tables -- Arrays and nested data types -- Approximate query processing -- Probabilistic data structures -- Full support of IPv6 -- Features for web analytics -- State-of-the-art algorithms -- Detailed documentation -- Clean documented code - - - - - -Used by - Zerodha - ## tidb TiDB ("Ti" stands for Titanium) is an open-source NewSQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability. diff --git a/docs/databases/others/readme.md b/docs/databases/others/readme.md index fdc31e1fd8f..b16b31e2201 100755 --- a/docs/databases/others/readme.md +++ b/docs/databases/others/readme.md @@ -2,7 +2,6 @@ - [Databases - Others](databases-others) - [Technologies / Tools](technologies-tools) -- [YugabyteDB](yugabytedb) - [ETL (Extract Transform Load)](etl-extract-transform-load) - [Course AWS Certified Database Specialty](course-aws-certified-database-specialty) - [Course Advanced Database Systems](course-advanced-database-systems) diff --git a/docs/management/people-management/people-team-management.md b/docs/management/people-management/people-team-management.md index ade528037df..f68ba7596e3 100755 --- a/docs/management/people-management/people-team-management.md +++ b/docs/management/people-management/people-team-management.md @@ -31,11 +31,11 @@ Fool me once, shame on you. Fool me twice, shame on me. Fool me three times, sha - Higher incentives led to worse performances - Financial incentives, can result in a negative impact on overall performance - New approach - intrinsic motivation - - Autonomy - - Traditional management is great if you want compliance, but if you want engagement self-direction works better - - 20 percent time - - Mastery - - Purpose + - Autonomy + - Traditional management is great if you want compliance, but if you want engagement self-direction works better + - 20 percent time + - Mastery + - Purpose ## Psychological Safety @@ -63,17 +63,18 @@ Creating **psychological safety** in the workplace for learning, innovation, and - When leaders are curious and admit that they don't know everything, people are encouraged to speak up - When people take risks and speak up, it's important for leaders to respond productively - You don't have to be a leader to help create a fearless work environment - - I need help - - I don't know - - I made a mistake - - What challenges are you facing? - - What can I do to help you? + - I need help + - I don't know + - I made a mistake + - What challenges are you facing? + - What can I do to help you? ![image](../../media/People-Team-Management-Culture-image1.jpg) ![image](../../media/People-Team-Management-Culture-image2.jpg) Let your team know + - Over-communication is helpful - The more everyone proactively shares progress + concerns, the better. - Empathize empathy - Not sure what someone meant by their note? Assume positive intent. Feeling bothered by the way someone communicated their request? Kindly share the feedback of what you observed + how you'd like things to be different next time. @@ -111,9 +112,9 @@ The fundamental principle of the situational leadership model is that there is * - Depends on - - Skill - - Motivation - - Urgency + - Skill + - Motivation + - Urgency ![image](../../media/People-Team-Management-Culture-image4.jpg) @@ -122,10 +123,10 @@ The fundamental principle of the situational leadership model is that there is * ![image](../../media/People-Team-Management-Culture-image6.jpg) - Three are many ways to lead - - Coach - - Shepherd - - Shaman - - Champion + - Coach + - Shepherd + - Shaman + - Champion - Saying no ![image](../../media/People-Team-Management-Culture-image7.jpg) @@ -264,12 +265,12 @@ Escalation matrix ### [**Managing people**](https://klinger.io/posts/managing-people-%F0%9F%A4%AF) - As a manager, everything is your fault - - There is no point being angry at your team -- ever - - You are in charge of processes and people - - And you got more information than they do, always - - You either created the processes where this outcome happened - - or you hired (or did not fire) the wrong people - - Ultimately everything is your fault + - There is no point being angry at your team -- ever + - You are in charge of processes and people + - And you got more information than they do, always + - You either created the processes where this outcome happened + - or you hired (or did not fire) the wrong people + - Ultimately everything is your fault - You manage processes; you lead people - Processes are expectations made explicit - Decisions vs Opinions diff --git a/docs/management/people-management/types-of-leadership.md b/docs/management/people-management/types-of-leadership.md index 5f5184babb8..c992faf9376 100644 --- a/docs/management/people-management/types-of-leadership.md +++ b/docs/management/people-management/types-of-leadership.md @@ -1,8 +1,11 @@ # Types of Leadership ### 1. Transformational leadership + ### 2. Situational leadership + ### 3. Authoritarian leadership + ### 4. Bureaucratic leadership As the cousin of the autocratic style, bureaucratic leadership runs on rules, policy, and maintaining the status quo. The standard procedure always wins out. Proponents of this style will listen to employees, and may even acknowledge their good ideas, but if those ideas don't fit within the established system, they'll never get the green light. diff --git a/docs/management/project-management/pm101.md b/docs/management/project-management/pm101.md index df32c7f717b..8d294f0ffc1 100755 --- a/docs/management/project-management/pm101.md +++ b/docs/management/project-management/pm101.md @@ -16,11 +16,12 @@ 10. Project Stakeholders 11. Project Management Life Cycle and its types 12. Project Management Processes - - Initiating Process Group - - Planning Process Group - - Executing Process Group - - Monitoring and Controlling Process Group - - Closing Process Group + +- Initiating Process Group +- Planning Process Group +- Executing Process Group +- Monitoring and Controlling Process Group +- Closing Process Group ## What is a Project