From 12ff67ebb4d9b3f5ff04d4da4285050138ea8984 Mon Sep 17 00:00:00 2001 From: Deepak Date: Fri, 29 Nov 2024 23:01:50 +0530 Subject: [PATCH] updated docs --- docs/algorithms/general/algo-ds.md | 301 ++++++------------ .../mysql/optimizing-locking-operations.md | 6 + .../1-linux-general-unix-linux-commands.md | 6 + docs/frontend/frontend-intro/wordpress.md | 40 +++ .../sql/dml-data-manipulation-language.md | 10 + docs/languages/sql/stored-procedure.md | 169 +++++++++- .../elasticsearch/analysis-and-analyzers.md | 28 +- .../elasticsearch/architecture.md | 17 +- docs/technologies/elasticsearch/elastalert.md | 8 +- .../elasticsearch-the-definitive-guide.md | 55 +--- ...k-efk-stack-elastic-stack-elasticsearch.md | 22 +- .../elasticsearch/full-text-searches.md | 2 +- .../elasticsearch/getting-started.md | 20 +- .../elasticsearch/information-retrieval.md | 13 +- .../elasticsearch/internal-working.md | 33 +- docs/technologies/elasticsearch/others.md | 19 +- 16 files changed, 401 insertions(+), 348 deletions(-) diff --git a/docs/algorithms/general/algo-ds.md b/docs/algorithms/general/algo-ds.md index e91fc011f1f..e3ecad1c714 100755 --- a/docs/algorithms/general/algo-ds.md +++ b/docs/algorithms/general/algo-ds.md @@ -3,235 +3,126 @@ ## Algorithms 1. Union-Find Algorithm - - - Dynamic Connectivity - - - Quick Find - - - Quick Union - - - Improvements - - - Weighted Quick Union - - - Weighted Quick Union with Path Compression - + - Dynamic Connectivity + - Quick Find + - Quick Union + - Improvements + - Weighted Quick Union + - Weighted Quick Union with Path Compression 2. Analysis of algorithms - - - Scientific Method of Analysis - - - Empirical Method of Analysis - + - Scientific Method of Analysis + - Empirical Method of Analysis 3. Stacks and Queues - - - Stacks - - - Resizing Arrays - - - Queues - - - Deque - - - Randomized Queues - + - Stacks + - Resizing Arrays + - Queues + - Deque + - Randomized Queues 4. Elementary Sort - - - Selection Sort - - - Insertion Sort - - - Shell Sort - - - Shuffling - - - Shuffle Sort - - - Knuth Shuffle - - - Convex Hull - + - Selection Sort + - Insertion Sort + - Shell Sort + - Shuffling + - Shuffle Sort + - Knuth Shuffle + - Convex Hull 5. Merge Sort - - - Bottom up mergesort - + - Bottom up mergesort 6. Quick Sort - - - Quick Select (Selection) - - - 3- way partition quicksort (Duplicate Keys) - - - System sorts - + - Quick Select (Selection) + - 3- way partition quicksort (Duplicate Keys) + - System sorts 7. Priority Queues - - - Binary heaps - - - Heap sort - + - Binary heaps + - Heap sort 8. Elementary Symbol Tables - - - Elementary Implementations - -Sorted array (Binary Search) - -Unordered List (Sequential Search) - -2. Ordered Operations - -3. Binary Search Trees - -4. Ordered Operations in BSTs - -5. Deletion in BSTs - -9. Balanced Search Trees - - - 2-3 Search Trees - - - Red-Black Trees - - - B-Trees - -10. Geometric applications of BST - -- 1d Range Search - -- Line Segment Intersection - -- Kd-Trees - -- Interval Search Trees - -- Rectangle Intersection - -11. Hash Tables - -- Uniform Hashing Assumption - -- Separate Chaining - -- Linear Probing - -12. Symbol Table Applications - -- Sets - -- Dictionary Clients - -- Indexing Clients - -- Sparse Vectors + - Elementary Implementations + - Sorted array (Binary Search) + - Unordered List (Sequential Search) +9. Ordered Operations +10. Binary Search Trees +11. Ordered Operations in BSTs +12. Deletion in BSTs +13. Balanced Search Trees + - 2-3 Search Trees + - Red-Black Trees + - B-Trees +14. Geometric applications of BST + - 1d Range Search + - Line Segment Intersection + - Kd-Trees + - Interval Search Trees + - Rectangle Intersection +15. Hash Tables + - Uniform Hashing Assumption + - Separate Chaining + - Linear Probing +16. Symbol Table Applications + - Sets + - Dictionary Clients + - Indexing Clients + - Sparse Vectors ## Data Structures 1. Undirected Graphs - - - Implementation - - - Adjacency Matrix - - - Adjacency List - - - DFS - - - BFS - - - Connected Components - + - Implementation + - Adjacency Matrix + - Adjacency List + - DFS + - BFS + - Connected Components 2. Directed Graphs - - - Topological Sort - - - Topological order of an acyclic digraph - - - Strongly Connected Components - - - Kosaraju-Sharir algorithm for computing strong components of a digraph - + - Topological Sort + - Topological order of an acyclic digraph + - Strongly Connected Components + - Kosaraju-Sharir algorithm for computing strong components of a digraph 3. Minimum Spanning Trees - - - Kruskal's Algorithm - - - Prim's Algorithm - + - Kruskal's Algorithm + - Prim's Algorithm 4. Shortest Path - - - Dijkstra's Algorithm - - - Bellman Ford Algorithm (Negative Weights) - + - Dijkstra's Algorithm + - Bellman Ford Algorithm (Negative Weights) 5. Maximum Flow and Minimum Cut - - - Ford-Fulkerson Algorithm - + - Ford-Fulkerson Algorithm 6. Radix Sorts - - - Key-Indexed Counting - - - LSD Radix Sort - - - MSD Radix Sort - - - 3-way Radix Quicksort - - - Suffix Arrays - + - Key-Indexed Counting + - LSD Radix Sort + - MSD Radix Sort + - 3-way Radix Quicksort + - Suffix Arrays 7. Tries - - - R-way Tries - - - Ternary Search Tries - + - R-way Tries + - Ternary Search Tries 8. Substring Search - - - KMP (Knuth-Morris-Pratt) - - - Boyer-Moore - - - Rabin-Karp - + - KMP (Knuth-Morris-Pratt) + - Boyer-Moore + - Rabin-Karp 9. Regular Expressions - - - DFA - - - NFA - + - DFA + - NFA 10. Data Compression - -- Run Length Encoding - -- Huffman Compression - -- LZW Compression - -- Burrows-Wheeler - + - Run Length Encoding + - Huffman Compression + - LZW Compression + - Burrows-Wheeler 11. Reductions - 12. Linear Programming - -- Brewer's Problem - -- Simplex Algorithm - + - Brewer's Problem + - Simplex Algorithm 13. Intractability - -- P - -- NP - -- NP-Complete + - P + - NP + - NP-Complete ## Strategies for algorithms 1. B.U.D. (Bottleneck, Unnecessary work, Duplicated work) - 2. Space / Time Tradeoffs ## Resources 1. Coursera - Algorithms Part 1 - 2. Coursera - Algorithms Part 2 https://www.toptal.com/algorithms/interview-questions @@ -239,23 +130,14 @@ https://www.toptal.com/algorithms/interview-questions ## Most Important Algos / DS / Programming Concepts 1. Depth first search - 2. Breadth first search - 3. Matching parenthesis - 4. Hash Tables - 5. Variables / Pointer manipulations - 6. Reversing a linked list - 7. Sorting fundamentals - 8. Recursion - 9. Custom data structures (suffix tree) - 10. **Binary search** ## BUD Optimization Strategy @@ -269,15 +151,10 @@ https://4tee-learn.blogspot.com/2017/12/optimisation-technique-15-bud.html ## Questions to asking when solving a coding interview questions 1. What is the data types of the inputs? - - - Can we assume the string is ASCII or Unicode? - + - Can we assume the string is ASCII or Unicode? 2. Do we have to worry about load factors? - 3. Do we have to validate inputs? - 4. Can we assume this fits in memory? - 5. Can we use additional data structures? https://www.freecodecamp.org/news/learn-algorithms-and-data-structures-in-python diff --git a/docs/databases/sql-databases/mysql/optimizing-locking-operations.md b/docs/databases/sql-databases/mysql/optimizing-locking-operations.md index b02b75a1dfc..e9bdc49e7f9 100755 --- a/docs/databases/sql-databases/mysql/optimizing-locking-operations.md +++ b/docs/databases/sql-databases/mysql/optimizing-locking-operations.md @@ -85,6 +85,12 @@ DELETE FROM orders WHERE order_id < 100; Here, InnoDB locks the gaps between existing rows where `order_id` is less than 100 to prevent new rows from being inserted into these gaps, which would affect the results of this `DELETE` operation. +Therefore it's better to use between for `DELETE` to make operation non-locking + +```sql +DELETE FROM orders WHERE order_id betwen 0 and 100; +``` + ### Index Locks InnoDB locks the index entries for the rows being deleted. This prevents other transactions from modifying the indexes until the transaction is complete. diff --git a/docs/devops/terminal-bash/1-linux-general-unix-linux-commands.md b/docs/devops/terminal-bash/1-linux-general-unix-linux-commands.md index 4ca9cda2bda..3475071273b 100755 --- a/docs/devops/terminal-bash/1-linux-general-unix-linux-commands.md +++ b/docs/devops/terminal-bash/1-linux-general-unix-linux-commands.md @@ -921,6 +921,12 @@ screen -r # To attach to specific session screen -r session_name +# attach to an already attached session (detach from old terminal and attach to new terminal) +screen -r -d 30608 + +# scroll in a session +ctrl + A > ESC > up and down to scroll + # list the current running screen sessions screen -ls diff --git a/docs/frontend/frontend-intro/wordpress.md b/docs/frontend/frontend-intro/wordpress.md index 0e18fcfbe7b..766f3322ce6 100755 --- a/docs/frontend/frontend-intro/wordpress.md +++ b/docs/frontend/frontend-intro/wordpress.md @@ -8,6 +8,46 @@ https://www.advancedcustomfields.com [Overview of WordPress (Beginners Guide 2020)](https://www.youtube.com/watch?v=jmqu4HC3zmo) +## Biggest Wordpress Users + +- WordPress.com +- IBM Jobs +- Microsoft News +- Facebook Newsroom +- Mercedes-Benz +- BBC America +- Forbes Blogs +- Time Magazine +- CNN Press Room +- Quartz +- The White House +- Usain Bolt +- Katty Perry +- Brian Smith +- Boing Boing +- The Bloggess +- FiveThirtyEight +- The Herald Sun +- Flickr Blog +- TechCrunch +- Sony Music +- Bata +- Bloomberg Professional +- Yelp Blog +- The New York Observer +- PlayStation Blog +- Rolling Stones +- Spotify Newsroom +- Disney Books +- Etsy Journal +- TED Blog + +[30+ Examples of Biggest Companies Using WordPress - weDevs](https://wedevs.com/blog/103311/top-brands-using-wordpress/) + +[Largest user base you have served? : r/Wordpress](https://www.reddit.com/r/Wordpress/comments/1dyshjs/largest_user_base_you_have_served/?rdt=42578) + +[40+ Most Notable Big Name Brands that are Using WordPress](https://www.wpbeginner.com/showcase/40-most-notable-big-name-brands-that-are-using-wordpress/) + ## wp-admin [WordPress Admin Dashboard Tutorial 2020 - Step By Step For Beginners In WP-ADMIN!](https://www.youtube.com/watch?v=Ov_zUmMyJnQ) diff --git a/docs/languages/sql/dml-data-manipulation-language.md b/docs/languages/sql/dml-data-manipulation-language.md index 8e3ca55b1c2..86ec46c4db5 100755 --- a/docs/languages/sql/dml-data-manipulation-language.md +++ b/docs/languages/sql/dml-data-manipulation-language.md @@ -25,6 +25,16 @@ DELETE statements are used to remove rows from a table. DELETE FROM table_name WHERE some_column = some_value; ``` +Always better to use between than comparison operator, so that lock is not taken on whole table, but only range locks are taken. + +```sql +-- good +DELETE FROM orders WHERE order_id betwen 0 and 100; + +-- bad +DELETE FROM orders WHERE order_id < 100; +``` + ### INSERT INTO ```sql diff --git a/docs/languages/sql/stored-procedure.md b/docs/languages/sql/stored-procedure.md index f673507fe6c..9c6404bb123 100755 --- a/docs/languages/sql/stored-procedure.md +++ b/docs/languages/sql/stored-procedure.md @@ -80,7 +80,37 @@ ForMySQL 8, connect your database viaWorkbench, go toAdministration -> User and ## Queries -### Creating a Stored Procedure +### Stored Procedure - CopyUsersLogInBatches + +```sql +DELIMITER $$ + +CREATE PROCEDURE CopyUsersLogInBatches() +BEGIN + DECLARE batch_size INT DEFAULT 1000000; + DECLARE start_id INT; + DECLARE max_id INT; + + -- Initialize start_id and max_id + SELECT MIN(id) INTO start_id FROM users_log; + SELECT MAX(id) INTO max_id FROM users_log; + + -- Loop to copy data in batches + WHILE start_id <= max_id DO + INSERT INTO users_log_backup_27_nov_2024 + SELECT * + FROM users_log + WHERE id BETWEEN start_id AND start_id + batch_size - 1; + + -- Update the start_id for the next batch + SET start_id = start_id + batch_size; + END WHILE; +END$$ + +DELIMITER ; +``` + +### Stored Procedure - DeleteUsersLogInBatches ```sql DELIMITER $$ @@ -105,6 +135,112 @@ END$$ DELIMITER ; ``` +### Stored Procedure with Progress and Total Rows Deleted - DeleteOldSessionsInBatches + +```sql +DELIMITER $$ + +CREATE PROCEDURE DeleteOldSessionsInBatches() +BEGIN + DECLARE batch_size INT DEFAULT 10000; -- Number of rows to delete in each batch + DECLARE rows_deleted INT DEFAULT 0; -- Counter for rows deleted in each iteration + DECLARE total_deleted INT DEFAULT 0; -- Total rows deleted across all batches + + -- Loop to delete data in batches + REPEAT + -- Delete a batch of rows + DELETE FROM entrancecorner.django_session + WHERE expire_date BETWEEN NOW() - INTERVAL 180 DAY AND NOW() - INTERVAL 165 DAY + LIMIT batch_size; + + -- Get the number of rows deleted in this batch + SET rows_deleted = ROW_COUNT(); + + -- Update the total count of rows deleted + SET total_deleted = total_deleted + rows_deleted; + + -- Output progress message + SELECT CONCAT('Deleted ', rows_deleted, ' rows in this batch. Total so far: ', total_deleted) AS Progress; + + UNTIL rows_deleted = 0 -- Exit when no more rows match the criteria + END REPEAT; + + -- Final message with total rows deleted + SELECT CONCAT('Deletion process completed. Total rows deleted: ', total_deleted) AS FinalMessage; +END$$ + +DELIMITER ; +``` + +### Stored Procedure - DeleteContentRevisionsEfficiently + +```sql +DELIMITER $$ + +CREATE PROCEDURE DeleteContentRevisionsEfficiently() +BEGIN + DECLARE current_model_name VARCHAR(255); -- Placeholder for the current model_name + DECLARE finished INT DEFAULT 0; -- Loop termination flag + DECLARE rows_deleted INT DEFAULT 0; -- Counter for rows deleted + DECLARE total_deleted INT DEFAULT 0; -- Total rows deleted + + -- Cursor to iterate over distinct model_name values + DECLARE model_cursor CURSOR FOR + SELECT DISTINCT model_name FROM content_revisions; + + -- Handler for the end of the cursor + DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished = 1; + + -- Open the cursor + OPEN model_cursor; + + -- Loop through each model_name + fetch_loop: LOOP + FETCH model_cursor INTO current_model_name; + + -- Exit loop if no more data + IF finished = 1 THEN + LEAVE fetch_loop; + END IF; + + -- Print progress: start processing current model_name + SELECT CONCAT('Processing model_name: ', current_model_name) AS ProgressMessage; + + -- Delete rows for the current model_name with ranking logic + DELETE cr + FROM content_revisions cr + JOIN ( + SELECT id + FROM ( + SELECT id, + ROW_NUMBER() OVER (PARTITION BY model_id ORDER BY created DESC, revision_no DESC) AS rn + FROM content_revisions + WHERE model_name = current_model_name + ) ranked_revisions + WHERE rn > 5 + ) to_delete + ON cr.id = to_delete.id; + + -- Get the number of rows deleted for the current group + SET rows_deleted = ROW_COUNT(); + + -- Update the total count + SET total_deleted = total_deleted + rows_deleted; + + -- Print progress: rows deleted for the current model_name + SELECT CONCAT('Deleted ', rows_deleted, ' rows for model_name: ', current_model_name, '. Total deleted so far: ', total_deleted) AS ProgressMessage; + END LOOP; + + -- Close the cursor + CLOSE model_cursor; + + -- Final message + SELECT CONCAT('Deletion process completed. Total rows deleted: ', total_deleted) AS FinalMessage; +END$$ + +DELIMITER ; +``` + ### Calling a Stored Procedure ```sql @@ -113,3 +249,34 @@ call DeleteUsersLogInBatches(); -- drop stored procedure drop procedure DeleteUsersLogInBatches; ``` + +### Stored Procedure with Progress Output + +```sql +DELIMITER $$ + +CREATE PROCEDURE DeleteUsersLogInBatches() +BEGIN + DECLARE batch_size INT DEFAULT 1000000; -- Number of rows to delete in each batch + DECLARE start_id INT DEFAULT 0; -- Starting ID for the first batch + DECLARE end_id INT DEFAULT 14900000; -- Target maximum ID for deletion + + -- Loop to delete data in batches + WHILE start_id < end_id DO + -- Delete rows in the current batch + DELETE FROM users_log + WHERE id BETWEEN start_id AND start_id + batch_size - 1; + + -- Output progress message + SELECT CONCAT('Deleted rows with IDs from ', start_id, ' to ', start_id + batch_size - 1) AS Progress; + + -- Update the start_id for the next batch + SET start_id = start_id + batch_size; + END WHILE; + + -- Final message + SELECT 'Deletion process completed.' AS FinalMessage; +END$$ + +DELIMITER ; +``` \ No newline at end of file diff --git a/docs/technologies/elasticsearch/analysis-and-analyzers.md b/docs/technologies/elasticsearch/analysis-and-analyzers.md index 825daf90513..118d5a45b3b 100755 --- a/docs/technologies/elasticsearch/analysis-and-analyzers.md +++ b/docs/technologies/elasticsearch/analysis-and-analyzers.md @@ -1,51 +1,51 @@ # Analysis and Analyzers -Analysisis aprocess that consists of the following: +Analysis is a process that consists of the following: - First, tokenizing a block of text into individualtermssuitable for use in an inverted index, - Then normalizing these terms into a standard form to improve their "searchability," orrecall -This job isperformed by analyzers. Ananalyzeris really just a wrapper that combines three functions into asingle package: +This job is performed by analyzers. Ananalyzer is really just a wrapper that combines three functions into a single package: ## Character filters -First, the string is passed through anycharacter filtersin turn. Their job is to tidy up the string before tokenization. A character filter could be used to strip out HTML, or to convert&characters to the wordand. +First, the string is passed through any character filters in turn. Their job is to tidy up the string before tokenization. A character filter could be used to strip out HTML, or to convert & characters to the word and. ## Tokenizer -Next, the string is tokenized into individual terms by atokenizer. A simple tokenizer might split the text into terms whenever it encounters whitespace or punctuation. +Next, the string is tokenized into individual terms by a tokenizer. A simple tokenizer might split the text into terms whenever it encounters whitespace or punctuation. ## Token filters -Last, each term is passed through anytoken filtersin turn, which can change terms (for example, lowercasingQuick), remove terms (for example, stopwords such asa, and, the) or add terms (for example, synonyms likejumpandleap). +Last, each term is passed through any token filters in turn, which can change terms (for example, lowercasing Quick), remove terms (for example, stop words such as a, and, the) or add terms (for example, synonyms like jump and leap). ## Standard analyzer -The standard analyzeris the default analyzer that Elasticsearch uses. It is the best general choice for analyzing text that may be in any language. It splits the text onword boundaries, asdefined by the [Unicode Consortium](http://www.unicode.org/reports/tr29/), and removes most punctuation. Finally, it lowercases all terms. It would produce +The standard analyzer is the default analyzer that Elasticsearch uses. It is the best general choice for analyzing text that may be in any language. It splits the text on word boundaries, as defined by the [Unicode Consortium](http://www.unicode.org/reports/tr29/), and removes most punctuation. Finally, it lowercases all terms. It would produce -set, the, shape, to, semi, transparent, by, calling, set_trans, 5 +`set, the, shape, to, semi, transparent, by, calling, set_trans, 5` ## Simple analyzer -The simple analyzer splitsthe text on anything that isn't a letter, and lowercases the terms. It would produce +The simple analyzer splits the text on anything that isn't a letter, and lowercases the terms. It would produce -set, the, shape, to, semi, transparent, by, calling, set, trans +`set, the, shape, to, semi, transparent, by, calling, set, trans` ## Whitespace analyzer The whitespace analyzer splitsthe text on whitespace. It doesn't lowercase. It would produce -Set, the, shape, to, semi-transparent, by, calling, set_trans(5) +`Set, the, shape, to, semi-transparent, by, calling, set_trans(5)` ## Language analyzers -Language-specific analyzersare available for [many languages](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/analysis-lang-analyzer.html). They are able to take the peculiarities of the specified language into account. For instance, theenglishanalyzer comes with a set of Englishstopwords (common words likeandorthethat don't have much impact on relevance), which it removes. This analyzer also is able tostemEnglishwords because it understands the rules of English grammar. +Language-specific analyzers are available for [many languages](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/analysis-lang-analyzer.html). They are able to take the peculiarities of the specified language into account. For instance, the english analyzer comes with a set of English stop words (common words like and or the that don't have much impact on relevance), which it removes. This analyzer also is able to stem English words because it understands the rules of English grammar. -Theenglishanalyzer would produce the following: +The english analyzer would produce the following: -set, shape, semi, transpar, call, set_tran, 5 +`set, shape, semi, transpar, call, set_tran, 5` -Note howtransparent, calling, andset_transhave been stemmed to their root form. +Note how transparent, calling, and set_trans have been stemmed to their root form. https://www.elastic.co/guide/en/elasticsearch/guide/current/analysis-intro.html diff --git a/docs/technologies/elasticsearch/architecture.md b/docs/technologies/elasticsearch/architecture.md index 8f3c7ad8f87..11f8b16acec 100755 --- a/docs/technologies/elasticsearch/architecture.md +++ b/docs/technologies/elasticsearch/architecture.md @@ -2,11 +2,11 @@ ## Cluster and Node -Anodeis a running instance ofElasticsearch, while aclusterconsists of one or more nodes with the samecluster.namethat are working together to share their data and workload. As nodes are added to or removed from the cluster, the cluster reorganizes itself to spread the data evenly. +A node is a running instance of Elasticsearch, while a cluster consists of one or more nodes with the same cluster.name that are working together to share their data and workload. As nodes are added to or removed from the cluster, the cluster reorganizes itself to spread the data evenly. -One node in the cluster is elected to be themasternode, whichis in charge of managing cluster-wide changes like creating or deleting an index, or adding or removing a node from the cluster. The master node does not need to be involved in document-level changes or searches, which means that having just one master node will not become a bottleneck as traffic grows. Any node can become the master. Our example cluster has only one node, so it performs the master role. +One node in the cluster is elected to be the masternode, which is in charge of managing cluster-wide changes like creating or deleting an index, or adding or removing a node from the cluster. The master node does not need to be involved in document-level changes or searches, which means that having just one master node will not become a bottleneck as traffic grows. Any node can become the master. Our example cluster has only one node, so it performs the master role. -As users, we can talk toany node in the cluster, including the master node. Every node knows where each document lives and can forward our request directly to the nodes that hold the data we are interested in. Whichever node we talk to manages the process of gathering the response from the node or nodes holding the data and returning the final response to the client. It is all managed transparently by Elasticsearch. +As users, we can talk to any node in the cluster, including the master node. Every node knows where each document lives and can forward our request directly to the nodes that hold the data we are interested in. Whichever node we talk to manages the process of gathering the response from the node or nodes holding the data and returning the final response to the client. It is all managed transparently by Elasticsearch. ## Index @@ -36,10 +36,7 @@ Often, we use the terms object and document interchangeably. However, there is a ## Document Metadata -`_index` - Where the document lives - -`_type` - The class of object that the document represents - -`_id` - The unique identifier for the document - -`_version` - Every time a change is made to a document (including deleting it), the `_version` number is incremented. +- `_index` - Where the document lives +- `_type` - The class of object that the document represents +- `_id` - The unique identifier for the document +- `_version` - Every time a change is made to a document (including deleting it), the `_version` number is incremented. diff --git a/docs/technologies/elasticsearch/elastalert.md b/docs/technologies/elasticsearch/elastalert.md index b14b1d67fe2..daaf7470599 100755 --- a/docs/technologies/elasticsearch/elastalert.md +++ b/docs/technologies/elasticsearch/elastalert.md @@ -33,18 +33,14 @@ In addition to this basic usage, there are many other features that make alerts ## Types 1. spike - 2. frequency - 3. flatline - 4. new_term - 5. change ## Common Configuration Example -1. Required settings +### 1. Required settings - es_host - es_port @@ -53,7 +49,7 @@ In addition to this basic usage, there are many other features that make alerts - type - alert -2. Optional settings +### 2. Optional settings - import - use_ssl diff --git a/docs/technologies/elasticsearch/elasticsearch-the-definitive-guide.md b/docs/technologies/elasticsearch/elasticsearch-the-definitive-guide.md index 80f66372a4e..59e13cffa62 100755 --- a/docs/technologies/elasticsearch/elasticsearch-the-definitive-guide.md +++ b/docs/technologies/elasticsearch/elasticsearch-the-definitive-guide.md @@ -30,7 +30,7 @@ A distributed real-time document store whereevery fieldis indexed and searchable - Support for more than one index. - Index level configuration (number of shards, index storage, ...). - Various set of APIs - - HTTPRESTfulAPI + - HTTP RESTful API - Native JavaAPI. - All APIs perform automatic node operation rerouting. - Document oriented @@ -63,110 +63,71 @@ A distributed real-time document store whereevery fieldis indexed and searchable ## Contents -1. **Getting Started** +### 1. Getting Started - You Know, for Search - - Life Inside a Cluster - - Data In, Data Out - - Distributed Document Store - - Searching - The Basic Tools - - Mapping and Analysis - - Full-Body Search - - Sorting and Relevance - - Distributed Search Execution - - Index Management - - Inside a Shard -2. **Search in Depth** +### 2. Search in Depth - Structured Search - - Full-Text Search - - Multifield Search - - Proximity Matching - - Partial Matching - - Controlling Relevance -3. **Dealing with Human Language** +### 3. Dealing with Human Language - Getting Started with Languages - - Indentifying Words - - Normalizing Tokens - - Reducing Words to Their Root Form - - Stopwords: Performance Versus Precision - - Synonyms - - Typoes and Mispelings -4. **Aggregations** +### 4. Aggregations - High-Level Concepts - - Aggregation Test-Drive - - Building Bar Charts - - Looking at Time - - Scoping Aggregations - - Sorting Queries and Aggregations - - Sorting Multivalue Buckets - - Approximate Aggregations - - Significant Terms - - Doc Values and Fielddata - - Closing Thoughts -5. **Geolocation** +### 5. Geolocation - Geo Points - - Geohashes - - Geo Aggregations - - Goe Shapes -6. **Modeling Your Data** +### 6. Modeling Your Data - Handling Relationships - - Nested Objects - - Parent-Child Relationship - - Designing for Scale -7. **Administration, Monitoring, and Deployment** +### 7. Administration, Monitoring, and Deployment - Monitoring - - Production Deployment - - Post-Deployment ## References diff --git a/docs/technologies/elasticsearch/elk-efk-stack-elastic-stack-elasticsearch.md b/docs/technologies/elasticsearch/elk-efk-stack-elastic-stack-elasticsearch.md index 38802b14c44..d241bee5286 100755 --- a/docs/technologies/elasticsearch/elk-efk-stack-elastic-stack-elasticsearch.md +++ b/docs/technologies/elasticsearch/elk-efk-stack-elastic-stack-elasticsearch.md @@ -6,23 +6,17 @@ ## Elasticsearch -Elasticsearch is a search and analytics engine. - -Elasticsearch is a NoSQL database that is based on the Lucene search engine. - -Elasticsearch uses Apache Lucene to index documents for fast searching. +- Elasticsearch is a search and analytics engine. +- Elasticsearch is a NoSQL database that is based on the Lucene search engine. +- Elasticsearch uses Apache Lucene to index documents for fast searching. ## Solr, ElasticSearch -Search platform - -Highly available - -Very scalable - -Fault tolerant search platform - -Provides full-text search +- Search platform +- Highly available +- Very scalable +- Fault tolerant search platform +- Provides full-text search ## Logstash diff --git a/docs/technologies/elasticsearch/full-text-searches.md b/docs/technologies/elasticsearch/full-text-searches.md index 36c33719bb1..59130e1f868 100755 --- a/docs/technologies/elasticsearch/full-text-searches.md +++ b/docs/technologies/elasticsearch/full-text-searches.md @@ -18,7 +18,7 @@ Typically, an entity will correspond to a table in your database, and the attrib ## Faceted Search -Faceted searchis a technique that involves augmenting traditional search techniques with a faceted navigation system, allowing users to narrow down search results by applying multiple filters based on [faceted classification](https://en.wikipedia.org/wiki/Faceted_classification) of the items.A faceted classification system classifies each information element along multiple explicit dimensions, called facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, pre-determined, [taxonomic](https://en.wikipedia.org/wiki/Taxonomy_(general)) order. +Faceted search is a technique that involves augmenting traditional search techniques with a faceted navigation system, allowing users to narrow down search results by applying multiple filters based on [faceted classification](https://en.wikipedia.org/wiki/Faceted_classification) of the items.A faceted classification system classifies each information element along multiple explicit dimensions, called facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, pre-determined, [taxonomic](https://en.wikipedia.org/wiki/Taxonomy_(general)) order. Facets correspond to properties of the information elements. They are often derived by analysis of the text of an item using [entity extraction](https://en.wikipedia.org/wiki/Entity_extraction) techniques or from pre-existing fields in a database such as author, descriptor, language, and format. Thus, existing web-pages, product descriptions or online collections of articles can be augmented with navigational facets. diff --git a/docs/technologies/elasticsearch/getting-started.md b/docs/technologies/elasticsearch/getting-started.md index 98e11889bd8..f6e80191ca4 100755 --- a/docs/technologies/elasticsearch/getting-started.md +++ b/docs/technologies/elasticsearch/getting-started.md @@ -49,23 +49,23 @@ The only difference is that theupdateAPI achieves this through a single client r ## Index -Index (noun) +### Index (noun) -As explained previously, an *index* is like a *database* in a traditional relational database. It is the place to store related documents. The plural of *index*is *indices*or *indexes*. +As explained previously, an *index* is like a *database* in a traditional relational database. It is the place to store related documents. The plural of *index* is *indices* or *indexes*. -Index (verb) +### Index (verb) -*To index a document*is to store a document in an*index (noun)*so that it can be retrieved and queried. It is much like theINSERTkeyword in SQL except that, if the document already exists, the new document would replace the old. +*To index a document* is to store a document in an *index (noun)* so that it can be retrieved and queried. It is much like the `INSERT` keyword in SQL except that, if the document already exists, the new document would replace the old. -Inverted index +### Inverted index -Relational databases add an *index*, such as a B-tree index, to specific columns in order to improve the speed of data retrieval. Elasticsearch and Lucene use a structure calledan*inverted index*for exactly the same purpose. +Relational databases add an *index*, such as a B-tree index, to specific columns in order to improve the speed of data retrieval. Elasticsearch and Lucene use a structure called an *inverted index* for exactly the same purpose. **By default, every field in a document is indexed (has an inverted index) and thus is searchable. A field without an inverted index is not searchable** ## Query -Elasticsearch provides a rich, flexible, query language called the*query DSL*, whichallows us to build much more complicated, robust queries. +Elasticsearch provides a rich, flexible, query language called the *query DSL*, which allows us to build much more complicated, robust queries. The *domain-specific language* (DSL) is specified using a JSON request body. @@ -77,7 +77,7 @@ Leaf query clauses look for a particular value in a particular field, such as th ## Compound query clauses -Compound query clauses wrap other leaforcompound queries and are used to combine multiple queries in a logical fashion (such as the [bool](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html) or [dis_max](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html) query), or to alter their behaviour (such as the [constant_score](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html) query). +Compound query clauses wrap other leaf or compound queries and are used to combine multiple queries in a logical fashion (such as the [bool](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html) or [dis_max](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-dis-max-query.html) query), or to alter their behaviour (such as the [constant_score](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html) query). ## Relevance Score @@ -111,6 +111,6 @@ A *search* can be any of the following: Data in Elasticsearch can be broadly divided into two types: exact values and full text. -*Exact values*are exactly what they sound like.Examples are a date or a user ID, but can also include exact strings such as a username or an email address. The exact valueFoois not the same as the exact valuefoo. The exact value2014is not the same as the exact value2014-09-15. +*Exact values* are exactly what they sound like.Examples are a date or a user ID, but can also include exact strings such as a username or an email address. The exact value Foo is not the same as the exact value foo. The exact value 2014 is not the same as the exact value 2014-09-15. -*Full text*, on the other hand, refersto textual data - usually written in some human language --- like the text of a tweet or the body of an email. +*Full text*, on the other hand, refers to textual data - usually written in some human language --- like the text of a tweet or the body of an email. diff --git a/docs/technologies/elasticsearch/information-retrieval.md b/docs/technologies/elasticsearch/information-retrieval.md index b53b2be521f..31b4b3cc6e9 100755 --- a/docs/technologies/elasticsearch/information-retrieval.md +++ b/docs/technologies/elasticsearch/information-retrieval.md @@ -6,9 +6,9 @@ Talks about "How Google Search indexes pages" ## tf-idf (term frequency - inverse document frequency) -In [information retrieval](https://en.wikipedia.org/wiki/Information_retrieval), tf--idforTFIDF, short forterm frequency--inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a [document](https://en.wikipedia.org/wiki/Document) in a collection or [corpus](https://en.wikipedia.org/wiki/Text_corpus). It is often used as a [weighting factor](https://en.wikipedia.org/wiki/Weighting_factor) in searches of information retrieval, [text mining](https://en.wikipedia.org/wiki/Text_mining), and [user modeling](https://en.wikipedia.org/wiki/User_modeling). The tf--idf value increases [proportionally](https://en.wikipedia.org/wiki/Proportionality_(mathematics)) to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general. Tf--idf is one of the most popular term-weighting schemes today; 83% of text-based recommender systems in digital libraries use tf--idf. +In [information retrieval](https://en.wikipedia.org/wiki/Information_retrieval), tf-id for TFIDF, short for term frequency - inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a [document](https://en.wikipedia.org/wiki/Document) in a collection or [corpus](https://en.wikipedia.org/wiki/Text_corpus). It is often used as a [weighting factor](https://en.wikipedia.org/wiki/Weighting_factor) in searches of information retrieval, [text mining](https://en.wikipedia.org/wiki/Text_mining), and [user modeling](https://en.wikipedia.org/wiki/User_modeling). The tf--idf value increases [proportionally](https://en.wikipedia.org/wiki/Proportionality_(mathematics)) to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general. Tf--idf is one of the most popular term-weighting schemes today; 83% of text-based recommender systems in digital libraries use tf-idf. -Variations of the tf--idf weighting scheme are often used by [search engines](https://en.wikipedia.org/wiki/Search_engine) as a central tool in scoring and ranking a document's [relevance](https://en.wikipedia.org/wiki/Relevance_(information_retrieval)) given a user [query](https://en.wikipedia.org/wiki/Information_retrieval). tf--idf can be successfully used for [stop-words](https://en.wikipedia.org/wiki/Stop-words) filtering in various subject fields, including [text summarization](https://en.wikipedia.org/wiki/Automatic_summarization) and classification. +Variations of the tf-idf weighting scheme are often used by [search engines](https://en.wikipedia.org/wiki/Search_engine) as a central tool in scoring and ranking a document's [relevance](https://en.wikipedia.org/wiki/Relevance_(information_retrieval)) given a user [query](https://en.wikipedia.org/wiki/Information_retrieval). tf-idf can be successfully used for [stop-words](https://en.wikipedia.org/wiki/Stop-words) filtering in various subject fields, including [text summarization](https://en.wikipedia.org/wiki/Automatic_summarization) and classification. One of the simplest [ranking functions](https://en.wikipedia.org/wiki/Ranking_function) is computed by summing the tf--idf for each query term; many more sophisticated ranking functions are variants of this simple model. @@ -64,10 +64,9 @@ BM25 and its newer variants, e.g. BM25F (a version of BM25 that can take documen - Dimensionality reduction - Vector similarity scoring -Solr +### Solr 1. Streaming expressions - 2. Vectors fields/functions in solr ![image](../../media/Technologies-Elasticsearch-Information-Retrieval-image6.jpg) @@ -86,15 +85,15 @@ Solr ## Neural Search -The core idea of neural search is to leverage state-of-the-art deep neural networks to buildeverycomponent of a search system. In short, neural search is deep neural network-powered information retrieval.In academia, it's often calledneural IR. +The core idea of neural search is to leverage state-of-the-art deep neural networks to build every component of a search system. In short, neural search is deep neural network-powered information retrieval.In academia, it's often called neural IR. -## What can it do? +### What can it do? Thanks to recent advances in deep neural networks, a neural search system can go way beyond simple text search. It enables advanced intelligence on all kinds of unstructured data, such as images, audio, video, PDF, 3D mesh, you name it. For example, retrieving animation according to some beats; finding the best-fit memes according to some jokes; scanning a table with your iPhone's LiDAR camera and finding similar furniture at IKEA. Neural search systems enable what traditional search can't: multi/cross-modal data retrieval. -## Think outside the (search)box +### Think outside the (search)box Many neural search-powered applications do not have a search box: diff --git a/docs/technologies/elasticsearch/internal-working.md b/docs/technologies/elasticsearch/internal-working.md index 65a38de593a..4987074e867 100755 --- a/docs/technologies/elasticsearch/internal-working.md +++ b/docs/technologies/elasticsearch/internal-working.md @@ -2,10 +2,8 @@ Here is the sequence of steps necessary to successfully create, index, or delete a document on both the primary and any replica shards: -1. The client sends a create, index, or delete request toNode 1. - +1. The client sends a create, index, or delete request to Node 1. 2. The node uses the document's `_id` to determine that the document belongs to shard0. It forwards the request to Node 3, where the primary copy of shard0is currently allocated. - 3. Node 3 executes the request on the primary shard. If it is successful, it forwards the request in parallel to the replica shards onNode 1 and Node 2. Once all of the replica shards report success, Node 3 reports success to the coordinating node, which reports success to the client. By the time the client receives a successful response, the document change has been executed on the primary shard and on all replica shards. Your change is safe. @@ -14,39 +12,36 @@ By the time the client receives a successful response, the document change has b Here is the sequence of steps to retrieve a document from either a primary or replica shard: -1. The client sends a get request toNode 1. - -2. The node uses the document's_idto determine that the document belongs to shard0. Copies of shard0exist on all three nodes. On this occasion, it forwards the request toNode 2. - -3. Node 2returns the document toNode 1, which returns the document to the client. +1. The client sends a get request to Node 1. +2. The node uses the document's `_id` to determine that the document belongs to shard0. Copies of shard0exist on all three nodes. On this occasion, it forwards the request to Node 2. +3. Node 2 returns the document to Node 1, which returns the document to the client. ![image](../../media/Technologies-Elasticsearch-Internal-Working-image2.jpg) Here is the sequence of steps used to perform a partial update on a document: -1. The client sends an update request toNode 1. +1. The client sends an update request to Node 1. -2. It forwards the request toNode 3, where the primary shard is allocated. +2. It forwards the request to Node 3, where the primary shard is allocated. -3. Node 3retrieves the document from the primary shard, changes the JSON in the_sourcefield, and tries to reindex the document on the primary shard. If the document has already been changed by another process, it retries step 3 up toretry_on_conflicttimes, before giving up. +3. Node 3 retrieves the document from the primary shard, changes the JSON in the `_source` field, and tries to reindex the document on the primary shard. If the document has already been changed by another process, it retries step 3 up to retry_on_conflict times, before giving up. -4. IfNode 3has managed to update the document successfully, it forwards the new version of the document in parallel to the replica shards onNode 1andNode 2to be reindexed. Once all replica shards report success, Node 3reports success to the coordinating node, which reports success to the client. +4. If Node 3 has managed to update the document successfully, it forwards the new version of the document in parallel to the replica shards on Node 1 and Node 2 to be reindexed. Once all replica shards report success, Node 3 reports success to the coordinating node, which reports success to the client. ![image](../../media/Technologies-Elasticsearch-Internal-Working-image3.jpg) -Here is the sequence of steps necessary to retrieve multiple documents with a singlemgetrequest: - -1. The client sends anmgetrequest toNode 1. +Here is the sequence of steps necessary to retrieve multiple documents with a single mget request: -2. Node 1builds a multi-get request per shard, and forwards these requests in parallel to the nodes hosting each required primary or replica shard. Once all replies have been received, Node 1builds the response and returns it to the client. +1. The client sends an `mget` request to Node 1. +2. Node 1 builds a multi-get request per shard, and forwards these requests in parallel to the nodes hosting each required primary or replica shard. Once all replies have been received, Node 1 builds the response and returns it to the client. ![image](../../media/Technologies-Elasticsearch-Internal-Working-image4.jpg) -The sequence of stepsfollowed by thebulkAPI are as follows: +The sequence of steps followed by the bulkAPI are as follows: -1. The client sends abulkrequest toNode 1. +1. The client sends a bulk request to Node 1. -2. Node 1builds a bulk request per shard, and forwards these requests in parallel to the nodes hosting each involved primary shard. +2. Node 1 builds a bulk request per shard, and forwards these requests in parallel to the nodes hosting each involved primary shard. 3. The primary shard executes each action serially, one after another. As each action succeeds, the primary forwards the new document (or deletion) to its replica shards in parallel, and then moves on to the next action. Once all replica shards report success for all actions, the node reports success to the coordinating node, which collates the responses and returns them to the client. diff --git a/docs/technologies/elasticsearch/others.md b/docs/technologies/elasticsearch/others.md index 792e2bec9f5..2960f5c1b6d 100755 --- a/docs/technologies/elasticsearch/others.md +++ b/docs/technologies/elasticsearch/others.md @@ -20,21 +20,21 @@ https://github.com/fluent/fluent-bit Fluentd is an open source data collector for unified logging layer. -- **Unified Logging with JSON** +### Unified Logging with JSON -Fluentd tries to structure data as JSON as much as possible: this allows Fluentd tounifyall facets of processing log data: collecting, filtering, buffering, and outputting logs acrossmultiple sources and destinations([Unified Logging Layer](http://www.fluentd.org/blog/unified-logging-layer)). The downstream data processing is much easier with JSON, since it has enough structure to be accessible while retaining flexible schemas. +Fluentd tries to structure data as JSON as much as possible: this allows Fluentd to unify all facets of processing log data: collecting, filtering, buffering, and outputting logs across multiple sources and destinations([Unified Logging Layer](http://www.fluentd.org/blog/unified-logging-layer)). The downstream data processing is much easier with JSON, since it has enough structure to be accessible while retaining flexible schemas. -- **Pluggable Architecture** +### Pluggable Architecture Fluentd has a flexible plugin system that allows the community to extend its functionality. Our 500+ community-contributed plugins connect dozens of [data sources](https://www.fluentd.org/datasources) and [data outputs](https://www.fluentd.org/dataoutputs). By leveraging the plugins, you can start making better use of your logs right away. -- **Minimum Resources Required** +### Minimum Resources Required Fluentd is written in a combination of C language and Ruby, and requires very little system resource. The vanilla instance runs on 30-40MB of memory and can process 13,000 events/second/core. If you have tighter memory requirements (-450kb), check out [Fluent Bit](http://fluentbit.io/), the lightweight forwarder for Fluentd. -- **Built-in Reliability** +### Built-in Reliability -Fluentd supports memory- and file-based buffering to prevent inter-node data loss. Fluentd also supports robust failover and can be set up for high availability.[2,000+ data-driven companies](https://www.fluentd.org/testimonials) rely on Fluentd to differentiate their products and services through a better use and understanding of their log data. +Fluentd supports memory- and file-based buffering to prevent inter-node data loss. Fluentd also supports robust failover and can be set up for high availability. [2,000+ data-driven companies](https://www.fluentd.org/testimonials) rely on Fluentd to differentiate their products and services through a better use and understanding of their log data. ![image](../../media/Technologies-Elasticsearch-Others-image1.jpg) @@ -84,6 +84,11 @@ https://www.elastic.co/guide/en/apm/agent/python/current/flask-support.html https://toptechtips.github.io/2019-07-08-add_python_code_to_apm -## Opensearch +## OpenSearch https://github.com/opensearch-project/OpenSearch + +### Elasticsearch vs Amazon OpenSearch + +- [Amazon OpenSearch vs. Elasticsearch | Elastic](https://www.elastic.co/amazon-opensearch-service) +- [Elasticsearch vs. OpenSearch: Performance and resource utilization analysis | Elastic Blog](https://www.elastic.co/blog/elasticsearch-opensearch-performance-gap)