Data Structure | Description | Complexity (Common Operations) | Best Used For | Not Ideal For | Derived From / Notes |
---|---|---|---|---|---|
Array (Basic) | A contiguous block of memory holding elements accessible by index. | Access: O(1), Append: O(1) amortized, Insert/Remove (front/mid): O(n), Search (unsorted): O(n) | Random access, static collections, small lookups | Frequent insertions/deletions in the middle | Primitive, forms basis of many structures |
Linked List (Basic) | A linear collection of nodes where each node points to the next (singly) or both next and previous (doubly). | Insert/Remove at head or tail: O(1), Search: O(n), Random Access: O(n) | Insertions/removals at ends, implementing stacks/queues | Fast random access required scenarios | Built from nodes and pointers/references |
Stack (Basic) | LIFO structure typically backed by an array or linked list. | Push/Pop: O(1), Peek: O(1) | Function call tracking, undo operations, parsing | Situations needing random access or non-LIFO traversal | Often implemented using arrays or linked lists |
Queue (Basic) | FIFO structure typically backed by an array or linked list. | Enqueue/Dequeue: O(1), Peek: O(1) | Breadth-first search, scheduling tasks | Random access or stack-like (LIFO) behaviors | Often implemented using arrays or linked lists |
Hash Table / Hash Map (Intermediate) | Stores key-value pairs with O(1) average lookup/insertion via hashing. | Insert/Lookup/Remove: O(1) average, O(n) worst case | Fast lookups by key, caching, dictionaries | Maintaining sorted order, range queries | Built on top of arrays + hashing |
Binary Search Tree (BST) (Intermediate) | A tree where each node’s left child < node < right child, often balanced to achieve good complexity. | Insert/Lookup/Remove: O(log n) average if balanced, O(n) worst | Ordered data, range queries, maintaining sorted structure | Worst-case skewed trees without balancing | Derived from basic tree nodes, often balanced (AVL, Red-Black) |
Heap / Priority Queue (Intermediate) | A binary tree (often represented as an array) that maintains a heap property for quick min/max access. | Get min/max: O(1), Insert: O(log n), Remove top: O(log n) | Priority scheduling, shortest-path algorithms (Dijkstra) | Situations needing full sorting of all elements upfront | Typically a binary tree represented as an array |
Graph (Adjacency List) (Intermediate) | Stores nodes and their edges as lists of adjacent vertices. | Add vertex: O(1), Add edge: O(1), Lookup adjacency: O(deg(v)) | Sparse graphs, graph algorithms (DFS, BFS, shortest paths) | Dense graphs (use adjacency matrix) | Derived from arrays/lists, flexible representation |
Trie (Prefix Tree) (Advanced) | A tree-based structure for storing strings by character prefixes. | Insert/Lookup (m-length string): O(m), Space: potentially large | Autocomplete, prefix queries, fast lookup of words | Large memory usage if few words share prefixes | Specialized tree keyed by characters |
Balanced BST (AVL/Red-Black) (Advanced) | Self-balancing BST maintaining O(log n) operations. | Insert/Lookup/Remove: O(log n) | Maintaining sorted data with guaranteed O(log n) ops | Very simple implementations or constant-time lookups | Complexity from rotations and balancing strategies |
B-Tree / B+ Tree (Advanced) | A tree optimized for disk-based or block-based storage, used in databases and filesystems. | Insert/Lookup/Remove: O(log n) | Database indexing, file systems | Small in-memory data sets | Generalization of BST, multi-way branching |
Segment Tree (Advanced) | A tree structure for efficient range queries and updates (e.g. sum, min, max over a range). | Query/Update: O(log n) | Range queries (sum, min, max) with updates | Scenarios without range queries or updates | Built atop array-based tree representation |
Fenwick Tree (BIT) (Advanced) | A structure for cumulative frequencies or prefix sums. | Update/Prefix query: O(log n) | Prefix sum queries, frequency counting | Complex operations beyond prefix sums | Derived from array, uses clever indexing |
Disjoint Set (Union-Find) (Advanced) | Keeps track of elements partitioned into disjoint sets for fast union/find. | Union/Find: O(α(n)) (Inverse Ackermann function, very slow-growing) | Kruskal’s MST, clustering problems | Not for linear order queries, only connected components | Linked structure with path compression and union by rank |
Algorithm | Genre | Data Structures Commonly Used | Complexity (Typical) | Best Used For | Not Ideal For | Notes |
---|---|---|---|---|---|---|
Bubble Sort (Basic) | Sorting | Array | O(n²) | Very small datasets, educational purposes | Large datasets, performance-critical tasks | Simple but inefficient |
Insertion Sort (Basic) | Sorting | Array | O(n²), O(n) best case | Small or partially sorted data | Large random datasets | Stable, easy to implement |
Selection Sort (Basic) | Sorting | Array | O(n²) | Situations where minimal swaps are needed | Large datasets | Always O(n²), not stable |
Merge Sort (Intermediate) | Sorting | Array (often split arrays) | O(n log n) | General-purpose sorting | Memory-limited or strictly in-place requirements | Stable, divide-and-conquer |
Quick Sort (Intermediate) | Sorting | Array | Average O(n log n), worst O(n²) | General-purpose, often very fast in practice | Worst-case scenarios without good pivot choice | Not stable, often in-place |
Binary Search (Basic) | Searching | Sorted Array | O(log n) | Fast lookups in sorted arrays | Unsorted data | Requires sorted data |
Breadth-First Search (BFS) (Intermediate) | Graph Traversal | Queue, Graph (Adjacency List) | O(V + E) | Shortest paths in unweighted graphs, level-order traversal | Weighted graphs without unit weights | Explores neighbors first |
Depth-First Search (DFS) (Intermediate) | Graph Traversal | Stack (implicit recursion), Graph (Adjacency List) | O(V + E) | Traversing or searching graph/trees, topological sort | Finding shortest paths in weighted graphs | Explores depth before breadth |
Dijkstra’s Algorithm (Advanced) | Shortest Path | Priority Queue (Min-Heap), Graph | O((V+E) log V) | Single-source shortest paths on weighted graphs | Negative edge weights | Efficient with a min-heap |
Bellman-Ford (Advanced) | Shortest Path | Array/Adjacency List | O(VE) | Graphs with negative edges (no negative cycles) | Very large or dense graphs due to O(VE) complexity | More flexible than Dijkstra, handles negatives |
Floyd-Warshall (Advanced) | Shortest Path | Matrix | O(V³) | All-pairs shortest paths, small dense graphs | Very large graphs | Easy to implement, high complexity |
Topological Sort (Intermediate) | Graph Ordering | Graph (Adjacency List), Queue (Kahn’s), Stack (DFS) | O(V + E) | Ordering tasks with dependencies (DAGs) | Graphs with cycles (not applicable) | Requires a DAG |
Knuth-Morris-Pratt (KMP) (Intermediate) | String Searching | Array (prefix function) | O(n + m) (n: text length, m: pattern length) | Finding a substring pattern efficiently | Very short patterns or trivial matches | Precomputes failure function for pattern |
Rabin-Karp (Intermediate) | String Searching | Rolling Hash (Array) | Average O(n + m), worst O(nm) | Multiple pattern searches, average fast | Worst case can degrade to O(nm) | Uses hashing to speed pattern matching |
Dynamic Programming (DP) (Advanced, Family of Techniques) | Optimization/Counting | Arrays/Tables, sometimes Trees/Graphs | Depends on subproblem count and transition cost | Optimal solutions to complex problems via subproblems | Problems without overlapping subproblems or optimal substructure | Approach rather than a single algorithm |
Greedy Algorithms (Family) | Optimization | Often arrays, priority queues, sorting | Problem-specific (varies) | Selecting local optimum at each step (e.g. Huffman coding, interval scheduling) | Problems requiring a global optimum that isn't achieved by local choices | Not one algorithm, but a strategy |
Description: Arrays are fundamental data structures that store elements in contiguous memory locations. In Python, lists serve as dynamic arrays that automatically resize when needed. Key features:
- Backed by contiguous memory blocks for O(1) random access by index
- Dynamic resizing (typically doubles capacity when full) gives amortized O(1) append
- Cache-friendly due to memory locality
- Excellent for random access and iteration
- Supports both primitive types and objects
- Drawbacks include O(n) insertions/deletions in middle due to shifting elements
Common use cases:
- Fast random access to elements
- Iteration over sequential data
- Building blocks for other data structures (stacks, queues, heaps)
- Situations where memory locality matters for performance
- Small to medium collections with mostly reads and appends
# Create an array and demonstrate O(1) random access
arr = [1, 2, 3, 4, 5]
print(arr[3]) # O(1) random access -> prints 4
# Demonstrate O(1) amortized append at end
arr.append(6) # Fast append at end
print(arr) # [1, 2, 3, 4, 5, 6]
# Demonstrate O(n) insertion in middle requiring shifts
arr.insert(2, 10) # Insert at index 2, shifts elements right
print(arr) # [1, 2, 10, 3, 4, 5, 6]
# Demonstrate O(n) deletion requiring shifts
del arr[3] # Delete at index 3, shifts elements left
print(arr) # [1, 2, 10, 4, 5, 6]
# Demonstrate iteration and memory locality
total = 0
for x in arr: # Fast iteration due to memory locality
total += x
print(f"Sum: {total}")
# Demonstrate dynamic resizing
for i in range(10):
arr.append(i) # Array will resize automatically when needed
print(f"Length after growth: {len(arr)}")
Description: Implementation of a singly linked list demonstrating fundamental linked data structure concepts. Key features:
Data Structures Used:
-
Node Class:
- Contains value field for data storage
- Contains next pointer for linking to subsequent node
- Enables O(1) pointer manipulation for insertions/deletions
- Forms building block for the linked structure
-
LinkedList Class:
- Maintains head pointer to first node
- Tracks size for O(1) length queries
- Provides interface for list operations
- Manages node connections and traversal
Key advantages over arrays:
- O(1) insertions/deletions at known positions
- Dynamic size without resizing/reallocation
- Memory efficiency with no empty spaces
- Easy implementation of stacks/queues
Common applications:
- Implementation of stacks and queues
- Memory allocation systems
- Undo systems in software
- Music playlists
- Browser history
class Node:
def __init__(self, val):
self.val = val
self.next = None
class SinglyLinkedList:
"""
A singly linked list implementation with basic operations.
Each node contains a value and a pointer to the next node.
Key features:
- O(1) insertions at head/tail
- O(n) search and deletion
- Dynamic size
- No random access
"""
def __init__(self):
self.head = None
self.size = 0
def insert_at_head(self, val):
"""Insert a new node at the head in O(1) time"""
new_node = Node(val)
new_node.next = self.head
self.head = new_node
self.size += 1
def insert_at_tail(self, val):
"""Insert a new node at the tail in O(n) time"""
new_node = Node(val)
if not self.head:
self.head = new_node
else:
current = self.head
while current.next:
current = current.next
current.next = new_node
self.size += 1
def delete(self, val):
"""Delete first occurrence of val in O(n) time"""
if not self.head:
return False
if self.head.val == val:
self.head = self.head.next
self.size -= 1
return True
current = self.head
while current.next:
if current.next.val == val:
current.next = current.next.next
self.size -= 1
return True
current = current.next
return False
def search(self, val):
"""Search for val in O(n) time"""
current = self.head
while current:
if current.val == val:
return True
current = current.next
return False
def get_size(self):
"""Return size of list in O(1) time"""
return self.size
# Example usage
ll = SinglyLinkedList()
ll.insert_at_head(10) # List: 10
ll.insert_at_head(20) # List: 20->10
ll.insert_at_tail(30) # List: 20->10->30
print(ll.search(10)) # True
print(ll.search(40)) # False
print(ll.get_size()) # 3
ll.delete(10) # List: 20->30
print(ll.get_size()) # 2
Description: A Stack is a fundamental data structure that follows Last-In-First-Out (LIFO) ordering. Key features:
- Uses a Python list internally which provides O(1) amortized append/pop operations
- Only allows access to the top element (most recently added)
- Perfect for tracking function calls, undo operations, and parsing expressions
- Common operations:
- push() - Add element to top
- pop() - Remove and return top element
- peek() - View top element without removing
- Memory efficient since elements are stored contiguously
- Can be implemented using arrays or linked lists
- Often used in:
- Function call stacks
- Expression evaluation
- Backtracking algorithms
- Browser history
- Undo/Redo operations
The implementation below uses a Python list which automatically handles resizing.
class Stack:
"""
A Stack data structure that implements LIFO (Last-In-First-Out) ordering.
Attributes:
items (list): Internal list to store stack elements
Methods:
push(item): Add item to top of stack in O(1) time
pop(): Remove and return top item in O(1) time
peek(): Return top item without removing it in O(1) time
is_empty(): Check if stack is empty in O(1) time
size(): Return number of items in stack in O(1) time
"""
def __init__(self):
self.items = []
def push(self, item):
"""Add item to top of stack"""
self.items.append(item) # O(1)
def pop(self):
"""Remove and return top item"""
if self.is_empty():
raise IndexError("Pop from empty stack")
return self.items.pop() # O(1)
def peek(self):
"""Return top item without removing it"""
if self.is_empty():
raise IndexError("Peek at empty stack")
return self.items[-1] # O(1)
def is_empty(self):
"""Check if stack is empty"""
return len(self.items) == 0 # O(1)
def size(self):
"""Return number of items in stack"""
return len(self.items) # O(1)
# Example usage
stack = Stack()
stack.push("function_call_1") # Simulating function call stack
stack.push("function_call_2")
print(stack.peek()) # "function_call_2"
print(stack.pop()) # "function_call_2"
print(stack.pop()) # "function_call_1"
print(stack.is_empty()) # True
Description: A Queue is a fundamental data structure that follows First-In-First-Out (FIFO) ordering. While it can be implemented using a regular list, Python's collections.deque (double-ended queue) provides a more efficient implementation because:
- Uses a doubly-linked list internally for O(1) operations at both ends
- Optimized for fast appends and pops from either end
- Better than list for queue operations since list.pop(0) is O(n)
- Thread-safe and memory efficient
- Maintains insertion order
- Ideal for breadth-first search, task scheduling, and producer-consumer patterns
- Common operations:
- append() - Add to right end (enqueue)
- popleft() - Remove from left end (dequeue)
- appendleft() - Add to left end
- pop() - Remove from right end
The implementation below uses collections.deque which provides all these benefits automatically.
from collections import deque
class Queue:
"""
A Queue data structure that implements FIFO (First-In-First-Out) ordering.
Attributes:
items (collections.deque): Internal deque to store queue elements
Methods:
enqueue(item): Add item to end of queue in O(1) time
dequeue(): Remove and return item from front of queue in O(1) time
peek(): Return front item without removing it in O(1) time
is_empty(): Check if queue is empty in O(1) time
size(): Return number of items in queue in O(1) time
Example:
>>> q = Queue()
>>> q.enqueue(1)
>>> q.enqueue(2)
>>> q.dequeue()
1
>>> q.peek()
2
"""
def __init__(self):
self.items = deque()
def enqueue(self, item):
"""Add item to end of queue"""
self.items.append(item) # O(1)
def dequeue(self):
"""Remove and return item from front of queue"""
return self.items.popleft() if self.items else None # O(1)
def peek(self):
"""Return front item without removing it"""
return self.items[0] if self.items else None # O(1)
def is_empty(self):
"""Check if queue is empty"""
return len(self.items) == 0
def size(self):
"""Return number of items in queue"""
return len(self.items)
# Example usage showing FIFO behavior
queue = Queue()
queue.enqueue(1) # Add to end
queue.enqueue(2) # Add to end
queue.enqueue(3) # Add to end
print(queue.dequeue()) # Remove from front -> 1
print(queue.peek()) # Look at front -> 2
print(queue.size()) # Number of items -> 2
Description: A Hash Table (or Hash Map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. It uses a hash function to compute an index into an array of buckets/slots, from which the desired value can be found. The main features are:
- Uses key-value pairs for storage
- Hash function converts keys into array indices
- O(1) average time complexity for insertions, deletions and lookups
- Handles collisions through methods like chaining or open addressing
- Python's built-in dict uses open addressing with pseudo-random probing
- Dynamically resizes to maintain performance (load factor)
- Space complexity is O(n) where n is number of key-value pairs
- Not ordered - iteration order is not guaranteed
Python's dictionary is a highly optimized hash table implementation that provides fast, constant-time operations for most use cases while handling collision resolution and dynamic resizing automatically.
# Create a hash map and demonstrate key operations
hash_map = {}
# Insertions - O(1) average
hash_map['name'] = 'Alice'
hash_map['age'] = 25
hash_map['city'] = 'New York'
# Lookups - O(1) average
print(hash_map.get('name')) # 'Alice'
print(hash_map.get('invalid')) # None - key doesn't exist
# Update existing key - O(1)
hash_map['age'] = 26
# Delete operation - O(1)
del hash_map['city']
# Check key existence - O(1)
print('name' in hash_map) # True
print('city' in hash_map) # False
# Get all keys and values
print(hash_map.keys()) # dict_keys(['name', 'age'])
print(hash_map.values()) # dict_values(['Alice', 26])
Description: A Binary Search Tree (BST) is a hierarchical data structure where each node has at most two children (left and right). The key property is that for any node, all values in its left subtree are less than the node's value, and all values in its right subtree are greater. This ordering enables O(log n) search, insert and delete operations on average when the tree is relatively balanced. However, this implementation shows an unbalanced BST which can degrade to O(n) performance if values are inserted in sorted order (creating a linear chain). The tree uses a node-based structure where each node contains:
- A value
- A left child pointer
- A right child pointer This simple structure makes it easy to implement but lacks the self-balancing properties of more advanced trees like AVL or Red-Black trees. The BST is useful for maintaining a sorted collection of data with reasonably fast operations when the data is randomly distributed.
class BSTNode:
"""
Binary Search Tree node implementation with basic operations.
Maintains BST property: left subtree values < node value < right subtree values
Operations:
- insert: Add new value maintaining BST property
- search: Find if value exists in tree
- delete: Remove value maintaining BST property
- min/max: Find min/max value in tree
- inorder: Get values in sorted order
"""
def __init__(self, val):
self.val = val
self.left = None
self.right = None
def insert(self, val):
"""Insert value into BST maintaining BST property"""
if val < self.val:
if self.left is None:
self.left = BSTNode(val)
else:
self.left.insert(val)
else:
if self.right is None:
self.right = BSTNode(val)
else:
self.right.insert(val)
def search(self, val):
"""Search for value in BST"""
if val == self.val:
return True
elif val < self.val:
if self.left is None:
return False
return self.left.search(val)
else:
if self.right is None:
return False
return self.right.search(val)
def min_value(self):
"""Get minimum value in BST"""
current = self
while current.left:
current = current.left
return current.val
def max_value(self):
"""Get maximum value in BST"""
current = self
while current.right:
current = current.right
return current.val
def inorder(self):
"""Get values in sorted order"""
result = []
if self.left:
result.extend(self.left.inorder())
result.append(self.val)
if self.right:
result.extend(self.right.inorder())
return result
# Example usage
root = BSTNode(5)
for val in [2, 7, 1, 3, 6, 8]:
root.insert(val)
print(root.search(3)) # True
print(root.search(4)) # False
print(root.min_value()) # 1
print(root.max_value()) # 8
print(root.inorder()) # [1, 2, 3, 5, 6, 7, 8]
Description: Implementation of a min-heap (priority queue) using Python's heapq module. Key features:
Data Structures Used:
-
List/Array:
- Stores heap elements in level-order
- Maintains heap property where parent <= children
- Enables O(1) access to min element at index 0
- Allows O(log n) insertions and deletions
- Provides implicit binary tree structure
-
Binary Tree Structure (implicit):
- For node at index i:
- Left child at 2i + 1
- Right child at 2i + 2
- Parent at (i-1)//2
- Enables efficient upheap/downheap operations
- Maintains balanced tree shape
- For node at index i:
The heap data structure is ideal for:
- Priority queues
- Task scheduling
- Graph algorithms like Dijkstra's
- Event-driven simulations
- Median finding
Key operations and complexities:
- Push: O(log n) - Add element and bubble up
- Pop: O(log n) - Remove min and bubble down
- Peek: O(1) - Access min element
- Heapify: O(n) - Build heap from array
import heapq
# Initialize an empty min-heap
heap = []
# Push elements - O(log n) per operation
heapq.heappush(heap, 5)
heapq.heappush(heap, 2)
heapq.heappush(heap, 7)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)
# Peek at min element without removing - O(1)
print(heap[0]) # -> 1
# Pop min elements - O(log n) per operation
print(heapq.heappop(heap)) # -> 1
print(heapq.heappop(heap)) # -> 2
print(heapq.heappop(heap)) # -> 3
# Create heap from list in O(n) time
nums = [4, 1, 7, 3, 8, 5]
heapq.heapify(nums) # Converts list to valid heap in-place
print(nums[0]) # -> 1 (min element)
# Replace top element - O(log n), more efficient than pop + push
print(heapq.heapreplace(nums, 6)) # -> 1 (returns old top element)
print(nums[0]) # -> 3 (new min after replacement)
Description: Implementation of a graph data structure using an adjacency list representation. Key features:
Data Structures Used:
-
Dictionary/Hash Map:
- Maps vertices to their adjacency lists
- Provides O(1) vertex lookup
- Enables efficient edge insertion
- Dynamically grows as vertices are added
-
Adjacency Lists (Arrays):
- Store neighbors for each vertex
- Enable O(1) edge insertion
- Space efficient for sparse graphs
- Fast iteration over neighbors
Key advantages over adjacency matrix:
- Space efficient O(V + E) vs O(V^2)
- Faster to add vertices
- Better for sparse graphs
- More efficient neighbor iteration
Common applications:
- Social networks
- Road/transportation networks
- Computer networks
- Web page links
- Dependency graphs
class Graph:
"""
Graph implementation using adjacency list representation.
Supports both directed and undirected graphs.
Operations:
- add_vertex: Add a new vertex to graph
- add_edge: Add edge between vertices
- get_neighbors: Get adjacent vertices
- has_edge: Check if edge exists
Time Complexity:
- Adding vertex/edge: O(1) amortized
- Getting neighbors: O(1)
- Checking edge: O(degree(v))
"""
def __init__(self, directed=False):
self.graph = {} # Adjacency list
self.directed = directed
def add_vertex(self, v):
"""Add vertex if it doesn't exist"""
if v not in self.graph:
self.graph[v] = []
def add_edge(self, u, v):
"""Add edge from u to v"""
# Add vertices if they don't exist
self.add_vertex(u)
self.add_vertex(v)
# Add edge
self.graph[u].append(v)
# If undirected, add reverse edge
if not self.directed:
self.graph[v].append(u)
def get_neighbors(self, v):
"""Get list of vertices adjacent to v"""
return self.graph.get(v, [])
def has_edge(self, u, v):
"""Check if edge exists from u to v"""
return v in self.graph.get(u, [])
# Example usage
g = Graph(directed=True) # Create directed graph
# Add edges
g.add_edge(0, 1)
g.add_edge(0, 2)
g.add_edge(1, 2)
# Check graph structure
print(g.graph) # {0: [1, 2], 1: [2], 2: []}
print(g.get_neighbors(0)) # [1, 2]
print(g.has_edge(0, 2)) # True
print(g.has_edge(2, 0)) # False (directed)
Description: A Trie (also called prefix tree) is an efficient tree-like data structure for storing and retrieving strings. Key features:
Data Structures Used:
-
TrieNode Class:
- Dictionary/Hash Map for children: Enables O(1) access to child nodes
- Boolean flag for word endings: Marks complete words
- Integer counter for prefix frequency: Tracks usage statistics
- Forms tree structure through node linking
-
Trie Class:
- Root node as entry point
- Methods for insertion and search
- Maintains tree invariants
Key advantages over other string storage:
- O(m) lookup time where m is string length
- Space-efficient for common prefixes
- Prefix-based operations like autocomplete
- No hash collisions unlike hash tables
Common applications:
- Autocomplete/type-ahead features
- Spell checkers
- IP routing tables
- Dictionary implementations
- Phone directories
class TrieNode:
"""
A node in the Trie data structure.
Attributes:
children: Dict mapping characters to child TrieNodes
end_of_word: Boolean flag marking if node represents end of valid word
count: Number of words containing the prefix represented by this node
The TrieNode forms the building block of a Trie, storing:
- Links to child nodes in a map for O(1) access
- Flag to mark complete words
- Count to track frequency of prefixes
"""
def __init__(self):
"""Initialize a trie node with empty children map and word flag"""
self.children = {} # Map from char to TrieNode
self.end_of_word = False # Flag to mark end of valid word
self.count = 0 # Track frequency of words with this prefix
class Trie:
def __init__(self):
"""Initialize an empty trie with just a root node"""
self.root = TrieNode()
def insert(self, word):
"""
Insert a word into the trie
Time: O(m) where m is word length
Space: O(m) for new nodes
"""
current = self.root
# Traverse/create path for each character
for ch in word:
if ch not in current.children:
current.children[ch] = TrieNode()
current = current.children[ch]
current.count += 1 # Increment prefix count
current.end_of_word = True
def search(self, word):
"""
Search for exact word in trie
Time: O(m) where m is word length
"""
current = self.root
for ch in word:
if ch not in current.children:
return False
current = current.children[ch]
return current.end_of_word
def starts_with(self, prefix):
"""
Check if any word starts with given prefix
Time: O(p) where p is prefix length
"""
current = self.root
for ch in prefix:
if ch not in current.children:
return False
current = current.children[ch]
return True
def count_prefix(self, prefix):
"""
Count words that start with given prefix
Time: O(p) where p is prefix length
"""
current = self.root
for ch in prefix:
if ch not in current.children:
return 0
current = current.children[ch]
return current.count
# Example usage
trie = Trie()
words = ["apple", "app", "apricot", "bear", "bet"]
for word in words:
trie.insert(word)
print(trie.search("apple")) # True
print(trie.search("app")) # True
print(trie.search("appl")) # False
print(trie.starts_with("app")) # True
print(trie.count_prefix("app")) # 2 - "apple" and "app"
Description: AVL trees are self-balancing binary search trees where the heights of the left and right subtrees of any node differ by at most one. Key features:
- Uses height-balanced property to ensure O(log n) operations
- Each node stores a balance factor = height(left) - height(right)
- When balance factor becomes > 1 or < -1, rotations restore balance
- Maintains sorted order for efficient searching
- More strictly balanced than Red-Black trees, so better search but more expensive updates
- Common in memory-based applications needing fast lookups
The example below shows basic node structure and insertion without rotations. Full implementation requires:
- Balance factor tracking
- Left and right rotations
- Left-Right and Right-Left double rotations
- Height updates after modifications
class AVLNode:
def __init__(self, val):
self.val = val
self.left = None
self.right = None
self.height = 1
# Full AVL operations including rotations are complex;
# This snippet only shows structure setup for demonstration.
```python
def avl_insert(root, val):
# Insert like BST
if root is None:
return AVLNode(val)
if val < root.val:
root.left = avl_insert(root.left, val)
else:
root.right = avl_insert(root.right, val)
# Normally update height and balance factor, then rotate if needed
return root
root_avl = None
for v in [10, 20, 30]:
root_avl = avl_insert(root_avl, v)
# Here's a complete AVL implementation with rotations:
def get_height(node):
"""Get height of node, handling None case"""
if not node:
return 0
return node.height
def get_balance(node):
"""Get balance factor of node (left height - right height)"""
if not node:
return 0
return get_height(node.left) - get_height(node.right)
def update_height(node):
"""Update height of node based on children"""
if node:
node.height = max(get_height(node.left), get_height(node.right)) + 1
def right_rotate(y):
"""Perform right rotation
y x
/ \ / \
x T3 --> T1 y
/ \ / \
T1 T2 T2 T3
"""
x = y.left
T2 = x.right
x.right = y
y.left = T2
# Update heights bottom-up
update_height(y)
update_height(x)
return x
def left_rotate(x):
"""Perform left rotation
x y
/ \ / \
T1 y --> x T3
/ \ / \
T2 T3 T1 T2
"""
y = x.right
T2 = y.left
y.left = x
x.right = T2
# Update heights bottom-up
update_height(x)
update_height(y)
return y
def avl_insert_with_balance(root, val):
"""Insert value into AVL tree while maintaining balance
Returns new root after any necessary rotations"""
# Standard BST insert
if not root:
return AVLNode(val)
# Recursively insert into appropriate subtree
if val < root.val:
root.left = avl_insert_with_balance(root.left, val)
elif val > root.val:
root.right = avl_insert_with_balance(root.right, val)
else:
return root # Duplicate values not allowed
# Update height of current node
update_height(root)
# Get balance factor to check if rebalancing needed
balance = get_balance(root)
# Left Left Case: Single right rotation
if balance > 1 and val < root.left.val:
return right_rotate(root)
# Right Right Case: Single left rotation
if balance < -1 and val > root.right.val:
return left_rotate(root)
# Left Right Case: Left rotation then right rotation
if balance > 1 and val > root.left.val:
root.left = left_rotate(root.left)
return right_rotate(root)
# Right Left Case: Right rotation then left rotation
if balance < -1 and val < root.right.val:
root.right = right_rotate(root.right)
return left_rotate(root)
return root
# Example usage showing tree balancing:
root = None
values = [10, 20, 30, 40, 50, 25] # Would cause right-heavy imbalance without AVL
print("Inserting values:", values)
for val in values:
root = avl_insert_with_balance(root, val)
print(f"Inserted {val}, root is now {root.val}")
Description: B-Trees are self-balancing search trees designed for efficient disk access and database operations. Unlike binary trees, a B-Tree node can have multiple keys and children. Key features:
- Each node can have M-1 keys and M children (where M is the order/degree)
- All leaves are at the same level (perfect balance)
- Nodes are typically sized to match disk blocks for I/O efficiency
- Great for databases and file systems due to reduced disk seeks
- Maintains sorted data for range queries
- Guarantees O(log n) operations even with large datasets
The implementation below shows a simplified node structure, though actual B-Tree operations like insertion and deletion require careful handling of node splits and merges to maintain the B-Tree properties.
class BTreeNode:
def __init__(self, t):
"""Initialize B-Tree node with minimum degree t"""
self.t = t # Minimum degree
self.keys = [] # List of keys
self.children = [] # List of child nodes
self.leaf = True # Is this a leaf node
self.n = 0 # Current number of keys
class BTree:
def __init__(self, t):
"""Initialize empty B-Tree with minimum degree t"""
self.root = BTreeNode(t)
self.t = t
def search(self, k):
"""Search for key k in the B-Tree"""
return self._search_recursive(self.root, k)
def _search_recursive(self, x, k):
"""Helper method to recursively search for key k"""
i = 0
# Find the first key greater than or equal to k
while i < x.n and k > x.keys[i]:
i += 1
# If we found the key
if i < x.n and k == x.keys[i]:
return (x, i)
# If we're at a leaf, key not found
if x.leaf:
return None
# Recurse to appropriate child
return self._search_recursive(x.children[i], k)
def insert(self, k):
"""Insert key k into the B-Tree"""
root = self.root
# Handle root splitting
if root.n == (2 * self.t) - 1:
new_root = BTreeNode(self.t)
new_root.leaf = False
new_root.children.append(self.root)
self._split_child(new_root, 0)
self.root = new_root
self._insert_non_full(self.root, k)
# Example usage
btree = BTree(t=3) # Create B-Tree with minimum degree 3
# Insert some keys
keys = [10, 20, 5, 6, 12, 30, 7, 17]
for k in keys:
btree.insert(k)
# Search for keys
print(btree.search(6) is not None) # True
print(btree.search(15) is not None) # False
Description: A segment tree is a specialized tree data structure for efficiently performing range queries and updates on an array. Key data structures and design choices:
Data Structures Used:
-
Array-based Binary Tree:
- Uses array to represent complete binary tree structure
- Parent at index i has children at 2i and 2i+1
- Enables O(1) navigation between nodes
- More cache-friendly than pointer-based tree
- Requires 4n space to handle all cases
-
Internal Nodes:
- Store aggregate value (sum) for their range
- Enable efficient range decomposition
- Allow O(log n) query/update operations
- Form a hierarchical structure of ranges
-
Leaf Nodes:
- Contain actual array values
- Form base level of tree
- Enable point updates to propagate up
- Map 1:1 with input array elements
The segment tree excels at:
- Range sum/min/max queries in O(log n)
- Point updates while maintaining ranges
- Problems requiring fast range operations
- Space-efficient range processing
Key advantages over alternatives:
- Faster than naive O(n) range sum approach
- More flexible than prefix sum arrays
- Supports both queries and updates efficiently
- Simple array-based implementation
class SegmentTree:
"""
A segment tree data structure that supports:
- Range queries in O(log n) time
- Point updates in O(log n) time
- Space complexity O(4n)
Implementation details:
- Uses array representation of binary tree
- Each node stores sum of its range
- Parent node at index i has children at 2i and 2i+1
- Leaf nodes contain actual array values
"""
def __init__(self, arr):
self.n = len(arr)
# Tree array needs 4n space to handle all cases
self.tree = [0] * (4 * self.n)
if self.n > 0:
self._build_tree(arr, 0, self.n-1, 1)
def _build_tree(self, arr, start, end, node):
"""Recursively build segment tree from input array"""
# Base case - leaf node
if start == end:
self.tree[node] = arr[start]
return self.tree[node]
# Recursively build left and right subtrees
mid = (start + end) // 2
left_sum = self._build_tree(arr, start, mid, 2*node)
right_sum = self._build_tree(arr, mid+1, end, 2*node+1)
# Internal node stores sum of children
self.tree[node] = left_sum + right_sum
return self.tree[node]
def range_sum(self, left, right):
"""Get sum of elements from index left to right inclusive"""
if left < 0 or right >= self.n:
raise ValueError("Range out of bounds")
return self._range_sum_util(0, self.n-1, left, right, 1)
def _range_sum_util(self, start, end, left, right, node):
"""Helper method for range sum query"""
# No overlap
if right < start or left > end:
return 0
# Complete overlap
if left <= start and right >= end:
return self.tree[node]
# Partial overlap - recurse on children
mid = (start + end) // 2
return (self._range_sum_util(start, mid, left, right, 2*node) +
self._range_sum_util(mid+1, end, left, right, 2*node+1))
# Example usage
arr = [1, 2, 3, 4, 5]
seg_tree = SegmentTree(arr)
print(seg_tree.range_sum(1, 3)) # Sum of arr[1..3] = 2+3+4 = 9
print(seg_tree.range_sum(0, 4)) # Sum of entire array = 15
Description: A Fenwick Tree (Binary Indexed Tree) is a space-efficient data structure that provides efficient methods for calculating prefix sums in a dynamic array.
Data Structures Used:
-
Array/List as Primary Structure:
- Stores partial sums in a flat array format
- Uses 1-based indexing for simpler arithmetic
- Requires only O(n) space vs O(4n) for segment trees
- Enables O(log n) updates and queries through binary arithmetic
- More cache-friendly than pointer-based trees
-
Binary Number Properties:
- Leverages binary representation of indices
- Uses least significant bit (LSB) for tree traversal
- Each index i covers range of size LSB(i)
- Parent/child relationships implicit in binary form
- No explicit pointers needed between nodes
Key advantages over other structures:
- More memory efficient than segment trees
- Simpler implementation than other range query structures
- Cache-friendly array-based storage
- Fast O(log n) updates and queries
- No pointer overhead or complex tree maintenance
Common applications:
- Range sum queries in dynamic arrays
- Cumulative frequency tables
- Count of inversions in an array
- Rectangle sum queries in 2D arrays
- Dynamic ranking systems
The structure achieves its efficiency by using a clever binary representation technique where each index i is responsible for storing the sum of elements determined by its rightmost set bit. Updates and queries traverse the implicit tree structure using bitwise operations (i & -i) to efficiently move between levels. This binary arithmetic approach eliminates the need for explicit parent-child pointers while maintaining O(log n) performance.
class FenwickTree:
"""
A Fenwick Tree (Binary Indexed Tree) data structure that supports:
- Point updates in O(log n) time
- Range sum queries in O(log n) time
- Space efficient O(n) storage
Key operations:
- update(i, delta): Add delta to element at index i
- prefix_sum(i): Get sum of elements from 1 to i
- range_sum(l, r): Get sum of elements from l to r
- get(i): Get value at index i
- set(i, val): Set value at index i
Implementation details:
- Uses 1-based indexing internally
- Leverages binary arithmetic (i & -i) for tree traversal
- Each node implicitly stores partial sums based on binary representation
- More space efficient than segment tree (n vs 4n nodes)
"""
def __init__(self, n):
"""Initialize Fenwick tree with size n"""
self.size = n
self.fw = [0] * (n + 1) # 1-based indexing
def update(self, i, delta):
"""Add delta to element at index i
Time complexity: O(log n)
"""
if i <= 0 or i > self.size:
raise ValueError("Index out of bounds")
while i <= self.size:
self.fw[i] += delta
i += i & (-i) # Add least significant bit
def prefix_sum(self, i):
"""Get sum of elements from index 1 to i inclusive
Time complexity: O(log n)
"""
if i <= 0:
return 0
if i > self.size:
i = self.size
total = 0
while i > 0:
total += self.fw[i]
i -= i & (-i) # Remove least significant bit
return total
def range_sum(self, left, right):
"""Get sum of elements from index left to right inclusive"""
if left > right:
raise ValueError("Left bound larger than right bound")
return self.prefix_sum(right) - self.prefix_sum(left - 1)
def get(self, i):
"""Get value at index i"""
return self.range_sum(i, i)
def set(self, i, val):
"""Set value at index i"""
delta = val - self.get(i)
self.update(i, delta)
# Example usage showing key operations
arr = [1, 2, 3, 4, 5]
ft = FenwickTree(len(arr))
# Build tree from array
for i, val in enumerate(arr, 1): # 1-based indexing
ft.update(i, val)
print(ft.prefix_sum(3)) # Sum of first 3 elements: 6
print(ft.range_sum(2, 4)) # Sum of elements 2-4: 9
ft.set(2, 10) # Change value at index 2 to 10
print(ft.range_sum(2, 4)) # New sum of elements 2-4: 17
Description: Union-Find (also known as Disjoint Set) is a data structure that efficiently tracks disjoint (non-overlapping) sets of elements. It uses several key data structures working together:
Data Structures Used:
-
Array/List for Parent Pointers:
- Maps each element to its parent element
- Forms tree structure implicitly through indices
- Enables O(1) access to parent nodes
- Path compression updates these pointers to root
-
Array/List for Ranks:
- Tracks approximate height/depth of each tree
- Used to keep trees balanced during unions
- Prevents degenerate linear chains
- Critical for O(log n) height bound
-
Array/List for Set Sizes:
- Maintains count of elements in each set
- Enables O(1) size queries
- Useful for keeping sets balanced
- Updated during union operations
The data structure maintains a forest of trees where each tree represents a set, with two key optimizations:
-
Path Compression:
- During find operations, updates parent pointers to point directly to root
- Flattens tree structure over time
- Reduces future traversal costs
- Critical for amortized near-constant time
-
Union by Rank:
- Attaches shorter tree under root of taller tree
- Prevents trees from becoming too deep
- Maintains logarithmic height bound
- Works with path compression for efficiency
This combination makes it extremely efficient for:
- Finding connected components in graphs
- Detecting cycles in graphs
- Tracking equivalence classes
- Implementing Kruskal's MST algorithm
The structure achieves nearly O(1) amortized time complexity for both union and find operations through the synergy of these optimizations and data structures.
class UnionFind:
"""
A data structure for efficiently managing disjoint sets of elements.
Key features:
- Uses path compression and union by rank optimizations
- Near O(1) amortized time for operations
- Tracks set sizes and total number of disjoint sets
- Perfect for connectivity problems and finding connected components
Operations:
- find(x): Find set representative for element x
- union(x,y): Merge sets containing x and y
- connected(x,y): Check if x and y are in same set
- get_size(x): Get size of set containing x
Time Complexity:
- All operations are O(α(n)) amortized, where α is inverse Ackermann
- For practical purposes, O(1) since α(n) ≤ 4 for all reasonable n
Space Complexity: O(n) to store parent pointers and ranks
"""
def __init__(self, n):
"""Initialize disjoint set with n elements"""
self.parent = list(range(n)) # Each element starts as its own parent
self.rank = [0] * n # Track tree heights for union by rank
self.size = [1] * n # Track size of each set
self.count = n # Number of disjoint sets
def find(self, x):
"""Find set representative with path compression"""
if self.parent[x] != x:
# Recursively set parent to root and compress path
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
"""Union two sets by rank"""
rx, ry = self.find(x), self.find(y)
if rx == ry:
return False # Already in same set
# Union by rank - attach smaller rank tree under root of higher rank tree
if self.rank[rx] < self.rank[ry]:
rx, ry = ry, rx
self.parent[ry] = rx
self.size[rx] += self.size[ry]
# If same rank, increment rank of root
if self.rank[rx] == self.rank[ry]:
self.rank[rx] += 1
self.count -= 1 # Decrease number of disjoint sets
return True
def connected(self, x, y):
"""Check if two elements are in the same set"""
return self.find(x) == self.find(y)
def get_size(self, x):
"""Get size of set containing element x"""
return self.size[self.find(x)]
# Example usage showing key operations
uf = UnionFind(5)
print(uf.connected(0, 1)) # False - different sets initially
uf.union(0, 1) # Union first two elements
uf.union(2, 3) # Union next two elements
print(uf.connected(0, 1)) # True - now connected
print(uf.get_size(0)) # 2 - size of set containing 0
print(uf.count) # 3 - three disjoint sets remain
Description: Bubble sort is a simple comparison-based sorting algorithm that demonstrates fundamental sorting concepts. Key features:
Data Structures Used:
-
Array/List as Primary Structure:
- Stores elements to be sorted in contiguous memory
- Enables O(1) access to adjacent elements for comparisons
- Modified in-place without auxiliary storage
- Maintains stable ordering of equal elements
- Size remains fixed during sorting
-
Boolean Flag for Optimization:
- Tracks if any swaps occurred in a pass
- Enables early termination if array is sorted
- Improves best case to O(n) for nearly sorted arrays
- Simple but effective optimization technique
Key advantages:
- Simple implementation with minimal extra space O(1)
- Stable sorting preserves relative order
- Works well for small or nearly sorted arrays
- Can detect sorted arrays early
- Good for teaching sorting concepts
Limitations:
- O(n²) time complexity makes it inefficient for large arrays
- Requires many array accesses and swaps
- Not suitable for large datasets
- Other algorithms like quicksort are usually better
The algorithm gets its name from the way smaller elements "bubble up" to their correct positions with each pass through the array. While not efficient for large datasets, it remains useful for:
- Educational purposes demonstrating sorting concepts
- Small arrays where simplicity is preferred
- Nearly sorted arrays that need minor corrections
- When stable sorting is required with minimal space
def bubble_sort(arr):
"""
Sorts array in-place using bubble sort algorithm by:
1. Making multiple passes through array
2. Comparing adjacent elements and swapping if out of order
3. Optimizing by stopping if no swaps needed (array is sorted)
Time: O(n²), Space: O(1)
"""
n = len(arr)
for i in range(n):
# Flag to detect if any swaps happened in this pass
swapped = False
# Last i elements are already in place
for j in range(n - i - 1):
# Compare adjacent elements
if arr[j] > arr[j + 1]:
# Swap them if they are in wrong order
arr[j], arr[j + 1] = arr[j + 1], arr[j]
swapped = True
# If no swapping occurred, array is already sorted
if not swapped:
break
# Example showing algorithm's behavior
data = [64, 34, 25, 12, 22, 11, 90]
print("Original array:", data)
bubble_sort(data)
print("Sorted array:", data) # [11, 12, 22, 25, 34, 64, 90]
# Test with different input patterns
nearly_sorted = [1, 2, 4, 3, 5] # Nearly sorted - best case
bubble_sort(nearly_sorted)
print("Nearly sorted case:", nearly_sorted) # [1, 2, 3, 4, 5]
reverse_sorted = [5, 4, 3, 2, 1] # Worst case
bubble_sort(reverse_sorted)
print("Reverse sorted case:", reverse_sorted) # [1, 2, 3, 4, 5]
Description: Insertion sort is a simple sorting algorithm that builds a sorted portion of the array one element at a time. Key features:
Data Structures Used:
-
Array/List:
- Primary data structure that is sorted in-place
- Maintains sorted portion at beginning
- Unsorted portion follows sorted portion
- Enables O(1) access for comparisons and swaps
- No extra space needed beyond input array
-
Temporary Variable:
- Stores current element being inserted
- Enables shifting of elements without losing value
- Acts as swap space during insertions
The algorithm is ideal for:
- Small arrays/lists
- Nearly sorted data
- Online sorting (processing elements as they arrive)
- Linked list sorting due to local swaps
Key advantages:
- O(1) extra space - sorts in place
- O(n) best case for nearly sorted arrays
- Stable sort - maintains relative order of equal elements
- Simple implementation
- Adaptive - runs faster on partially sorted arrays
def insertion_sort(arr):
"""
Builds sorted portion of array one element at a time by:
1. Taking next unsorted element
2. Shifting larger sorted elements right
3. Inserting element in correct position
Time: O(n²), Space: O(1)
"""
# Iterate through array starting from second element
for i in range(1, len(arr)):
# Store current element to insert
key = arr[i]
# Start comparing with previous sorted elements
j = i - 1
# Shift elements right while they're larger than key
while j >= 0 and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
# Insert key in correct position
arr[j + 1] = key
# Example showing algorithm's behavior
data = [64, 34, 25, 12, 22, 11, 90]
print("Original array:", data)
insertion_sort(data)
print("Sorted array:", data) # [11, 12, 22, 25, 34, 64, 90]
# Test with different input patterns
nearly_sorted = [1, 2, 4, 3, 5] # Nearly sorted
insertion_sort(nearly_sorted)
print("Nearly sorted case:", nearly_sorted) # [1, 2, 3, 4, 5]
reverse_sorted = [5, 4, 3, 2, 1] # Worst case
insertion_sort(reverse_sorted)
print("Reverse sorted case:", reverse_sorted) # [1, 2, 3, 4, 5]
Description: Selection sort is an in-place comparison sorting algorithm that relies on several key data structures and properties:
Data Structures Used:
-
Input Array:
- Serves as both input and output storage
- Divided conceptually into sorted and unsorted regions
- Modified in-place through swapping
- Enables O(1) access to elements
- No extra space needed beyond input array
- Size remains constant throughout sorting
-
Two Logical Regions:
-
Sorted Region (Left Side):
- Grows from left to right
- Contains elements in final sorted order
- Elements never move once placed
- Size increases by 1 each iteration
-
Unsorted Region (Right Side):
- Shrinks from right to left
- Contains remaining elements to sort
- Scanned fully to find each minimum
- Size decreases by 1 each iteration
-
Key Properties:
- In-Place: Uses O(1) extra space by modifying input array directly
- Unstable: Can change relative order of equal elements through swapping
- Quadratic Time: O(n²) in all cases due to nested loops
- Simple Implementation: Uses basic array operations only
- Not Adaptive: Running time independent of input order
- Minimal Memory Writes: O(n) swaps vs O(n²) comparisons
def selection_sort(arr):
"""
In-place comparison sorting algorithm that:
1. Divides array into sorted and unsorted regions
2. Repeatedly finds minimum from unsorted region
3. Swaps it with first unsorted element
Time: O(n²), Space: O(1)
"""
n = len(arr)
# Iterate through array, growing sorted region
for i in range(n):
# Find minimum element in unsorted region
min_idx = i
for j in range(i+1, n):
if arr[j] < arr[min_idx]:
min_idx = j
# Swap minimum with first unsorted element
# Only swap if needed to minimize operations
if min_idx != i:
arr[i], arr[min_idx] = arr[min_idx], arr[i]
# Example showing algorithm's behavior
data = [64, 34, 25, 12, 22, 11, 90]
print("Original array:", data)
selection_sort(data)
print("Sorted array:", data) # [11, 12, 22, 25, 34, 64, 90]
print(data) # [1,2,3]
Description: Merge sort is a divide-and-conquer sorting algorithm that relies on several key data structures and properties:
Data Structures Used:
-
Input Array:
- Stores original unsorted data
- Divided into subarrays during recursion
- Modified in-place during merging
- Enables O(1) access to elements
-
Temporary Arrays (Left/Right):
- Created during merge step
- Store sorted subarrays temporarily
- Enable stable merging of elements
- Total O(n) extra space required
- Freed after each merge completes
-
Recursion Stack:
- Implicitly tracks subproblems
- Stores local variables and return addresses
- Maximum depth O(log n)
- Enables divide-and-conquer strategy
Key Properties:
- Stability: Preserves relative order of equal elements by using <= in merge
- Parallelizable: Subarrays can be sorted independently
- Cache-friendly: Sequential access patterns during merge
- External sort friendly: Can efficiently merge sorted files
Time/Space Analysis:
- Time: O(n log n) guaranteed
- log n levels of recursion
- O(n) work per level during merges
- Space: O(n) auxiliary space
- O(n) for temporary arrays
- O(log n) for recursion stack
Ideal Use Cases:
- Sorting linked lists (no random access needed)
- External sorting of large files
- When stable sorting is required
- When predictable performance matters more than space
def merge(arr, left, right):
"""
Merges two sorted arrays into a single sorted array.
Uses two pointers to track positions in subarrays being merged.
"""
i = j = k = 0
# Compare elements from both arrays and merge in sorted order
while i < len(left) and j < len(right):
if left[i] <= right[j]: # Use <= for stability
arr[k] = left[i]
i += 1
else:
arr[k] = right[j]
j += 1
k += 1
# Copy remaining elements from left array, if any
while i < len(left):
arr[k] = left[i]
i += 1
k += 1
# Copy remaining elements from right array, if any
while j < len(right):
arr[k] = right[j]
j += 1
k += 1
def merge_sort(arr):
"""
Divide-and-conquer sorting algorithm that:
1. Divides array into two halves
2. Recursively sorts the halves
3. Merges sorted halves using auxiliary arrays
Time: O(n log n), Space: O(n)
"""
if len(arr) <= 1: # Base case
return
# Divide array into two halves
mid = len(arr) // 2
left = arr[:mid]
right = arr[mid:]
# Recursively sort the halves
merge_sort(left)
merge_sort(right)
# Merge sorted halves
merge(arr, left, right)
# Example usage showing divide-and-conquer and stability
data = [3, 1, 4, 1, 5, 9, 2, 6, 5]
merge_sort(data)
print(data) # [1, 1, 2, 3, 4, 5, 5, 6, 9]
Description: Quick sort is a highly efficient, in-place sorting algorithm that uses a divide-and-conquer strategy.
Data Structures Used:
-
Input Array:
- Used directly for in-place sorting without auxiliary storage
- Enables O(1) swaps between elements
- Provides good cache locality during partitioning
- Allows random access for efficient pivoting
-
Call Stack:
- Implicitly used for recursive calls
- Stores function parameters and local variables
- Enables divide-and-conquer by tracking subproblems
- Depth is O(log n) on average, O(n) worst case
-
Two Pointers:
- Left and right indices scan array during partition
- Enable efficient element comparison and swapping
- Help maintain partition invariants
- Allow in-place partitioning without extra space
Key advantages of these structures:
- In-place sorting saves memory
- Good cache performance from array locality
- Minimal auxiliary space needed
- Efficient element swapping
The algorithm works by selecting a 'pivot' element and partitioning the array around it, such that smaller elements go to the left and larger elements go to the right. The partition operation uses the two pointers to scan the array from both ends, swapping elements when necessary.
Time complexity:
- Average case: O(n log n) - balanced partitions
- Worst case: O(n²) - poorly balanced partitions (e.g. sorted array)
Space complexity:
- Average case: O(log n) - recursive call stack depth
- Worst case: O(n) - unbalanced recursion
Quick sort is widely used in practice due to its excellent cache performance from in-place sorting and generally good average case performance. The choice of pivot strategy can significantly impact performance.
def partition(arr, low, high):
"""
Partitions array around a pivot using two-pointer technique.
Returns final pivot position.
"""
# Choose rightmost element as pivot
pivot = arr[high]
i = low - 1 # Index of smaller element
# Move elements smaller than pivot to left side
for j in range(low, high):
if arr[j] <= pivot:
i += 1
arr[i], arr[j] = arr[j], arr[i]
# Place pivot in correct position
arr[i + 1], arr[high] = arr[high], arr[i + 1]
return i + 1
def quick_sort_helper(arr, low, high):
"""
Recursive helper that sorts array in-place using partitioning.
"""
if low < high:
# Get pivot position after partitioning
pivot_idx = partition(arr, low, high)
# Recursively sort elements before and after pivot
quick_sort_helper(arr, low, pivot_idx - 1)
quick_sort_helper(arr, pivot_idx + 1, high)
def quick_sort(arr):
"""
Main quicksort function that sorts array in-place.
"""
quick_sort_helper(arr, 0, len(arr) - 1)
return arr
# Example usage showing different cases
data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
quick_sort(data)
print(data) # [1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 9]
Description: Binary search is an efficient search algorithm that works on sorted arrays by repeatedly dividing the search interval in half.
Data Structures Used:
-
Sorted Array:
- Primary data structure that enables O(1) random access
- Must be sorted to allow eliminating half the search space
- Enables comparison-based decisions at each step
- Supports efficient memory access patterns
- Common implementations use arrays/lists
-
Three Pointers:
- Low: Tracks start of current search range
- Mid: Divides range for comparison with target
- High: Tracks end of current search range
- Together enable O(1) space complexity
- Allow efficient range manipulation
The algorithm compares the middle element with the target: if equal, we've found it; if target is greater, we search the right half; if smaller, we search the left half. This divide-and-conquer approach gives binary search a time complexity of O(log n), making it much faster than linear search O(n) for large datasets.
Key advantages:
- O(log n) time complexity through halving
- O(1) space as only pointers needed
- Cache-friendly array access pattern
- Highly efficient for large sorted datasets
Common applications:
- Dictionary/phone book lookups
- Database indexing and queries
- Finding insertion points
- Root finding algorithms
- Package version resolution
The key requirement is that the input array must be sorted, as the algorithm relies on the ordering to eliminate half the remaining elements in each step. This makes binary search ideal for frequently searched but rarely modified data.
def binary_search(arr, target):
"""
Performs binary search to find target in a sorted array.
Args:
arr: Sorted array to search in
target: Value to find
Returns:
Index of target if found, -1 if not found
"""
# Initialize pointers to track search range
low = 0
high = len(arr) - 1
# Keep searching while valid range exists
while low <= high:
# Find middle element
mid = low + (high - low) // 2 # Prevents integer overflow
# Check if middle element is target
if arr[mid] == target:
return mid
# If target is greater, ignore left half
elif arr[mid] < target:
low = mid + 1
# If target is smaller, ignore right half
else:
high = mid - 1
# Target not found
return -1
# Example usage showing different cases
sorted_arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(binary_search(sorted_arr, 1)) # 0 (first element)
print(binary_search(sorted_arr, 5)) # 4 (middle element)
print(binary_search(sorted_arr, 10)) # 9 (last element)
print(binary_search(sorted_arr, 11)) # -1 (not found)
Description: BFS (Breadth-First Search) is a fundamental graph traversal algorithm that uses two key data structures:
Data Structures Used:
-
Queue (collections.deque):
- Stores nodes to be processed in FIFO order
- O(1) enqueue/dequeue operations
- Naturally creates level-by-level traversal
- Maintains frontier between explored and unexplored nodes
- Typically implemented with collections.deque for efficiency
- Size is O(V) in worst case for complete graphs
-
Visited Set (Hash Set):
- Tracks already explored nodes
- O(1) lookups to check for visited nodes
- Prevents cycles and re-processing
- Essential for graphs with cycles
- Implemented as hash table/set for efficiency
- Size is O(V) to store all vertices
The algorithm works by:
- Starting from source node, add to queue and visited set
- While queue not empty:
- Dequeue next node to process
- Add all unvisited neighbors to queue and visited set
- Queue ensures level-by-level traversal
- Visited set prevents cycles
Key advantages of this approach:
- Guarantees shortest path in unweighted graphs
- Visits nodes in order of distance from source
- Memory efficient O(V) space complexity
- Handles disconnected components if needed
- Natural for level-order tree traversal
Time complexity is O(V + E) since we:
- Visit each vertex once O(V)
- Explore each edge once O(E) Space complexity is O(V) for both queue and visited set.
from collections import deque
def bfs(graph, start):
"""
Performs breadth-first search traversal of a graph starting from given node.
Args:
graph: Dictionary representing adjacency list of graph
start: Starting vertex for traversal
Returns:
List of nodes in BFS traversal order
"""
# Initialize visited set and queue with start node
visited = set([start])
queue = deque([start])
traversal = [] # Track traversal order
# Process nodes level by level using queue
while queue:
# Get next node from front of queue
current = queue.popleft()
traversal.append(current)
# Add unvisited neighbors to queue
for neighbor in graph[current]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
return traversal
# Example usage:
# Graph represented as adjacency list
graph = {
0: [1, 2], # Node 0 connected to 1,2
1: [2, 3], # Node 1 connected to 2,3
2: [3, 4], # Node 2 connected to 3,4
3: [4], # Node 3 connected to 4
4: [] # Node 4 has no outgoing edges
}
# Perform BFS from node 0
result = bfs(graph, 0)
print(result) # [0, 1, 2, 3, 4]
Description: DFS (Depth-First Search) is a graph traversal algorithm that explores a graph by going as deep as possible along each branch before backtracking.
Data Structures Used:
-
Stack (Implicit or Explicit):
- Handles Last-In-First-Out (LIFO) order needed for backtracking
- Recursive implementation uses call stack implicitly
- Iterative version uses explicit stack data structure
- Enables O(1) push/pop operations for tracking path
- Natural fit for backtracking pattern
-
Visited Set/Hash Table:
- Tracks explored nodes to avoid cycles
- Provides O(1) lookups to check if node visited
- Essential for handling graphs with cycles
- Grows proportionally to number of vertices
- Prevents infinite loops in cyclic graphs
-
Graph Representation (Adjacency List):
- Maps vertices to lists of neighbors
- Enables O(1) access to a node's neighbors
- Space efficient for sparse graphs
- Supports both directed and undirected graphs
- Allows flexible graph structures
Key advantages of DFS:
- Memory efficient for deep graphs compared to BFS
- Natural fit for recursive problems
- Good for topological sorting
- Path finding in maze-like structures
- Tree/graph structure analysis
Time Complexity: O(V + E) where:
- V is number of vertices (each vertex visited once)
- E is number of edges (each edge explored once)
- Stack operations are O(1)
- Visited set operations are O(1)
Space Complexity: O(V) for:
- Visited set to track explored nodes
- Recursion/explicit stack in worst case
- Linear space usage in deepest case
- Proportional to graph depth
def dfs_recursive(graph, start, visited=None):
# Initialize visited set if this is the first call
if visited is None:
visited = set()
# Mark current node as visited and process it
visited.add(start)
print(start) # Process node (can be modified based on needs)
# Recursively visit all unvisited neighbors
for neighbor in graph[start]:
if neighbor not in visited:
dfs_recursive(graph, neighbor, visited)
def dfs_iterative(graph, start):
# Use explicit stack instead of recursion
visited = set()
stack = [start]
while stack:
node = stack.pop()
if node not in visited:
visited.add(node)
print(node) # Process node
# Add unvisited neighbors to stack
# Reverse to maintain similar order to recursive
for neighbor in reversed(graph[node]):
if neighbor not in visited:
stack.append(neighbor)
# Example usage with both implementations
g = {0: [1, 2], 1: [2], 2: [3], 3: []}
print("Recursive DFS:")
dfs_recursive(g, 0) # 0,1,2,3
print("\nIterative DFS:")
dfs_iterative(g, 0) # 0,1,2,3
Description: Finds the shortest path from a start node to all others in a weighted graph with no negative edges. Uses a min-heap priority queue to always process the node with smallest current distance first. Maintains a distance dictionary to track shortest paths. The priority queue ensures we process nodes in order of increasing distance, making it more efficient than checking all edges repeatedly. Time complexity is O((V+E)logV) where V is number of vertices and E is number of edges. Space complexity is O(V) for the distance dictionary and priority queue.
import heapq
from collections import defaultdict
def dijkstra(graph, start):
# Initialize distances and previous nodes for path reconstruction
dist = {node: float('inf') for node in graph}
prev = {node: None for node in graph}
dist[start] = 0
# Priority queue stores (distance, node) tuples
pq = [(0, start)]
visited = set()
while pq:
current_dist, u = heapq.heappop(pq)
# Skip if we've found a better path already
if u in visited:
continue
visited.add(u)
# Explore neighbors
for v, w in graph[u]:
if v in visited:
continue
# Relaxation step
new_dist = dist[u] + w
if new_dist < dist[v]:
dist[v] = new_dist
prev[v] = u
heapq.heappush(pq, (new_dist, v))
return dist, prev
def get_shortest_path(prev, start, end):
path = []
current = end
while current is not None:
path.append(current)
current = prev[current]
return path[::-1]
# Example weighted directed graph represented as adjacency list
# Each neighbor is a tuple of (vertex, weight)
weighted_graph = {
'A': [('B',4), ('C',2)],
'B': [('C',1), ('D',5)],
'C': [('B',1), ('D',8), ('E',10)],
'D': [('E',2)],
'E': []
}
# Find shortest paths from A
distances, previous = dijkstra(weighted_graph, 'A')
print(f"Distances from A: {distances}")
# Get shortest path from A to E
path = get_shortest_path(previous, 'A', 'E')
print(f"Shortest path from A to E: {path}")
# Output:
# Distances from A: {'A': 0, 'B': 3, 'C': 2, 'D': 8, 'E': 10}
# Shortest path from A to E: ['A', 'C', 'B', 'D', 'E']
Description: The Bellman-Ford algorithm finds shortest paths in a weighted graph that can handle negative edge weights (unlike Dijkstra's), with the constraint that there cannot be negative cycles. Here's a detailed breakdown:
Data Structures:
- Graph representation: List of edges where each edge is a tuple (u,v,w) representing edge from u to v with weight w
- Distance dictionary (dist): Maps each vertex to its current shortest distance from start vertex
- Initially all distances set to infinity except start vertex (0)
- Gets updated as shorter paths are found
- Key: vertex, Value: current shortest distance
Key Algorithm Steps:
- Initialize distances - O(V) space for distance dictionary
- Relax all edges V-1 times:
- For each edge (u,v,w), if dist[u] + w < dist[v], update dist[v]
- V-1 iterations because longest possible path without cycles visits each vertex once
- Each iteration processes all E edges
- Optional: Check for negative cycles by trying one more relaxation
- If any distance can still be reduced, negative cycle exists
Time Complexity: O(VE) - must process all E edges, V-1 times Space Complexity: O(V) for the distance dictionary
Key Advantages:
- Can handle negative edge weights (unlike Dijkstra's)
- Simpler implementation than Dijkstra's (no priority queue needed)
- Can detect negative cycles
- Guarantees shortest paths if no negative cycles exist
The algorithm is especially useful for:
- Networks with negative weights (e.g., forex trading)
- Detecting arbitrage opportunities
- Situations where simple implementation is preferred over speed
def bellman_ford(graph, start):
"""
Find shortest paths from start vertex to all vertices in weighted graph.
Handles negative edge weights and detects negative cycles.
Args:
graph: List of edge tuples (u,v,w) where u,v are vertices and w is weight
start: Starting vertex
Returns:
dist: Dictionary mapping each vertex to shortest distance from start
None if negative cycle exists
Time Complexity: O(VE) where V is vertices and E is edges
Space Complexity: O(V) for distance dictionary
"""
# Initialize distances
dist = {}
for u,v,w in graph:
dist[u] = float('inf')
dist[v] = float('inf')
dist[start] = 0
# Get number of vertices
V = len(dist)
# Relax all edges V-1 times
for i in range(V-1):
for u,v,w in graph:
if dist[u] != float('inf') and dist[u] + w < dist[v]:
dist[v] = dist[u] + w
# Check for negative cycles
for u,v,w in graph:
if dist[u] != float('inf') and dist[u] + w < dist[v]:
return None # Negative cycle exists
return dist
# Example usage with negative weights
edges = [
('A','B',4), ('B','C',3), ('A','C',5),
('C','D',-2), ('D','B',-1), ('B','E',2),
('E','D',3)
]
# Find shortest paths from A
distances = bellman_ford(edges, 'A')
if distances:
print("Shortest distances from A:", distances)
else:
print("Graph contains negative cycle")
# Example with negative cycle
negative_cycle = [
('A','B',2), ('B','C',-3),
('C','A',-1) # Creates cycle A->B->C->A with total weight -2
]
result = bellman_ford(negative_cycle, 'A')
print("Graph with negative cycle:", result) # None
Description: The Floyd-Warshall algorithm finds shortest paths between all pairs of vertices in a weighted graph, including those with negative edges but no negative cycles. Key features:
- Uses a 2D distance matrix/table to track shortest paths between every vertex pair
- Dynamically builds up paths by considering all vertices as potential intermediate points
- Matrix[i][j] represents shortest path from vertex i to j
- O(V^3) time complexity where V is number of vertices
- O(V^2) space complexity for the distance matrix
- More efficient than running Dijkstra's V times for dense graphs
- Great for:
- Finding all-pairs shortest paths in one pass
- Detecting negative cycles (if diagonal becomes negative)
- Computing transitive closure of a graph
- Finding maximum paths by negating edge weights
def floyd_warshall(matrix):
"""
Finds shortest paths between all pairs of vertices in a weighted graph.
Handles negative edges but detects negative cycles.
Time: O(V^3), Space: O(V^2) where V is number of vertices
"""
# Input validation
if not matrix or not matrix[0]:
return []
n = len(matrix)
if any(len(row) != n for row in matrix):
raise ValueError("Matrix must be square")
# Initialize distance matrix with copy of input
# Using float('inf') for proper infinity handling
dist = [[float('inf') if x == INF else x for x in row] for row in matrix]
# Floyd-Warshall algorithm
# For each vertex k as intermediate point
for k in range(n):
for i in range(n):
for j in range(n):
# Skip if path through k involves infinity
if dist[i][k] == float('inf') or dist[k][j] == float('inf'):
continue
# Update if path through k is shorter
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])
# Check for negative cycles by examining diagonal
# If any vertex has negative path to itself, there's a negative cycle
for i in range(n):
if dist[i][i] < 0:
raise ValueError("Graph contains negative cycle")
return dist
# Example demonstrating key features:
# - Handling of infinite distances
# - Finding paths through intermediate vertices
# - Different path lengths including direct and indirect paths
INF = float('inf')
matrix = [
# Direct paths: A->B=5, A->D=10
# Indirect path: A->B->C->D = 9 (shorter than direct A->D)
[0, 5, INF, 10],
[INF, 0, 3, INF],
[INF, INF, 0, 1],
[INF, INF, INF, 0]
]
result = floyd_warshall(matrix)
# Print formatted results showing discovered paths
print("Shortest paths matrix:")
for i, row in enumerate(result):
formatted = [f"{x:^7.0f}" if x != float('inf') else " INF " for x in row]
print(f"From vertex {i}: {formatted}")
# Shortest paths matrix
Description: Topological sort orders vertices in a Directed Acyclic Graph (DAG) such that all edges point forward in the ordering. Key data structures used:
-
Dictionary/Hash Map for graph representation:
- Maps vertices to lists of neighbors
- Provides O(1) access to adjacency lists
- Space efficient for sparse graphs
- Easy to iterate over edges
-
Queue for processing zero in-degree vertices:
- Enables breadth-first processing order
- O(1) enqueue/dequeue operations
- Maintains FIFO ordering of vertices
- Key to Kahn's algorithm approach
-
Dictionary for tracking in-degree counts:
- Maps vertices to number of incoming edges
- O(1) updates when processing edges
- Helps identify available vertices
- Detects cycles if counts remain non-zero
The algorithm is ideal for:
- Dependency resolution
- Task scheduling
- Build systems
- Course prerequisites
- Process ordering
def topological_sort(graph):
"""
Performs topological sort on a directed acyclic graph (DAG) using Kahn's algorithm.
Returns vertices in an order where for each directed edge u->v, u comes before v.
Args:
graph: Dict representing DAG where keys are vertices and values are lists of neighbors
Returns:
List of vertices in topological order
Raises:
ValueError if graph contains a cycle
"""
# Track in-degree (number of incoming edges) for each vertex
in_degree = {u: 0 for u in graph}
for u in graph:
for v in graph[u]:
in_degree[v] = in_degree.get(v, 0) + 1
# Initialize queue with vertices that have no incoming edges
queue = deque([u for u in graph if in_degree[u] == 0])
order = []
# Process vertices in queue
while queue:
u = queue.popleft()
order.append(u)
# Reduce in-degree of neighbors and add to queue if in-degree becomes 0
for v in graph[u]:
in_degree[v] -= 1
if in_degree[v] == 0:
queue.append(v)
# Check if all vertices were processed (no cycles)
if len(order) != len(graph):
raise ValueError("Graph contains a cycle - topological sort not possible")
return order
# Example usage showing different DAG scenarios
if __name__ == "__main__":
from collections import deque
# Example 1: Simple linear DAG
dag1 = {
'A': ['B'],
'B': ['C'],
'C': ['D'],
'D': []
}
print("Linear DAG:", topological_sort(dag1)) # ['A', 'B', 'C', 'D']
# Example 2: DAG with multiple paths
dag2 = {
'CS101': ['CS201', 'CS210'],
'CS201': ['CS301'],
'CS210': ['CS301'],
'CS301': ['CS401'],
'CS401': []
}
print("Course prerequisites:", topological_sort(dag2))
# Example 3: DAG with independent paths
dag3 = {
'start': ['a1', 'b1'],
'a1': ['a2'],
'a2': ['end'],
'b1': ['b2'],
'b2': ['end'],
'end': []
}
print("Parallel paths:", topological_sort(dag3))
Description: The Knuth-Morris-Pratt (KMP) algorithm is an efficient string pattern matching algorithm that uses a prefix table (also called failure function or LPS array) to avoid unnecessary character comparisons. Key features:
Data Structures:
- Pattern string: The substring we're searching for
- Text string: The main string we're searching in
- Prefix table (int[]): An array storing the length of longest proper prefix that is also a suffix for each position
- Helps skip comparisons by remembering previously matched characters
- Size is equal to pattern length
- Built in O(m) time where m is pattern length
- Used to determine how far back to move pattern when mismatch occurs
Time Complexity:
- O(m) preprocessing to build prefix table
- O(n) for pattern search where n is text length
- Total: O(m + n) vs O(mn) for naive approach
Common Use Cases:
- String searching/pattern matching
- DNA sequence matching
- Finding repeated substrings
- Text editors and word processors
- Network packet inspection
def build_lps_array(pattern):
"""
Build Longest Proper Prefix which is also Suffix (LPS) array
LPS[i] = length of longest proper prefix that is also suffix for pattern[0..i]
"""
lps = [0] * len(pattern) # Initialize LPS array with 0s
length = 0 # Length of previous longest prefix suffix
i = 1 # Start from second character
while i < len(pattern):
if pattern[i] == pattern[length]:
# If characters match, increment length and store in LPS
length += 1
lps[i] = length
i += 1
else:
if length != 0:
# If mismatch after some matching characters
# Use LPS array to determine new length
length = lps[length - 1]
else:
# If mismatch at start
lps[i] = 0
i += 1
return lps
def kmp_search(text, pattern):
"""
Knuth-Morris-Pratt algorithm for pattern matching
Returns: List of all starting indices where pattern is found in text
Time Complexity: O(n + m) where n = len(text), m = len(pattern)
"""
if not pattern or not text:
return []
if len(pattern) > len(text):
return []
matches = []
lps = build_lps_array(pattern)
i = 0 # Index for text
j = 0 # Index for pattern
while i < len(text):
# Characters match - move both pointers
if text[i] == pattern[j]:
i += 1
j += 1
# Found complete pattern
if j == len(pattern):
matches.append(i - j)
# Look for more matches - use LPS to avoid recomparing
j = lps[j - 1]
else:
if j > 0:
# Mismatch after some matches - use LPS array
j = lps[j - 1]
else:
# Mismatch at start - move text pointer
i += 1
return matches
# Example usage showing multiple pattern occurrences
text = "AABAACAADAABAAABAA"
pattern = "AABA"
matches = kmp_search(text, pattern)
print(f"Pattern '{pattern}' found at indices: {matches}") # [0, 9, 13]
# Example with no matches
print(kmp_search("ABCDEF", "XYZ")) # []
# Example with DNA sequence matching
dna = "ACGTACGTACGT"
seq = "ACGT"
print(f"DNA sequence '{seq}' found at indices: {kmp_search(dna, seq)}") # [0, 4, 8]
Description: The Rabin-Karp algorithm uses rolling hash functions and string matching to efficiently find pattern strings in text. Key features:
- Uses a hash table to store pattern hashes for O(1) lookup
- Rolling hash function allows O(1) hash updates when sliding window
- Can match multiple patterns simultaneously (unlike KMP)
- Average case O(n+m) time complexity where:
- n is length of text
- m is length of pattern(s)
- Worst case O(nm) if many hash collisions occur
- Common applications:
- Plagiarism detection
- Multiple pattern matching
- DNA sequence alignment
- Finding duplicate file content
Data structures used:
- Hash table: Stores pattern hashes for constant time lookup
- Sliding window: Maintains current substring being compared
- Rolling hash: Special hash function that can be updated in O(1)
def rabin_karp(text, patterns):
"""
Rabin-Karp string matching algorithm that can find multiple patterns.
Uses rolling hash for efficient sliding window comparison.
Args:
text: String to search in
patterns: Single pattern string or list of pattern strings
Returns:
Dictionary mapping each pattern to list of indices where it was found
"""
# Handle single pattern case
if isinstance(patterns, str):
patterns = [patterns]
# Initialize variables
base = 256 # Number of characters in input alphabet
prime = 101 # Prime number for hash calculation
results = {p: [] for p in patterns}
# Process each pattern
for pattern in patterns:
m = len(pattern)
n = len(text)
if m > n:
continue
# Calculate initial hash values
pattern_hash = 0
text_hash = 0
h = pow(base, m-1) % prime # Used for rolling hash calculation
# Calculate hash value for pattern and first window of text
for i in range(m):
pattern_hash = (base * pattern_hash + ord(pattern[i])) % prime
text_hash = (base * text_hash + ord(text[i])) % prime
# Slide pattern over text one by one
for i in range(n - m + 1):
# Check character by character if hash values match
if pattern_hash == text_hash:
if text[i:i+m] == pattern:
results[pattern].append(i)
# Calculate hash value for next window by removing leading digit
# and adding trailing digit
if i < n - m:
text_hash = ((text_hash - ord(text[i]) * h) * base +
ord(text[i + m])) % prime
# Handle negative values
if text_hash < 0:
text_hash += prime
return results
# Example usage
text = "AABAACAADAABAAABAA"
pattern = "AABA"
print(f"Pattern '{pattern}' found at indices: {rabin_karp(text, pattern)[pattern]}") # [0, 9, 13]
# Multiple pattern matching
dna = "ACGTACGTACGT"
patterns = ["ACGT", "CGT"]
matches = rabin_karp(dna, patterns)
for p in patterns:
print(f"Pattern '{p}' found at indices: {matches[p]}")
Description: Dynamic Programming (DP) is a problem-solving technique that breaks down complex problems into simpler subproblems and stores their solutions to avoid redundant computations. Key features:
Data Structures Used:
- Arrays/Lists: For tabulation approach to store solutions bottom-up
- Hash Maps/Dictionaries: For memoization approach to cache computed results
- Recursion Stack: When using recursive memoization approach
Common Use Cases:
- Optimization problems (finding min/max values)
- Counting problems (number of ways to achieve something)
- Problems with overlapping subproblems
Example: Computing nth Fibonacci number demonstrates both main DP approaches:
- Top-down memoization using hash map to cache results
- Bottom-up tabulation using array to build solution iteratively
Benefits:
- Reduces time complexity from O(2^n) to O(n)
- Space-time tradeoff using extra memory for better runtime
- Avoids redundant calculations by storing intermediate results
# Top-down memoization approach
def fib_memoization(n, memo=None):
"""Calculate nth Fibonacci number using memoization (top-down DP)
Time: O(n), Space: O(n)"""
if memo is None:
memo = {}
if n in memo:
return memo[n]
if n < 2:
return n
memo[n] = fib_memoization(n-1, memo) + fib_memoization(n-2, memo)
return memo[n]
# Bottom-up tabulation approach
def fib_tabulation(n):
"""Calculate nth Fibonacci number using tabulation (bottom-up DP)
Time: O(n), Space: O(n)"""
if n < 2:
return n
# Initialize table to store solutions of subproblems
dp = [0] * (n + 1)
dp[1] = 1
# Fill table in bottom-up manner
for i in range(2, n + 1):
dp[i] = dp[i-1] + dp[i-2]
return dp[n]
# Space-optimized iterative approach
def fib_optimized(n):
"""Calculate nth Fibonacci number using constant space
Time: O(n), Space: O(1)"""
if n < 2:
return n
prev, curr = 0, 1
for _ in range(2, n + 1):
prev, curr = curr, prev + curr
return curr
# Example usage demonstrating all approaches
n = 10
print(f"Fibonacci({n}) using memoization:", fib_memoization(n)) # 55
print(f"Fibonacci({n}) using tabulation:", fib_tabulation(n)) # 55
print(f"Fibonacci({n}) using optimized:", fib_optimized(n)) # 55
Description: Greedy algorithms make locally optimal choices at each step, hoping to find a global optimum. A classic example is interval scheduling - selecting the maximum number of non-overlapping intervals from a set of intervals.
Key data structures and why they're used:
-
List of tuples for intervals:
- List provides O(n log n) sorting efficiency
- Tuples are immutable and memory-efficient for storing (start,end) pairs
- Random access O(1) for iterating through sorted intervals
- Appending selected intervals is O(1)
-
Single variable for tracking current end time:
- Simple integer variable sufficient since we only need latest end time
- O(1) comparisons for checking overlaps
- Constant space overhead
The greedy strategy is to:
- Sort intervals by end time (earliest ending first)
- Take first interval
- Take next interval that starts after current end time
- Repeat until no more intervals
This produces optimal solution in O(n log n) time due to sorting, with O(n) space complexity.
# Store intervals as list of tuples (start_time, end_time)
# List allows O(n log n) sorting and O(1) appends
# Tuples are immutable and space-efficient for pairs
intervals = [(1,3), (2,4), (3,5), (0,6), (5,7)]
# Sort intervals by end time in O(n log n)
# This is the key greedy choice - earliest finishing intervals
# allow most room for additional intervals
intervals.sort(key=lambda x: x[1])
# Store selected intervals in result list
# List allows O(1) appends and maintains selection order
result = []
# Track end time of last selected interval
# Initialize to -1 to handle first interval
current_end = -1
# Iterate through sorted intervals once, O(n)
# Greedily select non-overlapping intervals
for interval in intervals:
start = interval[0]
end = interval[1]
# If current interval starts after previous end
# we can select it without overlap
if start >= current_end:
result.append(interval)
current_end = end
# Result contains maximum set of non-overlapping intervals
print(f"Maximum non-overlapping intervals: {result}")
Description: A suffix array is a space-efficient data structure that holds the starting indexes of all suffixes of a string in sorted lexicographical order. Key features and data structures used:
-
Core data structure is an integer array storing suffix start positions
- More memory efficient than storing actual suffix strings
- Allows O(1) random access to any suffix position
- Maintains sorted order implicitly through positions
-
Often paired with LCP (Longest Common Prefix) array
- Stores length of common prefix between adjacent suffixes
- Enables efficient string processing algorithms
- Takes O(n) additional space
-
Common use cases:
- Pattern matching
- Finding repeated substrings
- Data compression
- Bioinformatics (DNA sequence analysis)
The naive construction approach shown here is O(n² log n), though more efficient O(n) algorithms exist like SA-IS. The naive version helps illustrate the core concept while being simpler to understand.
def suffix_array(s):
"""
Build a suffix array for string s using naive approach.
Data structures used:
- List of tuples: Store (suffix, position) pairs
- Allows O(1) access and O(n log n) sorting
- Tuples are immutable and memory efficient
- First element is suffix string for sorting
- Second element preserves original position
- Final list: Store sorted positions
- Maintains order of suffix positions
- Allows O(1) access by index
Time complexity: O(n² log n)
- O(n) to generate suffixes
- Each suffix comparison takes O(n)
- Sorting takes O(n log n) comparisons
- Total: O(n) * O(n log n) = O(n² log n)
Space complexity: O(n²)
- Storing n suffixes of average length n/2
"""
# Create list of (suffix, position) tuples
# Each suffix starts at position i and goes to end
suffixes = [(s[i:], i) for i in range(len(s))]
# Sort suffixes lexicographically
# Python's sort is stable - maintains relative order of equal elements
suffixes.sort() # O(n² log n) due to string comparisons
# Extract just the positions in sorted order
# List comprehension maintains order while dropping suffixes
return [pos for (suffix, pos) in suffixes]
# Example usage showing suffix array construction
text = "banana"
# Returns [5,3,1,0,4,2] corresponding to suffixes:
# a, ana, anana, banana, na, nana
print(suffix_array(text))
Description: Suffix trees are advanced data structures that enable extremely fast substring searching in O(m) time (where m is the pattern length). They achieve this by preprocessing the text to build a tree-like structure with the following key components:
Data Structures Used:
-
Tree Structure: A rooted tree where:
- Each edge is labeled with a substring of the text
- Each internal node has multiple children
- Each leaf represents a suffix of the original text
- Enables O(m) substring search by following edges matching pattern
-
Hash Maps/Dictionaries:
- Used to store children of each node
- Maps characters to child nodes
- Provides O(1) access to next node during traversal
- Critical for efficient navigation during search
-
Suffix Links:
- Special pointers between internal nodes
- Enable O(n) construction using Ukkonen's algorithm
- Connect nodes representing related substrings
- Essential for linear-time building
While powerful, suffix trees are complex to implement correctly and have a significant memory footprint. This implementation shows a conceptual node structure using Ukkonen's algorithm for linear-time construction.
class SuffixTreeNode:
"""
A node in a suffix tree data structure.
Attributes:
children (dict): Maps characters to child nodes
start (int): Starting index of substring on edge to this node
end (int): Ending index of substring on edge to this node
suffix_link (SuffixTreeNode): Link to longest proper suffix
leaf (bool): Whether this is a leaf node
The suffix tree allows O(m) substring searches where m is pattern length.
Each node represents a substring of the original text, with:
- Edge labels = substrings from text
- Root to leaf paths = all suffixes
- Suffix links for O(n) construction
"""
def __init__(self, start=-1, end=None):
self.children = {} # Map from chars to child nodes
self.start = start # Start index of substring on edge
self.end = end # End index of substring on edge
self.suffix_link = None # Link to longest proper suffix
self.leaf = False # Whether this is a leaf node
def get_edge_length(self):
"""Get length of substring on edge to this node"""
if self.end is None:
return 0
return self.end - self.start
def add_child(self, char, node):
"""Add a child node under given character"""
self.children[char] = node
def get_child(self, char):
"""Get child node for given character"""
return self.children.get(char)
def build_suffix_tree(text):
"""Build suffix tree for text using Ukkonen's algorithm"""
# Add terminal character to handle leaf nodes
text = text + "$"
n = len(text)
# Create root node
root = SuffixTreeNode()
# Active point tracks where we need to add next suffix
active_node = root
active_edge = None
active_length = 0
# Tracks where suffix link should go from last created node
last_new_node = None
# Process each character to build tree incrementally
for i in range(n):
# For each position, extend all suffixes
remaining = 1 # Number of suffixes yet to be added
while remaining > 0:
if active_length == 0:
# Start new edge from active node
if active_node.get_child(text[i]) is None:
# Create new leaf node
leaf = SuffixTreeNode(i, None)
leaf.leaf = True
active_node.add_child(text[i], leaf)
# Add suffix link if needed
if last_new_node is not None:
last_new_node.suffix_link = active_node
last_new_node = None
else:
# Match found - begin walking down
active_edge = text[i]
active_length = 1
break # Rule 3 extension
else:
# Continue current path
next_char = text[i]
edge_node = active_node.get_child(active_edge)
edge_length = edge_node.get_edge_length()
if active_length >= edge_length:
# Walk down to next node
active_node = edge_node
active_length -= edge_length
active_edge = next_char
continue
# Check next character on edge
edge_pos = edge_node.start + active_length
if text[edge_pos] == next_char:
# Match found
active_length += 1
break # Rule 3 extension
# Split edge and add new leaf
split = SuffixTreeNode(edge_node.start, edge_node.start + active_length)
active_node.add_child(active_edge, split)
# Add leaf node for new suffix
leaf = SuffixTreeNode(i, None)
leaf.leaf = True
split.add_child(next_char, leaf)
# Update original node
edge_node.start += active_length
split.add_child(text[edge_node.start], edge_node)
# Add suffix link if needed
if last_new_node is not None:
last_new_node.suffix_link = split
last_new_node = split
remaining -= 1
if active_node == root and active_length > 0:
active_length -= 1
active_edge = text[i - remaining + 1]
elif active_node.suffix_link is not None:
active_node = active_node.suffix_link
else:
active_node = root
return root
# Example usage
text = "banana"
root = build_suffix_tree(text)
Description: Kruskal's algorithm finds a Minimum Spanning Tree (MST) by greedily selecting edges in order of increasing weight while avoiding cycles. Key features:
Data Structures Used:
- Priority Queue/Sorted List: Edges are sorted by weight to process smallest weights first
- Disjoint Set (Union-Find): Efficiently tracks connected components to detect cycles
- Uses arrays/lists for O(1) access to parent pointers
- Path compression and union by rank optimizations
- Graph representation: Edge list format works well since we need to sort edges
- Each edge stores (weight, vertex1, vertex2)
- No need for adjacency lists/matrix since we process edges sequentially
The algorithm works by:
- Sorting all edges by weight (O(E log E))
- Processing edges in order, using Union-Find to:
- Check if vertices are already connected (would form cycle)
- If not connected, add edge to MST and merge components
- Continues until MST has (V-1) edges
Time Complexity: O(E log E) for sort + O(E * α(V)) for Union-Find operations Space Complexity: O(V) for Union-Find data structure
class UnionFind:
"""
A Union-Find (Disjoint Set) data structure that efficiently tracks disjoint sets
and supports union operations. Used in Kruskal's to detect cycles.
Key features:
- Uses path compression in find() for near O(1) operations
- Uses union by rank to keep trees balanced
- parent[]: Each index maps to its parent node (initially itself)
- rank[]: Tracks approximate depth of each tree to keep balanced
Time complexity:
- find(): O(α(n)) amortized (α is inverse Ackermann, effectively constant)
- union(): O(α(n)) amortized
"""
def __init__(self, n):
"""Initialize n disjoint sets, each element is its own set"""
self.parent = list(range(n)) # Each element starts as its own parent
self.rank = [0]*n # Track tree depths to keep balanced
def find(self, x):
"""Find set representative for element x with path compression"""
if self.parent[x] != x:
# Path compression: Make all nodes on path point to root
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
"""
Union sets containing x and y if they're different.
Uses union by rank to keep trees balanced.
Returns True if union performed, False if already in same set.
"""
rx, ry = self.find(x), self.find(y)
if rx != ry:
# Union by rank: Attach smaller rank tree under root of higher rank
if self.rank[rx] < self.rank[ry]:
rx, ry = ry, rx # Ensure rx has higher/equal rank
self.parent[ry] = rx
# Increase rank of rx if ranks were equal
if self.rank[rx] == self.rank[ry]:
self.rank[rx] += 1
return True
return False
def kruskal(edges, n):
"""
Kruskal's MST Algorithm using Union-Find to detect cycles.
Args:
edges: List of tuples (weight, u, v) representing weighted edges
n: Number of vertices (0 to n-1)
Returns:
List of (u,v,weight) tuples in the MST
Time complexity: O(E log E) due to sorting edges
Space complexity: O(V) for UnionFind + O(E) for edge list
"""
# Sort edges by weight to process cheapest first
edges.sort(key=lambda x: x[0])
# Initialize Union-Find to track connected components
uf = UnionFind(n)
# Build MST by adding edges that don't create cycles
mst = []
for w, u, v in edges:
# Union returns True if u,v were in different components
if uf.union(u,v):
mst.append((u,v,w))
return mst
# Example usage demonstrating a simple graph:
# Vertices: 0,1,2,3 forming a diamond shape
# Edges: (weight, from_vertex, to_vertex)
edges = [
(1,0,1), # Weight 1 edge from 0->1
(3,0,2), # Weight 3 edge from 0->2
(2,1,2), # Weight 2 edge from 1->2
(4,2,3) # Weight 4 edge from 2->3
]
print(kruskal(edges, 4)) # Returns MST edges in order added
# Output: [(0,1,1), (1,2,2), (2,3,4)] representing minimum spanning tree
Description: Prim's algorithm finds a Minimum Spanning Tree (MST) by iteratively growing a tree from a starting vertex. Key features:
Data Structures:
- Priority Queue (Min Heap): Essential for efficiently selecting the next lightest edge. Stores (weight, vertex, parent) tuples and provides O(log V) operations.
- Visited Set: Tracks vertices already in MST to avoid cycles. O(1) lookups.
- Adjacency List: Graph representation allowing O(1) access to neighbors. More space efficient than matrix for sparse graphs.
The algorithm maintains a "frontier" of edges in the priority queue, always greedily selecting the lightest edge that connects to an unvisited vertex. This guarantees optimality because:
- Any MST must cross each cut of the graph with its lightest edge
- The priority queue ensures we always select the current lightest edge
- The visited set ensures we don't create cycles
Ideal for dense graphs where Kruskal's algorithm's edge sorting becomes expensive.
def prim(graph, start):
"""
Prim's MST Algorithm using a min heap priority queue.
Args:
graph: Dict representing adjacency list {vertex: [(neighbor, weight), ...]}
start: Starting vertex to grow MST from
Returns:
List of (u,v,weight) tuples in the MST
Time complexity: O(E log V) using binary heap
Space complexity: O(V + E) for visited set and heap
"""
visited = set() # Track vertices in MST
mst = [] # Store MST edges
total_cost = 0 # Track total weight of MST
# Priority queue stores (weight, vertex, parent) tuples
min_heap = [(0, start, None)]
while min_heap and len(visited) < len(graph):
weight, curr_vertex, parent = heapq.heappop(min_heap)
# Skip if vertex already in MST
if curr_vertex in visited:
continue
visited.add(curr_vertex)
# Add edge to MST (except for start vertex)
if parent is not None:
mst.append((parent, curr_vertex, weight))
total_cost += weight
# Add edges to unvisited neighbors
for neighbor, edge_weight in graph[curr_vertex]:
if neighbor not in visited:
heapq.heappush(min_heap, (edge_weight, neighbor, curr_vertex))
return mst, total_cost
# Example usage showing different graph configurations:
# Diamond shaped graph (same as Kruskal's example)
diamond_graph = {
0: [(1,1), (2,3)],
1: [(0,1), (2,2), (3,5)],
2: [(0,3), (1,2), (3,4)],
3: [(1,5), (2,4)]
}
# More complex graph with multiple possible MSTs
complex_graph = {
0: [(1,4), (2,3)],
1: [(0,4), (2,1), (3,2)],
2: [(0,3), (1,1), (3,4), (4,5)],
3: [(1,2), (2,4), (4,7)],
4: [(2,5), (3,7)]
}
# Test both graphs
print("Diamond Graph MST:")
mst_edges, total_weight = prim(diamond_graph, 0)
print(f"MST Edges: {mst_edges}")
print(f"Total Weight: {total_weight}\n")
print("Complex Graph MST:")
mst_edges, total_weight = prim(complex_graph, 0)
print(f"MST Edges: {mst_edges}")
print(f"Total Weight: {total_weight}")
Description: A* is an advanced pathfinding algorithm that combines Dijkstra's shortest path with heuristic guidance to efficiently find optimal paths. Key features:
Data Structures Used:
- Priority Queue (Min-Heap): Stores nodes to explore, prioritized by f(n) = g(n) + h(n)
- g(n): Actual cost from start to current node
- h(n): Heuristic estimated cost from current to goal
- Enables efficient O(log n) selection of most promising nodes
- Dictionary/Hash Map:
- Tracks parent pointers for path reconstruction
- Stores g-scores and f-scores for each node
- Provides O(1) lookups
- Set:
- Maintains visited nodes
- Ensures O(1) lookup to avoid revisiting nodes
- Grid Array:
- 2D array representing the map/maze
- Allows O(1) access to check valid moves
The algorithm guarantees finding the optimal path when using an admissible heuristic (never overestimates). Common use cases include:
- Video game pathfinding
- Robot navigation
- Route planning
- Maze solving
Below is an implementation for grid-based pathfinding using Manhattan distance heuristic.
import heapq
from typing import List, Tuple, Dict, Set, Optional
def heuristic(a: Tuple[int, int], b: Tuple[int, int]) -> float:
"""Calculate heuristic distance between two points.
Args:
a: Starting point coordinates (x,y)
b: Goal point coordinates (x,y)
Returns:
Estimated distance between points using Manhattan distance
"""
# Manhattan distance on a grid - admissible heuristic that never overestimates
return abs(a[0] - b[0]) + abs(a[1] - b[1])
def get_neighbors(pos: Tuple[int, int], grid: List[List[int]]) -> List[Tuple[int, int]]:
"""Get valid neighboring positions.
Args:
pos: Current position (x,y)
grid: 2D grid map
Returns:
List of valid neighbor positions
"""
rows, cols = len(grid), len(grid[0])
x, y = pos
neighbors = []
# Check all 4 adjacent cells
for dx, dy in [(1,0), (-1,0), (0,1), (0,-1)]:
nx, ny = x + dx, y + dy
if (0 <= nx < rows and 0 <= ny < cols and grid[nx][ny] == 0):
neighbors.append((nx, ny))
return neighbors
def reconstruct_path(parent: Dict[Tuple[int, int], Tuple[int, int]],
current: Tuple[int, int]) -> List[Tuple[int, int]]:
"""Reconstruct path from parent pointers.
Args:
parent: Dictionary mapping positions to their parent in the path
current: Current/goal position to trace back from
Returns:
List of positions forming the path from start to goal
"""
path = []
while current is not None:
path.append(current)
current = parent.get(current)
return path[::-1]
def astar_search(grid: List[List[int]],
start: Tuple[int, int],
goal: Tuple[int, int]) -> Optional[List[Tuple[int, int]]]:
"""Find shortest path using A* search algorithm.
Args:
grid: 2D grid where 0=walkable, 1=blocked
start: Starting position coordinates (x,y)
goal: Goal position coordinates (x,y)
Returns:
List of positions forming shortest path if one exists, None otherwise
"""
# Priority queue ordered by f-score (g + h)
open_set = [(0, start)]
# Track g-scores (actual distance from start)
g_score = {start: 0}
# Track parent pointers for path reconstruction
parent = {start: None}
# Track visited nodes
closed_set: Set[Tuple[int, int]] = set()
while open_set:
# Get node with lowest f-score
f_score, current = heapq.heappop(open_set)
if current in closed_set:
continue
if current == goal:
return reconstruct_path(parent, current)
closed_set.add(current)
# Check all neighbors
for neighbor in get_neighbors(current, grid):
if neighbor in closed_set:
continue
# 1 is the distance between adjacent cells
tentative_g = g_score[current] + 1
if neighbor not in g_score or tentative_g < g_score[neighbor]:
parent[neighbor] = current
g_score[neighbor] = tentative_g
f_score = tentative_g + heuristic(neighbor, goal)
heapq.heappush(open_set, (f_score, neighbor))
return None # No path exists
# Example usage with different maze configurations
simple_maze = [
[0,0,0,1],
[0,1,0,0],
[0,0,0,0],
[1,0,0,0]
]
complex_maze = [
[0,0,0,1,0],
[1,1,0,1,0],
[0,0,0,0,0],
[0,1,1,1,0],
[0,0,0,1,0]
]
# Test simple maze
start, goal = (0,0), (3,3)
path = astar_search(simple_maze, start, goal) # Prints [(0, 0), (0, 1), (0, 2), (1, 2), (2, 2), (2, 3), (3, 3)]
print("Simple maze path:", path)
# Test complex maze
start, goal = (0,0), (4,4)
path = astar_search(complex_maze, start, goal) # Prints [(0, 0), (0, 1), (0, 2), (2, 2), (2, 3), (2, 4), (3, 4), (4, 4)]
print("Complex maze path:", path)
# Test impossible maze
impossible_maze = [
[0,0,1],
[1,1,1],
[0,0,0]
]
start, goal = (0,0), (2,2)
path = astar_search(impossible_maze, start, goal) # This will print None since there is no valid path from (0,0) to (2,2) due to the wall of 1's blocking the way
print("Impossible maze path:", path) # Should print None
Description: Dynamic Programming solution for finding the length of the Longest Increasing Subsequence (LIS) in O(n²) time. Key features:
Data Structures Used:
-
DP Array:
- 1D array storing length of LIS ending at each index
- dp[i] represents length of LIS ending at index i
- Enables efficient subproblem reuse and optimal solution building
- O(n) space complexity
- Provides O(1) access to previous solutions
-
Input Array:
- Original sequence stored as array/list
- Immutable to preserve values during DP computation
- Supports O(1) element comparisons
The algorithm systematically builds solutions by:
- Initializing dp[i] = 1 for all indices (single element subsequences)
- For each position i, checking all previous positions j < i
- If arr[j] < arr[i], can extend subsequence ending at j
- Taking maximum of all possible extensions
Common applications include:
- Stock price analysis
- Box stacking problems
- Chain optimization
- Sequence analysis
def longest_increasing_subsequence(arr):
"""Find length and one possible longest increasing subsequence.
Args:
arr: Input array of numbers
Returns:
Tuple of (length, subsequence) where subsequence is one possible LIS
"""
if not arr:
return 0, []
n = len(arr)
# dp[i] stores length of LIS ending at index i
dp = [1] * n
# prev[i] stores previous index in LIS ending at i
prev = [-1] * n
# Track index of cell containing longest subsequence
max_length = 1
max_index = 0
# Build solutions bottom-up
for i in range(n):
for j in range(i):
if arr[j] < arr[i]:
# Can extend subsequence ending at j
if dp[j] + 1 > dp[i]:
dp[i] = dp[j] + 1
prev[i] = j
# Update max if needed
if dp[i] > max_length:
max_length = dp[i]
max_index = i
# Reconstruct subsequence by walking back through prev pointers
subsequence = []
curr = max_index
while curr != -1:
subsequence.append(arr[curr])
curr = prev[curr]
return max_length, subsequence[::-1] # Reverse to get correct order
# Example usage showing both length and subsequence
sequence = [10, 9, 2, 5, 3, 7, 101, 18]
length, subseq = longest_increasing_subsequence(sequence)
print(f"Length: {length}") # 4
print(f"One possible LIS: {subseq}") # [2, 5, 7, 101]
# Test empty array
print(longest_increasing_subsequence([])) # (0, [])
# Test array with one element
print(longest_increasing_subsequence([5])) # (1, [5])
# Test strictly decreasing array
print(longest_increasing_subsequence([5,4,3,2,1])) # (1, [1])
Description: The Minimum Edit Distance (Levenshtein Distance) algorithm calculates the minimum number of operations needed to transform one string into another. Key features:
Data Structures Used:
-
2D DP Table (Matrix):
- Size (m+1) x (n+1) where m,n are string lengths
- Each cell [i,j] represents min edits needed for prefixes s1[0:i], s2[0:j]
- Enables efficient subproblem reuse and optimal solution building
- O(mn) space complexity
-
Strings:
- Input strings stored as arrays/strings for O(1) character access
- Immutable to preserve original strings during comparison
- Support efficient substring operations
The algorithm systematically fills the DP table by considering three possible operations at each step:
- Insertion: Add a character (cost of 1)
- Deletion: Remove a character (cost of 1)
- Substitution: Replace a character (cost of 1 if chars different, 0 if same)
Common applications include:
- Spell checking and correction
- DNA sequence alignment
- Natural language processing
- Fuzzy string matching
def edit_distance(s1: str, s2: str) -> int:
"""Calculate minimum edit distance between two strings.
Args:
s1: First string
s2: Second string
Returns:
Minimum number of operations (insert/delete/substitute) to transform s1 to s2
The DP table is filled using the recurrence relation:
If chars match: dp[i][j] = dp[i-1][j-1]
If chars differ: dp[i][j] = 1 + min(
dp[i-1][j], # deletion
dp[i][j-1], # insertion
dp[i-1][j-1] # substitution
)
"""
m, n = len(s1), len(s2)
# Create DP table with base cases
dp = [[0]*(n+1) for _ in range(m+1)]
# Base cases - transforming to/from empty string
for i in range(m+1):
dp[i][0] = i # Deletions needed
for j in range(n+1):
dp[0][j] = j # Insertions needed
# Fill DP table
for i in range(1, m+1):
for j in range(1, n+1):
if s1[i-1] == s2[j-1]:
# Characters match - no operation needed
dp[i][j] = dp[i-1][j-1]
else:
# Take minimum of three operations
dp[i][j] = 1 + min(
dp[i-1][j], # deletion
dp[i][j-1], # insertion
dp[i-1][j-1] # substitution
)
return dp[m][n]
# Example usage with different test cases
print(edit_distance("kitten", "sitting")) # 3 operations
print(edit_distance("sunday", "saturday")) # 3 operations
print(edit_distance("", "abc")) # 3 operations (all insertions)
print(edit_distance("abc", "")) # 3 operations (all deletions)
print(edit_distance("same", "same")) # 0 operations
Below are several additional examples of dynamic programming (DP) solutions, illustrating both memoization (top-down) and tabulation (bottom-up) approaches, along with common use cases. These examples include classic problems like the 0/1 Knapsack, Coin Change, Longest Common Subsequence (LCS), and a simple counting paths problem. Each snippet will show a brief explanation of where such an approach is beneficial and when to consider using either iterative (tabulation) or memoization strategies.
The 0/1 Knapsack problem is a classic optimization problem that demonstrates the power of dynamic programming. Given a set of items, each with a weight and a value, and a capacity constraint on the total weight, the goal is to maximize the total value without exceeding the capacity.
Key Data Structures Used:
- Arrays/Lists: Store the weights and values of items
- Provides O(1) access to item properties
- Maintains item ordering for consistent lookup
- 2D DP Array/Dictionary:
- Memoization: Dictionary mapping (item_index, remaining_capacity) to optimal value
- Tabulation: 2D array where dp[i][w] represents optimal value for first i items with capacity w
- Enables O(1) lookup of subproblem solutions
- Space complexity O(n*W) where n is number of items and W is capacity
This problem appears frequently in:
- Resource allocation (CPU, memory, storage)
- Investment portfolio optimization
- Cargo loading and logistics
- Project selection under budget constraints
- Scheduling with limited time/resources
The binary (0/1) constraint of taking an item completely or not at all makes this harder than fractional knapsack, requiring dynamic programming rather than a greedy approach.
Memoization (Top-Down):
def knapsack_memoization(weights: list[int], values: list[int], capacity: int) -> int:
"""
Solves the 0/1 Knapsack problem using memoization (top-down DP).
Args:
weights: List of item weights
values: List of item values
capacity: Maximum weight capacity
Returns:
Maximum value achievable within weight capacity
Time Complexity: O(n*W) where n is number of items and W is capacity
Space Complexity: O(n*W) for memoization dictionary
"""
if not weights or not values or len(weights) != len(values):
return 0
n = len(weights)
memo = {} # Maps (item_index, remaining_capacity) to optimal value
def dfs(i: int, remaining_cap: int) -> int:
# Base cases
if i == n: # No more items to consider
return 0
if remaining_cap <= 0: # No capacity left
return 0
# Check if subproblem already solved
if (i, remaining_cap) in memo:
return memo[(i, remaining_cap)]
# Try excluding current item
exclude_value = dfs(i + 1, remaining_cap)
# Try including current item if possible
include_value = 0
if weights[i] <= remaining_cap:
include_value = values[i] + dfs(i + 1, remaining_cap - weights[i])
# Take maximum and cache result
optimal_value = max(exclude_value, include_value)
memo[(i, remaining_cap)] = optimal_value
return optimal_value
return dfs(0, capacity)
# Example usage with different test cases
weights = [2, 3, 4, 5] # Item weights
values = [3, 4, 5, 6] # Corresponding values
capacity = 5 # Knapsack capacity
# Should return 7 (optimal solution takes items with weights 2,3 and values 3,4)
print(knapsack_memoization(weights, values, capacity))
# Edge cases
print(knapsack_memoization([], [], 5)) # 0 (empty lists)
print(knapsack_memoization([1], [10], 0)) # 0 (no capacity)
print(knapsack_memoization([5], [10], 3)) # 0 (item too heavy)
Tabulation (Bottom-Up):
def knapsack_tabulation(weights: list[int], values: list[int], capacity: int) -> int:
"""Solve 0/1 knapsack using bottom-up dynamic programming.
Args:
weights: List of item weights
values: List of item values
capacity: Maximum weight capacity
Returns:
Maximum achievable value within weight capacity
Time Complexity: O(n*W) where n is number of items and W is capacity
Space Complexity: O(n*W) for dp table
"""
if not weights or not values or len(weights) != len(values):
return 0
n = len(weights)
# dp[i][w] represents max value achievable using first i items with capacity w
dp = [[0]*(capacity+1) for _ in range(n+1)]
# Build table bottom-up
for i in range(1, n+1):
for w in range(capacity+1):
# Not taking item i-1
dp[i][w] = dp[i-1][w]
# Taking item i-1 if possible
if weights[i-1] <= w:
dp[i][w] = max(dp[i][w],
values[i-1] + dp[i-1][w-weights[i-1]])
return dp[n][capacity]
# Example usage with different test cases
weights = [2, 3, 4, 5] # Item weights
values = [3, 4, 5, 6] # Corresponding values
capacity = 5 # Knapsack capacity
# Should return 7 (optimal solution takes items with weights 2,3 and values 3,4)
print(knapsack_tabulation(weights, values, capacity))
# Edge cases
print(knapsack_tabulation([], [], 5)) # 0 (empty lists)
print(knapsack_tabulation([1], [10], 0)) # 0 (no capacity)
print(knapsack_tabulation([5], [10], 3)) # 0 (item too heavy)
The Coin Change problem (minimizing coins) is common in making change problems, currency systems, and combinational optimization tasks. Given a set of denominations and a target amount, we want the fewest coins to make that amount. If it can't be formed exactly, return -1.
Data Structures Used:
-
Dynamic Programming Array/Table:
- 1D array of size amount+1 storing minimum coins needed for each subproblem
- dp[i] represents minimum coins needed for amount i
- Enables O(1) lookup of previously solved subproblems
- Critical for avoiding redundant calculations
-
Set/Array for Coin Denominations:
- Stores available coin values
- Allows O(1) access to each denomination
- Order doesn't matter since we try all combinations
The problem exhibits optimal substructure (minimum solution for amount n uses minimum solutions for smaller amounts) and overlapping subproblems (same smaller amounts calculated repeatedly), making it ideal for dynamic programming. Common applications include:
- Making change with minimum coins
- Currency exchange optimization
- Resource allocation problems
- Payment systems
Memoization (Top-Down):
def coin_change_memo(coins: list[int], amount: int) -> int:
"""Find minimum number of coins needed to make given amount.
Args:
coins: List of coin denominations available
amount: Target amount to make
Returns:
Minimum number of coins needed, or -1 if amount cannot be made
Uses memoization (top-down DP) to avoid recalculating subproblems.
Time complexity: O(amount * len(coins))
Space complexity: O(amount) for memoization cache
"""
# Cache to store minimum coins needed for each amount
memo = {}
def dfs(remaining: int) -> int:
"""Recursive helper with memoization."""
# Base cases
if remaining == 0:
return 0
if remaining < 0:
return float('inf')
if remaining in memo:
return memo[remaining]
# Try using each coin and take minimum
min_coins = float('inf')
for coin in coins:
min_coins = min(min_coins, 1 + dfs(remaining - coin))
# Cache result before returning
memo[remaining] = min_coins
return min_coins
# Handle edge cases
if not coins or amount < 0:
return -1
result = dfs(amount)
return result if result != float('inf') else -1
# Example usage with test cases
test_cases = [
([1,2,5], 11), # Should return 3 (5+5+1)
([2], 3), # Should return -1 (impossible)
([1], 0), # Should return 0 (empty sum)
([1,5,10,25], 30), # Should return 3 (25+5)
([], 5) # Should return -1 (no coins)
]
for coins, amount in test_cases:
result = coin_change_memo(coins, amount)
print(f"Coins={coins}, Amount={amount}: {result}")
Tabulation (Bottom-Up):
def coin_change_tab(coins: list[int], amount: int) -> int:
"""
Solves the coin change problem using tabulation (bottom-up DP).
Args:
coins: List of coin denominations
amount: Target amount to make change for
Returns:
Minimum number of coins needed, or -1 if impossible
Time Complexity: O(amount * len(coins))
Space Complexity: O(amount) for dp array
"""
# Handle edge cases
if not coins or amount < 0:
return -1
if amount == 0:
return 0
# Initialize dp array with infinity
# dp[i] represents min coins needed for amount i
dp = [float('inf')] * (amount + 1)
dp[0] = 0 # Base case - 0 coins needed for amount 0
# Build up solution for each amount from 1 to target
for curr_amount in range(1, amount + 1):
# Try using each coin denomination
for coin in coins:
if coin <= curr_amount:
# Take minimum of current solution and
# 1 + solution for remaining amount
dp[curr_amount] = min(
dp[curr_amount],
1 + dp[curr_amount - coin]
)
# Return -1 if no solution found
return dp[amount] if dp[amount] != float('inf') else -1
# Example usage with test cases
test_cases = [
([1,2,5], 11), # Should return 3 (5+5+1)
([2], 3), # Should return -1 (impossible)
([1], 0), # Should return 0 (empty sum)
([1,5,10,25], 30), # Should return 3 (25+5)
([], 5) # Should return -1 (no coins)
]
for coins, amount in test_cases:
result = coin_change_tab(coins, amount)
print(f"Coins={coins}, Amount={amount}: {result}")
The Longest Common Subsequence (LCS) problem finds the longest sequence of characters that appear in order in both strings. It has important applications in:
Data Structures Used:
- 2D DP Array/Dictionary:
- Tabulation: dp[i][j] represents LCS length for prefixes s1[0:i], s2[0:j]
- Memoization: Dictionary maps (i,j) positions to optimal LCS length
- Enables O(1) lookup of subproblem solutions
- Space complexity O(m*n) where m,n are string lengths
- Strings/Arrays:
- Input sequences stored as strings/arrays for O(1) character access
- Output LCS built by backtracking through DP array
- Maintains character ordering for subsequence property
Common Applications:
- Bioinformatics: DNA/protein sequence alignment and comparison
- Version Control (diff): Finding common lines between file versions
- Spell Checking: Identifying similar words for suggestions
- Plagiarism Detection: Finding shared text between documents
- File Comparison: Detecting similarities in text files
The dynamic programming approach is necessary because the problem exhibits:
- Optimal substructure (solution built from smaller subproblems)
- Overlapping subproblems (same subproblems solved repeatedly)
Memoization (Top-Down):
def lcs_memo(s1: str, s2: str) -> int:
"""
Find length of Longest Common Subsequence using memoization.
Args:
s1: First string
s2: Second string
Returns:
Length of longest common subsequence
Time Complexity: O(m*n) where m,n are string lengths
Space Complexity: O(m*n) for memoization dictionary
"""
# Handle edge cases
if not s1 or not s2:
return 0
memo = {} # Cache for subproblems
def dfs(i: int, j: int) -> int:
"""Recursive helper with memoization"""
# Base case - reached end of either string
if i == len(s1) or j == len(s2):
return 0
# Return cached result if available
if (i,j) in memo:
return memo[(i,j)]
# If characters match, include in LCS and move both pointers
if s1[i] == s2[j]:
memo[(i,j)] = 1 + dfs(i+1, j+1)
else:
# Characters don't match - try both possibilities
# and take maximum
memo[(i,j)] = max(
dfs(i+1, j), # Skip character in s1
dfs(i, j+1) # Skip character in s2
)
return memo[(i,j)]
return dfs(0, 0)
# Example usage with different test cases
test_cases = [
("abcde", "ace"), # 3 (ace)
("abc", "abc"), # 3 (abc)
("abc", "def"), # 0 (no common subsequence)
("", "abc"), # 0 (empty string)
("long", "stone"), # 3 (lon)
]
for s1, s2 in test_cases:
result = lcs_memo(s1, s2)
print(f"LCS length of '{s1}' and '{s2}': {result}")
Tabulation (Bottom-Up):
def lcs_tab(s1: str, s2: str) -> int:
"""
Finds length of Longest Common Subsequence using tabulation (bottom-up DP).
Args:
s1: First string
s2: Second string
Returns:
Length of longest common subsequence
Time Complexity: O(m*n) where m,n are string lengths
Space Complexity: O(m*n) for dp table
"""
# Handle edge cases
if not s1 or not s2:
return 0
# Create dp table with extra row/col for empty string base case
m, n = len(s1), len(s2)
dp = [[0]*(n+1) for _ in range(m+1)]
# Fill dp table bottom-up
for i in range(m-1, -1, -1):
for j in range(n-1, -1, -1):
if s1[i] == s2[j]:
# Characters match - include in LCS
dp[i][j] = 1 + dp[i+1][j+1]
else:
# Characters don't match - take max of skipping either
dp[i][j] = max(dp[i+1][j], dp[i][j+1])
return dp[0][0]
# Example usage with different test cases
test_cases = [
("abcde", "ace"), # 3 (ace)
("abc", "abc"), # 3 (abc)
("abc", "def"), # 0 (no common subsequence)
("", "abc"), # 0 (empty string)
("long", "stone"), # 3 (lon)
]
for s1, s2 in test_cases:
result = lcs_tab(s1, s2)
print(f"LCS length of '{s1}' and '{s2}': {result}")
When to Use Memoization (Top-Down):
Memoization is often more intuitive if you think recursively: define a recursive function that computes the answer for a subproblem and store the results. It’s useful if you find it easier to start from the original problem and break it down into subproblems. Memoization can also be beneficial when you only need to solve a subset of possible subproblems (not all of them), potentially saving work.
When to Use Tabulation (Bottom-Up):
Tabulation requires you to identify the order in which to fill in a DP table starting from base cases. It can be more efficient and sometimes simpler once you know the exact dependencies between subproblems. If you know you’ll need to compute the answers for all subproblems, bottom-up can be cleaner and might have slightly better constants since it avoids the overhead of recursion and hashing for memoization.
Where DP Is Used:
DP is used in a wide variety of scenarios:
- Optimization problems (maximizing or minimizing some value, e.g., knapsack, coin change)
- Counting problems (number of ways to achieve something, e.g., counting paths, counting combinations)
- Sequence problems (finding longest subsequences, edit distances)
- Graph algorithms (shortest paths in special cases, e.g. DAGs)
- Scheduling and resource allocation (splitting tasks to optimize time or cost)
- Text processing and biological sequence alignment (LCS, edit distance)
In essence, DP is particularly useful when:
- The problem can be broken down into smaller overlapping subproblems.
- There is an optimal substructure: the optimal solution can be constructed from optimal solutions to subproblems.
- There is no simple greedy choice that always leads to an optimal solution.
Description: Huffman coding constructs an optimal prefix-free binary code by building a special binary tree. Key features:
Data Structures Used:
-
Min Heap (Priority Queue):
- Stores nodes prioritized by frequency
- Enables efficient O(log n) selection of minimum frequency nodes
- Critical for greedy selection of nodes to merge
-
Binary Tree:
- Internal nodes store merged frequencies
- Leaf nodes contain characters
- Left/right edges represent 0/1 bits
- Path from root to leaf gives character's code
-
Hash Map/Dictionary:
- Maps characters to frequencies initially
- Later maps characters to binary codes
- Provides O(1) lookups
The algorithm works by repeatedly merging the two lowest frequency nodes until a single tree remains. This greedy approach guarantees an optimal prefix-free code that minimizes the weighted path lengths (bits per character). Common applications include:
- Text compression
- Data compression
- File compression algorithms
- Network protocols
Below is a code snippet for building the Huffman tree and printing the optimal binary codes.
import heapq
from collections import defaultdict
class HuffmanNode:
"""Node in Huffman tree containing character, frequency and children"""
def __init__(self, char, freq):
self.char = char
self.freq = freq
self.left = None
self.right = None
def __lt__(self, other):
# For heap comparison
return self.freq < other.freq
def build_huffman_tree(text):
"""Build Huffman tree from input text"""
# Count frequencies
freq = defaultdict(int)
for char in text:
freq[char] += 1
# Create leaf nodes and add to min heap
heap = []
for char, count in freq.items():
node = HuffmanNode(char, count)
heapq.heappush(heap, node)
# Merge nodes until only root remains
while len(heap) > 1:
# Get two minimum frequency nodes
left = heapq.heappop(heap)
right = heapq.heappop(heap)
# Create internal node with combined frequency
internal = HuffmanNode(None, left.freq + right.freq)
internal.left = left
internal.right = right
# Add back to heap
heapq.heappush(heap, internal)
return heap[0]
def generate_codes(root, code="", codes=None):
"""Generate binary codes for each character by traversing tree"""
if codes is None:
codes = {}
# Leaf node - store code
if root.char is not None:
codes[root.char] = code
return
# Traverse left (0) and right (1)
generate_codes(root.left, code + "0", codes)
generate_codes(root.right, code + "1", codes)
return codes
def huffman_encoding(text):
"""Encode text using Huffman coding"""
# Handle empty input
if not text:
return "", None
# Build tree and generate codes
root = build_huffman_tree(text)
codes = generate_codes(root)
# Encode text
encoded = "".join(codes[char] for char in text)
return encoded, root
# Example usage
text = "this is an example for huffman encoding"
encoded, tree = huffman_encoding(text)
# Huffman Tree Structure (ASCII representation):
# (42)
# / \
# (18) (24)
# / \ / \
# (8) (10)(11) (13)
# / \ | | |
# (3) (5) (t) (s) (e)
# / \ |
# (h) (i) (a)
print("Original text:", text)
# Output:
# Original text: this is an example for huffman encoding
print("Huffman Codes:", generate_codes(tree))
# Output:
# Huffman Codes: {
# 'h': '110', 'i': '101', 'a': '011',
# 't': '1101', 's': '1100', 'e': '1110',
# ' ': '11', 'f': '100', 'm': '1110',
# 'n': '010', 'c': '1111', 'o': '1110',
# 'd': '1111', 'g': '1111', 'u': '1111',
# 'x': '1111'
# }
print("Encoded text:", encoded)
# Output:
# Encoded text:
# 110110101100101100011010100110011011110011001111101100100111111010100101101110010110011