The Ultimate Guide to Mastering Data Structures and Algorithms

Being proficient in data structures and algorithms is the computer science equivalent of having a superpower. Everyone assumes you're a genius, you receive high-paying offers from prestigious companies, and you even gain a high social market value on internet forums. However, the journey to becoming an algorithm expert is not an easy one; it's often filled with frustration and imposter syndrome.

Having navigated this path and secured offers from top-tier tech firms, I want to give back and help even the most confused programmers. Consider this article your guide.

The "Why" Before the "How"

The most crucial part of learning something new is genuinely wanting to learn it. Before we begin, ask yourself: why do you want to learn this? Don't do it just to land a job at a major tech company. Do it because mastering these concepts fundamentally changes the way you think.

Data structures and algorithms are about solving problems efficiently. A bad programmer solves a problem inefficiently, and a truly bad programmer doesn't even understand why their solution is inefficient. So, let's start there: how do we rank an algorithm's efficiency?

Understanding Algorithmic Efficiency

Imagine we wrote a function that iterates through every number in a list and adds it to a total sum variable. If you consider the addition as a single operation, running this function on a list with 10 numbers costs 10 operations. Running it on a list with 20 numbers costs 20 operations.

Now, say we wrote another function that simply returns the first number in a list. No matter how large the list is, this function will never cost more than one operation.

Clearly, these two algorithms have different time complexities—a relationship between the growth of input size and the growth of operations. We communicate these time complexities using Big O notation, referring to the input size as n.

Several common complexities include: - Constant: O(1) - Logarithmic: O(log n) - Linear: O(n) - N log n: O(n log n) - Quadratic: O(n²) - Exponential: O(2^n) - Factorial: O(n!)

Our first algorithm runs in O(n), meaning its operations grow linearly with the input size (the number of items in the list). Our second algorithm is not dependent on the input size at all, so it runs in constant time, or O(1).

Let's look at how many operations a program might execute with an input size of 5 versus 50.

| Time Complexity | Input Size (n=5) | Input Size (n=50) | | :--- | :--- | :--- | | O(log n) | ~2 | ~6 | | O(n) | 5 | 50 | | O(n²) | 25 | 2500 | | O(n!) | 120 | A very, very large number |

The difference might not seem significant when the input is small, but this gap becomes dramatic as the input size increases. If n were 10,000, a function running in O(log n) would take only about 14 operations, while a function running in O(n!) would likely set your computer on fire.

Note: For Big O notation, we drop constants. So, O(10 * n) and O(n / 10) are both equivalent to O(n) because the growth is still linear. Big O notation is also used for space complexity, which measures how much memory an algorithm uses as n grows.

A Quick Refresher on Logarithms

Unless you spend all day practicing algorithms, you might have forgotten what logarithms are. Logarithms are the inverse of exponential functions.

Let's say you want to find a word in a dictionary. One method is to check every word one by one. This is an O(n) approach, but nobody does that. So, how can humans find any word in a dictionary with over a hundred thousand words in seconds? We use a smarter method: cutting the search window in half each time by checking the middle element. If our word comes before the middle word, we search the left half; otherwise, we search the right half. We repeat this until we find our word.

If our input n doubled in size, our operations would only grow by one. In computer science, this is known as a binary search.

The Importance of Sorting

For a binary search to work, the collection must be sorted. This requirement opens up a huge field of computer science: sorting algorithms.

A very basic sort is the selection sort. You have one pointer at the start of the list and another pointer that performs a linear scan to find the next minimum element. Then, you swap those elements and increment the first pointer. Since you're doing a linear scan for every element in the collection, this runs in O(n²).

A more efficient algorithm is the merge sort. A merge sort splits a collection in half into sub-collections until those sub-collections can be sorted in constant time. Then, it works its way back up by merging the sorted sub-collections until the entire collection is sorted. Since it's splitting the collection in half, there will be log n splits and, consequently, log n merges. It takes O(n) time to merge two sorted collections into one. Since we're doing that log n times, the total time complexity is O(n log n). No algorithm can sort an arbitrary collection in a better time complexity, so we consider O(n log n) to be the cost of sorting.

Fundamental Data Structures

Arrays and Linked Lists

Perhaps the most fundamental data structure is an array, which is an indexable, contiguous chunk of memory. Arrays have a fixed size; you can't insert a ninth element into an array designed for eight.

A data structure with a flexible size is a linked list. The idea is to package your data into nodes that hold two things: your data and a pointer to the next node. If a node's next pointer points to null, it's the end of the list. You can add a new node to the list in constant time by just setting the last node's pointer to the new node. With some additional pointer manipulation, we can also insert and delete nodes in constant time.

Sometimes, nodes also contain a pointer to the previous node in a variation called a doubly linked list. However, a major downside of linked lists is that you can't access elements in constant time via an index like you can with an array.

In practice, most programming languages have dynamically sized arrays (or lists) that work by creating arrays with a fixed size. If the array reaches full capacity and you try to add a new element, the programming language automatically creates a new, larger array (often double the capacity), copies the current elements over, and updates the reference to this new array. Since this resizing happens infrequently, we generally consider appending to a list to be a constant-time operation.

Trees: Going Beyond Linear Structures

Linked lists aren't the only data structure that uses nodes to refer to other nodes. What if, instead of pointing to a single next node, nodes pointed to a left and a right child node? This structure is called a binary tree. These child nodes can also be seen as the roots of their own subtrees, as trees are recursive in nature. A node with no children is called a leaf node.

A common type of tree is the binary search tree (BST), which follows these rules: - A node's left child must have a smaller value. - A node's right child must have a greater or equal value.

Let's say you're looking for an element x. You start at the root. If x is smaller than the root's value, you go to the left subtree. If it's larger, you go to the right. You repeat this process until you find your number. In the worst case, you have to traverse the height of the tree, so the time complexity to search, insert, and delete elements is O(h), where h is the height.

This is efficient because, on average, the height will be O(log n). But what if you inserted every element of a sorted array into an empty BST? You'd end up with a structure that behaves like a linked list, meaning the height would be n. We can guarantee O(log n) time complexities by using a self-balancing binary search tree, such as a Red-Black Tree, which maintains additional properties to ensure the height of the tree remains O(log n).

Binary search trees are workhorse data structures that can solve many problems with decent efficiency, but they particularly shine when you're asked a question specifically about binary search trees.

Heaps and Priority Queues

Another important type of tree is a heap. The primary difference between a heap and a BST is its practical application in various algorithms. It's also commonly known as a priority queue because the element with the highest priority is always at the root. A min-heap will have the minimum element at the root, and a max-heap will have the maximum element at the root.

Don't be mistaken, though—the rest of the heap is unsorted. Searching for an element in the rest of the heap is no better than searching in an unsorted array. We only care about what's at the top.

To insert a new element into a heap, we first find the next available spot to add a leaf node. Then, we compare it with its parent. If it has a higher priority, it swaps places and "bubbles up" the tree. This process involves at most O(log n) comparisons. You can build a heap from a random collection by inserting n times, which costs O(n log n). There's also a way to build a heap in O(n) time, which is a fascinating process worth researching further.

Finally, since a heap is a nearly complete binary tree, we can use its properties to represent it efficiently with an array, even though we visualize it as a tree.

Traversing Trees: DFS and BFS

One way to traverse a tree is using a Depth-First Search (DFS), which involves going deep down one path to explore it fully before moving on to the next. For example, one DFS traversal of a tree might visit nodes in the order: 10, 13, 4, 6, 12, 8, 1.

Another option is a Breadth-First Search (BFS), where we explore the tree level by level. A BFS traversal of the same tree might be: 10, 13, 12, 4, 6, 8, 1.

For implementation, DFS can use a stack, which is a list-based data structure with two main operations: adding to the end (push) and removing from the end (pop). This makes a stack LIFO (Last-In, First-Out). DFS is often implemented using recursion, which indirectly uses the call stack. If your recursion goes too deep, you'll get a stack overflow error.

Here is a simple way to implement DFS with a stack: ``` initialize stack add root to stack

while stack is not empty: node = pop from stack print node.value add node's children to stack ```

On the contrary, a BFS uses a queue, which is FIFO (First-In, First-Out). Instead of popping from the end of the list, you pop from the beginning. Using the same algorithm as above but with a queue instead of a stack will result in a BFS traversal.

Graphs: The Ultimate Network Structure

All trees are graphs, but not all graphs are trees. A graph is a collection of vertices (points) and edges (connections between vertices). A graph can be directed, meaning edges only go one way (from A to B), or undirected, meaning the connection is two-way (from A to B and B to A). Two common ways to represent edges are adjacency lists and adjacency matrices.

Graphs are incredibly versatile. Here are a few scenarios where they excel.

1. Finding the Shortest Path (Unweighted)

Imagine you want to find the shortest connection between two people on a social network. This is a classic graph problem. You can model people as vertices and their relationships (friendships) as edges. The shortest path between a source node and a target node can be found with a Breadth-First Search (BFS). Since BFS moves out level by level, the first time it reaches the target node, it is guaranteed to have found the shortest path in terms of the number of connections.

2. Finding the Shortest Path (Weighted)

Consider a map system, like Google Maps, that needs to find the shortest path between two locations. This is different from the last example because, while both deal with shortest paths, a map system has weighted edges (e.g., distance or travel time). For example, a direct edge between two points might have a weight of 20, while an alternative path through two edges might have weights of 8 each. A BFS would give us the path with fewer vertices but not necessarily the one with the smallest total weight.

One famous algorithm that computes the shortest path in a graph with positive weighted edges is Dijkstra's algorithm, which often uses a heap. It's a powerful tool worth researching if you're interested in pathfinding problems.

3. Topological Sorting

Imagine a university course schedule with classes and their prerequisites. Suppose you wanted to find a valid order in which to take all your classes while fulfilling all prerequisites. This problem can be solved using a graph. Model the classes as vertices and the prerequisites as directed edges. This algorithm is known as a topological sort.

The process is as follows: 1. Count how many incoming edges each vertex has. 2. Add all vertices with zero incoming edges to a queue. 3. While the queue is not empty, pop a vertex, print it, and for all of its children (courses that have it as a prerequisite), decrement their incoming edge count. 4. If a child's incoming edge count becomes zero, add it to the queue. 5. Repeat until the queue is empty.

Hash Maps: The Key to Constant-Time Lookups

Hash maps are often considered the holy grail of data structures, offering nearly constant-time retrieval of data. The saying goes that if you don't know what you're doing, just try throwing a hash map at the problem.

A hash map is a data structure built on top of an array, optimized to store key-value pairs. What makes it so powerful is that you can retrieve, delete, and store data in, on average, constant time.

For example, here we have a map where keys are strings representing people's names and values are their corresponding ages. We can directly access an individual's age, for instance, ages['Trent Black']. It's almost as if strings can be used as array indices.

How is this possible? It's because of a hash function. A hash function takes a key and returns a hash code (an integer). If we take that number modulus the length of the underlying array, we can use the result as an index to store the value. The hash function must be deterministic, meaning it computes the same hash code if given the same key; otherwise, we would lose the index.

But what if the hash of 'Trent Black' and the hash of 'Ziz' both result in the same index? This is called a collision. One way to handle this is to store the values at that index in a linked list. When we go to store 'Ziz' and see that the index already has a value, we append the new value to the linked list at that position. This is why a hash map is not strictly O(1) in the worst case. If you have a terrible hash function, it won't be balanced, and you will have to do a lot of linked list traversing.

In practice, modern languages use very good hash functions, so you don't have to write your own. A common example of a hash map is the dict in Python or the Object/Map in JavaScript. Remember, constant-time lookup of data is extremely powerful, so try to use it when you can.

Your Learning Path Forward

And that's a high-level overview of the essential concepts in data structures and algorithms—a semester-long course condensed into a brief article.

Of course, you haven't learned everything in great detail, but it's far easier to master a complex subject once you understand the big picture. Traditional education often dives deep into one topic before moving to the next, similar to a Depth-First Search. A potentially more effective method is to get a general understanding first and then build upon it, much like a Breadth-First Search.