Algorithm search algorithm Python


Python search algorithm – sequential search, binary search, block search, hash table search, etc

Search Algorithm-Sequential Search Binary Search Block Search Hash Table Search.etc.

Sequential lookup

Sequential search, also known as linear search, is one of the simplest search methods. It is suitable for the sequential storage structure and chain storage structure of linear tables. The time complexity of the algorithm is O (n).

Basic ideas

Start from the first element M and compare it with the element X to be found one by one. When the element value is the same (i.e. m=x), the subscript of element M is returned. If it is not found in the last comparison, it returns -1.

Advantages and disadvantages

Disadvantages: when n is large, the average search length is large and the efficiency is low;

Advantage: there is no requirement for the storage of data elements in the table. In addition, for linear linked lists, only sequential search can be performed.

Algorithm implementation

  • The most basic search algorithm for traversing unordered lists
  • Time complexity O (n)
def sequential_search(lis, key):
  length = len(lis)
  for i in range(length):
    if lis[i] == key:
      return i
      return False

if __name__ == '__main__':
  LIST = [1, 5, 8, 123, 22, 54, 7, 99, 300, 222]
  result = sequential_search(LIST, 123)

Binary search

Binary search is a search algorithm for finding a specific element in an ordered array. The search process starts from the middle element of the array. If the middle element is exactly the element to be searched, the search process ends; If a specific element is greater than or less than the intermediate element, it is searched in the half of the array that is greater than or less than the intermediate element, and the comparison starts from the intermediate element as in the beginning. If the array is empty in a step, it means it cannot be found.

This search algorithm reduces the search range by half with each comparison. “

Algorithm description

Give an array a containing N valued elements

  1. Let l be 0 and R be n-1;
  2. If l>r, the search ends in failure;
  3. Let m (intermediate value element) be ⌊ (l+r) /2 ⌋;
  4. If am<t, let l be m + 1 and return to step 2;
  5. If am>t, let R be m – 1 and return to step 2;

Complexity analysis

Time complexity: half search reduces the search area by half each time, and the time complexity is O (logn)

Space complexity: O (1)

Algorithm implementation

  • Binary search algorithm for ordered lookup table
def binary_ search(lis, key):

Interpolation lookup

Interpolation search is a search method based on the comparison between the keyword key to be searched and the keyword of the largest and smallest record in the lookup table. Its core lies in the interpolation calculation formula (key-a[low]) / (a[high]-a[low]) * (high-low).

The time complexity is O (logn), but it is more efficient for lookup tables with large table length and uniform keyword distribution.

Algorithmic idea

Based on the binary search algorithm, the selection of search points is improved to adaptive selection, which can improve the search efficiency. Of course, the difference search is also an ordered search.

Note: for the lookup table with large table length and uniform keyword distribution, the average performance of interpolation lookup algorithm is much better than half lookup. On the contrary, if the distribution in the array is very uneven, then interpolation search may not be a very appropriate choice.

Complexity analysis

Time complexity: if the elements are evenly distributed, then o (log n)), in the worst case, O (n) may be required.

Space complexity: O (1).

Algorithm implementation

  • Interpolation search algorithm
def binary_ search(lis, key):

fibonacci search

Fibonacci sequence, also known as golden section sequence, refers to such a sequence: 1, 1, 2, 3, 5, 8, 13, 21,…. mathematically, Fibonacci is defined as follows by recursive method: F (1) =1, f (2) =1, f (n) =f (n-1) +f (n-2) (n>=2). The ratio of two adjacent numbers in this series tends to the golden ratio value (0.618).

Fibonacci search is divided according to Fibonacci sequence on the basis of binary search. Find a number f[n] in the Fibonacci sequence that is slightly greater than the number of elements in the lookup table, and expand the original lookup table to a length of Fn, and then perform Fibonacci segmentation, that is, f[n] elements are divided into f[n-1] elements in the first half and f[n-2] elements in the second half. Find out which part of the element to find and recurse until it is found.

Complexity analysis

In the worst case, the time complexity is O (log2n), and the expected complexity is also o (log2n).

Algorithm implementation

  • Fibonacci search algorithm
  • Time complexity O (log (n))
def fibonacci_ search(lis, key):

Tree table lookup

1. Binary tree search algorithm.

Binary search tree is to generate a tree from the data to be searched, ensure that the value of the left branch of the tree is less than the value of the right branch, and then compare the size of the parent node of each node with that of the row to find the most suitable range. The search efficiency of this algorithm is very high, but if you use this search method, you must first create a tree.

Algorithmic idea

Binarysearch tree is either an empty tree or a binary tree with the following properties:

1) If the left subtree of any node is not empty, the values of all nodes on the left subtree are less than the values of its root node;

2) If the right subtree of any node is not empty, the value of all nodes on the right subtree is greater than that of its root node;

3) The left and right subtrees of any node are also binary search trees.

Properties of binary search tree: the binary search tree can be traversed in medium order to obtain an ordered sequence of numbers.

Complexity analysis

Like binary search, the time complexity of insertion and search is O (logn), but in the worst case, there will still be o (n). The reason is that the tree is not balanced when inserting and deleting elements.

Algorithm implementation

  • Binary tree lookup Python implementation
class BSTNode:

2. 2-3 tree of balanced search tree

2-3 lookup tree definitions

Unlike binary trees, 2-3 trees save one or two values for each node. For a normal 2-node, it saves one key and two self nodes. Corresponding to the 3-node, two keys are saved. The definition of the 2-3 lookup tree is as follows:

1) either empty or:

2) for node 2, the node stores a key and its corresponding value, as well as two nodes pointing to the left and right nodes. The left node is also a 2-3 node, and all values are smaller than the key. The right node is also a 2-3 node, and all values are larger than the key.

3) for node 3, the node stores two keys and their corresponding values, as well as three nodes pointing to the left, middle and right. The left node is also a 2-3 node. All values are smaller than the smallest of the two keys; The intermediate node is also a 2-3 node. The key value of the intermediate node is between the two key values of the following node; The right node is also a 2-3 node. All key values of the node are larger than the largest of the two keys.

2-3 properties of search tree

1) if the middle order traverses the 2-3 search tree, the ordered sequence can be obtained;

2) in a completely balanced 2-3 lookup tree, the distance from the root node to each empty node is the same. (this is also the concept of “balance” in the balance tree. The longest distance from root node to leaf node corresponds to the worst case of the search algorithm, while the distance from root node to leaf node in the balance tree is the same, and the worst case also has logarithmic complexity.)

The search efficiency of 2-3 tree is closely related to the height of the tree.

In terms of distance, for a 2-3 tree with 1million nodes, the height of the tree is 12-20, and for a 2-3 tree with 1billion nodes, the height of the tree is 18-30.

For insertion, only constant operations are required, because it only needs to modify the node associated with the node and does not need to check other nodes, so the efficiency is similar to that of searching.

Algorithm implementation

3. Red black tree of balanced search tree

Definition of red black tree

The red black tree is a balanced lookup tree with red and black links, which meets the following requirements:

① Red node tilts to the left;

② A node cannot have two red links;

③ The whole tree is completely black balanced, that is, the number of black links on the path from the root node to all leaf nodes is the same.

Properties of red black tree

The whole tree is completely black balanced, that is, on the path from the root node to all leaf nodes, the number of black links is the same (the second property of 2-3 trees, and the distance from the root node to the leaf node is the same).

Complexity analysis

In the worst case, all but the leftmost paths in the red and black tree are composed of 3-node nodes, that is, the length of the red and black paths is twice the length of the full black paths.

The following figure is a typical red black tree, from which you can see that the longest path (red black path) is twice as long as the shortest path:

Algorithm implementation

#Red black tree

4. B tree and b+ tree (B tree/b+ tree)

B tree introduction

B-tree can be regarded as an extension of 2-3 search tree, that is, it allows each node to have M-1 child nodes.

① The root node has at least two child nodes;

② Each node has M-1 keys, which are arranged in ascending order;

③ The values of the child nodes in M-1 and M key are between the values corresponding to M-1 and M key;

④ Number of keywords of non leaf nodes = number of pointers to sons -1;

⑤ Keywords of non leaf nodes: k[1], k[2],…, k[m-1]; And k[i];

⑥ Other nodes have at least m/2 child nodes;

⑦ All leaf nodes are on the same layer;

For example: (m=3)

Idea of B-tree algorithm

The search of B-tree starts from the root node and performs binary search for the keyword (ordered) sequence in the node. If it hits, it ends. Otherwise, it enters the child node of the range of query keywords; Repeat until the corresponding son pointer is null or is already a leaf node;

Characteristics of B-tree

1. the keyword set is distributed in the whole tree;

2. any keyword appears in only one node;

3. the search may end at a non leaf node;

4. its search performance is equivalent to a binary search in the complete set of keywords;

5. automatic hierarchical control;

Because the non leaf nodes other than the root node are limited and contain at least m/2 sons, the minimum utilization of the node is ensured, and its minimum search performance is O (logn)

Introduction to b+ tree

B+ tree is a variant of B-tree and a multi-channel search tree:

1. its definition is basically the same as that of B-tree, except:

2. the number of sub tree pointers of non leaf nodes is the same as that of keywords;

3. the subtree pointer p[i] of the non leaf node points to the subtree whose keyword value belongs to [k[i], k[i+1])

4. B-tree is an open interval;

5. add a chain pointer to all leaf nodes;

6. all keywords appear in the leaf node;

For example: (m=3)

Idea of b+ tree algorithm

The search of b+ is basically the same as that of b- tree. The difference is that b+ tree can only hit the leaf node (b- tree can hit the non leaf node), and its performance is equivalent to a binary search in the complete set of keywords;

Characteristics of b+ tree

1. all keywords appear in the linked list of leaf nodes (dense index), and the keywords in the linked list are just orderly;

2. it is impossible to hit a non leaf node;

3. the non leaf node is equivalent to the index of the leaf node (sparse index), and the leaf node is equivalent to the data layer storing (keyword) data;

4. more suitable for file index system;

Algorithm implementation

  • B tree lookup
Class btree: \b tree

5. Tree table lookup summary

The average search performance of binary search tree is good, which is O (logn), but it will degenerate to o (n) in the worst case. Based on binary search tree, we can use balanced search tree. The 2-3 search tree in the balanced search tree can perform self balancing operation after insertion, thus ensuring that the height of the tree is within a certain range and thus ensuring the time complexity in the worst case. However, it is difficult to implement the 2-3 search tree. The red black tree is a simple and efficient implementation of the 2-3 tree. It skillfully uses color tags to replace the 3-node node problem in the 2-3 tree. Red black tree is an efficient balanced search tree, which is widely used. Many internal implementations of programming languages use red black tree more or less.

In addition, the b/b+ balance tree, another extension of the 2-3 lookup tree, is widely used in file systems and database systems.

Block lookup (log (m) +n/m)

The requirement is a sequential table. Block search is also called index sequential search. It is an improved method of sequential search.

Algorithm search algorithm Python

Algorithmic idea

Divide n data elements into M blocks (m ≤ n) in “block order”.

The nodes in each block do not have to be ordered, but the blocks must be “ordered by blocks”;

That is, the keyword of any element in block 1 must be less than the keyword of any element in block 2;

Any element in block 2 must be smaller than any element in block 3

Algorithm flow

1. First, select the largest keyword in each block to form an index table;

2. The search is divided into two parts: first, perform binary search or sequential search on the index table to determine which block the records to be checked are in;

3. Find in a determined block using the sequential method.

Complexity analysis

Time complexity: O (log (m) +n/m)

#Block search is a comprehensive optimization of sequential search and binary search, and its performance is between the two
#Change index selection and use ascending sort for indexes

hash search

A hash table is a key indexed structure for storing data. As long as you enter the value to be searched, that is, key, you can find its corresponding value.

Algorithmic idea

The idea of hashing is very simple. If all keys are integers, you can use a simple unordered array to implement it: take the key as an index, and the value is its corresponding value. In this way, you can quickly access the value of any key. This is the case for simple keys, which we extend to handle more complex types of keys.

Algorithm flow

1) Construct a hash table with a given hash function;

2) Address conflicts are resolved according to the selected conflict handling method;

Common methods to resolve conflicts: zipper method and linear detection method.

3) Perform a hash lookup on the basis of a hash table.

Complexity analysis

Simplistic search complexity: for conflict free hash tables, the search complexity is O (1) (note that we need to build the corresponding hash table before searching).

Algorithm implementation

  • The judgment of data type and element overflow is ignored.
class HashTable:

This work adoptsCC agreement, reprint must indicate the author and the link to this article

article!! Started on my blogStray_Camel(^U^)ノ~YO