演算法圖鑑第 2 章 - Data Structures

發表於 2019-12-08 00:00 3337 字 17 min read
這篇筆記是我參考【演算法圖鑑】和網路資源所製作。如果內容有誤,還請不吝賜教! Introduction In general, data structures are different ways we combine and arrange the data. However, there are various data structures (e.g. stack, queue, heap,...

這篇筆記是我參考【演算法圖鑑】和網路資源所製作。如果內容有誤,還請不吝賜教!

Introduction

In general, data structures are different ways we combine and arrange the data. However, there are various data structures (e.g. stack, queue, heap, tree, ...), different arrangements of data give all sorts of structures.

In this chapter, we will see some kinds of sequential structures, which means data is put in linear order.

array, list, stack, queue, hash table

Array

In the majority of programming languages, declaration is needed for variables as well as array, which means computer has to assign specific memory spaces before use of an array. However, in python declarations of any variables involving array are never a prerequisite of using them since python is a dynamically typed language (i.e. types of all objects are not defined until runtime).

After declaration(ask computer for memory space), data is put in a clear order. And if we want to refer or retrieve data, we use index to find its location. However if we want to insert or remove data from the array, it cost much more time to be done due to searching for our goal and shifting the remainders.

Time Complexity

  • Add/Delete data from an array : O(n)O(n)
  • Get data by index : O(1)O(1)

Implementation

Below is the implementation of array. (In python, an array is called a "list")

Make a List

list = [var1, var2, ... , varN]
var = any ( even another array )
x = [10, 20, 'string']  # [10, 20, 'string']
y = [1, 2, 3]  # [1, 2, 3]
z = ['p', 'y', 't', 'h', 'o', 'n']  # ['p', 'y', 't', 'h', 'o', 'n']
print(x, y, z)

Unlike tuple, values in a list are still mutable. Hence, it's illegal to use a list as a key in hash tables.

Using multiple variables to get values from a list

x1, x2, x3 = x
# 10 20 'string'
y1, y2, y3 = y
# 1  2  3

Using *var expression to assign multiple values to a variable(This variable becomes a list).

*xa, xb = x
# xa = [10, 20] | xb = 'string'
z1, z2, *z3, z4 = z
# z1 z2 -> front, z4 -> rear, z3 -> remainders
print(z1, z2, z3, z4)  # p y ['t', 'h', 'o'] n

Get a value from a list via index

print(z[1])  # 'y'

Index of the first item is 0!

Other methods in class 'list'

Count / Index

list.count(any)
# return how many times a value appears in a list
list.index(any)
# find the index of a value in a list
# return the smallest index if the value appears many times
# return -1 if it doesn't exist in the list
a = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
print(a.count(3))  # 3
print(a.index(4))  # 6

Check if a value is in the list

# if any in list:
#     do ...

if 3 in a:
    print('3 is in a.')

Revise an item in a list

x = [1, 2, 3, 4, 5, 6, 7, 8]
x[0] = 1111111
x[1] = 2222222
print(x)  # [1111111, 2222222, 3, 4, 5, 6, 7, 8]

Add data

list.append(any)
# Add a value at the end

list.extend([array])
list += [array]
# Add a bunch of values at the end

list.insert(index, any)
# Insert a value at given position

Delete data

list.remove(any)
# Remove a value in a list
# If it appears many times, remove the frontmost one

list.pop(index)
# Remove the value in given position
# If the argument is not assigned, remove the last value

list.clear()
# Clear all the values in the list

del list
# delete the list itself

Get length of a list

len(list)

Concepts clarification

1. Wrong way to copy a list

x = [1, 2, 3]
y = x

In 'y = x', python consider this statement as 'making a pointer y pointing to the list[1, 2, 3] just like x.' Which means:

When changing values in x, items in y will also be revised!

Correct ways to copy a list: (1) list comprehention

list_copy = [i for i in list]

(2) 'copy()' method

list_copy = list.copy()

(3) 'slice' in python

list_copy = list[:]

Examples:

a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b = [i for i in a]
b = a.copy()
b = a[:]

2. list 'multiplying' a. list *= n, n > 0

c = ['p', 'y']
c *= 3  # c = ['p', 'y', 'p', 'y', 'p', 'y']

b. list *= n, n <= 0 = list.clear()

c = ['O', 'w', 'O']
c *= -1 # c = []

Tuple

Most features of a tuple are similar to an array(They both support item retrieving via indexes). However, the most significant difference between array and tuple is that tuple is immutable. Thus tuples can be used as keys of hash tables. Futhermore tuples are more efficient than lists in python.

Time Complexity

  • Get data by index : O(1)O(1)

Implementation

Make a tuple

tuple = var1, var2, ... , varN
varI = any (even another array)

x = 10, 20, 'string'
# (10, 20, 'string')
y = 1, 2, 3
# (1, 2, 3)
z = 'p', 'y', 't', 'h', 'o', 'n'
# ('p', 'y', 't', 'h', 'o', 'n')
print(x, y)

Using multiple variables to get values from a tuple

x1, x2, x3 = x
# 10 20 'string'
y1, y2, y3 = y
# 1  2  3

Get a value from a tuple via index

print(z[1])  # 'y'

Index of the first item is 0!

Other methods in class 'tuple'

Count / Index

tuple.count(any)
# return how many times a value appears in a tuple
list.index(any)
# find the index of a value in a tuple
# return the smallest index if the value appears many times
# return -1 if it doesn't exist in the tuple

a = 1, 2, 2, 3, 3, 3, 4, 4, 4, 4
print(a.count(3))  # 3
print(a.index(4))  # 6

Check if a value exists in a tuple

# if any in tuple:
#     do ...

if 3 in a:
    print('3 is in a.')

Get length of a tuple

len(tuple)

Copy a tuple

  1. tuple comprehension
x = tuple(i for i in iterator/generator)
  1. slice in python
y = x[:]

Linked List

Head -> Node -> ... -> Last Node -> Null (-> stands for pointer)

There's no indexing system in list. Instead, lists use pointer to store the position of the next data. Hence, it takes time to find the specific data since we have to search linearly. On the other hand, adding as well as deleting can be done with ease because we only have to revise the direction of pointers.

Adding node will look like this:

  • Head -> A => C -> D -> Null
  • Head -> A => B -> C -> D -> Null
    • (=> points toward C -> B)

Deleting node will look like this:

  • Head -> A => O -> B -> C -> D -> Null
  • Head -> A => B -> C -> D -> Null
    • (=> points toward O -> B)

Time Complexity

  • Add/Delete a node : O(1)O(1)
  • Find a specific data : O(n)O(n)

Implementation

class ListNode:
    def __init__(self, value):
        self.val = value
        self.next = None


class LinkedList:
    def __init__(self, head: ListNode):
        self.head = head
        self.tail = self.head
        self.cur = self.head
        self.last = None
    
    # add/insert a node in the linked list
    def append(self, node: ListNode, behind: ListNode =None):
        self.cur = self.head
        if not behind:
            # if 'behind' is not given,
            # then add the node at rear end
            self.tail.next = node
            self.tail = self.tail.next
        elif behind.val == self.tail.val:
            # if 'behind' is simply the tail,
            # then do the same thing as the above :D
            self.tail.next = node
            self.tail = self.tail.next
        else:
            while self.cur.next:
                if self.cur.val == behind.val:
                    # found it!
                    temp = self.cur.next
                    self.cur.next = node
                    self.cur = self.cur.next
                    self.cur.next = temp
                    break
                self.cur = self.cur.next
            # if 'behind' doesn't exist in the list,
            # then just don't do anything

    def pop(self, node: ListNode =None):
        # if there's only 1 node in the whole linked list,
        # then don't pop anything
        if not self.head.next:
            return None

        self.cur = self.head
        self.last = None

        while self.cur:
            try:
                if self.cur.val == node.val:
                    self.last.next = self.cur.next
                    break
            except AttributeError:
                # This means 'node' is not given
                # then we'll delete the last node
                
                # find the new tail
                self.tail = self.head
                while self.tail.next.next:
                    self.tail = self.tail.next
                self.tail.next = None
                break

            self.last = self.cur
            self.cur = self.cur.next

    def traverse(self) -> list:
        # returns a list of values of nodes in order
        self.cur = self.head
        values = []

        while self.cur:
            values.append(self.cur.val)
            self.cur = self.cur.next
        return values

    def generate(self):
        # returns a generator
        self.cur = self.head

        while self.cur:
            yield self.cur.val
            self.cur = self.cur.next


linked = LinkedList(ListNode(0))
for i in range(1, 10):
    linked.append(ListNode(i))
print(linked.traverse())
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

linked.append(ListNode(5.5), behind=ListNode(5))
for i in linked.generate():
    print(i, end=' ')
# 0 1 2 3 4 5 5.5 6 7 8 9

linked.pop()
print(linked.traverse())
# [0, 1, 2, 3, 4, 5, 5.5, 6, 7, 8]

linked.pop(ListNode(5.5))
for i in linked.generate():
    print(i, end=' ')
# 0 1 2 3 4 5 6 7 8

More Extensions

1. Circular Linked List

Last node points toward the head node.

Head -> Node -> ... -> Last Node -> Head -> ... (-> stands for pointer)

2. Doubly Linked List

Every node has 2 pointers : one points toward the next one, the other points toward the previous node.

Head <-> Node <-> ... <-> Last Node <-> Null (-> stands for pointer)

Stack

Stack is also a sequential structure. However, similar to piling up plates, the last data we put in will be the first to be taken out, and this feature is called "LIFO" (last in first out).

Implementation

class Stack:
    def __init__(self, init_stk: list):
        self._stk = init_stk

    # add data - push
    def push(self, dt):
        self._stk.append(dt)

    # delete data - pop
    def pop(self):
        self._stk.pop()

    # get top - top
    def top(self):
        return self._stk[-1]

    # get length - size
    def size(self):
        return len(self._stk)


stack = Stack([1, 2, 3])
stack.push(4)  # 1 2 3 4
stack.push(5)  # 1 2 3 4 5
print(stack.size())  # 5
print(stack.top())  # 5
stack.pop()  # 1 2 3 4
print(stack.top())  # 4

Queue

Queue is completely different from stack. Just like waiting in line, the first data we push in will also be the first to be poped out, and this feature is called "FIFO" (first in first out).

Implementation

class Queue:
    def __init__(self, init_qu: list):
        self._qu = init_qu

    # 加入資料 - push
    def push(self, dt):
        self._qu.append(dt)

    # 取出資料 - pop
    def pop(self):
        self._qu.pop(0)

    # 取得前面的資料
    def get_front(self):
        return self._qu[0]

    # 取得長度
    def size(self):
        return len(self._qu)


queue = Queue([1, 2, 3])
queue.push(4)  # 1 2 3 4
queue.push(5)  # 1 2 3 4 5
print(queue.size())  # 5
print(queue.get_front())  # 1
queue.pop()  # 2 3 4 5
print(queue.get_front())  # 2

More Extensions

1. Deque (Double Ends Queue)

A normal queue only offers one side for pushing in data and the other for popping out. However, the design of dequeue allows more convenient operations - we can push in and pop out data at both ends.

Implementation

class Deque:
    def __init__(self, init_deq: list):
        self._deq = init_deq

    # 插入資料 - push (front and back)
    def push(self, deq, command):
        # command : 'f' -> front ; 'b' -> back
        self._deq.insert((len(self._deq) + 1) * int(command == 'b'), deq)

    # 取出資料 - pop (front and back)
    def pop(self, command):
        # command : 'f' -> front ; 'b' -> back
        self._deq.pop(-1 * int(command == 'b'))

    # 取得頭尾兩端的資料
    def get(self, command):
        # command : 'f' -> front ; 'b' -> back
        return self._deq[-1 * int(command == 'b')]

    # 取得長度
    def size(self):
        return len(self._deq)


deque = Deque([1, 2, 3])
deque.push(0, 'f')  # 0 1 2 3
deque.push(4, 'b')  # 0 1 2 3 4
print(deque.size())  # 5
print(deque.get('f'))  # 0
print(deque.get('b'))  # 4
deque.pop('f')  # 1 2 3 4
deque.pop('b')  # 1 2 3
print(deque.get('f'))  # 1
print(deque.get('b'))  # 3

2. Priority Queue

In comparison with queue, we pop out the data with the biggest weight from a priority queue instead of taking out the data we first put in.

Implementation

class PriorityQueue:
    def __init__(self, init_pq: list):
        init_pq.sort()
        self._pq = init_pq
        self._prior = 'big'

    # 設定優先條件
    def set_prior(self, command):
        # command : big or small
        self._prior = command

    # 加入資料
    def insert(self, pq):
        # 先確定是否要插入在邊界,再確認要放在中間的哪個位置
        if pq <= self._pq[0]:
            self._pq.insert(0, pq)
        elif pq >= self._pq[-1]:
            self._pq.insert(-1, pq)
        else:
            for i in range(len(self._pq) - 1):
                if self._pq[i] <= pq <= self._pq[i + 1]:
                    self._pq.insert(i + 1, pq)
                    break

    # 取出資料
    def pop(self):
        self._pq.pop(-1 * int(self._prior == 'big'))

    # 取出最優先的資料
    def get(self):
        return self._pq[-1 * int(self._prior == 'big')]

    # 取得資料長度
    def size(self):
        return len(self._pq)


# 'big'
pr_qu = PriorityQueue([0, 5, 2, 3, 5, 7])  # 0 2 3 5 5 7
pr_qu.insert(4)  # 0 2 3 4 5 5 7
pr_qu.insert(6)  # 0 2 3 4 5 5 6 7
print(pr_qu.size())  # 8
print(pr_qu.get())  # 7
pr_qu.pop()  # 0 2 3 4 5 5 6
print(pr_qu.get())  # 6

# 'small'
pr_qu.set_prior('small')  # 0 2 3 4 5 5 6
print(pr_qu.get())  # 0
pr_qu.pop()  # 2 3 4 5 5 6
pr_qu.pop()  # 3 4 5 5 6
print(pr_qu.size())  # 5
print(pr_qu.get())  # 3

Hash Table

Hash Function

Hash function is a function that turns given data into a fixed-length value, called "hash code".

Hash function has the following features:

  1. Fixed Length of Hash Code - No matter how massive the given data is, the length of hash code is always fixed.
  2. Same Data, Same Hash Code
  3. Similar Data, Different Hash Code
  4. Unidirectional - It's impossible to infer the original data from its hash code.
  5. Hash Collision - Different data might have the same hash code, when this occurs, we call it "hash collision". However, chances of collision are extremely slim.

Implementation

def hash(x):
    value = 0
    for i in str(x):
        value += _pow(ord(i) * 987654321, 123456789, 10 ** 20)
    code = int(str(value)[:15:])
    if code < 10 ** 14:
        return code + 123456789876543
    else:
        return code


def _pow(a, b, n):
    if b == 0:
        return 1
    if b == 1:
        return a
    if b % 2 == 0:
        return _pow(a, int(b / 2), n) ** 2 % n
    else:
        return _pow(a, int((b - 1) / 2), n) ** 2 * a % n


print(hash('python'))  # 189770782082354
print(hash('pythona') == hash('pythonad'))  # True - hash collision occured

Hash Table

Hash table is like a dictionary. Each item is composed of a key and a value, at which the key points. The most significant advantage of hash table is that searching can be done in a really short time, rather than doing linear search. This makes hash table a superior data structure.

Following shows how to build a hash table :

  1. We now have a key and its value.
  2. Index = hash(key) mod (length of table) (complexity : O(1)O(1))
  3. Put the value at table[index]. If hash collision occurs (i.e. table[index] is not empty), we use other methods to put in the value. The implementation below uses a linked list to connect these data. (complexity : O(1)  O(n)O(1)\ \sim \ O(n))

This is how hash table works while doing searching :

  1. The user gives a key.
  2. Index = hash(key) mod (length of table) (complexity : O(1)O(1))
  3. Use the index to find the value and return it if the value exists. (complexity : O(1)  O(n)O(1)\ \sim \ O(n))

Time Complexity

  • Insert/Delete an element : O(1)O(1)(average) to O(n)O(n)(worst case - too many collisions)
  • Searching : O(1)O(1)(average) to O(n)O(n)(worst case - too many collisions)

Implementation

1. Make a class 'Hash Table'

class HashTable:
    def __init__(self, data: dict):
        self._data = []
        self._index = 0

        for self._i in data.keys():
            self._data.append([self._i, data[self._i]])

        self._table = [[] for self._i in range(int(len(self._data) ** 0.5) + 1)]
        HashTable.make_table(self, self._data)

    @staticmethod
    def __pow(a, b, n) -> int:
        if b == 0:
            return 1
        if b == 1:
            return a
        if b % 2 == 0:
            return HashTable.__pow(a, int(b / 2), n) ** 2 % n
        else:
            return HashTable.__pow(a, int((b - 1) / 2), n) ** 2 * a % n

    # 雜湊函數 - 除留餘數法 + 字元碼運算
    @staticmethod
    def __hash(x) -> int:
        value = 0
        for i in str(x):
            value += HashTable.__pow(ord(i) * 987654321, 123456789, 10 ** 20)
        code = int(str(value)[:15:])
        if code < 10 ** 14:
            return code + 123456789876543
        else:
            return code

    def make_table(self, data: list):
        for i in data:
            # 這裡先不模擬碰撞時指標的存取
            # 碰撞時,直接存入二維陣列,稱為 "鏈結法"
            self._table[HashTable.__hash(i[0]) % len(self._table)].append(i)

    def search(self, goal):
        for i in self._table[HashTable.__hash(goal) % len(self._table)]:
            if i[0] == goal:
                return i[1]
        else:
            return '{} is not in the table'.format(goal)

    def show(self):
        print('-' * 40)

        for i in range(len(self._table)):
            print('NO.{} :'.format(i))

            for j in self._table[i]:
                print('{} = {}'.format(j[0], j[1]))

        print('-' * 40)


hash_table = HashTable({'Joe': 0, 'Sue': 1, 'Dan': 0, 'Nell': 1, 'Ally': 1, 'Bob': 0})
print(hash_table.search('Sue'))  # 1
print(hash_table.search('Ally'))  # 1
print(hash_table.search('John'))  # John is not in the table
hash_table.show()

hash_table.make_table([['John', 0], ['Mary', 1]])
print(hash_table.search('Mary'))  # 1
hash_table.show()

2. Using Python's Dictionary P.S. Python's dictionary is actually a implementation of hash table.

Make a dictionary

dict1 = {key: value, key: value, ..., key: value}
dict2 = dict(key=value, key=value, ..., key=value)

Get values

from_dict_get_value = dict[key]

More ways to use a dictionary (dict)

  1. using for loop to make a dictionary in some situation
d = {chr(i): i for i in range(65, 91)}  # 建立大寫字母的ASCII字元碼對照表
  1. Add/Revise value dict[key] = value

if key already exists:

  • revise value else:
  • add value
  1. Delete an item (key + value)
del dict[key]
dict.pop(key)
  1. Deletion and assignment var = dict.pop(key)
  2. Get all the keys and values
dict.keys()  # return keys
dict.values()  # return values
dict.items() # return items (keys + values)

Returnings of keys(), values(), and items() are neither tuples nor lists. Instead, they all have their own types respectively.

d = {0: 1, 1: 2}
print(type(d.keys()))    # <class 'dict_keys'>
print(type(d.values()))  # <class 'dict_values'>
print(type(d.items()))   # <class 'dict_items'>
  1. Check if a key/value exists if ... in d.keys()/d.values()/d.items()

item in dict == item in dict.keys() != item in dict.items()

  1. Two dictionaries are equal if and only if there keys as well as their values are all in correspondence with one another.