1145 Hashing - Average Search Time

题目

The task of this problem is simple: insert a sequence of distinct positive integers into a hash table first. Then try to find another sequence of integer keys from the table and output the average search time (the number of comparisons made to find whether or not the key is in the table). The hash function is defined to be H(key)=key%TSize where TSize is the maximum size of the hash table. Quadratic probing (with positive increments only) is used to solve the collisions.
Note that the table size is better to be prime. If the maximum size given by the user is not prime, you must re-define the table size to be the smallest prime number which is larger than the size given by the user.
Input Specification:
Each input file contains one test case. For each case, the first line contains 3 positive numbers: MSize, N, and M, which are the user-defined table size, the number of input numbers, and the number of keys to be found, respectively. All the three numbers are no more than 10^4. Then N distinct positive integers are given in the next line, followed by M positive integer keys in the next line. All the numbers in a line are separated by a space and are no more than 10^5.

Output Specification:
For each test case, in case it is impossible to insert some number, print in a line X cannot be inserted. where X is the input number. Finally print in a line the average search time for all the M keys, accurate up to 1 decimal place.
Sample Input:
4 5 4
10 6 4 15 11
11 4 15 2
Sample Output:
15 cannot be inserted.
2.8

词汇

distinct：不同的
Quadratic probing: 平方探测
collisions: 冲突

分析

本题没有什么特别复杂的算法思想，可以说就是单纯的考察hash表和常见的hash函数（忘了）然后就是对于输出格式的要求，下面借机会复习一下hash函数。

平方探测是开放定址法的一种递增函数，下面介绍一下常见的几种冲突解决方法。

常见的冲突处理方法

开放定址法：

$$H_i=(H(key)+d_i) MOD m, i=1,2,…k(k≤m-1)$$.$H(key)$是哈希函数；m是表长，$d_i$是增长序列，常见的有如下三种：
1. $d_i=1, 2, 3…, m-1$, 即线性探测再散列
2. $d_i=0, 1^2, -1^2, 2^2, -2^2,…, k^2, (k≤m/2)$, 平方探测再散列
3. $d_i=伪随机数$，即伪随机探测再散列
这个方法的思想就是当当前地址不可行的话，就根据所选择的增长序列，更改地址直到不发生冲突或者无法插入。
再hash法

$H_i=RH_i(key)$，其中$RH_i$是不同的散列函数。这种方法就是当发生冲突时，换用不同的hash函数产生不同的地址，直到不发生冲突为止。这种方法不容易产生聚集，但是增加了计算时间。
链地址法

将关键字存储在同一地址下，产生冲突的数据就用链表的形式进行链接即可（如图）。
建立公共溢出区域

这也是一种处理冲突的思路，设hash函数的值域是[0, m-1], 则设向量hashTable[0..m-1]为基本表，每个分量存放一个记录，另设立向量OverTable[0..v]为溢出表。所有关键字和基本表中的关键字为同义词的记录，不管他们的hash地址是多少，发生冲突就填入溢出表。

上述四种常见的冲突解决方法，对于本题就是开放地址法的一种特殊形式。

C++限定输出精度

对于浮点数，限制输出精度应该是很常见的操作，但是由于不经常使用，这里复习一下。

std：：setprecision()

当使用cout 时，可以使用这个函数来设定输出精度（会自动在末尾补0）,用例如下：
```
 std::cout << std::setprecision(5) << f << '\n'; //精度为5位
```
printf()
```
printf("%6.2f", floatNum) //100.1打印出来为100.10
```
6表示数字至少占6字符宽度（包括小数点），2表示小数点后有两位，小数部分不够时会在后面补0。

下面给出本题代码：

代码

//
// Created by masterCai on 2020/5/12.
//

#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

bool is_prime(int n){ // 判断素数
    for (int i = 2; i*i <= n; ++i) { // 这里技巧 使用i*i判断 降低时间
        if (n%i == 0){
            return false;
        }
    }
    return true;
}

int main(){
    int tsize, n, m, a;
    scanf("%d %d %d", &tsize, &n, &m);
    while (!is_prime(tsize)){//找到合适的hashTable大小
        tsize += 1;
    }
    vector<int> v(tsize); //hash table
    for (int i = 0; i < n; ++i) { // 对于每个数 先进行hash
        scanf("%d", &a);
        bool flag = false; //是否可以插入
        for (int j = 0; j < tsize; ++j) {// 使用平方探测解决冲突
            int pos = (a+j*j)%tsize;
            if(v[pos] == 0){
                v[pos] = a;
                flag = true;
                break;
            }
        }
        if(!flag){
            printf("%d cannot be inserted.\n", a);
        }
    }
    int ans = 0;
    for (int i = 0; i < m; ++i) { // 对每个key进行查找
        scanf("%d", &a);
        for (int j = 0; j <= tsize; ++j) {// 查找同样需要使用平方探测
            ans += 1; // 每尝试一次计数
            int pos = (a+j*j)%tsize; //平方探测
            if (v[pos] == a || v[pos] == 0){//如果找到或者找不到则退出
                break;
            }
        }
    }
    printf("%.1lf\n", ans*1.0/m); //控制输出精度
    return 0;
}

哈希平方探测数据结构

1146 Topological Order

1146 Topological Order题目This is a problem given in the Graduate Entrance Exam in 2018: Which of the following is NOT a t

2020-05-13 pat

pat 拓扑排序

1151. LCA in a Binary Tree

1151. LCA in a Binary Tree题目The lowest common ancestor (LCA) of two nodes U and V in a tree is the deepest node that has

2020-05-11 pat

pat algorithm 二叉树中序遍历前序遍历