集合是一种鲁棒性很好的数据结构,应用在与当元素顺序的重要性不如元素的唯一性和测试元素是否包含在集合中的效率时,大部分情况下这种数据结构极其有用。表现形式通常是从列表中删除重复项以及相关的数学运算,如交集、并集、差分和对称差分等集合操作。
python的 set支持 x in set, len(set),和 for x in set。作为一个无序的数据结构,set 不记录元素位置或者插入点。因此,set 不支持 indexing, 或其它类序列的操作。
python的内置集合类型有两种:
set(): 一种可变的、无序的、有限的集合,其元素是唯一的、不可变的(可哈希的)对象
frozenset(): 一种不可变的、可哈希的、无序的集合,其元素是唯一的,不可变的哈希对象。
PyObject
在 set 中,对应的 set 的值的存储是通过结构 setentry来保存数据值的;
源文件:include/setobject.h
typedef struct {
PyObject *key;
Py_hash_t hash; /* Cached hash code of the key */
} setentry;
This subtype of PyObject is used to hold the internal data for both set and frozenset objects. It is like a PyDictObject in that it is a fixed size for small sets (much like tuple storage) and will point to a separate, variable sized block of memory for medium and large sized sets (much like list storage). None of the fields of this structure should be considered public and are subject to change. All access should be done through the documented API rather than by manipulating the values in the structure.
typedef struct {
PyObject_HEAD
Py_ssize_t fill; /* Number active and dummy entries*/ // 包括已经使用的entry与空entry值的总和
Py_ssize_t used; /* Number active entries */ // 已经使用可用的总量
/* The table contains mask + 1 slots, and that's a power of 2.
* We store the mask instead of the size because the mask is more
* frequently needed.
*/
Py_ssize_t mask; // 与hash求和的mask
/* The table points to a fixed-size smalltable for small tables
* or to additional malloc'ed memory for bigger tables.
* The table pointer is never NULL which saves us from repeated
* runtime null-tests.
*/
setentry *table; // 保存数据的数组数组指针
Py_hash_t hash; /* Only used by frozenset objects */
Py_ssize_t finger; /* Search finger for pop() */
setentry smalltable[PySet_MINSIZE]; // 保存数据的数组 默认初始化为8个元素,通过table指向
PyObject *weakreflist; /* List of weak references */
} PySetObject;
总之一个 set 就对应一个 PySetObject 类型数据,set 会根据保存的元素自动调整大小。