python的集合

柠栀 2025/9/19 python

# 1.集合是什么？

集合是一个无序、可变的容器，存储唯一的、不可变的元素。

无序：元素没有固定的位置（没有索引概念）
可变：可以添加、删除元素，但不能修改已有元素
唯一性：集合中不会存在重复元素
元素要求：集合中的元素必须是不可变类型（数字、字符串、元组等）

核心特性：唯一性和无序性是集合与列表、元组最根本的区别！

# 2.创建集合

有三种主要方式创建集合：

# 2.1.使用花括号 {}

最常用的创建集合方式是用花括号 {}。但要注意，空的 {} 创建的是字典，不是集合，空集合必须用 set()。

集合广泛应用于数据去重、个性分析、集合运算（如交集、并集、差集）、快速查找唯一元素等场景，是写高效 Python 代码的重要工具。

# 创建集合
fruits = {"apple", "banana", "orange"}
numbers = {1, 2, 3, 4, 5}
mixed = {1, "hello", 3.14, (1, 2)}  # 可以包含不同类型的不可变元素

print(fruits) # {'banana', 'orange', 'apple'} (顺序可能不同)
print(numbers) # {1, 2, 3, 4, 5}
print(mixed)   # {1, 3.14, (1, 2), 'hello'}

# 空集合 - 特别注意！
empty_set = set()  # 必须这样创建
print(empty_set)   # set()
print(type(empty_set)) # <class 'set'>

# 错误的方式 - 这会创建空字典！
not_a_set = {}
print(type(not_a_set)) # <class 'dict'>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

# 2.2.使用 set()

集合还可以通过 set() 构造函数从任何可迭代对象（如列表、元组、字符串等）创建。这个方法非常适合用于去重及处理已有数据。

从列表创建集合：通常用于对列表元素去重
从字符串创建集合：会得到字符串所有不重复的字符组成的集合
从元组创建集合：和列表类似，只要是可迭代对象都可以
创建空集合：必须用 set()，不能用 {}，否则会创建空字典

# 从列表创建（常用：用于去重）
list_data = [1, 2, 2, 3, 3, 3, 4, 5]
set_from_list = set(list_data)
print(set_from_list) # {1, 2, 3, 4, 5} - 自动去重！

# 从字符串创建
set_from_string = set("hello")
print(set_from_string) # {'e', 'h', 'l', 'o'} - 去重且无序

# 从元组创建
set_from_tuple = set((1, 2, 2, 3))
print(set_from_tuple) # {1, 2, 3}

# 创建空集合
another_empty_set = set()
print(another_empty_set) # set()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

# 2.3.集合推导式

类似于列表推导式，但使用花括号：

# 创建平方数的集合
squares = {x ** 2 for x in range(5)}
print(squares) # {0, 1, 4, 9, 16}

# 带条件的集合推导式
even_squares = {x ** 2 for x in range(10) if x % 2 == 0}
print(even_squares) # {0, 4, 16, 36, 64}

1
2
3
4
5
6
7

# 3.集合的基本操作

# 3.1.添加元素

集合的添加操作包括两种常见方式：

add() 方法：添加单个元素。如果元素已存在，不做任何操作。
update() 方法：可以一次性添加多个元素（可迭代对象），如列表、元组、集合等。

fruits = {"apple", "banana"}

# add() - 添加单个元素
fruits.add("orange")
print(fruits) # {'banana', 'orange', 'apple'}

fruits.add("apple")  # 添加已存在的元素，不会有任何效果
print(fruits) # {'banana', 'orange', 'apple'} (不变)

# update() - 添加多个元素（从可迭代对象）
fruits.update(["grape", "mango"])
print(fruits) # {'banana', 'orange', 'apple', 'mango', 'grape'}

fruits.update(("pineapple", "kiwi"))
print(fruits) # 添加了更多水果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

注意：集合自动去重，不会添加重复的元素。

# 3.2.删除元素

remove(elem)：删除指定元素，如果元素不存在会抛出 KeyError 错误。
discard(elem)：删除指定元素，如果元素不存在什么都不做（更安全，推荐）。
pop()：随机删除并返回一个元素，因为集合是无序的，不确定删除哪个元素；如果集合为空，会抛出 KeyError。
clear()：清空集合中的所有元素。

fruits = {"apple", "banana", "cherry", "date"}

# remove() - 删除指定元素，元素不存在会报错
fruits.remove("banana")
print(fruits) # {'cherry', 'date', 'apple'}

# fruits.remove("watermelon") # KeyError: 'watermelon'

# discard() - 删除指定元素，元素不存在不会报错（推荐使用）
fruits.discard("cherry")
print(fruits) # {'date', 'apple'}

fruits.discard("watermelon") # 不会报错

# pop() - 随机删除并返回一个元素（因为集合无序）
popped_item = fruits.pop()
print(f"删除了: {popped_item}")
print(fruits)

# clear() - 清空集合
fruits.clear()
print(fruits) # set()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

通常推荐使用 discard()，避免因元素不存在而程序报错。
如需遍历删除，也可以用循环结合这几个方法处理。

# 3.3.集合查询操作

成员测试 用 in 或 not in 检查某个元素是否在集合中。因为集合用哈希表实现，速度非常快（平均时间复杂度 O(1)）。
获取集合大小 用 len(集合) 获取集合包含的元素个数。
最大、最小值 如果集合中的元素可以比较大小，可以用 max(集合)、min(集合)
集合的遍历 利用 for 循环遍历集合中的所有元素（无序）

fruits = {"apple", "banana", "orange"}

# 检查元素是否存在
print("apple" in fruits)   # True
print("grape" not in fruits) # True

# 获取集合长度
print(len(fruits)) # 3

# 遍历集合中的所有元素，找出最大、最小值
numbers = {10, 4, 7, 2, 15}
# 获取最大值和最小值
print(f"最大值: {max(numbers)}")  # 15
print(f"最小值: {min(numbers)}")  # 2

# 遍历集合
for n in numbers:
    print(f"集合元素: {n}")

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

# 4.集合的数学运算

集合真正强大的地方在于它支持数学上的集合运算：

# 4.1.并集

包含所有集合中的元素

set1 = {1, 2, 3}
set2 = {3, 4, 5}

# 方法1: union() 或 | 运算符
union1 = set1.union(set2)
union2 = set1 | set2
print(union1) # {1, 2, 3, 4, 5}
print(union2) # {1, 2, 3, 4, 5}

# 多个集合的并集
set3 = {5, 6, 7}
big_union = set1.union(set2, set3)
print(big_union) # {1, 2, 3, 4, 5, 6, 7}

1
2
3
4
5
6
7
8
9
10
11
12
13

# 4.2.交集

包含同时存在于所有集合中的元素

set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

# 方法1: intersection() 或 & 运算符
intersection1 = set1.intersection(set2)
intersection2 = set1 & set2
print(intersection1) # {3, 4}
print(intersection2) # {3, 4}

# 找出多个集合的交集
set3 = {4, 5, 6}
common = set1 & set2 & set3
print(common) # {4}

1
2
3
4
5
6
7
8
9
10
11
12
13

# 4.3.差集

包含只在第一个集合中，不在其他集合中的元素

set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

# 方法1: difference() 或 - 运算符
difference1 = set1.difference(set2)  # 在set1中但不在set2中
difference2 = set1 - set2
print(difference1) # {1, 2}
print(difference2) # {1, 2}

# 注意顺序很重要！
difference3 = set2 - set1  # 在set2中但不在set1中
print(difference3) # {5, 6}

1
2
3
4
5
6
7
8
9
10
11
12

# 4.4.对称差集

包含只存在于其中一个集合中的元素（排除交集）

set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}

# 方法1: symmetric_difference() 或 ^ 运算符
sym_diff1 = set1.symmetric_difference(set2)
sym_diff2 = set1 ^ set2
print(sym_diff1) # {1, 2, 5, 6}
print(sym_diff2) # {1, 2, 5, 6}

1
2
3
4
5
6
7
8

# 5.集合关系判断

# 5.1.子集与超集

A = {1, 2, 3}
B = {1, 2, 3, 4, 5}
C = {1, 2}

# 子集判断
print(C.issubset(A))     # True - C是A的子集
print(C.issubset(B))     # True - C是B的子集
print(C <= B)            # True - 等价写法

# 真子集（不能相等）
print(C < B)             # True - C是B的真子集
print(A < A)             # False - 集合不是自身的真子集

# 超集判断
print(B.issuperset(A))   # True - B是A的超集
print(B >= A)            # True - 等价写法
print(B > A)             # True - B是A的真超集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

# 5.2.不相交集合

可以使用 isdisjoint() 方法来判断两个集合是否没有交集（即是否没有任何共同元素）。

如果两个集合完全没有重叠元素，则返回 True。
如果存在至少一个共同元素，则返回 False。

这个方法经常用于判断集合是否彼此"独立"、是否可以安全合并等场景。

set1 = {1, 2, 3}
set2 = {4, 5, 6}
set3 = {3, 4, 5}

print(set1.isdisjoint(set2))  # True - 没有共同元素
print(set1.isdisjoint(set3))  # False - 有共同元素3

1
2
3
4
5
6

# 6.集合的遍历

由于集合是无序的，遍历时元素的顺序是不确定的：

fruits = {"apple", "banana", "cherry"}

# 直接遍历元素
for fruit in fruits:
    print(fruit)

# 输出顺序可能每次运行都不同，例如：
# banana
# cherry
# apple

1
2
3
4
5
6
7
8
9
10

# 7.不可变集合

frozenset 是集合的不可变版本，具有集合的所有特性（除了不能修改）。

# 创建不可变集合
frozen = frozenset([1, 2, 3, 2, 1])
print(frozen)        # frozenset({1, 2, 3})
print(type(frozen))  # <class 'frozenset'>

# 不可变集合的操作（除了修改操作）
print(len(frozen))           # 3
print(1 in frozen)           # True
print(frozen.union({4, 5}))  # frozenset({1, 2, 3, 4, 5})

# 不能修改
#frozen.add(4)      # AttributeError
#frozen.remove(1)   # AttributeError

# 主要用途：作为字典的键
valid_dict = {
    frozenset({1, 2, 3}): "set1",
    frozenset({4, 5}): "set2"
}
print(valid_dict[frozenset({1, 2, 3})]) # "set1"

# 普通集合不能作为字典键
#invalid_dict = {{1, 2, 3}: "value"} # TypeError
# 不可变集合可以作为字典键
valid_dict = {frozenset({1, 2, 3}): "value"}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

# 8.集合的典型应用场景

# 8.1.数据去重

# 从列表中去除重复元素
numbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
unique_numbers = list(set(numbers))
print(unique_numbers) # [1, 2, 3, 4] (顺序可能不同)

# 统计字符串中不同字符的数量
text = "hello world"
unique_chars = set(text)
print(f"不同字符: {unique_chars}") # {'e', 'd', 'h', 'r', 'l', 'o', 'w', ' '}
print(f"不同字符数量: {len(unique_chars)}") # 8

1
2
3
4
5
6
7
8
9
10

# 8.2.成员测试

集合的成员测试时间复杂度为 O(1)，比列表的 O(n) 快得多。

# 大量数据成员测试 - 集合远快于列表
import time

# 准备测试数据
large_list = list(range(1000000))
large_set = set(large_list)

# 测试列表查找时间
start_time = time.time()
result1 = 999999 in large_list
list_time = time.time() - start_time

# 测试集合查找时间
start_time = time.time()
result2 = 999999 in large_set
set_time = time.time() - start_time

print(f"列表查找时间: {list_time:.6f}秒") #列表查找时间: 0.005703秒
print(f"集合查找时间: {set_time:.6f}秒") #集合查找时间: 0.000005秒
print(f"集合比列表快 {list_time/set_time:.0f} 倍") #集合比列表快 1196 倍

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

# 9.数学集合运算

# 找出两个列表的共同元素
students_math = {"Alice", "Bob", "Charlie", "David"}
students_physics = {"Bob", "David", "Eve", "Frank"}

# 同时参加两门课的学生
both = students_math & students_physics
print(f"同时参加两门课: {both}") # {'Bob', 'David'}

# 只参加一门课的学生
only_one = students_math ^ students_physics
print(f"只参加一门课: {only_one}") # {'Alice', 'Charlie', 'Eve', 'Frank'}

1
2
3
4
5
6
7
8
9
10
11

# 10.集合 vs 列表 vs 元组

特性	列表 (List)	元组 (Tuple)	集合 (Set)
语法	`[1, 2, 3]`	`(1, 2, 3)`	`{1, 2, 3}`
有序性	有序（有索引）	有序（有索引）	无序
可变性	可变	不可变	可变
元素重复	允许重复	允许重复	不允许重复
元素类型	任意类型	任意类型	不可变类型
主要用途	有序数据集合	不可变数据集合	去重、快速查找、集合运算

# 11.选择指南：

使用列表：需要保持元素顺序，允许重复，需要索引访问
使用元组：数据不应该被修改，需要作为字典键
使用集合：
- 需要去除重复元素
- 需要快速成员测试
- 需要数学集合运算（并集、交集等）
- 不关心元素顺序

记住：当你需要"唯一性"和"快速查找"时，就应该考虑使用集合！

python的迭代器